Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhishek Mishra

Peter

Language translation, and change of accent for speech-to-speech task using diffusion model

May 04, 2025

Abhishek Mishra, Ritesh Sur Chowdhury, Vartul Bahuguna, Isha Pandey, Ganesh Ramakrishnan

Abstract:Speech-to-speech translation (S2ST) aims to convert spoken input in one language to spoken output in another, typically focusing on either language translation or accent adaptation. However, effective cross-cultural communication requires handling both aspects simultaneously - translating content while adapting the speaker's accent to match the target language context. In this work, we propose a unified approach for simultaneous speech translation and change of accent, a task that remains underexplored in current literature. Our method reformulates the problem as a conditional generation task, where target speech is generated based on phonemes and guided by target speech features. Leveraging the power of diffusion models, known for high-fidelity generative capabilities, we adapt text-to-image diffusion strategies by conditioning on source speech transcriptions and generating Mel spectrograms representing the target speech with desired linguistic and accentual attributes. This integrated framework enables joint optimization of translation and accent adaptation, offering a more parameter-efficient and effective model compared to traditional pipelines.

Via

Access Paper or Ask Questions

Guardians of Generation: Dynamic Inference-Time Copyright Shielding with Adaptive Guidance for AI Image Generation

Mar 19, 2025

Soham Roy, Abhishek Mishra, Shirish Karande, Murari Mandal

Abstract:Modern text-to-image generative models can inadvertently reproduce copyrighted content memorized in their training data, raising serious concerns about potential copyright infringement. We introduce Guardians of Generation, a model agnostic inference time framework for dynamic copyright shielding in AI image generation. Our approach requires no retraining or modification of the generative model weights, instead integrating seamlessly with existing diffusion pipelines. It augments the generation process with an adaptive guidance mechanism comprising three components: a detection module, a prompt rewriting module, and a guidance adjustment module. The detection module monitors user prompts and intermediate generation steps to identify features indicative of copyrighted content before they manifest in the final output. If such content is detected, the prompt rewriting mechanism dynamically transforms the user's prompt by sanitizing or replacing references that could trigger copyrighted material while preserving the prompt's intended semantics. The adaptive guidance module adaptively steers the diffusion process away from flagged content by modulating the model's sampling trajectory. Together, these components form a robust shield that enables a tunable balance between preserving creative fidelity and ensuring copyright compliance. We validate our method on a variety of generative models such as Stable Diffusion, SDXL, and Flux, demonstrating substantial reductions in copyrighted content generation with negligible impact on output fidelity or alignment with user intent. This work provides a practical, plug-and-play safeguard for generative image models, enabling more responsible deployment under real-world copyright constraints. Source code is available at: https://respailab.github.io/gog

Via

Access Paper or Ask Questions

Wafer2Spike: Spiking Neural Network for Wafer Map Pattern Classification

Nov 29, 2024

Abhishek Mishra, Suman Kumar, Anush Lingamoorthy, Anup Das, Nagarajan Kandasamy

Abstract:In integrated circuit design, the analysis of wafer map patterns is critical to improve yield and detect manufacturing issues. We develop Wafer2Spike, an architecture for wafer map pattern classification using a spiking neural network (SNN), and demonstrate that a well-trained SNN achieves superior performance compared to deep neural network-based solutions. Wafer2Spike achieves an average classification accuracy of 98\% on the WM-811k wafer benchmark dataset. It is also superior to existing approaches for classifying defect patterns that are underrepresented in the original dataset. Wafer2Spike achieves this improved precision with great computational efficiency.

Via

Access Paper or Ask Questions

Ethical Implementation of Artificial Intelligence to Select Embryos in In Vitro Fertilization

Apr 30, 2021

Michael Anis Mihdi Afnan, Cynthia Rudin, Vincent Conitzer, Julian Savulescu, Abhishek Mishra, Yanhe Liu, Masoud Afnan

Figure 1 for Ethical Implementation of Artificial Intelligence to Select Embryos in In Vitro Fertilization

Figure 2 for Ethical Implementation of Artificial Intelligence to Select Embryos in In Vitro Fertilization

Figure 3 for Ethical Implementation of Artificial Intelligence to Select Embryos in In Vitro Fertilization

Abstract:AI has the potential to revolutionize many areas of healthcare. Radiology, dermatology, and ophthalmology are some of the areas most likely to be impacted in the near future, and they have received significant attention from the broader research community. But AI techniques are now also starting to be used in in vitro fertilization (IVF), in particular for selecting which embryos to transfer to the woman. The contribution of AI to IVF is potentially significant, but must be done carefully and transparently, as the ethical issues are significant, in part because this field involves creating new people. We first give a brief introduction to IVF and review the use of AI for embryo selection. We discuss concerns with the interpretation of the reported results from scientific and practical perspectives. We then consider the broader ethical issues involved. We discuss in detail the problems that result from the use of black-box methods in this context and advocate strongly for the use of interpretable models. Importantly, there have been no published trials of clinical effectiveness, a problem in both the AI and IVF communities, and we therefore argue that clinical implementation at this point would be premature. Finally, we discuss ways for the broader AI community to become involved to ensure scientifically sound and ethically responsible development of AI in IVF.

* AIES 2021

Via

Access Paper or Ask Questions

Image Completion and Extrapolation with Contextual Cycle Consistency

Jun 04, 2020

Sai Hemanth Kasaraneni, Abhishek Mishra

Figure 1 for Image Completion and Extrapolation with Contextual Cycle Consistency

Figure 2 for Image Completion and Extrapolation with Contextual Cycle Consistency

Figure 3 for Image Completion and Extrapolation with Contextual Cycle Consistency

Figure 4 for Image Completion and Extrapolation with Contextual Cycle Consistency

Abstract:Image Completion refers to the task of filling in the missing regions of an image and Image Extrapolation refers to the task of extending an image at its boundaries while keeping it coherent. Many recent works based on GAN have shown progress in addressing these problem statements but lack adaptability for these two cases, i.e. the neural network trained for the completion of interior masked images does not generalize well for extrapolating over the boundaries and vice-versa. In this paper, we present a technique to train both completion and extrapolation networks concurrently while benefiting each other. We demonstrate our method's efficiency in completing large missing regions and we show the comparisons with the contemporary state of the art baseline.

* This paper has been accepted to 2020 IEEE International Conference on Image Processing (ICIP 2020)

Via

Access Paper or Ask Questions

3DFR: A Swift 3D Feature Reductionist Framework for Scene Independent Change Detection

Dec 26, 2019

Murari Mandal, Vansh Dhar, Abhishek Mishra, Santosh Kumar Vipparthi

Figure 1 for 3DFR: A Swift 3D Feature Reductionist Framework for Scene Independent Change Detection

Figure 2 for 3DFR: A Swift 3D Feature Reductionist Framework for Scene Independent Change Detection

Figure 3 for 3DFR: A Swift 3D Feature Reductionist Framework for Scene Independent Change Detection

Figure 4 for 3DFR: A Swift 3D Feature Reductionist Framework for Scene Independent Change Detection

Abstract:In this paper we propose an end-to-end swift 3D feature reductionist framework (3DFR) for scene independent change detection. The 3DFR framework consists of three feature streams: a swift 3D feature reductionist stream (AvFeat), a contemporary feature stream (ConFeat) and a temporal median feature map. These multilateral foreground/background features are further refined through an encoder-decoder network. As a result, the proposed framework not only detects temporal changes but also learns high-level appearance features. Thus, it incorporates the object semantics for effective change detection. Furthermore, the proposed framework is validated through a scene independent evaluation scheme in order to demonstrate the robustness and generalization capability of the network. The performance of the proposed method is evaluated on the benchmark CDnet 2014 dataset. The experimental results show that the proposed 3DFR network outperforms the state-of-the-art approaches.

* IEEE Signal Process. Letters, vol. 26, no. 12, pp. 1882-1886, 2019
* IEEE Signal Processing Letters

Via

Access Paper or Ask Questions

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

Jan 03, 2017

Nithyanand Kota, Abhishek Mishra, Sunil Srinivasa, Xi, Chen, Pieter Abbeel

Figure 1 for A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

Figure 2 for A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

Figure 3 for A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

Figure 4 for A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

Abstract:The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline. However, the baseline fitting itself suffers from the underfitting or the overfitting problem. In this paper, we develop a K-fold method for baseline estimation in policy gradient algorithms. The parameter K is the baseline estimation hyperparameter that can adjust the bias-variance trade-off in the baseline estimates. We demonstrate the usefulness of our approach via two state-of-the-art policy gradient algorithms on three MuJoCo locomotive control tasks.

Via

Access Paper or Ask Questions