Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rishibha Bansal

Read it to me: An emotionally aware Speech Narration Application

Sep 06, 2022

Rishibha Bansal

Figure 1 for Read it to me: An emotionally aware Speech Narration Application

Figure 2 for Read it to me: An emotionally aware Speech Narration Application

Figure 3 for Read it to me: An emotionally aware Speech Narration Application

Figure 4 for Read it to me: An emotionally aware Speech Narration Application

Abstract:In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architecture is explored for various emotion-pair transfers. The generated audio is then classified using an LSTM-based emotion classifier for audio. We find that "sad" audio is generated well as compared to "happy" or "anger" as people have similar expressions of sadness.

Via

Access Paper or Ask Questions

SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions

Jul 24, 2022

Ansh Mittal, Shuvam Ghosal, Rishibha Bansal, Dat Ngyuyen

Figure 1 for SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions

Figure 2 for SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions

Figure 3 for SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions

Figure 4 for SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions

Abstract:Detecting suspicious activities in surveillance videos has been a longstanding problem, which can further lead to difficulties in detecting crimes. The authors propose a novel approach for detecting and summarizing the suspicious activities going on in the surveillance videos. They also create ground truth summaries for the UCF-Crime video dataset. Further, the authors test existing state-of-the-art algorithms for Dense Video Captioning for a subset of this dataset and propose a model for this task by leveraging Human-Object Interaction models for the Visual features. They observe that this formulation for Dense Captioning achieves large gains over earlier approaches by a significant margin. The authors also perform an ablative analysis of the dataset and the model and report their findings.

* 11 pages, 4 figures, 5 tables

Via

Access Paper or Ask Questions