Picture for Jianhua Tao

Jianhua Tao

Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio

Add code
Dec 02, 2024
Figure 1 for Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio
Figure 2 for Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio
Figure 3 for Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio
Figure 4 for Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio
Viaarxiv icon

Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS

Add code
Nov 27, 2024
Viaarxiv icon

LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis

Add code
Nov 24, 2024
Figure 1 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Figure 2 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Figure 3 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Figure 4 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Viaarxiv icon

DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection

Add code
Oct 15, 2024
Figure 1 for DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection
Figure 2 for DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection
Figure 3 for DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection
Figure 4 for DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection
Viaarxiv icon

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0

Add code
Sep 18, 2024
Figure 1 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 2 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 3 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 4 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Viaarxiv icon

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Add code
Sep 18, 2024
Figure 1 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 2 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 3 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 4 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Viaarxiv icon

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification

Add code
Sep 18, 2024
Figure 1 for WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
Figure 2 for WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
Figure 3 for WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
Figure 4 for WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
Viaarxiv icon

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation

Add code
Sep 14, 2024
Figure 1 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 2 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 3 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 4 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Viaarxiv icon

Utilizing Speaker Profiles for Impersonation Audio Detection

Add code
Aug 30, 2024
Viaarxiv icon

Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models

Add code
Aug 24, 2024
Figure 1 for Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models
Figure 2 for Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models
Figure 3 for Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models
Figure 4 for Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models
Viaarxiv icon