Picture for Chen Feng

Chen Feng

${D}^{3}${ETOR}: ${D}$ebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive ${D}$ebiasing for Weakly-Supervised Camouflaged Object ${D}$etection with Scribble Annotations

Add code
Dec 23, 2025
Viaarxiv icon

Flexible and Efficient Spatio-Temporal Transformer for Sequential Visual Place Recognition

Add code
Oct 05, 2025
Viaarxiv icon

Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models

Add code
Jul 10, 2025
Viaarxiv icon

OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

Add code
Jul 03, 2025
Viaarxiv icon

Compressed Video Super-Resolution based on Hierarchical Encoding

Add code
Jun 17, 2025
Figure 1 for Compressed Video Super-Resolution based on Hierarchical Encoding
Figure 2 for Compressed Video Super-Resolution based on Hierarchical Encoding
Figure 3 for Compressed Video Super-Resolution based on Hierarchical Encoding
Figure 4 for Compressed Video Super-Resolution based on Hierarchical Encoding
Viaarxiv icon

From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models

Add code
Jun 11, 2025
Viaarxiv icon

GARF: Learning Generalizable 3D Reassembly for Real-World Fractures

Add code
Apr 07, 2025
Viaarxiv icon

Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework

Add code
Mar 11, 2025
Viaarxiv icon

When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis

Add code
Jan 17, 2025
Figure 1 for When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Figure 2 for When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Figure 3 for When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Figure 4 for When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Viaarxiv icon

Extrapolated Urban View Synthesis Benchmark

Add code
Dec 10, 2024
Figure 1 for Extrapolated Urban View Synthesis Benchmark
Figure 2 for Extrapolated Urban View Synthesis Benchmark
Figure 3 for Extrapolated Urban View Synthesis Benchmark
Figure 4 for Extrapolated Urban View Synthesis Benchmark
Viaarxiv icon