Picture for Minh Tran

Minh Tran

Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice

Add code
Aug 24, 2025
Viaarxiv icon

A Robust BERT-Based Deep Learning Model for Automated Cancer Type Extraction from Unstructured Pathology Reports

Add code
Aug 21, 2025
Viaarxiv icon

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Add code
Apr 05, 2025
Viaarxiv icon

Negative to Positive Co-learning with Aggressive Modality Dropout

Add code
Jan 01, 2025
Viaarxiv icon

A2VIS: Amodal-Aware Approach to Video Instance Segmentation

Add code
Dec 02, 2024
Figure 1 for A2VIS: Amodal-Aware Approach to Video Instance Segmentation
Figure 2 for A2VIS: Amodal-Aware Approach to Video Instance Segmentation
Figure 3 for A2VIS: Amodal-Aware Approach to Video Instance Segmentation
Figure 4 for A2VIS: Amodal-Aware Approach to Video Instance Segmentation
Viaarxiv icon

Amodal Instance Segmentation with Diffusion Shape Prior Estimation

Add code
Sep 26, 2024
Viaarxiv icon

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

Add code
Jun 01, 2024
Viaarxiv icon

S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling

Add code
May 07, 2024
Viaarxiv icon

CarcassFormer: An End-to-end Transformer-based Framework for Simultaneous Localization, Segmentation and Classification of Poultry Carcass Defect

Add code
Apr 17, 2024
Viaarxiv icon

Dyadic Interaction Modeling for Social Behavior Generation

Add code
Mar 27, 2024
Viaarxiv icon