Picture for Puneet Mathur

Puneet Mathur

PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents

Add code
Jan 20, 2025
Figure 1 for PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents
Figure 2 for PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents
Viaarxiv icon

Multi-LLM Text Summarization

Add code
Dec 20, 2024
Viaarxiv icon

GUI Agents: A Survey

Add code
Dec 18, 2024
Viaarxiv icon

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Add code
Dec 14, 2024
Viaarxiv icon

DynaSaur: Large Language Agents Beyond Predefined Actions

Add code
Nov 04, 2024
Figure 1 for DynaSaur: Large Language Agents Beyond Predefined Actions
Figure 2 for DynaSaur: Large Language Agents Beyond Predefined Actions
Figure 3 for DynaSaur: Large Language Agents Beyond Predefined Actions
Figure 4 for DynaSaur: Large Language Agents Beyond Predefined Actions
Viaarxiv icon

Survey of User Interface Design and Interaction Techniques in Generative AI Applications

Add code
Oct 28, 2024
Figure 1 for Survey of User Interface Design and Interaction Techniques in Generative AI Applications
Figure 2 for Survey of User Interface Design and Interaction Techniques in Generative AI Applications
Figure 3 for Survey of User Interface Design and Interaction Techniques in Generative AI Applications
Figure 4 for Survey of User Interface Design and Interaction Techniques in Generative AI Applications
Viaarxiv icon

Taipan: Efficient and Expressive State Space Language Models with Selective Attention

Add code
Oct 24, 2024
Viaarxiv icon

DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding

Add code
Oct 21, 2024
Figure 1 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 2 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 3 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Figure 4 for DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Viaarxiv icon

DocSynthv2: A Practical Autoregressive Modeling for Document Generation

Add code
Jun 12, 2024
Viaarxiv icon

3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

Add code
Mar 28, 2022
Figure 1 for 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
Figure 2 for 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
Figure 3 for 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
Figure 4 for 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
Viaarxiv icon