Picture for Shafiq Joty

Shafiq Joty

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings

Add code
Mar 19, 2025
Viaarxiv icon

Multi$^2$: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing

Add code
Feb 27, 2025
Viaarxiv icon

Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding

Add code
Feb 17, 2025
Viaarxiv icon

Demystifying Domain-adaptive Post-training for Financial LLMs

Add code
Jan 09, 2025
Figure 1 for Demystifying Domain-adaptive Post-training for Financial LLMs
Figure 2 for Demystifying Domain-adaptive Post-training for Financial LLMs
Figure 3 for Demystifying Domain-adaptive Post-training for Financial LLMs
Figure 4 for Demystifying Domain-adaptive Post-training for Financial LLMs
Viaarxiv icon

StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs

Add code
Dec 23, 2024
Figure 1 for StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
Figure 2 for StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
Figure 3 for StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
Figure 4 for StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
Viaarxiv icon

Preference Optimization for Reasoning with Pseudo Feedback

Add code
Nov 25, 2024
Viaarxiv icon

Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown

Add code
Nov 24, 2024
Figure 1 for Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Figure 2 for Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Figure 3 for Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Figure 4 for Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Viaarxiv icon

On Positional Bias of Faithfulness for Long-form Summarization

Add code
Oct 31, 2024
Figure 1 for On Positional Bias of Faithfulness for Long-form Summarization
Figure 2 for On Positional Bias of Faithfulness for Long-form Summarization
Figure 3 for On Positional Bias of Faithfulness for Long-form Summarization
Figure 4 for On Positional Bias of Faithfulness for Long-form Summarization
Viaarxiv icon

JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking

Add code
Oct 31, 2024
Viaarxiv icon

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

Add code
Oct 11, 2024
Figure 1 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 2 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 3 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 4 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Viaarxiv icon