Picture for Anthony G. Cohn

Anthony G. Cohn

Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores

Add code
Oct 04, 2024
Figure 1 for Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Figure 2 for Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Figure 3 for Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Figure 4 for Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Viaarxiv icon

Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction

Add code
Jun 20, 2024
Viaarxiv icon

Dishonesty in Helpful and Harmless Alignment

Add code
Jun 04, 2024
Figure 1 for Dishonesty in Helpful and Harmless Alignment
Figure 2 for Dishonesty in Helpful and Harmless Alignment
Figure 3 for Dishonesty in Helpful and Harmless Alignment
Figure 4 for Dishonesty in Helpful and Harmless Alignment
Viaarxiv icon

Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning

Add code
May 23, 2024
Viaarxiv icon

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Add code
Jan 08, 2024
Figure 1 for Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Figure 2 for Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Figure 3 for Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Figure 4 for Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Viaarxiv icon

The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges

Add code
Sep 28, 2023
Figure 1 for The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges
Figure 2 for The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges
Figure 3 for The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges
Figure 4 for The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges
Viaarxiv icon

Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings

Add code
Mar 30, 2023
Viaarxiv icon

A Hierarchical Framework for Collaborative Artificial Intelligence

Add code
Dec 14, 2022
Figure 1 for A Hierarchical Framework for Collaborative Artificial Intelligence
Figure 2 for A Hierarchical Framework for Collaborative Artificial Intelligence
Viaarxiv icon

Exploring the GLIDE model for Human Action-effect Prediction

Add code
Aug 01, 2022
Figure 1 for Exploring the GLIDE model for Human Action-effect Prediction
Figure 2 for Exploring the GLIDE model for Human Action-effect Prediction
Figure 3 for Exploring the GLIDE model for Human Action-effect Prediction
Figure 4 for Exploring the GLIDE model for Human Action-effect Prediction
Viaarxiv icon

Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace

Add code
Feb 19, 2021
Figure 1 for Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace
Figure 2 for Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace
Figure 3 for Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace
Figure 4 for Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace
Viaarxiv icon