Picture for Tianyu Zheng

Tianyu Zheng

SimulBench: Evaluating Language Models with Creative Simulation Tasks

Add code
Sep 11, 2024
Figure 1 for SimulBench: Evaluating Language Models with Creative Simulation Tasks
Figure 2 for SimulBench: Evaluating Language Models with Creative Simulation Tasks
Figure 3 for SimulBench: Evaluating Language Models with Creative Simulation Tasks
Figure 4 for SimulBench: Evaluating Language Models with Creative Simulation Tasks
Viaarxiv icon

LIME-M: Less Is More for Evaluation of MLLMs

Add code
Sep 10, 2024
Figure 1 for LIME-M: Less Is More for Evaluation of MLLMs
Figure 2 for LIME-M: Less Is More for Evaluation of MLLMs
Figure 3 for LIME-M: Less Is More for Evaluation of MLLMs
Figure 4 for LIME-M: Less Is More for Evaluation of MLLMs
Viaarxiv icon

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Add code
Sep 04, 2024
Figure 1 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 2 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 3 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 4 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Viaarxiv icon

Research Advances and New Paradigms for Biology-inspired Spiking Neural Networks

Add code
Aug 26, 2024
Viaarxiv icon

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

Add code
Aug 15, 2024
Figure 1 for I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
Figure 2 for I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
Figure 3 for I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
Figure 4 for I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
Viaarxiv icon

GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

Add code
Jun 24, 2024
Figure 1 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Figure 2 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Figure 3 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Figure 4 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Viaarxiv icon

Dynamic Generation of Personalities with Large Language Models

Add code
Apr 10, 2024
Figure 1 for Dynamic Generation of Personalities with Large Language Models
Figure 2 for Dynamic Generation of Personalities with Large Language Models
Figure 3 for Dynamic Generation of Personalities with Large Language Models
Figure 4 for Dynamic Generation of Personalities with Large Language Models
Viaarxiv icon

MuPT: A Generative Symbolic Music Pretrained Transformer

Add code
Apr 10, 2024
Figure 1 for MuPT: A Generative Symbolic Music Pretrained Transformer
Figure 2 for MuPT: A Generative Symbolic Music Pretrained Transformer
Figure 3 for MuPT: A Generative Symbolic Music Pretrained Transformer
Figure 4 for MuPT: A Generative Symbolic Music Pretrained Transformer
Viaarxiv icon

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Add code
Apr 09, 2024
Figure 1 for Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Figure 2 for Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Figure 3 for Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Figure 4 for Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Viaarxiv icon

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Add code
Apr 06, 2024
Figure 1 for CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Figure 2 for CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Figure 3 for CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Figure 4 for CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Viaarxiv icon