Picture for Guang Liu

Guang Liu

CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models

Add code
Oct 24, 2024
Viaarxiv icon

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Add code
Oct 24, 2024
Viaarxiv icon

ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model

Add code
Oct 06, 2024
Viaarxiv icon

Emu3: Next-Token Prediction is All You Need

Add code
Sep 27, 2024
Viaarxiv icon

Aquila2 Technical Report

Add code
Aug 14, 2024
Viaarxiv icon

AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

Add code
Aug 13, 2024
Viaarxiv icon

InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

Add code
Aug 09, 2024
Viaarxiv icon

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Add code
Jan 26, 2024
Figure 1 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Figure 2 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Figure 3 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Figure 4 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Viaarxiv icon

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Add code
Jan 25, 2024
Viaarxiv icon

TACO: Topics in Algorithmic COde generation dataset

Add code
Dec 27, 2023
Figure 1 for TACO: Topics in Algorithmic COde generation dataset
Figure 2 for TACO: Topics in Algorithmic COde generation dataset
Figure 3 for TACO: Topics in Algorithmic COde generation dataset
Figure 4 for TACO: Topics in Algorithmic COde generation dataset
Viaarxiv icon