Picture for Hao Feng

Hao Feng

Summer

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon

LiRCDepth: Lightweight Radar-Camera Depth Estimation via Knowledge Distillation and Uncertainty Guidance

Add code
Dec 20, 2024
Viaarxiv icon

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

Add code
Oct 20, 2024
Figure 1 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 2 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 3 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 4 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Viaarxiv icon

EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models

Add code
Oct 20, 2024
Viaarxiv icon

GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling

Add code
Sep 02, 2024
Figure 1 for GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling
Figure 2 for GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling
Figure 3 for GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling
Figure 4 for GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling
Viaarxiv icon

AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding

Add code
Aug 30, 2024
Figure 1 for AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Figure 2 for AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Figure 3 for AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Figure 4 for AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Viaarxiv icon

LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation

Add code
Aug 25, 2024
Viaarxiv icon

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

Add code
Jul 05, 2024
Viaarxiv icon

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Add code
Jul 02, 2024
Viaarxiv icon

CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation

Add code
Jun 30, 2024
Viaarxiv icon