Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuqi Huang

DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion

May 03, 2025

Haoteng Li, Zhao Yang, Zezhong Qian, Gongpeng Zhao, Yuqi Huang, Jun Yu, Huazheng Zhou, Longjun Liu

Abstract:Accurate and high-fidelity driving scene reconstruction relies on fully leveraging scene information as conditioning. However, existing approaches, which primarily use 3D bounding boxes and binary maps for foreground and background control, fall short in capturing the complexity of the scene and integrating multi-modal information. In this paper, we propose DualDiff, a dual-branch conditional diffusion model designed to enhance multi-view driving scene generation. We introduce Occupancy Ray Sampling (ORS), a semantic-rich 3D representation, alongside numerical driving scene representation, for comprehensive foreground and background control. To improve cross-modal information integration, we propose a Semantic Fusion Attention (SFA) mechanism that aligns and fuses features across modalities. Furthermore, we design a foreground-aware masked (FGM) loss to enhance the generation of tiny objects. DualDiff achieves state-of-the-art performance in FID score, as well as consistently better results in downstream BEV segmentation and 3D object detection tasks.

* 8 pages, 6 figures,

Via

Access Paper or Ask Questions

What can LLM tell us about cities?

Nov 25, 2024

Zhuoheng Li, Yaochen Wang, Zhixue Song, Yuqi Huang, Rui Bao, Guanjie Zheng, Zhenhui Jessie Li

Abstract:This study explores the capabilities of large language models (LLMs) in providing knowledge about cities and regions on a global scale. We employ two methods: directly querying the LLM for target variable values and extracting explicit and implicit features from the LLM correlated with the target variable. Our experiments reveal that LLMs embed a broad but varying degree of knowledge across global cities, with ML models trained on LLM-derived features consistently leading to improved predictive accuracy. Additionally, we observe that LLMs demonstrate a certain level of knowledge across global cities on all continents, but it is evident when they lack knowledge, as they tend to generate generic or random outputs for unfamiliar tasks. These findings suggest that LLMs can offer new opportunities for data-driven decision-making in the study of cities.

Via

Access Paper or Ask Questions

A General Framework for Clustering and Distribution Matching with Bandit Feedback

Sep 08, 2024

Recep Can Yavas, Yuqi Huang, Vincent Y. F. Tan, Jonathan Scarlett

Figure 1 for A General Framework for Clustering and Distribution Matching with Bandit Feedback

Figure 2 for A General Framework for Clustering and Distribution Matching with Bandit Feedback

Figure 3 for A General Framework for Clustering and Distribution Matching with Bandit Feedback

Figure 4 for A General Framework for Clustering and Distribution Matching with Bandit Feedback

Abstract:We develop a general framework for clustering and distribution matching problems with bandit feedback. We consider a $K$-armed bandit model where some subset of $K$ arms is partitioned into $M$ groups. Within each group, the random variable associated to each arm follows the same distribution on a finite alphabet. At each time step, the decision maker pulls an arm and observes its outcome from the random variable associated to that arm. Subsequent arm pulls depend on the history of arm pulls and their outcomes. The decision maker has no knowledge of the distributions of the arms or the underlying partitions. The task is to devise an online algorithm to learn the underlying partition of arms with the least number of arm pulls on average and with an error probability not exceeding a pre-determined value $\delta$. Several existing problems fall under our general framework, including finding $M$ pairs of arms, odd arm identification, and $M$-ary clustering of $K$ arms belong to our general framework. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $\delta$. Furthermore, we develop a computationally-efficient online algorithm based on the Track-and-Stop method and Frank--Wolfe algorithm, and show that the average number of arm pulls of our algorithm asymptotically matches that of the lower bound. Our refined analysis also uncovers a novel bound on the speed at which the average number of arm pulls of our algorithm converges to the fundamental limit as $\delta$ vanishes.

* 22 pages, submitted to Information Theory Transactions in September 2024

Via

Access Paper or Ask Questions

GazeReader: Detecting Unknown Word Using Webcam for English as a Second Language (ESL) Learners

Mar 18, 2023

Jiexin Ding, Bowen Zhao, Yuqi Huang, Yuntao Wang, Yuanchun Shi

Abstract:Automatic unknown word detection techniques can enable new applications for assisting English as a Second Language (ESL) learners, thus improving their reading experiences. However, most modern unknown word detection methods require dedicated eye-tracking devices with high precision that are not easily accessible to end-users. In this work, we propose GazeReader, an unknown word detection method only using a webcam. GazeReader tracks the learner's gaze and then applies a transformer-based machine learning model that encodes the text information to locate the unknown word. We applied knowledge enhancement including term frequency, part of speech, and named entity recognition to improve the performance. The user study indicates that the accuracy and F1-score of our method were 98.09% and 75.73%, respectively. Lastly, we explored the design scope for ESL reading and discussed the findings.

* This paper has been accepted by ACM CHI 2023

Via

Access Paper or Ask Questions