Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Jiang

Victor

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

Mar 13, 2025

Runjian Chen, Wenqi Shao, Bo Zhang, Shaoshuai Shi, Li Jiang, Ping Luo

Abstract:Deep-learning-based autonomous driving (AD) perception introduces a promising picture for safe and environment-friendly transportation. However, the over-reliance on real labeled data in LiDAR perception limits the scale of on-road attempts. 3D real world data is notoriously time-and-energy-consuming to annotate and lacks corner cases like rare traffic participants. On the contrary, in simulators like CARLA, generating labeled LiDAR point clouds with corner cases is a piece of cake. However, introducing synthetic point clouds to improve real perception is non-trivial. This stems from two challenges: 1) sample efficiency of simulation datasets 2) simulation-to-real gaps. To overcome both challenges, we propose a plug-and-play method called JiSAM , shorthand for Jittering augmentation, domain-aware backbone and memory-based Sectorized AlignMent. In extensive experiments conducted on the famous AD dataset NuScenes, we demonstrate that, with SOTA 3D object detector, JiSAM is able to utilize the simulation data and only labels on 2.5% available real data to achieve comparable performance to models trained on all real data. Additionally, JiSAM achieves more than 15 mAPs on the objects not labeled in the real training set. We will release models and codes.

Via

Access Paper or Ask Questions

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jan 02, 2025

Jiajun Deng, Tianyu He, Li Jiang, Tianyu Wang, Feras Dayoub, Ian Reid

Abstract:Current 3D Large Multimodal Models (3D LMMs) have shown tremendous potential in 3D-vision-based dialogue and reasoning. However, how to further enhance 3D LMMs to achieve fine-grained scene understanding and facilitate flexible human-agent interaction remains a challenging problem. In this work, we introduce 3D-LLaVA, a simple yet highly powerful 3D LMM designed to act as an intelligent assistant in comprehending, reasoning, and interacting with the 3D world. Unlike existing top-performing methods that rely on complicated pipelines-such as offline multi-view feature extraction or additional task-specific heads-3D-LLaVA adopts a minimalist design with integrated architecture and only takes point clouds as input. At the core of 3D-LLaVA is a new Omni Superpoint Transformer (OST), which integrates three functionalities: (1) a visual feature selector that converts and selects visual tokens, (2) a visual prompt encoder that embeds interactive visual prompts into the visual token space, and (3) a referring mask decoder that produces 3D masks based on text description. This versatile OST is empowered by the hybrid pretraining to obtain perception priors and leveraged as the visual connector that bridges the 3D data to the LLM. After performing unified instruction tuning, our 3D-LLaVA reports impressive results on various benchmarks. The code and model will be released to promote future exploration.

Via

Access Paper or Ask Questions

Are Expressive Models Truly Necessary for Offline RL?

Dec 15, 2024

Guan Wang, Haoyi Niu, Jianxiong Li, Li Jiang, Jianming Hu, Xianyuan Zhan

Figure 1 for Are Expressive Models Truly Necessary for Offline RL?

Figure 2 for Are Expressive Models Truly Necessary for Offline RL?

Figure 3 for Are Expressive Models Truly Necessary for Offline RL?

Figure 4 for Are Expressive Models Truly Necessary for Offline RL?

Abstract:Among various branches of offline reinforcement learning (RL) methods, goal-conditioned supervised learning (GCSL) has gained increasing popularity as it formulates the offline RL problem as a sequential modeling task, therefore bypassing the notoriously difficult credit assignment challenge of value learning in conventional RL paradigm. Sequential modeling, however, requires capturing accurate dynamics across long horizons in trajectory data to ensure reasonable policy performance. To meet this requirement, leveraging large, expressive models has become a popular choice in recent literature, which, however, comes at the cost of significantly increased computation and inference latency. Contradictory yet promising, we reveal that lightweight models as simple as shallow 2-layer MLPs, can also enjoy accurate dynamics consistency and significantly reduced sequential modeling errors against large expressive models by adopting a simple recursive planning scheme: recursively planning coarse-grained future sub-goals based on current and target information, and then executes the action with a goal-conditioned policy learned from data rela-beled with these sub-goal ground truths. We term our method Recursive Skip-Step Planning (RSP). Simple yet effective, RSP enjoys great efficiency improvements thanks to its lightweight structure, and substantially outperforms existing methods, reaching new SOTA performances on the D4RL benchmark, especially in multi-stage long-horizon tasks.

* Instead of relying on expressive models, shallow MLPs can also excel in long sequential decision-making tasks with Recursive Skip-Step Planning (RSP)

Via

Access Paper or Ask Questions

Using Zone Inflation and Volume Transfer to Design a Fabric-based Pneumatic Exosuit with both Efficiency and Wearability

Oct 15, 2024

Chendong Liu, Dapeng Yang, Jiachen Chen, Yiming Dai, Li Jiang, Shengquan Xie, Hong Liu

Abstract:Fabric-based pneumatic exosuits have a broad application prospect due to their good human-machine interaction performance, but their structural design paradigm has not yet been finalized and requires in-depth research. This paper proposes the concepts of zone inflation and volume transfer for the design of a fabric-based pneumatic exosuit with both efficiency and wearability. The meaning of zone inflation is to divide the inflation area of pneumatic exosuit into inflation-deflation zone and inflation-holding zone which can reduce the consumption of compressed air and improve efficiency. Volume transfer, a strategic distribution method of inflatable regions inside the garment, can effectively enhance the wearability of the exosuit. Using inexpensive thermoplastic polyurethane film and clothing fabric, the exosuit is made by heat pressing and sewing. The exosuit has a response time of 0.5s, a stress area of 1500mm2, and a profile of only 32mm, which can be hidden inside common clothing. A mathematical model is developed to predict the output torque of the exosuit with an error of 3.6%. Mechanical experiments show that the exosuit outputs a torque of 9.1Nm at a pressure of 100kPa. Surface electromyography experiments show that the exosuit can provide users with a boost from sitting to standing, with an average reduction in electromyography signals of 14.95%. The exosuit designed using these methods synthesizes efficiency and wearability and is expected to be an ideal paradigm for fabric-based pneumatic exosuits.

Via

Access Paper or Ask Questions

MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment

Jul 31, 2024

Anurag Das, Xinting Hu, Li Jiang, Bernt Schiele

Abstract:Recent approaches have shown that large-scale vision-language models such as CLIP can improve semantic segmentation performance. These methods typically aim for pixel-level vision-language alignment, but often rely on low resolution image features from CLIP, resulting in class ambiguities along boundaries. Moreover, the global scene representations in CLIP text embeddings do not directly correlate with the local and detailed pixel-level features, making meaningful alignment more difficult. To address these limitations, we introduce MTA-CLIP, a novel framework employing mask-level vision-language alignment. Specifically, we first propose Mask-Text Decoder that enhances the mask representations using rich textual data with the CLIP language model. Subsequently, it aligns mask representations with text embeddings using Mask-to-Text Contrastive Learning. Furthermore, we introduce MaskText Prompt Learning, utilizing multiple context-specific prompts for text embeddings to capture diverse class representations across masks. Overall, MTA-CLIP achieves state-of-the-art, surpassing prior works by an average of 2.8% and 1.3% on on standard benchmark datasets, ADE20k and Cityscapes, respectively.

* accepted at ECCV 2024

Via

Access Paper or Ask Questions

A Generalized Version of Chung's Lemma and its Applications

Jun 09, 2024

Li Jiang, Xiao Li, Andre Milzarek, Junwen Qiu

Abstract:Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized Chung's lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as stochastic gradient descent and random reshuffling, under a general $(\theta,\mu)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes can adapt to the objective function's geometry, achieving the optimal convergence rate without requiring exact knowledge of the underlying landscape. Our results demonstrate that the developed variant of Chung's lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.

* 43 pages, 5 figures

Via

Access Paper or Ask Questions

Image Copy-Move Forgery Detection and Localization Scheme: How to Avoid Missed Detection and False Alarm

Jun 05, 2024

Li Jiang, Zhaowei Lu, Yuebing Gao, Yifan Wang

Abstract:Image copy-move is an operation that replaces one part of the image with another part of the same image, which can be used for illegal purposes due to the potential semantic changes. Recent studies have shown that keypoint-based algorithms achieved excellent and robust localization performance even when small or smooth tampered areas were involved. However, when the input image is low-resolution, most existing keypoint-based algorithms are difficult to generate sufficient keypoints, resulting in more missed detections. In addition, existing algorithms are usually unable to distinguish between Similar but Genuine Objects (SGO) images and tampered images, resulting in more false alarms. This is mainly due to the lack of further verification of local homography matrix in forgery localization stage. To tackle these problems, this paper firstly proposes an excessive keypoint extraction strategy to overcome missed detection. Subsequently, a group matching algorithm is used to speed up the matching of excessive keypoints. Finally, a new iterative forgery localization algorithm is introduced to quickly form pixel-level localization results while ensuring a lower false alarm. Extensive experimental results show that our scheme has superior performance than state-of-the-art algorithms in overcoming missed detection and false alarm. Our code is available at https://github.com/LUZW1998/CMFDL.

Via

Access Paper or Ask Questions

Hummer: Towards Limited Competitive Preference Dataset

May 21, 2024

Li Jiang, Yusen Wu, Junwu Xiong, Jingqing Ruan, Yichuan Ding, Qingpei Guo, Zujie Wen, Jun Zhou, Xiaotie Deng

Abstract:Preference datasets are essential for incorporating human preferences into pre-trained language models, playing a key role in the success of Reinforcement Learning from Human Feedback. However, these datasets often demonstrate conflicting alignment objectives, leading to increased vulnerability to jailbreak attacks and challenges in adapting downstream tasks to prioritize specific alignment objectives without negatively impacting others. In this work, we introduce a novel statistical metric, Alignment Dimension Conflict, to quantify the degree of conflict within preference datasets. We then present \texttt{Hummer} and its fine-grained variant, \texttt{Hummer-F}, as innovative pairwise preference datasets with reduced-conflict alignment objectives. \texttt{Hummer} is built based on UltraFeedback and is enhanced by AI feedback from GPT-4, marking as the first preference dataset aimed at reducing the competition between alignment objectives. Furthermore, we develop reward models, HummerRM and HummerRM-F, which employ a hybrid sampling approach to balance diverse alignment objectives effectively. This sampling method positions HummerRM as an ideal model for domain-specific further fine-tuning and reducing vulnerabilities to attacks.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions

Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

May 17, 2024

Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu(+4 more)

Figure 1 for Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

Figure 2 for Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

Figure 3 for Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

Figure 4 for Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

Abstract:Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks based on human instructions, paving the way to artificial general intelligence (AGI)-enabled 6G. Given the great potential of LLM technologies, this work aims to provide a comprehensive overview of LLM-enabled telecom networks. In particular, we first present LLM fundamentals, including model architecture, pre-training, fine-tuning, inference and utilization, model evaluation, and telecom deployment. Then, we introduce LLM-enabled key techniques and telecom applications in terms of generation, classification, optimization, and prediction problems. Specifically, the LLM-enabled generation applications include telecom domain knowledge, code, and network configuration generation. After that, the LLM-based classification applications involve network security, text, image, and traffic classification problems. Moreover, multiple LLM-enabled optimization techniques are introduced, such as automated reward function design for reinforcement learning and verbal reinforcement learning. Furthermore, for LLM-aided prediction problems, we discussed time-series prediction models and multi-modality prediction problems for telecom. Finally, we highlight the challenges and identify the future directions of LLM-enabled telecom networks.

Via

Access Paper or Ask Questions

Unified Language-driven Zero-shot Domain Adaptation

Apr 10, 2024

Senqiao Yang, Zhuotao Tian, Li Jiang, Jiaya Jia

Abstract:This paper introduces Unified Language-driven Zero-shot Domain Adaptation (ULDA), a novel task setting that enables a single model to adapt to diverse target domains without explicit domain-ID knowledge. We identify the constraints in the existing language-driven zero-shot domain adaptation task, particularly the requirement for domain IDs and domain-specific models, which may restrict flexibility and scalability. To overcome these issues, we propose a new framework for ULDA, consisting of Hierarchical Context Alignment (HCA), Domain Consistent Representation Learning (DCRL), and Text-Driven Rectifier (TDR). These components work synergistically to align simulated features with target text across multiple visual levels, retain semantic correlations between different regional representations, and rectify biases between simulated and real target visual features, respectively. Our extensive empirical evaluations demonstrate that this framework achieves competitive performance in both settings, surpassing even the model that requires domain-ID, showcasing its superiority and generalization ability. The proposed method is not only effective but also maintains practicality and efficiency, as it does not introduce additional computational costs during inference. Our project page is https://senqiaoyang.com/project/ULDA .

* Accepted by CVPR 2024

Via

Access Paper or Ask Questions