Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ting Huang

DC-Scene: Data-Centric Learning for 3D Scene Understanding

May 21, 2025

Ting Huang, Zeyu Zhang, Ruicheng Zhang, Yang Zhao

Abstract:3D scene understanding plays a fundamental role in vision applications such as robotics, autonomous driving, and augmented reality. However, advancing learning-based 3D scene understanding remains challenging due to two key limitations: (1) the large scale and complexity of 3D scenes lead to higher computational costs and slower training compared to 2D counterparts; and (2) high-quality annotated 3D datasets are significantly scarcer than those available for 2D vision. These challenges underscore the need for more efficient learning paradigms. In this work, we propose DC-Scene, a data-centric framework tailored for 3D scene understanding, which emphasizes enhancing data quality and training efficiency. Specifically, we introduce a CLIP-driven dual-indicator quality (DIQ) filter, combining vision-language alignment scores with caption-loss perplexity, along with a curriculum scheduler that progressively expands the training pool from the top 25% to 75% of scene-caption pairs. This strategy filters out noisy samples and significantly reduces dependence on large-scale labeled 3D data. Extensive experiments on ScanRefer and Nr3D demonstrate that DC-Scene achieves state-of-the-art performance (86.1 CIDEr with the top-75% subset vs. 85.4 with the full dataset) while reducing training cost by approximately two-thirds, confirming that a compact set of high-quality samples can outperform exhaustive training. Code will be available at https://github.com/AIGeeksGroup/DC-Scene.

Via

Access Paper or Ask Questions

3D CoCa: Contrastive Learners are 3D Captioners

Apr 13, 2025

Ting Huang, Zeyu Zhang, Yemin Wang, Hao Tang

Abstract:3D captioning, which aims to describe the content of 3D scenes in natural language, remains highly challenging due to the inherent sparsity of point clouds and weak cross-modal alignment in existing methods. To address these challenges, we propose 3D CoCa, a novel unified framework that seamlessly combines contrastive vision-language learning with 3D caption generation in a single architecture. Our approach leverages a frozen CLIP vision-language backbone to provide rich semantic priors, a spatially-aware 3D scene encoder to capture geometric context, and a multi-modal decoder to generate descriptive captions. Unlike prior two-stage methods that rely on explicit object proposals, 3D CoCa jointly optimizes contrastive and captioning objectives in a shared feature space, eliminating the need for external detectors or handcrafted proposals. This joint training paradigm yields stronger spatial reasoning and richer semantic grounding by aligning 3D and textual representations. Extensive experiments on the ScanRefer and Nr3D benchmarks demonstrate that 3D CoCa significantly outperforms current state-of-the-arts by 10.2% and 5.76% in CIDEr at 0.5IoU, respectively. Code will be available at https://github.com/AIGeeksGroup/3DCoCa.

Via

Access Paper or Ask Questions

Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

Mar 07, 2025

Ling Team, Binwei Zeng, Chao Huang, Chao Zhang, Changxin Tian, Cong Chen, Dingnan Jin, Feng Yu, Feng Zhu, Feng Yuan(+64 more)

Figure 1 for Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

Figure 2 for Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

Figure 3 for Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

Figure 4 for Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

Abstract:In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as "Bailing" in Chinese, spelled B\v{a}il\'ing in Pinyin). Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters. Both models exhibit comparable performance to leading industry benchmarks. This report offers actionable insights to improve the efficiency and accessibility of AI development in resource-constrained settings, promoting more scalable and sustainable technologies. Specifically, to reduce training costs for large-scale MoE models, we propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency. Additionally, leveraging high-quality data generated from knowledge graphs, our models demonstrate superior capabilities in tool use compared to other models. Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models. Compared to high-performance devices, utilizing a lower-specification hardware system during the pre-training phase demonstrates significant cost savings, reducing computing costs by approximately 20%. The models can be accessed at https://huggingface.co/inclusionAI.

* 34 pages

Via

Access Paper or Ask Questions

Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction

Feb 08, 2025

Shengbin Yue, Ting Huang, Zheng Jia, Siyuan Wang, Shujun Liu, Yun Song, Xuanjing Huang, Zhongyu Wei

Abstract:Large Language Models (LLMs) have significantly advanced legal intelligence, but the scarcity of scenario data impedes the progress toward interactive legal scenarios. This paper introduces a Multi-agent Legal Simulation Driver (MASER) to scalably generate synthetic data by simulating interactive legal scenarios. Leveraging real-legal case sources, MASER ensures the consistency of legal attributes between participants and introduces a supervisory mechanism to align participants' characters and behaviors as well as addressing distractions. A Multi-stage Interactive Legal Evaluation (MILE) benchmark is further constructed to evaluate LLMs' performance in dynamic legal scenarios. Extensive experiments confirm the effectiveness of our framework.

* Accepted by NAACL 2025

Via

Access Paper or Ask Questions

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning

Apr 12, 2024

Hongqiao Lian, Zeyuan Ma, Hongshu Guo, Ting Huang, Yue-Jiao Gong

Abstract:Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent for flexibly adjusting individual-level searching strategies to match the up-to-date optimization status, hence boosting the search performance on MMOP. Concretely, we encode landscape properties and evolution path information into each individual and then leverage attention networks to advance population information sharing. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. The experimental results on the CEC2013 MMOP benchmark underscore the competitive optimization performance of RLEMMO against several strong baselines.

* Accepted as full paper at GECCO 2024

Via

Access Paper or Ask Questions

InternLM2 Technical Report

Mar 26, 2024

Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu(+90 more)

Abstract:The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution.

Via

Access Paper or Ask Questions

Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization

May 28, 2019

Ting Huang, Gehui Shen, Zhi-Hong Deng

Figure 1 for Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization

Figure 2 for Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization

Figure 3 for Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization

Figure 4 for Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization

Abstract:Recurrent Neural Networks (RNNs) are widely used in the field of natural language processing (NLP), ranging from text categorization to question answering and machine translation. However, RNNs generally read the whole text from beginning to end or vice versa sometimes, which makes it inefficient to process long texts. When reading a long document for a categorization task, such as topic categorization, large quantities of words are irrelevant and can be skipped. To this end, we propose Leap-LSTM, an LSTM-enhanced model which dynamically leaps between words while reading texts. At each step, we utilize several feature encoders to extract messages from preceding texts, following texts and the current word, and then determine whether to skip the current word. We evaluate Leap-LSTM on several text categorization tasks: sentiment analysis, news categorization, ontology classification and topic classification, with five benchmark data sets. The experimental results show that our model reads faster and predicts better than standard LSTM. Compared to previous models which can also skip words, our model achieves better trade-offs between performance and efficiency.

* Accepted by IJCAI 2019, 7 pages, 3 figures

Via

Access Paper or Ask Questions

Learning to Compose over Tree Structures via POS Tags

Aug 21, 2018

Gehui Shen, Zhi-Hong Deng, Ting Huang, Xi Chen

Figure 1 for Learning to Compose over Tree Structures via POS Tags

Figure 2 for Learning to Compose over Tree Structures via POS Tags

Figure 3 for Learning to Compose over Tree Structures via POS Tags

Figure 4 for Learning to Compose over Tree Structures via POS Tags

Abstract:Recursive Neural Network (RecNN), a type of models which compose words or phrases recursively over syntactic tree structures, has been proven to have superior ability to obtain sentence representation for a variety of NLP tasks. However, RecNN is born with a thorny problem that a shared compositional function for each node of trees can't capture the complex semantic compositionality so that the expressive power of model is limited. In this paper, in order to address this problem, we propose Tag-Guided HyperRecNN/TreeLSTM (TG-HRecNN/TreeLSTM), which introduces hypernetwork into RecNNs to take as inputs Part-of-Speech (POS) tags of word/phrase and generate the semantic composition parameters dynamically. Experimental results on five datasets for two typical NLP tasks show proposed models both obtain significant improvement compared with RecNN and TreeLSTM consistently. Our TG-HTreeLSTM outperforms all existing RecNN-based models and achieves or is competitive with state-of-the-art on four sentence classification benchmarks. The effectiveness of our models is also demonstrated by qualitative analysis.

Via

Access Paper or Ask Questions

A Novel Framework for Recurrent Neural Networks with Enhancing Information Processing and Transmission between Units

Jun 02, 2018

Xi Chen, Zhihong Deng, Gehui Shen, Ting Huang

Figure 1 for A Novel Framework for Recurrent Neural Networks with Enhancing Information Processing and Transmission between Units

Figure 2 for A Novel Framework for Recurrent Neural Networks with Enhancing Information Processing and Transmission between Units

Figure 3 for A Novel Framework for Recurrent Neural Networks with Enhancing Information Processing and Transmission between Units

Figure 4 for A Novel Framework for Recurrent Neural Networks with Enhancing Information Processing and Transmission between Units

Abstract:This paper proposes a novel framework for recurrent neural networks (RNNs) inspired by the human memory models in the field of cognitive neuroscience to enhance information processing and transmission between adjacent RNNs' units. The proposed framework for RNNs consists of three stages that is working memory, forget, and long-term store. The first stage includes taking input data into sensory memory and transferring it to working memory for preliminary treatment. And the second stage mainly focuses on proactively forgetting the secondary information rather than the primary in the working memory. And finally, we get the long-term store normally using some kind of RNN's unit. Our framework, which is generalized and simple, is evaluated on 6 datasets which fall into 3 different tasks, corresponding to text classification, image classification and language modelling. Experiments reveal that our framework can obviously improve the performance of traditional recurrent neural networks. And exploratory task shows the ability of our framework of correctly forgetting the secondary information.

Via

Access Paper or Ask Questions

On unbiased performance evaluation for protein inference

Nov 29, 2012

Zengyou He, Ting Huang, Peijun Zhu

Figure 1 for On unbiased performance evaluation for protein inference

Figure 2 for On unbiased performance evaluation for protein inference

Abstract:This letter is a response to the comments of Serang (2012) on Huang and He (2012) in Bioinformatics. Serang (2012) claimed that the parameters for the Fido algorithm should be specified using the grid search method in Serang et al. (2010) so as to generate a deserved accuracy in performance comparison. It seems that it is an argument on parameter tuning. However, it is indeed the issue of how to conduct an unbiased performance evaluation for comparing different protein inference algorithms. In this letter, we would explain why we don't use the grid search for parameter selection in Huang and He (2012) and show that this procedure may result in an over-estimated performance that is unfair to competing algorithms. In fact, this issue has also been pointed out by Li and Radivojac (2012).

Via

Access Paper or Ask Questions