Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ling Feng

Multiple Descents in Deep Learning as a Sequence of Order-Chaos Transitions

May 26, 2025

Wenbo Wei, Nicholas Chong Jia Le, Choy Heng Lai, Ling Feng

Abstract:We observe a novel 'multiple-descent' phenomenon during the training process of LSTM, in which the test loss goes through long cycles of up and down trend multiple times after the model is overtrained. By carrying out asymptotic stability analysis of the models, we found that the cycles in test loss are closely associated with the phase transition process between order and chaos, and the local optimal epochs are consistently at the critical transition point between the two phases. More importantly, the global optimal epoch occurs at the first transition from order to chaos, where the 'width' of the 'edge of chaos' is the widest, allowing the best exploration of better weight configurations for learning.

Via

Access Paper or Ask Questions

AdaptThink: Reasoning Models Can Learn When to Think

May 19, 2025

Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li

Abstract:Recently, large reasoning models have achieved impressive performance on various tasks by employing human-like deep thinking. However, the lengthy thinking process substantially increases inference overhead, making efficiency a critical bottleneck. In this work, we first demonstrate that NoThinking, which prompts the reasoning model to skip thinking and directly generate the final solution, is a better choice for relatively simple tasks in terms of both performance and efficiency. Motivated by this, we propose AdaptThink, a novel RL algorithm to teach reasoning models to choose the optimal thinking mode adaptively based on problem difficulty. Specifically, AdaptThink features two core components: (1) a constrained optimization objective that encourages the model to choose NoThinking while maintaining the overall performance; (2) an importance sampling strategy that balances Thinking and NoThinking samples during on-policy training, thereby enabling cold start and allowing the model to explore and exploit both thinking modes throughout the training process. Our experiments indicate that AdaptThink significantly reduces the inference costs while further enhancing performance. Notably, on three math datasets, AdaptThink reduces the average response length of DeepSeek-R1-Distill-Qwen-1.5B by 53% and improves its accuracy by 2.4%, highlighting the promise of adaptive thinking-mode selection for optimizing the balance between reasoning quality and efficiency. Our codes and models are available at https://github.com/THU-KEG/AdaptThink.

Via

Access Paper or Ask Questions

MISE: Meta-knowledge Inheritance for Social Media-Based Stressor Estimation

May 03, 2025

Xin Wang, Ling Feng, Huijun Zhang, Lei Cao, Kaisheng Zeng, Qi Li, Yang Ding, Yi Dai, David Clifton

Abstract:Stress haunts people in modern society, which may cause severe health issues if left unattended. With social media becoming an integral part of daily life, leveraging social media to detect stress has gained increasing attention. While the majority of the work focuses on classifying stress states and stress categories, this study introduce a new task aimed at estimating more specific stressors (like exam, writing paper, etc.) through users' posts on social media. Unfortunately, the diversity of stressors with many different classes but a few examples per class, combined with the consistent arising of new stressors over time, hinders the machine understanding of stressors. To this end, we cast the stressor estimation problem within a practical scenario few-shot learning setting, and propose a novel meta-learning based stressor estimation framework that is enhanced by a meta-knowledge inheritance mechanism. This model can not only learn generic stressor context through meta-learning, but also has a good generalization ability to estimate new stressors with little labeled data. A fundamental breakthrough in our approach lies in the inclusion of the meta-knowledge inheritance mechanism, which equips our model with the ability to prevent catastrophic forgetting when adapting to new stressors. The experimental results show that our model achieves state-of-the-art performance compared with the baselines. Additionally, we construct a social media-based stressor estimation dataset that can help train artificial intelligence models to facilitate human well-being. The dataset is now public at \href{https://www.kaggle.com/datasets/xinwangcs/stressor-cause-of-mental-health-problem-dataset}{\underline{Kaggle}} and \href{https://huggingface.co/datasets/XinWangcs/Stressor}{\underline{Hugging Face}}.

* WWW2025, Oral Presentation

Via

Access Paper or Ask Questions

DuckSegmentation: A segmentation model based on the AnYue Hemp Duck Dataset

Mar 27, 2025

Ling Feng, Tianyu Xie, Wei Ma, Ruijie Fu, Yingxiao Zhang, Jun Li, Bei Zhou

Abstract:The modernization of smart farming is a way to improve agricultural production efficiency, and improve the agricultural production environment. Although many large models have achieved high accuracy in the task of object recognition and segmentation, they cannot really be put into use in the farming industry due to their own poor interpretability and limitations in computational volume. In this paper, we built AnYue Shelduck Dateset, which contains a total of 1951 Shelduck datasets, and performed target detection and segmentation annotation with the help of professional annotators. Based on AnYue ShelduckDateset, this paper describes DuckProcessing, an efficient and powerful module for duck identification based on real shelduckfarms. First of all, using the YOLOv8 module designed to divide the mahjong between them, Precision reached 98.10%, Recall reached 96.53% and F1 score reached 0.95 on the test set. Again using the DuckSegmentation segmentation model, DuckSegmentation reached 96.43% mIoU. Finally, the excellent DuckSegmentation was used as the teacher model, and through knowledge distillation, Deeplabv3 r50 was used as the student model, and the final student model achieved 94.49% mIoU on the test set. The method provides a new way of thinking in practical sisal duck smart farming.

Via

Access Paper or Ask Questions

Bezier Distillation

Mar 20, 2025

Ling Feng, SK Yang

Abstract:In Rectified Flow, by obtaining the rectified flow several times, the mapping relationship between distributions can be distilled into a neural network, and the target distribution can be directly predicted by the straight lines of the flow. However, during the pairing process of the mapping relationship, a large amount of error accumulation will occur, resulting in a decrease in performance after multiple rectifications. In the field of flow models, knowledge distillation of multi - teacher diffusion models is also a problem worthy of discussion in accelerating sampling. I intend to combine multi - teacher knowledge distillation with Bezier curves to solve the problem of error accumulation. Currently, the related paper is being written by myself.

Via

Access Paper or Ask Questions

LongReward: Improving Long-context Large Language Models with AI Feedback

Oct 28, 2024

Jiajie Zhang, Zhongni Hou, Xin Lv, Shulin Cao, Zhenyu Hou, Yilin Niu, Lei Hou, Yuxiao Dong, Ling Feng, Juanzi Li

Abstract:Though significant advancements have been achieved in developing long-context large language models (LLMs), the compromised quality of LLM-synthesized data for supervised fine-tuning (SFT) often affects the long-context performance of SFT models and leads to inherent limitations. In principle, reinforcement learning (RL) with appropriate reward signals can further enhance models' capacities. However, how to obtain reliable rewards in long-context scenarios remains unexplored. To this end, we propose LongReward, a novel method that utilizes an off-the-shelf LLM to provide rewards for long-context model responses from four human-valued dimensions: helpfulness, logicality, faithfulness, and completeness, each with a carefully designed assessment pipeline. By combining LongReward and offline RL algorithm DPO, we are able to effectively improve long-context SFT models. Our experiments indicate that LongReward not only significantly improves models' long-context performance but also enhances their ability to follow short instructions. We also find that long-context DPO with LongReward and conventional short-context DPO can be used together without hurting either one's performance.

Via

Access Paper or Ask Questions

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Sep 04, 2024

jiajie Zhang, Yushi Bai, Xin Lv, Wanjun Gu, Danqing Liu, Minhao Zou, Shulin Cao, Lei Hou, Yuxiao Dong, Ling Feng(+1 more)

Figure 1 for LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Figure 2 for LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Figure 3 for LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Figure 4 for LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Abstract:Though current long-context large language models (LLMs) have demonstrated impressive capacities in answering user questions based on extensive text, the lack of citations in their responses makes user verification difficult, leading to concerns about their trustworthiness due to their potential hallucinations. In this work, we aim to enable long-context LLMs to generate responses with fine-grained sentence-level citations, improving their faithfulness and verifiability. We first introduce LongBench-Cite, an automated benchmark for assessing current LLMs' performance in Long-Context Question Answering with Citations (LQAC), revealing considerable room for improvement. To this end, we propose CoF (Coarse to Fine), a novel pipeline that utilizes off-the-shelf LLMs to automatically generate long-context QA instances with precise sentence-level citations, and leverage this pipeline to construct LongCite-45k, a large-scale SFT dataset for LQAC. Finally, we train LongCite-8B and LongCite-9B using the LongCite-45k dataset, successfully enabling their generation of accurate responses and fine-grained sentence-level citations in a single output. The evaluation results on LongBench-Cite show that our trained models achieve state-of-the-art citation quality, surpassing advanced proprietary models including GPT-4o.

Via

Access Paper or Ask Questions

KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases

Feb 02, 2024

Jiajie Zhang, Shulin Cao, Linmei Hu, Ling Feng, Lei Hou, Juanzi Li

Figure 1 for KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases

Figure 2 for KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases

Figure 3 for KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases

Figure 4 for KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases

Abstract:Program induction (PI) has become a promising paradigm for using knowledge bases (KBs) to help large language models (LLMs) answer complex knowledge-intensive questions. Nonetheless, PI typically relies on a large number of parallel question-program pairs to make the LLM aware of the schema of the given KB, and is thus challenging for many low-resourced KBs that lack annotated data. To this end, we propose KB-Plugin, a plug-and-play framework that enables LLMs to induce programs over any low-resourced KB. Firstly, KB-Plugin adopts self-supervised learning to encode the detailed schema information of a given KB into a pluggable module, namely schema plugin. Secondly, KB-Plugin utilizes abundant annotated data from a rich-resourced KB to train another pluggable module, namely PI plugin, which can help the LLM extract question-relevant schema information from the schema plugin of any KB and utilize this information to induce programs over this KB. Experiments on five heterogeneous KBQA datasets show that KB-Plugin achieves better or comparable performance with 25$\times$ smaller backbone LLM compared to SoTA PI methods for low-resourced KBs, and even approaches the performance of supervised methods. Our code and data are available at https://github.com/THU-KEG/KB-Plugin.

Via

Access Paper or Ask Questions

Education distillation:getting student models to learn in shcools

Nov 27, 2023

Ling Feng, Danyang Li, Tianhao Wu, Xuliang Duan

Abstract:Knowledge distillation is one of the methods for model compression, and existing knowledge distillation techniques focus on how to improve the distillation algorithm so as to enhance the distillation efficiency. This paper introduces dynamic incremental learning into knowledge distillation and proposes a distillation strategy for education distillation. Specifically, it is proposed to take fragmented student models divided from the complete student model as lower-grade models. As the grade level rises, fragmented student models deepen in conjunction with designed teaching reference layers, while learning and distilling from more teacher models. By moving from lower to higher grades, fragmented student models were gradually integrated into a complete target student model, and the performance of the student models gradually improved from lower to higher grades of the stage. Education distillation strategies combined with distillation algorithms outperform the results of single distillation algorithms on the public dataset CIFAR100,Caltech256, Food-101 dataset.

Via

Access Paper or Ask Questions

Self-Organization Towards $1/f$ Noise in Deep Neural Networks

Jan 20, 2023

Nicholas Chong Jia Le, Ling Feng

Abstract:Despite $1/f$ noise being ubiquitous in both natural and artificial systems, no general explanations for the phenomenon have received widespread acceptance. One well-known system where $1/f$ noise has been observed in is the human brain, with this 'noise' proposed by some to be important to the healthy function of the brain. As deep neural networks (DNNs) are loosely modelled after the human brain, and as they start to achieve human-level performance in specific tasks, it might be worth investigating if the same $1/f$ noise is present in these artificial networks as well. Indeed, we find the existence of $1/f$ noise in DNNs - specifically Long Short-Term Memory (LSTM) networks modelled on real world dataset - by measuring the Power Spectral Density (PSD) of different activations within the network in response to a sequential input of natural language. This was done in analogy to the measurement of $1/f$ noise in human brains with techniques such as electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI). We further examine the exponent values in the $1/f$ noise in "inner" and "outer" activations in the LSTM cell, finding some resemblance in the variations of the exponents in the fMRI signal. In addition, comparing the values of the exponent at "rest" compared to when performing "tasks" of the LSTM network, we find a similar trend to that of the human brain where the exponent while performing tasks is less negative.

Via

Access Paper or Ask Questions