Renmin University of China
Abstract:In this study, we investigate the resource management challenges in next-generation mobile crowdsensing networks with the goal of minimizing task completion latency while ensuring coverage performance, i.e., an essential metric to ensure comprehensive data collection across the monitored area, yet it has been commonly overlooked in existing studies. To this end, we formulate a weighted latency and coverage gap minimization problem via jointly optimizing user selection, subchannel allocation, and sensing task allocation. The formulated minimization problem is a non-convex mixed-integer programming issue. To facilitate the analysis, we decompose the original optimization problem into two subproblems. One focuses on optimizing sensing task and subband allocation under fixed sensing user selection, which is optimally solved by the Hungarian algorithm via problem reformulation. Building upon these findings, we introduce a time-efficient two-sided swapping method to refine the scheduled user set and enhance system performance. Extensive numerical results demonstrate the effectiveness of our proposed approach compared to various benchmark strategies.
Abstract:Recent Large Reasoning Models (LRMs), such as DeepSeek-R1 and OpenAI o1, have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference. However, a growing concern lies in their tendency to produce excessively long reasoning traces, which are often filled with redundant content (e.g., repeated definitions), over-analysis of simple problems, and superficial exploration of multiple reasoning paths for harder tasks. This inefficiency introduces significant challenges for training, inference, and real-world deployment (e.g., in agent-based systems), where token economy is critical. In this survey, we provide a comprehensive overview of recent efforts aimed at improving reasoning efficiency in LRMs, with a particular focus on the unique challenges that arise in this new paradigm. We identify common patterns of inefficiency, examine methods proposed across the LRM lifecycle, i.e., from pretraining to inference, and discuss promising future directions for research. To support ongoing development, we also maintain a real-time GitHub repository tracking recent progress in the field. We hope this survey serves as a foundation for further exploration and inspires innovation in this rapidly evolving area.
Abstract:Deepfake detection is a long-established research topic vital for mitigating the spread of malicious misinformation. Unlike prior methods that provide either binary classification results or textual explanations separately, we introduce a novel method capable of generating both simultaneously. Our method harnesses the multi-modal learning capability of the pre-trained CLIP and the unprecedented interpretability of large language models (LLMs) to enhance both the generalization and explainability of deepfake detection. Specifically, we introduce a multi-modal face forgery detector (M2F2-Det) that employs tailored face forgery prompt learning, incorporating the pre-trained CLIP to improve generalization to unseen forgeries. Also, M2F2-Det incorporates an LLM to provide detailed textual explanations of its detection decisions, enhancing interpretability by bridging the gap between natural language and subtle cues of facial forgeries. Empirically, we evaluate M2F2-Det on both detection and explanation generation tasks, where it achieves state-of-the-art performance, demonstrating its effectiveness in identifying and explaining diverse forgeries.
Abstract:Model editing aims at selectively updating a small subset of a neural model's parameters with an interpretable strategy to achieve desired modifications. It can significantly reduce computational costs to adapt to large language models (LLMs). Given its ability to precisely target critical components within LLMs, model editing shows great potential for efficient fine-tuning applications. In this work, we investigate model editing to serve an efficient method for adapting LLMs to solve aspect-based sentiment classification. Through causal interventions, we trace and determine which neuron hidden states are essential for the prediction of the model. By performing interventions and restorations on each component of an LLM, we identify the importance of these components for aspect-based sentiment classification. Our findings reveal that a distinct set of mid-layer representations is essential for detecting the sentiment polarity of given aspect words. Leveraging these insights, we develop a model editing approach that focuses exclusively on these critical parts of the LLM, leading to a more efficient method for adapting LLMs. Our in-domain and out-of-domain experiments demonstrate that this approach achieves competitive results compared to the currently strongest methods with significantly fewer trainable parameters, highlighting a more efficient and interpretable fine-tuning strategy.
Abstract:The rapid advancement of large language models (LLMs) has significantly improved their performance in code generation tasks. However, existing code benchmarks remain static, consisting of fixed datasets with predefined problems. This makes them vulnerable to memorization during training, where LLMs recall specific test cases instead of generalizing to new problems, leading to data contamination and unreliable evaluation results. To address these issues, we introduce DynaCode, a dynamic, complexity-aware benchmark that overcomes the limitations of static datasets. DynaCode evaluates LLMs systematically using a complexity-aware metric, incorporating both code complexity and call-graph structures. DynaCode achieves large-scale diversity, generating up to 189 million unique nested code problems across four distinct levels of code complexity, referred to as units, and 16 types of call graphs. Results on 12 latest LLMs show an average performance drop of 16.8% to 45.7% compared to MBPP+, a static code generation benchmark, with performance progressively decreasing as complexity increases. This demonstrates DynaCode's ability to effectively differentiate LLMs. Additionally, by leveraging call graphs, we gain insights into LLM behavior, particularly their preference for handling subfunction interactions within nested code.
Abstract:Large Language Models (LLMs) are increasingly utilized in scientific research assessment, particularly in automated paper review. However, existing LLM-based review systems face significant challenges, including limited domain expertise, hallucinated reasoning, and a lack of structured evaluation. To address these limitations, we introduce DeepReview, a multi-stage framework designed to emulate expert reviewers by incorporating structured analysis, literature retrieval, and evidence-based argumentation. Using DeepReview-13K, a curated dataset with structured annotations, we train DeepReviewer-14B, which outperforms CycleReviewer-70B with fewer tokens. In its best mode, DeepReviewer-14B achieves win rates of 88.21\% and 80.20\% against GPT-o1 and DeepSeek-R1 in evaluations. Our work sets a new benchmark for LLM-based paper review, with all resources publicly available. The code, model, dataset and demo have be released in http://ai-researcher.net.
Abstract:This paper investigates joint location and velocity estimation, along with their fundamental performance bounds analysis, in a cell-free multi-input multi-output (MIMO) integrated sensing and communication (ISAC) system. First, unlike existing studies that derive likelihood functions for target parameter estimation using continuous received signals, we formulate the maximum likelihood estimation (MLE) for radar sensing based on discrete received signals at a given sampling rate. Second, leveraging the proposed MLEs, we derive closed-form Cramer-Rao lower bounds (CRLBs) for joint location and velocity estimation in both single-target and multiple-target scenarios. Third, to enhance computational efficiency, we propose approximate CRLBs and conduct an in-depth accuracy analysis. Additionally, we thoroughly examine the impact of sampling rate, squared effective bandwidth, and time width on CRLB performance. For multiple-target scenarios, the concepts of safety distance and safety velocity are introduced to characterize conditions under which the CRLBs for multiple targets converge to their single target counterparts. Finally, extensive simulations are conducted to verify the accuracy of the proposed CRLBs and the theoretical results using state-of-the-art waveforms, namely orthogonal frequency division multiplexing (OFDM) and orthogonal chirp division multiplexing (OCDM).
Abstract:Compressed sensing (CS)-based techniques have been widely applied in the grant-free non-orthogonal multiple access (NOMA) to a single-antenna base station (BS). In this paper, we consider the multi-antenna reception at the BS for uplink grant-free access for the massive machine type communication (mMTC) with limited channel resources. To enhance the overloading performance of the BS, we develop a general framework for the synergistic amalgamation of the spatial division multiple access (SDMA) technique with the CS-based grant-free NOMA. We derive a closed-form statistical beamforming and a dynamic beamforming scheme for the inter-cluster interference suppression when applying SDMA. Based on this, we further develop a joint adaptive beamforming and subspace pursuit (JABF-SP) algorithm for the multiuser detection and data recovery, with a novel sparsity level decision method without the accurate knowledge of the noise level. To further improve the data recovery performance, we propose an interference cancellation based J-ABF-SP scheme (J-ABF-SP-IC) by using the initial signal estimates generated from the J-ABF-SP algorithm. Illustrative simulations verify the superior user detection and signal recovery performance of our proposed algorithms in comparison with existing CS-based grant-free NOMA techniques.
Abstract:Large language models (LLMs) have achieved remarkable success in machine translation, demonstrating impressive performance across diverse languages. However, translationese, characterized by overly literal and unnatural translations, remains a persistent challenge in LLM-based translation systems. Despite their pre-training on vast corpora of natural utterances, LLMs exhibit translationese errors and generate unexpected unnatural translations, stemming from biases introduced during supervised fine-tuning (SFT). In this work, we systematically evaluate the prevalence of translationese in LLM-generated translations and investigate its roots during supervised training. We introduce methods to mitigate these biases, including polishing golden references and filtering unnatural training instances. Empirical evaluations demonstrate that these approaches significantly reduce translationese while improving translation naturalness, validated by human evaluations and automatic metrics. Our findings highlight the need for training-aware adjustments to optimize LLM translation outputs, paving the way for more fluent and target-language-consistent translations. We release the data and code at https://github.com/yafuly/LLM_Translationese.
Abstract:In the multi-turn interaction schema, large language models (LLMs) can leverage user feedback to enhance the quality and relevance of their responses. However, evaluating an LLM's ability to incorporate user refutation feedback is crucial yet challenging. In this study, we introduce RefuteBench 2.0, which significantly extends the original RefuteBench by incorporating LLM agents as refuters and evaluators, which allows for flexible and comprehensive assessment. We design both transient and persistent refutation instructions with different validity periods. Meta-evaluation shows that the LLM-based refuter could generate more human-like refutations and the evaluators could assign scores with high correlation with humans. Experimental results of various LLMs show that current models could effectively satisfy the refutation but fail to memorize the refutation information. Interestingly, we also observe that the performance of the initial task decreases as the refutations increase. Analysis of the attention scores further shows a potential weakness of current LLMs: they struggle to retain and correctly use previous information during long context dialogues. https://github.com/ElliottYan/RefuteBench-2.0