Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qianen Zhang

Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies

Jan 16, 2026

Qianen Zhang, Zeyu Yang, Satoshi Nakamura

Abstract:Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional policies with only READ/WRITE actions cannot fully address. We extend the action space of SiMT with four adaptive actions: Sentence_Cut, Drop, Partial_Summarization and Pronominalization, which enable real-time restructuring, omission, and simplification while preserving semantic fidelity. We adapt these actions in a large language model (LLM) framework and construct training references through action-aware prompting. To evaluate both quality and word-level monotonicity, we further develop a latency-aware TTS pipeline that maps textual outputs to speech with realistic timing. Experiments on the ACL60/60 English-Chinese, English-German and English-Japanese benchmarks show that our framework consistently improves semantic metrics and achieves lower delay compared to reference translations and salami-based baselines. Notably, combining Drop and Sentence_Cut leads to consistent improvements in the balance between fluency and latency. These results demonstrate that enriching the action space of LLM-based SiMT provides a promising direction for bridging the gap between human and machine interpretation.

* arXiv admin note: substantial text overlap with arXiv:2509.21801

Via

Access Paper or Ask Questions

DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Feb 11, 2025

Xiliang Yang, Feng Jiang, Qianen Zhang, Lei Zhao, Xiao Li

Figure 1 for DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Figure 2 for DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Figure 3 for DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Figure 4 for DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Abstract:Direct Preference Optimization (DPO) and its variants have become increasingly popular for aligning language models with human preferences. These methods aim to teach models to better distinguish between chosen (or preferred) and rejected (or dispreferred) responses. However, prior research has identified that the probability of chosen responses often decreases during training, and this phenomenon is known as likelihood displacement. To tackle this challenge, in this work we introduce \method to controllably shift the distribution of the chosen probability. Then, we show that \method exhibits a fundamental trade-off between improving the chosen probability and sacrificing the reward margin, as supported by both theoretical analysis and experimental validation. Furthermore, we demonstrate the superiority of \method over DPO on downstream tasks such as MT-Bench and a designed win rate experiment. We believe this study shows that the likelihood displacement issue of DPO can be effectively mitigated with a simple, theoretically grounded solution. Our code is available at https://github.com/Meaquadddd/DPO-Shift.

Via

Access Paper or Ask Questions