Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiran Ma

What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning

Dec 20, 2024

Yiran Ma, Zui Chen, Tianqiao Liu, Mi Tian, Zhuo Liu, Zitao Liu, Weiqi Luo

Abstract:Step-level reward models (SRMs) can significantly enhance mathematical reasoning performance through process supervision or step-level preference alignment based on reinforcement learning. The performance of SRMs is pivotal, as they serve as critical guidelines, ensuring that each step in the reasoning process is aligned with desired outcomes. Recently, AlphaZero-like methods, where Monte Carlo Tree Search (MCTS) is employed for automatic step-level preference annotation, have proven particularly effective. However, the precise mechanisms behind the success of SRMs remain largely unexplored. To address this gap, this study delves into the counterintuitive aspects of SRMs, particularly focusing on MCTS-based approaches. Our findings reveal that the removal of natural language descriptions of thought processes has minimal impact on the efficacy of SRMs. Furthermore, we demonstrate that SRMs are adept at assessing the complex logical coherence present in mathematical language while having difficulty in natural language. These insights provide a nuanced understanding of the core elements that drive effective step-level reward modeling in mathematical reasoning. By shedding light on these mechanisms, this study offers valuable guidance for developing more efficient and streamlined SRMs, which can be achieved by focusing on the crucial parts of mathematical reasoning.

* AAAI 2025

Via

Access Paper or Ask Questions

From Barriers to Tactics: A Behavioral Science-Informed Agentic Workflow for Personalized Nutrition Coaching

Oct 17, 2024

Eric Yang, Tomas Garcia, Hannah Williams, Bhawesh Kumar, Martin Ramé, Eileen Rivera, Yiran Ma, Jonathan Amar, Caricia Catalani, Yugang Jia

Abstract:Effective management of cardiometabolic conditions requires sustained positive nutrition habits, often hindered by complex and individualized barriers. Direct human management is simply not scalable, while previous attempts aimed at automating nutrition coaching lack the personalization needed to address these diverse challenges. This paper introduces a novel LLM-powered agentic workflow designed to provide personalized nutrition coaching by directly targeting and mitigating patient-specific barriers. Grounded in behavioral science principles, the workflow leverages a comprehensive mapping of nutrition-related barriers to corresponding evidence-based strategies. A specialized LLM agent intentionally probes for and identifies the root cause of a patient's dietary struggles. Subsequently, a separate LLM agent delivers tailored tactics designed to overcome those specific barriers with patient context. We designed and validated our approach through a user study with individuals with cardiometabolic conditions, demonstrating the system's ability to accurately identify barriers and provide personalized guidance. Furthermore, we conducted a large-scale simulation study, grounding on real patient vignettes and expert-validated metrics, to evaluate the system's performance across a wide range of scenarios. Our findings demonstrate the potential of this LLM-powered agentic workflow to improve nutrition coaching by providing personalized, scalable, and behaviorally-informed interventions.

* 22 pages

Via

Access Paper or Ask Questions

A Transponder Aggregator with Efficient Use of Filtering Function for Transponder Noise Suppression

Oct 03, 2022

Kenya Suzuki, Osamu Moriwaki, Koichi Hadama, Keita Yamaguchi, Hiroki Taniguchi, Yoshiaki Kisaka, Daisuke Ogawa, Makoto Takeshita, Stefano Camatel, Yiran Ma(+1 more)

Figure 1 for A Transponder Aggregator with Efficient Use of Filtering Function for Transponder Noise Suppression

Figure 2 for A Transponder Aggregator with Efficient Use of Filtering Function for Transponder Noise Suppression

Figure 3 for A Transponder Aggregator with Efficient Use of Filtering Function for Transponder Noise Suppression

Figure 4 for A Transponder Aggregator with Efficient Use of Filtering Function for Transponder Noise Suppression

Abstract:Colorless, directionless, and contentionless reconfigurable optical add/drop multiplexing (CDC-ROADM) provides highly flexible physical layer network configuration. Such CDC-ROADM must operate in multiple wavelength bands which are being increasingly implemented in optical transmission systems. The operation in C+L bands requires switch devices used in CDC-ROADM to also be capable of multiband operation. Recent studies on wavelength division multiplexing (WDM) systems have pointed out the impact of amplified spontaneous emission (ASE) noise generated by signals of different wavelengths, which causes OSNR degradation. Therefore, it is desirable to filter out the ASE noise from different transponders when multiplexing multiple wavelengths at the transmitter side, especially in a system with non-wavelength selective combiners such as directional couplers and multicast switches. The use of transponder aggregators with filtering functions, such as the M x N wavelength selective switch (WSS), is preferable for this filtering. However, the downside of these devices is that it is difficult to provide economical multiband support. Therefore, we propose an economical transponder aggregator configuration by allowing a certain amount of ASE superposition and reducing the number of filtering functions. In this paper, we fabricated a prototype of the proposed transponder aggregator by combining silica-based planar lightwave circuit technology and C+L band WSS, both commercially available, and verified its feasibility through transmission experiments. The novel transponder aggregator is a practical solution for a multiband CDC-ROADM system with improved OSNR performance.

* 10 pages, 11 figures. Submitted to IEEE Journal of Lightwave Technology for possible publication

Via

Access Paper or Ask Questions

First demonstration of C + L band CDCROADM with simple node configuration using multiband switching devices

Jun 04, 2021

Shuto Yamamoto, Hiroki Taniguchi, Yoshiaki Kisaka, Stefano Camatel, Yiran Ma, Daisuke Ogawa, Koichi Hadama, Mitsunori Fukutoku, Takashi Goh, Kenya Suzuki

Figure 1 for First demonstration of C + L band CDCROADM with simple node configuration using multiband switching devices

Figure 2 for First demonstration of C + L band CDCROADM with simple node configuration using multiband switching devices

Figure 3 for First demonstration of C + L band CDCROADM with simple node configuration using multiband switching devices

Figure 4 for First demonstration of C + L band CDCROADM with simple node configuration using multiband switching devices

Abstract:While ultrahigh-baud-rate optical signals are effective for extending the transmission distance of large capacity signals, they also reduce the number of wavelengths that can be arranged in a band because of their wider bandwidth. This reduces the flexibility of optical path configuration in reconfigurable optical add/drop multiplexing (ROADM) networks. In colorless, directionless and contentionless (CDC)-ROADM in particular, the effect reduces the add/drop ratio at a node. Multiband ROADM systems are an effective countermeasure for overcoming this issue, but they make the node configuration more complicated and its operation more difficult. In this paper, we analyze the challenges of C + L band CDC-ROADM and show that optical switch devices that operate over multiple bands are effective in meeting them. For this purpose, we built a C + L band CDC-ROADM node based on C + L band wavelength selective switches (WSSs) and multicast switches (MCSs) and confirmed its effectiveness experimentally. In particular, to simplify the node configuration, we propose a reduction in the number of optical amplifiers used for node loss compensation and experimentally verify its feasibility.

* 10 pages, 9 figures

Via

Access Paper or Ask Questions