Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haotian Guo

Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study

Dec 24, 2025

Zhongren Dong, Haotian Guo, Weixiang Xu, Huan Zhao, Zixing Zhang

Abstract:Neuropsychiatric disorders, such as Alzheimer's disease (AD), depression, and autism spectrum disorder (ASD), are characterized by linguistic and acoustic abnormalities, offering potential biomarkers for early detection. Despite the promise of multi-modal approaches, challenges like multi-lingual generalization and the absence of a unified evaluation framework persist. To address these gaps, we propose FEND (Foundation model-based Evaluation of Neuropsychiatric Disorders), a comprehensive multi-modal framework integrating speech and text modalities for detecting AD, depression, and ASD across the lifespan. Leveraging 13 multi-lingual datasets spanning English, Chinese, Greek, French, and Dutch, we systematically evaluate multi-modal fusion performance. Our results show that multi-modal fusion excels in AD and depression detection but underperforms in ASD due to dataset heterogeneity. We also identify modality imbalance as a prevalent issue, where multi-modal fusion fails to surpass the best mono-modal models. Cross-corpus experiments reveal robust performance in task- and language-consistent scenarios but noticeable degradation in multi-lingual and task-heterogeneous settings. By providing extensive benchmarks and a detailed analysis of performance-influencing factors, FEND advances the field of automated, lifespan-inclusive, and multi-lingual neuropsychiatric disorder assessment. We encourage researchers to adopt the FEND framework for fair comparisons and reproducible research.

Via

Access Paper or Ask Questions

DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech

Jun 09, 2025

Haotian Guo, Jing Han, Yongfeng Tu, Shihao Gao, Shengfan Shen, Wulong Xiang, Weihao Gan, Zixing Zhang

Abstract:Despite extensive research on textual and visual disambiguation, disambiguation through speech (DTS) remains underexplored. This is largely due to the lack of high-quality datasets that pair spoken sentences with richly ambiguous text. To address this gap, we present DEBATE, a unique public Chinese speech-text dataset designed to study how speech cues and patterns-pronunciation, pause, stress and intonation-can help resolve textual ambiguity and reveal a speaker's true intent. DEBATE contains 1,001 carefully selected ambiguous utterances, each recorded by 10 native speakers, capturing diverse linguistic ambiguities and their disambiguation through speech. We detail the data collection pipeline and provide rigorous quality analysis. Additionally, we benchmark three state-of-the-art large speech and audio-language models, illustrating clear and huge performance gaps between machine and human understanding of spoken intent. DEBATE represents the first effort of its kind and offers a foundation for building similar DTS datasets across languages and cultures. The dataset and associated code are available at: https://github.com/SmileHnu/DEBATE.

Via

Access Paper or Ask Questions

GelSight FlexiRay: Breaking Planar Limits by Harnessing Large Deformations for Flexible,Full-Coverage Multimodal Sensing

Nov 28, 2024

Yanzhe Wang, Hao Wu, Haotian Guo, Huixu Dong

Figure 1 for GelSight FlexiRay: Breaking Planar Limits by Harnessing Large Deformations for Flexible,Full-Coverage Multimodal Sensing

Figure 2 for GelSight FlexiRay: Breaking Planar Limits by Harnessing Large Deformations for Flexible,Full-Coverage Multimodal Sensing

Figure 3 for GelSight FlexiRay: Breaking Planar Limits by Harnessing Large Deformations for Flexible,Full-Coverage Multimodal Sensing

Figure 4 for GelSight FlexiRay: Breaking Planar Limits by Harnessing Large Deformations for Flexible,Full-Coverage Multimodal Sensing

Abstract:The integration of tactile sensing into compliant soft robotic grippers offers a compelling pathway toward advanced robotic grasping and safer human-robot interactions. Visual-tactile sensors realize high-resolution, large-area tactile perception with affordable cameras. However, conventional visual-tactile sensors rely heavily on rigid forms, sacrificing finger compliance and sensing regions to achieve localized tactile feedback. Enabling seamless, large-area tactile sensing in soft grippers remains challenging, as deformations inherent to soft structures can obstruct the optical path and restrict the camera's field of view. To address these, we present Gelsight FlexiRay, a multimodal visual-tactile sensor designed for safe and compliant interactions with substantial structural deformation through integration with Finray Effect grippers. First, we adopt a multi-mirror configuration, which is systematically modeled and optimized based on the physical force-deformation characteristics of FRE grippers. Second, we enhanced Gelsight FlexiRay with human-like multimodal perception, including contact force and location, proprioception, temperature, texture, and slippage. Experiments demonstrate Gelsight FlexiRay's robust tactile performance across diverse deformation states, achieving a force measurement accuracy of 0.14 N and proprioceptive positioning accuracy of 0.19 mm. Compared with state of art compliant VTS, the FlexiRay demonstrates 5 times larger structural deformation under the same loads. Its expanded sensing area and ability to distinguish contact information and execute grasping and classification tasks highlights its potential for versatile, large-area multimodal tactile sensing integration within soft robotic systems. This work establishes a foundation for flexible, high-resolution tactile sensing in compliant robotic applications.

* 14 pages, 8 figures

Via

Access Paper or Ask Questions

Debatts: Zero-Shot Debating Text-to-Speech Synthesis

Nov 10, 2024

Yiqiao Huang, Yuancheng Wang, Jiaqi Li, Haotian Guo, Haorui He, Shunsi Zhang, Zhizheng Wu

Figure 1 for Debatts: Zero-Shot Debating Text-to-Speech Synthesis

Figure 2 for Debatts: Zero-Shot Debating Text-to-Speech Synthesis

Figure 3 for Debatts: Zero-Shot Debating Text-to-Speech Synthesis

Figure 4 for Debatts: Zero-Shot Debating Text-to-Speech Synthesis

Abstract:In debating, rebuttal is one of the most critical stages, where a speaker addresses the arguments presented by the opposing side. During this process, the speaker synthesizes their own persuasive articulation given the context from the opposing side. This work proposes a novel zero-shot text-to-speech synthesis system for rebuttal, namely Debatts. Debatts takes two speech prompts, one from the opposing side (i.e. opponent) and one from the speaker. The prompt from the opponent is supposed to provide debating style prosody, and the prompt from the speaker provides identity information. In particular, we pretrain the Debatts system from in-the-wild dataset, and integrate an additional reference encoder to take debating prompt for style. In addition, we also create a debating dataset to develop Debatts. In this setting, Debatts can generate a debating-style speech in rebuttal for any voices. Experimental results confirm the effectiveness of the proposed system in comparison with the classic zero-shot TTS systems.

Via

Access Paper or Ask Questions

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

Sep 01, 2024

Yuancheng Wang, Haoyue Zhan, Liwei Liu, Ruihong Zeng, Haotian Guo, Jiachen Zheng, Qiang Zhang, Shunsi Zhang, Zhizheng Wu

Abstract:Nowadays, large-scale text-to-speech (TTS) systems are primarily divided into two types: autoregressive and non-autoregressive. The autoregressive systems have certain deficiencies in robustness and cannot control speech duration. In contrast, non-autoregressive systems require explicit prediction of phone-level duration, which may compromise their naturalness. We introduce the Masked Generative Codec Transformer (MaskGCT), a fully non-autoregressive model for TTS that does not require precise alignment information between text and speech. MaskGCT is a two-stage model: in the first stage, the model uses text to predict semantic tokens extracted from a speech self-supervised learning (SSL) model, and in the second stage, the model predicts acoustic tokens conditioned on these semantic tokens. MaskGCT follows the \textit{mask-and-predict} learning paradigm. During training, MaskGCT learns to predict masked semantic or acoustic tokens based on given conditions and prompts. During inference, the model generates tokens of a specified length in a parallel manner. We scale MaskGCT to a large-scale multilingual dataset with 100K hours of in-the-wild speech. Our experiments demonstrate that MaskGCT achieves superior or competitive performance compared to state-of-the-art zero-shot TTS systems in terms of quality, similarity, and intelligibility while offering higher generation efficiency than diffusion-based or autoregressive TTS models. Audio samples are available at https://maskgct.github.io.

Via

Access Paper or Ask Questions

The Stiffness of 3-PRS PM Across Parasitic and Orientational Workspace

May 14, 2024

Hassen Nigatu, Li Jihao, Keqi Zhu, Junhan Zhang, Haotian Guo, Guodong Lu, Doik Kim

Abstract:This study investigates the stiffness characteristics of the Sprint Z3 head, also known as 3-PRS Parallel Kinematics Machines, which are among the most extensively researched and viably successful manipulators for precision machining applications. Despite the wealth of research on these robotic manipulators, no previous work has demonstrated their stiffness performance within the parasitic motion space. Such an undesired motion influences their stiffness properties, as stiffness is configuration-dependent. Addressing this gap, this paper develops a stiffness model that accounts for both the velocity-level parasitic motion space and the regular workspace. Numerical simulations are provided to illustrate the stiffness characteristics of the manipulator across all considered spaces. The results indicate that the stiffness profile within the parasitic motion space is both shallower and the values are smaller when compared to the stiffness distribution across the orientation workspace. This implies that evaluating a manipulator's performance adequately requires assessing its ability to resist external loads during parasitic motion. Therefore, comprehending this aspect is crucial for redesigning components to enhance overall stiffness.

* arXiv admin note: text overlap with arXiv:2404.18575

Via

Access Paper or Ask Questions

Under-actuated Robotic Gripper with Multiple Grasping Modes Inspired by Human Finger

Mar 19, 2024

Jihao Li, Tingbo Liao, Hassen Nigatu, Haotian Guo, Guodong Lu, Huixu Dong

Figure 1 for Under-actuated Robotic Gripper with Multiple Grasping Modes Inspired by Human Finger

Figure 2 for Under-actuated Robotic Gripper with Multiple Grasping Modes Inspired by Human Finger

Figure 3 for Under-actuated Robotic Gripper with Multiple Grasping Modes Inspired by Human Finger

Figure 4 for Under-actuated Robotic Gripper with Multiple Grasping Modes Inspired by Human Finger

Abstract:Under-actuated robot grippers as a pervasive tool of robots have become a considerable research focus. Despite their simplicity of mechanical design and control strategy, they suffer from poor versatility and weak adaptability, making widespread applications limited. To better relieve relevant research gaps, we present a novel 3-finger linkage-based gripper that realizes retractable and reconfigurable multi-mode grasps driven by a single motor. Firstly, inspired by the changes that occurred in the contact surface with a human finger moving, we artfully design a slider-slide rail mechanism as the phalanx to achieve retraction of each finger, allowing for better performance in the enveloping grasping mode. Secondly, a reconfigurable structure is constructed to broaden the grasping range of objects' dimensions for the proposed gripper. By adjusting the configuration and gesture of each finger, the gripper can achieve five grasping modes. Thirdly, the proposed gripper is just actuated by a single motor, yet it can be capable of grasping and reconfiguring simultaneously. Finally, various experiments on grasps of slender, thin, and large-volume objects are implemented to evaluate the performance of the proposed gripper in practical scenarios, which demonstrates the excellent grasping capabilities of the gripper.

* 8 pages

Via

Access Paper or Ask Questions

Theoretical Modeling and Bio-inspired Trajectory Optimization of A Multiple-locomotion Origami Robot

Mar 19, 2024

Keqi Zhu, Haotian Guo, Wei Yu, Hassen Nigatu, Tong Li, Huixu Dong

Figure 1 for Theoretical Modeling and Bio-inspired Trajectory Optimization of A Multiple-locomotion Origami Robot

Figure 2 for Theoretical Modeling and Bio-inspired Trajectory Optimization of A Multiple-locomotion Origami Robot

Figure 3 for Theoretical Modeling and Bio-inspired Trajectory Optimization of A Multiple-locomotion Origami Robot

Figure 4 for Theoretical Modeling and Bio-inspired Trajectory Optimization of A Multiple-locomotion Origami Robot

Abstract:Recent research on mobile robots has focused on increasing their adaptability to unpredictable and unstructured environments using soft materials and structures. However, the determination of key design parameters and control over these compliant robots are predominantly iterated through experiments, lacking a solid theoretical foundation. To improve their efficiency, this paper aims to provide mathematics modeling over two locomotion, crawling and swimming. Specifically, a dynamic model is first devised to reveal the influence of the contact surfaces' frictional coefficients on displacements in different motion phases. Besides, a swimming kinematics model is provided using coordinate transformation, based on which, we further develop an algorithm that systematically plans human-like swimming gaits, with maximum thrust obtained. The proposed algorithm is highly generalizable and has the potential to be applied in other soft robots with multiple joints. Simulation experiments have been conducted to illustrate the effectiveness of the proposed modeling.

* 8 pages

Via

Access Paper or Ask Questions

Theoretical Model Construction of Deformation-Force for Soft Grippers Part I: Co-rotational Modeling and Force Control for Design Optimization

Mar 23, 2023

Huixu Dong, Haotian Guo, Sihao Yang, Chen Qiu, Jiansheng Dai, I-Ming Chen

Figure 1 for Theoretical Model Construction of Deformation-Force for Soft Grippers Part I: Co-rotational Modeling and Force Control for Design Optimization

Figure 2 for Theoretical Model Construction of Deformation-Force for Soft Grippers Part I: Co-rotational Modeling and Force Control for Design Optimization

Figure 3 for Theoretical Model Construction of Deformation-Force for Soft Grippers Part I: Co-rotational Modeling and Force Control for Design Optimization

Figure 4 for Theoretical Model Construction of Deformation-Force for Soft Grippers Part I: Co-rotational Modeling and Force Control for Design Optimization

Abstract:Compliant grippers, owing to adaptivity and safety, have attracted considerable attention for unstructured grasping in real applications, such as industrial or logistic scenarios. However, accurate construction of the mathematical model depicting the bidirectional relationship between shape deformation and contact force for such grippers, such as the Fin-Ray grippers, remains stagnant to date. To address this research gap, this article devises, presents, and experimentally validates a universal bidirectional force-displacement mathematical model for compliant grippers based on the co-rotational concept, which endows such grippers with an intrinsic force sensing capability and offers a better insight into the design optimization. In Part 1 of the article, we introduce the fundamental theory of the co-rotational approach, where arbitrary large deformation of beam elements can be modeled. Its intrinsic principle enables the theoretical modeling to consider various types of configurations and key design parameters with very few assumptions made. Further, a force control algorithm is proposed, providing accurate displacement estimations of the gripper under external forces with minor computational loads. The performance of the proposed method is experimentally verified through comparison with Finite Element Analysis, where the influence of four key design parameters on the gripper s performance is investigated, facilitating systematical design optimization. Part 2 of this article demonstrating the force sensing capabilities and the effects of representative co-rotational modeling parameters on model accuracy is released in Google Drive.

Via

Access Paper or Ask Questions

Theoretical Model Construction of Deformation-Force for Soft Grippers Part II: Displacement Control Based Intrinsic Force Sensing

Mar 22, 2023

Huixu Dong, Ziyi Zheng, Haotian Guo, Sihao Yang, Chen Qiu, Jiansheng Dai, I-Ming Chen

Figure 1 for Theoretical Model Construction of Deformation-Force for Soft Grippers Part II: Displacement Control Based Intrinsic Force Sensing

Figure 2 for Theoretical Model Construction of Deformation-Force for Soft Grippers Part II: Displacement Control Based Intrinsic Force Sensing

Figure 3 for Theoretical Model Construction of Deformation-Force for Soft Grippers Part II: Displacement Control Based Intrinsic Force Sensing

Figure 4 for Theoretical Model Construction of Deformation-Force for Soft Grippers Part II: Displacement Control Based Intrinsic Force Sensing

Abstract:Compliant grasping is an essential capability for most robots in practical applications. For compliant robotic end-effectors that commonly appear in industrial or logistic scenarios, such as Fin-Ray gripper, it still remains challenging to build a bidirectional mathematical model that mutually maps the shape deformation and contact force. Part I of this article has constructed the force-displacement relationship for design optimization through the co-rotational theory with very few assumptions. In Part II, we further devise a detailed displacement-force mathematical model, enabling the compliant gripper to precisely estimate contact force sensor-free. Specifically, the proposed approach based on the co-rotational theory can calculate contact forces from deformations. The presented displacement-control algorithm elaborately investigates contact forces and provides force feedback for a force control system of a gripper, where deformation appears as displacements in contact points. Afterward, simulation experiments are conducted to evaluate the performance of the proposed model through comparisons with the finite-element analysis (FEA). Simulation results reveal that the proposed model accurately estimates contact force, with an average error of around 5% throughout all single/multiple node cases, regardless of various design parameters (Part I of this article is released in Google Drive).

Via

Access Paper or Ask Questions