Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Masaya Kawamura

Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control

Sep 26, 2024

Ryuichi Yamamoto, Yuma Shirahata, Masaya Kawamura, Kentaro Tachibana

Abstract:We propose a novel description-based controllable text-to-speech (TTS) method with cross-lingual control capability. To address the lack of audio-description paired data in the target language, we combine a TTS model trained on the target language with a description control model trained on another language, which maps input text descriptions to the conditional features of the TTS model. These two models share disentangled timbre and style representations based on self-supervised learning (SSL), allowing for disentangled voice control, such as controlling speaking styles while retaining the original timbre. Furthermore, because the SSL-based timbre and style representations are language-agnostic, combining the TTS and description control models while sharing the same embedding space effectively enables cross-lingual control of voice characteristics. Experiments on English and Japanese TTS demonstrate that our method achieves high naturalness and controllability for both languages, even though no Japanese audio-description pairs are used.

* Submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Antagonist Inhibition Control in Redundant Tendon-driven Structures Based on Human Reciprocal Innervation for Wide Range Limb Motion of Musculoskeletal Humanoids

Sep 01, 2024

Kento Kawaharazuka, Masaya Kawamura, Shogo Makino, Yuki Asano, Kei Okada, Masayuki Inaba

Abstract:The body structure of an anatomically correct tendon-driven musculoskeletal humanoid is complex, and the difference between its geometric model and the actual robot is very large because expressing the complex routes of tendon wires in a geometric model is very difficult. If we move a tendon-driven musculoskeletal humanoid by the tendon wire lengths of the geometric model, unintended muscle tension and slack will emerge. In some cases, this can lead to the wreckage of the actual robot. To solve this problem, we focused on reciprocal innervation in the human nervous system, and then implemented antagonist inhibition control (AIC) based on the reflex. This control makes it possible to avoid unnecessary internal muscle tension and slack of tendon wires caused by model error, and to perform wide range motion safely for a long time. To verify its effectiveness, we applied AIC to the upper limb of the tendon-driven musculoskeletal humanoid, Kengoro, and succeeded in dangling for 14 minutes and doing pull-ups.

* Accepted at IEEE Robotics and Automation Letters

Via

Access Paper or Ask Questions

Human Mimetic Forearm Design with Radioulnar Joint using Miniature Bone-Muscle Modules and Its Applications

Aug 19, 2024

Kento Kawaharazuka, Shogo Makino, Masaya Kawamura, Yuki Asano, Yohei Kakiuchi, Kei Okada, Masayuki Inaba

Figure 1 for Human Mimetic Forearm Design with Radioulnar Joint using Miniature Bone-Muscle Modules and Its Applications

Figure 2 for Human Mimetic Forearm Design with Radioulnar Joint using Miniature Bone-Muscle Modules and Its Applications

Figure 3 for Human Mimetic Forearm Design with Radioulnar Joint using Miniature Bone-Muscle Modules and Its Applications

Figure 4 for Human Mimetic Forearm Design with Radioulnar Joint using Miniature Bone-Muscle Modules and Its Applications

Abstract:The human forearm is composed of two long, thin bones called the radius and the ulna, and rotates using two axle joints. We aimed to develop a forearm based on the body proportion, weight ratio, muscle arrangement, and joint performance of the human body in order to bring out its benefits. For this, we need to miniaturize the muscle modules. To approach this task, we arranged two muscle motors inside one muscle module, and used the space effectively by utilizing common parts. In addition, we enabled the muscle module to also be used as the bone structure. Moreover, we used miniature motors and developed a way to dissipate the motor heat to the bone structure. Through these approaches, we succeeded in developing a forearm with a radioulnar joint based on the body proportion, weight ratio, muscle arrangement, and joint performance of the human body, while keeping maintainability and reliability. Also, we performed some motions such as soldering, opening a book, turning a screw, and badminton swinging using the benefits of the radioulnar structure, which have not been discussed before, and verified that Kengoro can realize skillful motions using the radioulnar joint like a human.

* Accepted at IROS2017

Via

Access Paper or Ask Questions

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

Jun 12, 2024

Masaya Kawamura, Ryuichi Yamamoto, Yuma Shirahata, Takuya Hasumi, Kentaro Tachibana

Figure 1 for LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

Figure 2 for LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

Figure 3 for LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

Figure 4 for LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

Abstract:We introduce LibriTTS-P, a new corpus based on LibriTTS-R that includes utterance-level descriptions (i.e., prompts) of speaking style and speaker-level prompts of speaker characteristics. We employ a hybrid approach to construct prompt annotations: (1) manual annotations that capture human perceptions of speaker characteristics and (2) synthetic annotations on speaking style. Compared to existing English prompt datasets, our corpus provides more diverse prompt annotations for all speakers of LibriTTS-R. Experimental results for prompt-based controllable TTS demonstrate that the TTS model trained with LibriTTS-P achieves higher naturalness than the model using the conventional dataset. Furthermore, the results for style captioning tasks show that the model utilizing LibriTTS-P generates 2.5 times more accurate words than the model using a conventional dataset. Our corpus, LibriTTS-P, is available at https://github.com/line/LibriTTS-P.

* Accepted to INTERSPEECH 2024

Via

Access Paper or Ask Questions

A Method of Joint Angle Estimation Using Only Relative Changes in Muscle Lengths for Tendon-driven Humanoids with Complex Musculoskeletal Structures

Apr 22, 2024

Kento Kawaharazuka, Shogo Makino, Masaya Kawamura, Yuki Asano, Kei Okada, Masayuki Inaba

Figure 1 for A Method of Joint Angle Estimation Using Only Relative Changes in Muscle Lengths for Tendon-driven Humanoids with Complex Musculoskeletal Structures

Figure 2 for A Method of Joint Angle Estimation Using Only Relative Changes in Muscle Lengths for Tendon-driven Humanoids with Complex Musculoskeletal Structures

Figure 3 for A Method of Joint Angle Estimation Using Only Relative Changes in Muscle Lengths for Tendon-driven Humanoids with Complex Musculoskeletal Structures

Figure 4 for A Method of Joint Angle Estimation Using Only Relative Changes in Muscle Lengths for Tendon-driven Humanoids with Complex Musculoskeletal Structures

Abstract:Tendon-driven musculoskeletal humanoids typically have complex structures similar to those of human beings, such as ball joints and the scapula, in which encoders cannot be installed. Therefore, joint angles cannot be directly obtained and need to be estimated using the changes in muscle lengths. In previous studies, methods using table-search and extended kalman filter have been developed. These methods express the joint-muscle mapping, which is the nonlinear relationship between joint angles and muscle lengths, by using a data table, polynomials, or a neural network. However, due to computational complexity, these methods cannot consider the effects of polyarticular muscles. In this study, considering the limitation of the computational cost, we reduce unnecessary degrees of freedom, divide joints and muscles into several groups, and formulate a joint angle estimation method that takes into account polyarticular muscles. Also, we extend the estimation method to propose a joint angle estimation method using only the relative changes in muscle lengths. By this extension, which does not use absolute muscle lengths, we do not need to execute a difficult calibration of muscle lengths for tendon-driven musculoskeletal humanoids. Finally, we conduct experiments in simulation and actual environments, and verify the effectiveness of this study.

* Accepted at Humanoids2018

Via

Access Paper or Ask Questions

Online Learning of Joint-Muscle Mapping Using Vision in Tendon-driven Musculoskeletal Humanoids

Apr 08, 2024

Kento Kawaharazuka, Shogo Makino, Masaya Kawamura, Yuki Asano, Kei Okada, Masayuki Inaba

Figure 1 for Online Learning of Joint-Muscle Mapping Using Vision in Tendon-driven Musculoskeletal Humanoids

Figure 2 for Online Learning of Joint-Muscle Mapping Using Vision in Tendon-driven Musculoskeletal Humanoids

Figure 3 for Online Learning of Joint-Muscle Mapping Using Vision in Tendon-driven Musculoskeletal Humanoids

Figure 4 for Online Learning of Joint-Muscle Mapping Using Vision in Tendon-driven Musculoskeletal Humanoids

Abstract:The body structures of tendon-driven musculoskeletal humanoids are complex, and accurate modeling is difficult, because they are made by imitating the body structures of human beings. For this reason, we have not been able to move them accurately like ordinary humanoids driven by actuators in each axis, and large internal muscle tension and slack of tendon wires have emerged by the model error between its geometric model and the actual robot. Therefore, we construct a joint-muscle mapping (JMM) using a neural network (NN), which expresses a nonlinear relationship between joint angles and muscle lengths, and aim to move tendon-driven musculoskeletal humanoids accurately by updating the JMM online from data of the actual robot. In this study, the JMM is updated online by using the vision of the robot so that it moves to the correct position (Vision Updater). Also, we execute another update to modify muscle antagonisms correctly (Antagonism Updater). By using these two updaters, the error between the target and actual joint angles decrease to about 40% in 5 minutes, and we show through a manipulation experiment that the tendon-driven musculoskeletal humanoid Kengoro becomes able to move as intended. This novel system can adapt to the state change and growth of robots, because it updates the JMM online successively.

* Accepted at IEEE Robotics and Automation Letters, 2018

Via

Access Paper or Ask Questions

Online Self-body Image Acquisition Considering Changes in Muscle Routes Caused by Softness of Body Tissue for Tendon-driven Musculoskeletal Humanoids

Apr 08, 2024

Kento Kawaharazuka, Shogo Makino, Masaya Kawamura, Ayaka Fujii, Yuki Asano, Kei Okada, Masayuki Inaba

Abstract:Tendon-driven musculoskeletal humanoids have many benefits in terms of the flexible spine, multiple degrees of freedom, and variable stiffness. At the same time, because of its body complexity, there are problems in controllability. First, due to the large difference between the actual robot and its geometric model, it cannot move as intended and large internal muscle tension may emerge. Second, movements which do not appear as changes in muscle lengths may emerge, because of the muscle route changes caused by softness of body tissue. To solve these problems, we construct two models: ideal joint-muscle model and muscle-route change model, using a neural network. We initialize these models by a man-made geometric model and update them online using the sensor information of the actual robot. We validate that the tendon-driven musculoskeletal humanoid Kengoro is able to obtain a correct self-body image through several experiments.

* Accepted at IROS2018

Via

Access Paper or Ask Questions

High-Power, Flexible, Robust Hand: Development of Musculoskeletal Hand Using Machined Springs and Realization of Self-Weight Supporting Motion with Humanoid

Mar 26, 2024

Shogo Makino, Kento Kawaharazuka, Masaya Kawamura, Yuki Asano, Kei Okada, Masayuki Inaba

Abstract:Human can not only support their body during standing or walking, but also support them by hand, so that they can dangle a bar and others. But most humanoid robots support their body only in the foot and they use their hand just to manipulate objects because their hands are too weak to support their body. Strong hands are supposed to enable humanoid robots to act in much broader scene. Therefore, we developed new life-size five-fingered hand that can support the body of life-size humanoid robot. It is tendon-driven and underactuated hand and actuators in forearms produce large gripping force. This hand has flexible joints using machined springs, which can be designed integrally with the attachment. Thus, it has both structural strength and impact resistance in spite of small size. As other characteristics, this hand has force sensors to measure external force and the fingers can be flexed along objects though the number of actuators to flex fingers is less than that of fingers. We installed the developed hand on musculoskeletal humanoid "Kengoro" and achieved two self-weight supporting motions: push-up motion and dangling motion.

* accepted at IROS2017

Via

Access Paper or Ask Questions

Five-fingered Hand with Wide Range of Thumb Using Combination of Machined Springs and Variable Stiffness Joints

Mar 26, 2024

Shogo Makino, Kento Kawaharazuka, Ayaka Fujii, Masaya Kawamura, Tasuku Makabe, Moritaka Onitsuka, Yuki Asano, Kei Okada, Koji Kawasaki, Masayuki Inaba

Figure 1 for Five-fingered Hand with Wide Range of Thumb Using Combination of Machined Springs and Variable Stiffness Joints

Figure 2 for Five-fingered Hand with Wide Range of Thumb Using Combination of Machined Springs and Variable Stiffness Joints

Figure 3 for Five-fingered Hand with Wide Range of Thumb Using Combination of Machined Springs and Variable Stiffness Joints

Figure 4 for Five-fingered Hand with Wide Range of Thumb Using Combination of Machined Springs and Variable Stiffness Joints

Abstract:Human hands can not only grasp objects of various shape and size and manipulate them in hands but also exert such a large gripping force that they can support the body in the situations such as dangling a bar and climbing a ladder. On the other hand, it is difficult for most robot hands to manage both. Therefore in this paper we developed the hand which can grasp various objects and exert large gripping force. To develop such hand, we focused on the thumb CM joint with wide range of motion and the MP joints of four fingers with the DOF of abduction and adduction. Based on the hand with large gripping force and flexibility using machined spring, we applied above mentioned joint mechanism to the hand. The thumb CM joint has wide range of motion because of the combination of three machined springs and MP joints of four fingers have variable rigidity mechanism instead of driving each joint independently in order to move joint in limited space and by limited actuators. Using the developed hand, we achieved the grasping of various objects, supporting a large load and several motions with an arm.

* accepted at IROS2018

Via

Access Paper or Ask Questions

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions

Sep 15, 2023

Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana

Abstract:We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions. To control speaker identity within the prompt-based TTS framework, we introduce the concept of speaker prompt, which describes voice characteristics (e.g., gender-neutral, young, old, and muffled) designed to be approximately independent of speaking style. Since there is no large-scale dataset containing speaker prompts, we first construct a dataset based on the LibriTTS-R corpus with manually annotated speaker prompts. We then employ a diffusion-based acoustic model with mixture density networks to model diverse speaker factors in the training data. Unlike previous studies that rely on style prompts describing only a limited aspect of speaker individuality, such as pitch, speaking speed, and energy, our method utilizes an additional speaker prompt to effectively learn the mapping from natural language descriptions to the acoustic features of diverse speakers. Our subjective evaluation results show that the proposed method can better control speaker characteristics than the methods without the speaker prompt. Audio samples are available at https://reppy4620.github.io/demo.promptttspp/.

* Submitted to ICASSP 2024

Via

Access Paper or Ask Questions