Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bin He

Duplex Self-Aligning Resonant Beam Communications and Power Transfer with Coupled Spatially Distributed Laser Resonator

May 08, 2025

Mingliang Xiong, Qingwen Liu, Hao Deng, Gang Wang, Gang Li, Bin He

Abstract:Sustainable energy supply and high-speed communications are two significant needs for mobile electronic devices. This paper introduces a self-aligning resonant beam system for simultaneous light information and power transfer (SLIPT), employing a novel coupled spatially distributed resonator (CSDR). The system utilizes a resonant beam for efficient power delivery and a second-harmonic beam for concurrent data transmission, inherently minimizing echo interference and enabling bidirectional communication. Through comprehensive analyses, we investigate the CSDR's stable region, beam evolution, and power characteristics in relation to working distance and device parameters. Numerical simulations validate the CSDR-SLIPT system's feasibility by identifying a stable beam waist location for achieving accurate mode-match coupling between two spatially distributed resonant cavities and demonstrating its operational range and efficient power delivery across varying distances. The research reveals the system's benefits in terms of both safety and energy transmission efficiency. We also demonstrate the trade-off among the reflectivities of the cavity mirrors in the CSDR. These findings offer valuable design insights for resonant beam systems, advancing SLIPT with significant potential for remote device connectivity.

Via

Access Paper or Ask Questions

Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions

Apr 11, 2025

Yingqian Xu, Xiaohan Li, Caihua Wan, Ran Zhang, Bin He, Shiqiang Liu, Jihao Xia, Dehao Kong, Shilong Xiong, Guoqiang Yu(+1 more)

Abstract:Bayesian networks play an increasingly important role in data mining, inference, and reasoning with the rapid development of artificial intelligence. In this paper, we present proof-of-concept experiments demonstrating the use of spin-orbit torque magnetic tunnel junctions (SOT-MTJs) in Bayesian network reasoning. Not only can the target probability distribution function (PDF) of a Bayesian network be precisely formulated by a conditional probability table as usual but also quantitatively parameterized by a probabilistic forward propagating neuron network. Moreover, the parameters of the network can also approach the optimum through a simple point-by point training algorithm, by leveraging which we do not need to memorize all historical data nor statistically summarize conditional probabilities behind them, significantly improving storage efficiency and economizing data pretreatment. Furthermore, we developed a simple medical diagnostic system using the SOT-MTJ as a random number generator and sampler, showcasing the application of SOT-MTJ-based Bayesian reasoning. This SOT-MTJ-based Bayesian reasoning shows great promise in the field of artificial probabilistic neural network, broadening the scope of spintronic device applications and providing an efficient and low-storage solution for complex reasoning tasks.

Via

Access Paper or Ask Questions

Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction

Mar 07, 2025

Shuo Jiang, Haonan Li, Ruochen Ren, Yanmin Zhou, Zhipeng Wang, Bin He

Figure 1 for Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction

Figure 2 for Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction

Figure 3 for Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction

Figure 4 for Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction

Abstract:Cutting-edge robot learning techniques including foundation models and imitation learning from humans all pose huge demands on large-scale and high-quality datasets which constitute one of the bottleneck in the general intelligent robot fields. This paper presents the Kaiwu multimodal dataset to address the missing real-world synchronized multimodal data problems in the sophisticated assembling scenario,especially with dynamics information and its fine-grained labelling. The dataset first provides an integration of human,environment and robot data collection framework with 20 subjects and 30 interaction objects resulting in totally 11,664 instances of integrated actions. For each of the demonstration,hand motions,operation pressures,sounds of the assembling process,multi-view videos, high-precision motion capture information,eye gaze with first-person videos,electromyography signals are all recorded. Fine-grained multi-level annotation based on absolute timestamp,and semantic segmentation labelling are performed. Kaiwu dataset aims to facilitate robot learning,dexterous manipulation,human intention investigation and human-robot collaboration research.

Via

Access Paper or Ask Questions

Can video generation replace cinematographers? Research on the cinematic language of generated video

Dec 16, 2024

Xiaozhe Li, Kai WU, Siyi Yang, YiZhan Qu, Guohua. Zhang, Zhiyu Chen, Jiayao Li, Jiangchuan Mu, Xiaobin Hu, Wen Fang(+5 more)

Abstract:Recent advancements in text-to-video (T2V) generation have leveraged diffusion models to enhance the visual coherence of videos generated from textual descriptions. However, most research has primarily focused on object motion, with limited attention given to cinematic language in videos, which is crucial for cinematographers to convey emotion and narrative pacing. To address this limitation, we propose a threefold approach to enhance the ability of T2V models to generate controllable cinematic language. Specifically, we introduce a cinematic language dataset that encompasses shot framing, angle, and camera movement, enabling models to learn diverse cinematic styles. Building on this, to facilitate robust cinematic alignment evaluation, we present CameraCLIP, a model fine-tuned on the proposed dataset that excels in understanding complex cinematic language in generated videos and can further provide valuable guidance in the multi-shot composition process. Finally, we propose CLIPLoRA, a cost-guided dynamic LoRA composition method that facilitates smooth transitions and realistic blending of cinematic language by dynamically fusing multiple pre-trained cinematic LoRAs within a single video. Our experiments demonstrate that CameraCLIP outperforms existing models in assessing the alignment between cinematic language and video, achieving an R@1 score of 0.81. Additionally, CLIPLoRA improves the ability for multi-shot composition, potentially bridging the gap between automatically generated videos and those shot by professional cinematographers.

* 13 pages

Via

Access Paper or Ask Questions

Eco-Friendly 0G Networks: Unlocking the Power of Backscatter Communications for a Greener Future

Nov 20, 2024

Shumaila Javaid, Hamza Fahim, Bin He, Nasir Saeed

Abstract:Backscatter Communication (BackCom) technology has emerged as a promising paradigm for the Green Internet of Things (IoT) ecosystem, offering advantages such as low power consumption, cost-effectiveness, and ease of deployment. While traditional BackCom systems, such as RFID technology, have found widespread applications, the advent of ambient backscatter presents new opportunities for expanding applications and enhancing capabilities. Moreover, ongoing standardization efforts are actively focusing on BackCom technologies, positioning them as a potential solution to meet the near-zero power consumption and massive connectivity requirements of next-generation wireless systems. 0G networks have the potential to provide advanced solutions by leveraging BackCom technology to deliver ultra-low-power, ubiquitous connectivity for the expanding IoT ecosystem, supporting billions of devices with minimal energy consumption. This paper investigates the integration of BackCom and 0G networks to enhance the capabilities of traditional BackCom systems and enable Green IoT. We conduct an in-depth analysis of BackCom-enabled 0G networks, exploring their architecture and operational objectives, and also explore the Waste Factor (WF) metric for evaluating energy efficiency and minimizing energy waste within integrated systems. By examining both structural and operational aspects, we demonstrate how this synergy enhances the performance, scalability, and sustainability of next-generation wireless networks. Moreover, we highlight possible applications, open challenges, and future directions, offering valuable insights for guiding future research and practical implementations aimed at achieving large-scale, sustainable IoT deployments.

Via

Access Paper or Ask Questions

SSFold: Learning to Fold Arbitrary Crumpled Cloth Using Graph Dynamics from Human Demonstration

Oct 24, 2024

Changshi Zhou, Haichuan Xu, Jiarui Hu, Feng Luan, Zhipeng Wang, Yanchao Dong, Yanmin Zhou, Bin He

Figure 1 for SSFold: Learning to Fold Arbitrary Crumpled Cloth Using Graph Dynamics from Human Demonstration

Figure 2 for SSFold: Learning to Fold Arbitrary Crumpled Cloth Using Graph Dynamics from Human Demonstration

Figure 3 for SSFold: Learning to Fold Arbitrary Crumpled Cloth Using Graph Dynamics from Human Demonstration

Figure 4 for SSFold: Learning to Fold Arbitrary Crumpled Cloth Using Graph Dynamics from Human Demonstration

Abstract:Robotic cloth manipulation faces challenges due to the fabric's complex dynamics and the high dimensionality of configuration spaces. Previous methods have largely focused on isolated smoothing or folding tasks and overly reliant on simulations, often failing to bridge the significant sim-to-real gap in deformable object manipulation. To overcome these challenges, we propose a two-stream architecture with sequential and spatial pathways, unifying smoothing and folding tasks into a single adaptable policy model that accommodates various cloth types and states. The sequential stream determines the pick and place positions for the cloth, while the spatial stream, using a connectivity dynamics model, constructs a visibility graph from partial point cloud data of the self-occluded cloth, allowing the robot to infer the cloth's full configuration from incomplete observations. To bridge the sim-to-real gap, we utilize a hand tracking detection algorithm to gather and integrate human demonstration data into our novel end-to-end neural network, improving real-world adaptability. Our method, validated on a UR5 robot across four distinct cloth folding tasks with different goal shapes, consistently achieves folded states from arbitrary crumpled initial configurations, with success rates of 99\%, 99\%, 83\%, and 67\%. It outperforms existing state-of-the-art cloth manipulation techniques and demonstrates strong generalization to unseen cloth with diverse colors, shapes, and stiffness in real-world experiments.Videos and source code are available at: https://zcswdt.github.io/SSFold/

Via

Access Paper or Ask Questions

Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams

Aug 20, 2024

Litian Huang, Xinguo Yu, Feng Xiong, Bin He, Shengbing Tang, Jiawen Fu

Figure 1 for Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams

Figure 2 for Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams

Figure 3 for Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams

Figure 4 for Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams

Abstract:Solving Algebra Problems with Geometry Diagrams (APGDs) is still a challenging problem because diagram processing is not studied as intensively as language processing. To work against this challenge, this paper proposes a hologram reasoning scheme and develops a high-performance method for solving APGDs by using this scheme. To reach this goal, it first defines a hologram, being a kind of graph, and proposes a hologram generator to convert a given APGD into a hologram, which represents the entire information of APGD and the relations for solving the problem can be acquired from it by a uniform way. Then HGR, a hologram reasoning method employs a pool of prepared graph models to derive algebraic equations, which is consistent with the geometric theorems. This method is able to be updated by adding new graph models into the pool. Lastly, it employs deep reinforcement learning to enhance the efficiency of model selection from the pool. The entire HGR not only ensures high solution accuracy with fewer reasoning steps but also significantly enhances the interpretability of the solution process by providing descriptions of all reasoning steps. Experimental results demonstrate the effectiveness of HGR in improving both accuracy and interpretability in solving APGDs.

Via

Access Paper or Ask Questions

Contrast, Imitate, Adapt: Learning Robotic Skills From Raw Human Videos

Aug 10, 2024

Zhifeng Qian, Mingyu You, Hongjun Zhou, Xuanhui Xu, Hao Fu, Jinzhe Xue, Bin He

Abstract:Learning robotic skills from raw human videos remains a non-trivial challenge. Previous works tackled this problem by leveraging behavior cloning or learning reward functions from videos. Despite their remarkable performances, they may introduce several issues, such as the necessity for robot actions, requirements for consistent viewpoints and similar layouts between human and robot videos, as well as low sample efficiency. To this end, our key insight is to learn task priors by contrasting videos and to learn action priors through imitating trajectories from videos, and to utilize the task priors to guide trajectories to adapt to novel scenarios. We propose a three-stage skill learning framework denoted as Contrast-Imitate-Adapt (CIA). An interaction-aware alignment transformer is proposed to learn task priors by temporally aligning video pairs. Then a trajectory generation model is used to learn action priors. To adapt to novel scenarios different from human videos, the Inversion-Interaction method is designed to initialize coarse trajectories and refine them by limited interaction. In addition, CIA introduces an optimization method based on semantic directions of trajectories for interaction security and sample efficiency. The alignment distances computed by IAAformer are used as the rewards. We evaluate CIA in six real-world everyday tasks, and empirically demonstrate that CIA significantly outperforms previous state-of-the-art works in terms of task success rate and generalization to diverse novel scenarios layouts and object instances.

* 2024 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

Via

Access Paper or Ask Questions

Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs

May 28, 2024

Yong Qi, Gabriel Kyebambo, Siyuan Xie, Wei Shen, Shenghui Wang, Bitao Xie, Bin He, Zhipeng Wang, Shuo Jiang

Abstract:Safety limitations in service robotics across various industries have raised significant concerns about the need for robust mechanisms ensuring that robots adhere to safe practices, thereby preventing actions that might harm humans or cause property damage. Despite advances, including the integration of Knowledge Graphs (KGs) with Large Language Models (LLMs), challenges in ensuring consistent safety in autonomous robot actions persist. In this paper, we propose a novel integration of Large Language Models with Embodied Robotic Control Prompts (ERCPs) and Embodied Knowledge Graphs (EKGs) to enhance the safety framework for service robots. ERCPs are designed as predefined instructions that ensure LLMs generate safe and precise responses. These responses are subsequently validated by EKGs, which provide a comprehensive knowledge base ensuring that the actions of the robot are continuously aligned with safety protocols, thereby promoting safer operational practices in varied contexts. Our experimental setup involved diverse real-world tasks, where robots equipped with our framework demonstrated significantly higher compliance with safety standards compared to traditional methods. This integration fosters secure human-robot interactions and positions our methodology at the forefront of AI-driven safety innovations in service robotics.

Via

Access Paper or Ask Questions

Large Language Models for UAVs: Current State and Pathways to the Future

May 02, 2024

Shumaila Javaid, Nasir Saeed, Bin He

Figure 1 for Large Language Models for UAVs: Current State and Pathways to the Future

Figure 2 for Large Language Models for UAVs: Current State and Pathways to the Future

Figure 3 for Large Language Models for UAVs: Current State and Pathways to the Future

Figure 4 for Large Language Models for UAVs: Current State and Pathways to the Future

Abstract:Unmanned Aerial Vehicles (UAVs) have emerged as a transformative technology across diverse sectors, offering adaptable solutions to complex challenges in both military and civilian domains. Their expanding capabilities present a platform for further advancement by integrating cutting-edge computational tools like Artificial Intelligence (AI) and Machine Learning (ML) algorithms. These advancements have significantly impacted various facets of human life, fostering an era of unparalleled efficiency and convenience. Large Language Models (LLMs), a key component of AI, exhibit remarkable learning and adaptation capabilities within deployed environments, demonstrating an evolving form of intelligence with the potential to approach human-level proficiency. This work explores the significant potential of integrating UAVs and LLMs to propel the development of autonomous systems. We comprehensively review LLM architectures, evaluating their suitability for UAV integration. Additionally, we summarize the state-of-the-art LLM-based UAV architectures and identify novel opportunities for LLM embedding within UAV frameworks. Notably, we focus on leveraging LLMs to refine data analysis and decision-making processes, specifically for enhanced spectral sensing and sharing in UAV applications. Furthermore, we investigate how LLM integration expands the scope of existing UAV applications, enabling autonomous data processing, improved decision-making, and faster response times in emergency scenarios like disaster response and network restoration. Finally, we highlight crucial areas for future research that are critical for facilitating the effective integration of LLMs and UAVs.

Via

Access Paper or Ask Questions