Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Swarun Kumar

SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures

Apr 15, 2025

Kuang Yuan, Yifeng Wang, Xiyuxing Zhang, Chengyi Shen, Swarun Kumar, Justin Chan

Abstract:Imagine placing your smartphone on a table in a noisy restaurant and clearly capturing the voices of friends seated around you, or recording a lecturer's voice with clarity in a reverberant auditorium. We introduce SonicSieve, the first intelligent directional speech extraction system for smartphones using a bio-inspired acoustic microstructure. Our passive design embeds directional cues onto incoming speech without any additional electronics. It attaches to the in-line mic of low-cost wired earphones which can be attached to smartphones. We present an end-to-end neural network that processes the raw audio mixtures in real-time on mobile devices. Our results show that SonicSieve achieves a signal quality improvement of 5.0 dB when focusing on a 30{\deg} angular region. Additionally, the performance of our system based on only two microphones exceeds that of conventional 5-microphone arrays.

Via

Access Paper or Ask Questions

DeWinder: Single-Channel Wind Noise Reduction using Ultrasound Sensing

Sep 10, 2024

Kuang Yuan, Shuo Han, Swarun Kumar, Bhiksha Raj

Abstract:The quality of audio recordings in outdoor environments is often degraded by the presence of wind. Mitigating the impact of wind noise on the perceptual quality of single-channel speech remains a significant challenge due to its non-stationary characteristics. Prior work in noise suppression treats wind noise as a general background noise without explicit modeling of its characteristics. In this paper, we leverage ultrasound as an auxiliary modality to explicitly sense the airflow and characterize the wind noise. We propose a multi-modal deep-learning framework to fuse the ultrasonic Doppler features and speech signals for wind noise reduction. Our results show that DeWinder can significantly improve the noise reduction capabilities of state-of-the-art speech enhancement models.

Via

Access Paper or Ask Questions

ToMoBrush: Exploring Dental Health Sensing using a Sonic Toothbrush

Feb 02, 2024

Kuang Yuan, Mohamed Ibrahim, Yiwen Song, Guoxiang Deng, Suvendra Vijayan, Robert Nerone, Akshay Gadre, Swarun Kumar

Abstract:Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?" This paper presents ToMoBrush, a dental health sensing system that explores using off-the-shelf sonic toothbrushes for dental condition detection. Our solution leverages the fact that a sonic toothbrush produces rich acoustic signals when in contact with teeth, which contain important information about each tooth's status. ToMoBrush extracts tooth resonance signatures from the acoustic signals to characterize varied dental health conditions of the teeth. We evaluate ToMoBrush on 19 participants and dental-standard models for detecting common dental problems including caries, calculus, and food impaction, achieving a detection ROC-AUC of 0.90, 0.83, and 0.88 respectively. Interviews with dental experts validate ToMoBrush's potential in enhancing at-home dental healthcare.

Via

Access Paper or Ask Questions

Low-latency Imaging and Inference from LoRa-enabled CubeSats

Jun 21, 2022

Akshay Gadre, Swarun Kumar, Zachary Manchester

Figure 1 for Low-latency Imaging and Inference from LoRa-enabled CubeSats

Figure 2 for Low-latency Imaging and Inference from LoRa-enabled CubeSats

Figure 3 for Low-latency Imaging and Inference from LoRa-enabled CubeSats

Figure 4 for Low-latency Imaging and Inference from LoRa-enabled CubeSats

Abstract:Recent years have seen the rapid deployment of low-cost CubeSats in low-Earth orbit, primarily for research, education, and Earth observation. The vast majority of these CubeSats experience significant latency (several hours) from the time an image is captured to the time it is available on the ground. This is primarily due to the limited availability of dedicated satellite ground stations that tend to be bulky to deploy and expensive to rent. This paper explores using LoRa radios in the ISM band for low-latency downlink communication from CubeSats, primarily due to the availability of extensive ground LoRa infrastructure and minimal interference to terrestrial communication. However, the limited bandwidth of LoRa precludes rich satellite Earth images to be sent - instead, the CubeSats can at best send short messages (a few hundred bytes). This paper details our experience in communicating with a LoRa-enabled CubeSat launched by our team. We present Vista, a communication system that makes software modifications to LoRa encoding onboard a CubeSat and decoding on commercial LoRa ground stations to allow for satellite imagery to be communicated, as well as wide-ranging machine learning inference on these images. This is achieved through a LoRa-channel-aware image encoding that is informed by the structure of satellite images, the tasks performed on it, as well as the Doppler variation of satellite signals. A detailed evaluation of Vista through trace-driven emulation with traces from the LoRa-CubeSat launch (in 2021) shows 4.56 dB improvement in LoRa image PSNR and 1.38x improvement in land-use classification over those images.

Via

Access Paper or Ask Questions

High Resolution Point Clouds from mmWave Radar

Jun 18, 2022

Akarsh Prabhakara, Tao Jin, Arnav Das, Gantavya Bhatt, Lilly Kumari, Elahe Soltanaghaei, Jeff Bilmes, Swarun Kumar, Anthony Rowe

Figure 1 for High Resolution Point Clouds from mmWave Radar

Figure 2 for High Resolution Point Clouds from mmWave Radar

Figure 3 for High Resolution Point Clouds from mmWave Radar

Figure 4 for High Resolution Point Clouds from mmWave Radar

Abstract:This paper explores a machine learning approach for generating high resolution point clouds from a single-chip mmWave radar. Unlike lidar and vision-based systems, mmWave radar can operate in harsh environments and see through occlusions like smoke, fog, and dust. Unfortunately, current mmWave processing techniques offer poor spatial resolution compared to lidar point clouds. This paper presents RadarHD, an end-to-end neural network that constructs lidar-like point clouds from low resolution radar input. Enhancing radar images is challenging due to the presence of specular and spurious reflections. Radar data also doesn't map well to traditional image processing techniques due to the signal's sinc-like spreading pattern. We overcome these challenges by training RadarHD on a large volume of raw I/Q radar data paired with lidar point clouds across diverse indoor settings. Our experiments show the ability to generate rich point clouds even in scenes unobserved during training and in the presence of heavy smoke occlusion. Further, RadarHD's point clouds are high-quality enough to work with existing lidar odometry and mapping workflows.

Via

Access Paper or Ask Questions

Toolbox Release: A WiFi-Based Relative Bearing Sensor for Robotics

Sep 24, 2021

Ninad Jadhav, Weiying Wang, Diana Zhang, Swarun Kumar, Stephanie Gil

Figure 1 for Toolbox Release: A WiFi-Based Relative Bearing Sensor for Robotics

Figure 2 for Toolbox Release: A WiFi-Based Relative Bearing Sensor for Robotics

Figure 3 for Toolbox Release: A WiFi-Based Relative Bearing Sensor for Robotics

Figure 4 for Toolbox Release: A WiFi-Based Relative Bearing Sensor for Robotics

Abstract:This paper presents the WiFi-Sensor-for-Robotics (WSR) toolbox, an open source C++ framework. It enables robots in a team to obtain relative bearing to each other, even in non-line-of-sight (NLOS) settings which is a very challenging problem in robotics. It does so by analyzing the phase of their communicated WiFi signals as the robots traverse the environment. This capability, based on the theory developed in our prior works, is made available for the first time as an opensource tool. It is motivated by the lack of easily deployable solutions that use robots' local resources (e.g WiFi) for sensing in NLOS. This has implications for localization, ad-hoc robot networks, and security in multi-robot teams, amongst others. The toolbox is designed for distributed and online deployment on robot platforms using commodity hardware and on-board sensors. We also release datasets demonstrating its performance in NLOS and line-of-sight (LOS) settings for a multi-robot localization usecase. Empirical results show that the bearing estimation from our toolbox achieves mean accuracy of 5.10 degrees. This leads to a median error of 0.5m and 0.9m for localization in LOS and NLOS settings respectively, in a hardware deployment in an indoor office environment.

* 7 pages

Via

Access Paper or Ask Questions

A Hybrid mmWave and Camera System for Long-Range Depth Imaging

Jun 15, 2021

Diana Zhang, Akarsh Prabhakara, Sirajum Munir, Aswin Sankaranarayanan, Swarun Kumar

Figure 1 for A Hybrid mmWave and Camera System for Long-Range Depth Imaging

Figure 2 for A Hybrid mmWave and Camera System for Long-Range Depth Imaging

Figure 3 for A Hybrid mmWave and Camera System for Long-Range Depth Imaging

Figure 4 for A Hybrid mmWave and Camera System for Long-Range Depth Imaging

Abstract:mmWave radars offer excellent depth resolution owing to their high bandwidth at mmWave radio frequencies. Yet, they suffer intrinsically from poor angular resolution, that is an order-of-magnitude worse than camera systems, and are therefore not a capable 3-D imaging solution in isolation. We propose Metamoran, a system that combines the complimentary strengths of radar and camera systems to obtain depth images at high azimuthal resolutions at distances of several tens of meters with high accuracy, all from a single fixed vantage point. Metamoran enables rich long-range depth imaging outdoors with applications to roadside safety infrastructure, surveillance and wide-area mapping. Our key insight is to use the high azimuth resolution from cameras using computer vision techniques, including image segmentation and monocular depth estimation, to obtain object shapes and use these as priors for our novel specular beamforming algorithm. We also design this algorithm to work in cluttered environments with weak reflections and in partially occluded scenarios. We perform a detailed evaluation of Metamoran's depth imaging and sensing capabilities in 200 diverse scenes at a major U.S. city. Our evaluation shows that Metamoran estimates the depth of an object up to 60~m away with a median error of 28~cm, an improvement of 13$\times$ compared to a naive radar+camera baseline and 23$\times$ compared to monocular depth estimation.

Via

Access Paper or Ask Questions

WSR: A WiFi Sensor for Collaborative Robotics

Dec 08, 2020

Ninad Jadhav, Weiying Wang, Diana Zhang, Oussama Khatib, Swarun Kumar, Stephanie Gil

Figure 1 for WSR: A WiFi Sensor for Collaborative Robotics

Figure 2 for WSR: A WiFi Sensor for Collaborative Robotics

Figure 3 for WSR: A WiFi Sensor for Collaborative Robotics

Figure 4 for WSR: A WiFi Sensor for Collaborative Robotics

Abstract:In this paper we derive a new capability for robots to measure relative direction, or Angle-of-Arrival (AOA), to other robots operating in non-line-of-sight and unmapped environments with occlusions, without requiring external infrastructure. We do so by capturing all of the paths that a WiFi signal traverses as it travels from a transmitting to a receiving robot, which we term an AOA profile. The key intuition is to "emulate antenna arrays in the air" as the robots move in 3D space, a method akin to Synthetic Aperture Radar (SAR). The main contributions include development of i) a framework to accommodate arbitrary 3D trajectories, as well as continuous mobility all robots, while computing AOA profiles and ii) an accompanying analysis that provides a lower bound on variance of AOA estimation as a function of robot trajectory geometry based on the Cramer Rao Bound. This is a critical distinction with previous work on SAR that restricts robot mobility to prescribed motion patterns, does not generalize to 3D space, and/or requires transmitting robots to be static during data acquisition periods. Our method results in more accurate AOA profiles and thus better AOA estimation, and formally characterizes this observation as the informativeness of the trajectory; a computable quantity for which we derive a closed form. All theoretical developments are substantiated by extensive simulation and hardware experiments. We also show that our formulation can be used with an off-the-shelf trajectory estimation sensor. Finally, we demonstrate the performance of our system on a multi-robot dynamic rendezvous task.

* 29 pages, 25 figures, *co-primary authors

Via

Access Paper or Ask Questions