Abstract:Multi-channel photoplethysmography (PPG) sensors have found widespread adoption in wearable devices for monitoring cardiac health. Channels thereby serve different functions -- whereas green is commonly used for metrics such as heart rate and heart rate variability, red and infrared are commonly used for pulse oximetry. In this paper, we introduce a novel method that simultaneously fuses multi-channel PPG signals into a single recovered PPG signal that can be input to further processing. Via signal fusion, our learning-based method compensates for the artifacts that affect wavelengths to different extents, such as motion and ambient light changes. We evaluate our method on a novel dataset of multi-channel PPG recordings and electrocardiogram recordings for reference from 10 participants over the course of 13 hours during real-world activities outside the laboratory. Using the fusion PPG signal our method recovered, participants' heart rates can be calculated with a mean error of 4.5\,bpm (23\% lower than from green PPG signals at 5.9\,bpm).
Abstract:Reflective photoplethysmography (PPG) has become the default sensing technique in wearable devices to monitor cardiac activity via a person's heart rate (HR). However, PPG-based HR estimates can be substantially impacted by factors such as the wearer's activities, sensor placement and resulting motion artifacts, as well as environmental characteristics such as temperature and ambient light. These and other factors can significantly impact and decrease HR prediction reliability. In this paper, we show that state-of-the-art HR estimation methods struggle when processing \emph{representative} data from everyday activities in outdoor environments, likely because they rely on existing datasets that captured controlled conditions. We introduce a novel multimodal dataset and benchmark results for continuous PPG recordings during outdoor activities from 16 participants over 13.5 hours, captured from four wearable sensors, each worn at a different location on the body, totaling 216\,hours. Our recordings include accelerometer, temperature, and altitude data, as well as a synchronized Lead I-based electrocardiogram for ground-truth HR references. Participants completed a round trip from Zurich to Jungfraujoch, a tall mountain in Switzerland over the course of one day. The trip included outdoor and indoor activities such as walking, hiking, stair climbing, eating, drinking, and resting at various temperatures and altitudes (up to 3,571\,m above sea level) as well as using cars, trains, cable cars, and lifts for transport -- all of which impacted participants' physiological dynamics. We also present a novel method that estimates HR values more robustly in such real-world scenarios than existing baselines.
Abstract:Smartwatches have become popular for monitoring physiological parameters outside clinical settings. Using reflective photoplethysmography (PPG) sensors, such watches can non-invasively estimate heart rate (HR) in everyday environments and throughout a patient's day. However, achieving consistently high accuracy remains challenging, particularly during moments of increased motion or due to varying device placement. In this paper, we introduce a novel sensor fusion method for estimating HR that flexibly combines samples from multiple PPG sensors placed across the patient's body, including wrist, ankle, head, and sternum (chest). Our method first estimates signal quality across all inputs to dynamically integrate them into a joint and robust PPG signal for HR estimation. We evaluate our method on a novel dataset of PPG and ECG recordings from 14 participants who engaged in real-world activities outside the laboratory over the course of a whole day. Our method achieves a mean HR error of 2.4\,bpm, which is 46\% lower than the mean error of the best-performing single device (4.4\,bpm, head).
Abstract:Despite the advent of touchscreens, typing on physical keyboards remains most efficient for entering text, because users can leverage all fingers across a full-size keyboard for convenient typing. As users increasingly type on the go, text input on mobile and wearable devices has had to compromise on full-size typing. In this paper, we present TapType, a mobile text entry system for full-size typing on passive surfaces--without an actual keyboard. From the inertial sensors inside a band on either wrist, TapType decodes and relates surface taps to a traditional QWERTY keyboard layout. The key novelty of our method is to predict the most likely character sequences by fusing the finger probabilities from our Bayesian neural network classifier with the characters' prior probabilities from an n-gram language model. In our online evaluation, participants on average typed 19 words per minute with a character error rate of 0.6% after 30 minutes of training. Expert typists thereby consistently achieved more than 25 WPM at a similar error rate. We demonstrate applications of TapType in mobile use around smartphones and tablets, as a complement to interaction in situated Mixed Reality outside visual control, and as an eyes-free mobile text input method using an audio feedback-only interface.
Abstract:Full-body ego-pose estimation from head and hand poses alone has become an active area of research to power articulate avatar representation on headset-based platforms. However, existing methods over-rely on the confines of the motion-capture spaces in which datasets were recorded, while simultaneously assuming continuous capture of joint motions and uniform body dimensions. In this paper, we propose EgoPoser, which overcomes these limitations by 1) rethinking the input representation for headset-based ego-pose estimation and introducing a novel motion decomposition method that predicts full-body pose independent of global positions, 2) robustly modeling body pose from intermittent hand position and orientation tracking only when inside a headset's field of view, and 3) generalizing across various body sizes for different users. Our experiments show that EgoPoser outperforms state-of-the-art methods both qualitatively and quantitatively, while maintaining a high inference speed of over 600 fps. EgoPoser establishes a robust baseline for future work, where full-body pose estimation needs no longer rely on outside-in capture and can scale to large-scene environments.