Abstract:Fast and accurate physics simulation is an essential component of robot learning, where robots can explore failure scenarios that are difficult to produce in the real world and learn from unlimited on-policy data. Yet, it remains challenging to incorporate RGB-color perception into the sim-to-real pipeline that matches the real world in its richness and realism. In this work, we train a robot dog in simulation for visual parkour. We propose a way to use generative models to synthesize diverse and physically accurate image sequences of the scene from the robot's ego-centric perspective. We present demonstrations of zero-shot transfer to the RGB-only observations of the real world on a robot equipped with a low-cost, off-the-shelf color camera. website visit https://lucidsim.github.io
Abstract:Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using features distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects.
Abstract:Exposure to bio-aerosols such as mold spores and pollen can lead to adverse health effects. There is a need for a portable and cost-effective device for long-term monitoring and quantification of various bio-aerosols. To address this need, we present a mobile and cost-effective label-free bio-aerosol sensor that takes holographic images of flowing particulate matter concentrated by a virtual impactor, which selectively slows down and guides particles larger than ~6 microns to fly through an imaging window. The flowing particles are illuminated by a pulsed laser diode, casting their inline holograms on a CMOS image sensor in a lens-free mobile imaging device. The illumination contains three short pulses with a negligible shift of the flowing particle within one pulse, and triplicate holograms of the same particle are recorded at a single frame before it exits the imaging field-of-view, revealing different perspectives of each particle. The particles within the virtual impactor are localized through a differential detection scheme, and a deep neural network classifies the aerosol type in a label-free manner, based on the acquired holographic images. We demonstrated the success of this mobile bio-aerosol detector with a virtual impactor using different types of pollen (i.e., bermuda, elm, oak, pine, sycamore, and wheat) and achieved a blind classification accuracy of 92.91%. This mobile and cost-effective device weighs ~700 g and can be used for label-free sensing and quantification of various bio-aerosols over extended periods since it is based on a cartridge-free virtual impactor that does not capture or immobilize particulate matter.
Abstract:Logit models are usually applied when studying individual travel behavior, i.e., to predict travel mode choice and to gain behavioral insights on traveler preferences. Recently, some studies have applied machine learning to model travel mode choice and reported higher out-of-sample prediction accuracy than conventional logit models (e.g., multinomial logit). However, there has not been a comprehensive comparison between logit models and machine learning that covers both prediction and behavioral analysis. This paper aims at addressing this gap by examining the key differences in model development, evaluation, and behavioral interpretation between logit and machine-learning models for travel-mode choice modeling. To complement the theoretical discussions, we also empirically evaluated the two approaches on stated-preference survey data for a new type of transit system integrating high-frequency fixed routes and micro-transit. The results show that machine learning can produce significantly higher predictive accuracy than logit models and are better at capturing the nonlinear relationships between trip attributes and mode-choice outcomes. On the other hand, compared to the multinomial logit model, the best-performing machine-learning model, the random forest model, produces less reasonable behavioral outputs (i.e. marginal effects and elasticities) when they were computed from a standard approach. By introducing some behavioral constraints into the computation of behavioral outputs from a random forest model, however, we obtained better results that are somewhat comparable with the multinomial logit model. We believe that there is great potential in merging ideas from machine learning and conventional statistical methods to develop refined models for travel-behavior research and suggest some possible research directions.