Abstract:Detecting vehicles in aerial images can be very challenging due to complex backgrounds, small resolution, shadows, and occlusions. Despite the effectiveness of SOTA detectors such as YOLO, they remain vulnerable to adversarial attacks (AAs), compromising their reliability. Traditional AA strategies often overlook the practical constraints of physical implementation, focusing solely on attack performance. Our work addresses this issue by proposing practical implementation constraints for AA in texture and/or shape. These constraints include pixelation, masking, limiting the color palette of the textures, and constraining the shape modifications. We evaluated the proposed constraints through extensive experiments using three widely used object detector architectures, and compared them to previous works. The results demonstrate the effectiveness of our solutions and reveal a trade-off between practicality and performance. Additionally, we introduce a labeled dataset of overhead images featuring vehicles of various categories. We will make the code/dataset public upon paper acceptance.
Abstract:Hand motion capture data is now relatively easy to obtain, even for complicated grasps; however this data is of limited use without the ability to retarget it onto the hands of a specific character or robot. The target hand may differ dramatically in geometry, number of degrees of freedom (DOFs), or number of fingers. We present a simple, but effective framework capable of kinematically retargeting multiple human hand-object manipulations from a publicly available dataset to a wide assortment of kinematically and morphologically diverse target hands through the exploitation of contact areas. We do so by formulating the retarget operation as a non-isometric shape matching problem and use a combination of both surface contact and marker data to progressively estimate, refine, and fit the final target hand trajectory using inverse kinematics (IK). Foundational to our framework is the introduction of a novel shape matching process, which we show enables predictable and robust transfer of contact data over full manipulations while providing an intuitive means for artists to specify correspondences with relatively few inputs. We validate our framework through thirty demonstrations across five different hand shapes and six motions of different objects. We additionally compare our method against existing hand retargeting approaches. Finally, we demonstrate our method enabling novel capabilities such as object substitution and the ability to visualize the impact of design choices over full trajectories.
Abstract:This paper presents a novel approach to the problem of semantic parsing via learning the correspondences between complex sentences and rich sets of events. Our main intuition is that correct correspondences tend to occur more frequently. Our model benefits from a discriminative notion of similarity to learn the correspondence between sentence and an event and a ranking machinery that scores the popularity of each correspondence. Our method can discover a group of events (called macro-events) that best describes a sentence. We evaluate our method on our novel dataset of professional soccer commentaries. The empirical results show that our method significantly outperforms the state-of-theart.