Stephen
Abstract:The Markov property serves as a foundational assumption in most existing work on vehicle driving behavior, positing that future states depend solely on the current state, not the series of preceding states. This study validates the Markov properties of vehicle trajectories for both Autonomous Vehicles (AVs) and Human-driven Vehicles (HVs). A statistical method used to test whether time series data exhibits Markov properties is applied to examine whether the trajectory data possesses Markov characteristics. t test and F test are additionally introduced to characterize the differences in Markov properties between AVs and HVs. Based on two public trajectory datasets, we investigate the presence and order of the Markov property of different types of vehicles through rigorous statistical tests. Our findings reveal that AV trajectories generally exhibit stronger Markov properties compared to HV trajectories, with a higher percentage conforming to the Markov property and lower Markov orders. In contrast, HV trajectories display greater variability and heterogeneity in decision-making processes, reflecting the complex perception and information processing involved in human driving. These results have significant implications for the development of driving behavior models, AV controllers, and traffic simulation systems. Our study also demonstrates the feasibility of using statistical methods to test the presence of Markov properties in driving trajectory data.
Abstract:Motor-evoked potentials (MEPs) are among the few directly observable responses to external brain stimulation and serve a variety of applications, often in the form of input-output (IO) curves. Previous statistical models with two variability sources inherently consider the small MEPs at the low-side plateau as part of the neural recruitment properties. However, recent studies demonstrated that small MEP responses under resting conditions are contaminated and over-shadowed by background noise of mostly technical quality, e.g., caused by the amplifier, and suggested that the neural recruitment curve should continue below this noise level. This work intends to separate physiological variability from background noise and improve the description of recruitment behaviour. We developed a triple-variability-source model around a logarithmic logistic function without a lower plateau and incorporated an additional source for background noise. Compared to models with two or fewer variability sources, our approach better described IO characteristics, evidenced by lower Bayesian Information Criterion scores across all subjects and pulse shapes. The model independently extracted hidden variability information across the stimulated neural system and isolated it from background noise, which led to an accurate estimation of the IO curve parameters. This new model offers a robust tool to analyse brain stimulation IO curves in clinical and experimental neuroscience and reduces the risk of spurious results from inappropriate statistical methods. The presented model together with the corresponding calibration method provides a more accurate representation of MEP responses and variability sources, advances our understanding of cortical excitability, and may improve the assessment of neuromodulation effects.
Abstract:Data-free model stealing involves replicating the functionality of a target model into a substitute model without accessing the target model's structure, parameters, or training data. The adversary can only access the target model's predictions for generated samples. Once the substitute model closely approximates the behavior of the target model, attackers can exploit its white-box characteristics for subsequent malicious activities, such as adversarial attacks. Existing methods within cooperative game frameworks often produce samples with high confidence for the prediction of the substitute model, which makes it difficult for the substitute model to replicate the behavior of the target model. This paper presents a new data-free model stealing approach called Query Efficient Data Generation (\textbf{QEDG}). We introduce two distinct loss functions to ensure the generation of sufficient samples that closely and uniformly align with the target model's decision boundary across multiple classes. Building on the limitation of current methods, which typically yield only one piece of supervised information per query, we propose the query-free sample augmentation that enables the acquisition of additional supervised information without increasing the number of queries. Motivated by theoretical analysis, we adopt the consistency rate metric, which more accurately evaluates the similarity between the substitute and target models. We conducted extensive experiments to verify the effectiveness of our proposed method, which achieved better performance with fewer queries compared to the state-of-the-art methods on the real \textbf{MLaaS} scenario and five datasets.
Abstract:We have witnessed the unprecedented success of diffusion-based video generation over the past year. Recently proposed models from the community have wielded the power to generate cinematic and high-resolution videos with smooth motions from arbitrary input prompts. However, as a supertask of image generation, video generation models require more computation and are thus hosted mostly on cloud servers, limiting broader adoption among content creators. In this work, we propose a comprehensive acceleration framework to bring the power of the large-scale video diffusion model to the hands of edge users. From the network architecture scope, we initialize from a compact image backbone and search out the design and arrangement of temporal layers to maximize hardware efficiency. In addition, we propose a dedicated adversarial fine-tuning algorithm for our efficient model and reduce the denoising steps to 4. Our model, with only 0.6B parameters, can generate a 5-second video on an iPhone 16 PM within 5 seconds. Compared to server-side models that take minutes on powerful GPUs to generate a single video, we accelerate the generation by magnitudes while delivering on-par quality.
Abstract:Vision Transformers (ViT) is known for its scalability. In this work, we target to scale down a ViT to fit in an environment with dynamic-changing resource constraints. We observe that smaller ViTs are intrinsically the sub-networks of a larger ViT with different widths. Thus, we propose a general framework, named Scala, to enable a single network to represent multiple smaller ViTs with flexible inference capability, which aligns with the inherent design of ViT to vary from widths. Concretely, Scala activates several subnets during training, introduces Isolated Activation to disentangle the smallest sub-network from other subnets, and leverages Scale Coordination to ensure each sub-network receives simplified, steady, and accurate learning objectives. Comprehensive empirical validations on different tasks demonstrate that with only one-shot training, Scala learns slimmable representation without modifying the original ViT structure and matches the performance of Separate Training. Compared with the prior art, Scala achieves an average improvement of 1.6% on ImageNet-1K with fewer parameters.
Abstract:Deep learning is reshaping mobile applications, with a growing trend of deploying deep neural networks (DNNs) directly to mobile and embedded devices to address real-time performance and privacy. To accommodate local resource limitations, techniques like weight compression, convolution decomposition, and specialized layer architectures have been developed. However, the \textit{dynamic} and \textit{diverse} deployment contexts of mobile devices pose significant challenges. Adapting deep models to meet varied device-specific requirements for latency, accuracy, memory, and energy is labor-intensive. Additionally, changing processor states, fluctuating memory availability, and competing processes frequently necessitate model re-compression to preserve user experience. To address these issues, we introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts. AdaScale leverages a self-evolutionary model to streamline network creation, employs diverse compression operator combinations to reduce the search space and improve outcomes, and integrates a resource availability awareness block and performance profilers to establish an automated adaptation loop. Our experiments demonstrate that AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.
Abstract:Vision-language pre-training (VLP) models excel at interpreting both images and text but remain vulnerable to multimodal adversarial examples (AEs). Advancing the generation of transferable AEs, which succeed across unseen models, is key to developing more robust and practical VLP models. Previous approaches augment image-text pairs to enhance diversity within the adversarial example generation process, aiming to improve transferability by expanding the contrast space of image-text features. However, these methods focus solely on diversity around the current AEs, yielding limited gains in transferability. To address this issue, we propose to increase the diversity of AEs by leveraging the intersection regions along the adversarial trajectory during optimization. Specifically, we propose sampling from adversarial evolution triangles composed of clean, historical, and current adversarial examples to enhance adversarial diversity. We provide a theoretical analysis to demonstrate the effectiveness of the proposed adversarial evolution triangle. Moreover, we find that redundant inactive dimensions can dominate similarity calculations, distorting feature matching and making AEs model-dependent with reduced transferability. Hence, we propose to generate AEs in the semantic image-text feature contrast space, which can project the original feature space into a semantic corpus subspace. The proposed semantic-aligned subspace can reduce the image feature redundancy, thereby improving adversarial transferability. Extensive experiments across different datasets and models demonstrate that the proposed method can effectively improve adversarial transferability and outperform state-of-the-art adversarial attack methods. The code is released at https://github.com/jiaxiaojunQAQ/SA-AET.
Abstract:On-device adapting to continual, unpredictable domain shifts is essential for mobile applications like autonomous driving and augmented reality to deliver seamless user experiences in evolving environments. Test-time adaptation (TTA) emerges as a promising solution by tuning model parameters with unlabeled live data immediately before prediction. However, TTA's unique forward-backward-reforward pipeline notably increases the latency over standard inference, undermining the responsiveness in time-sensitive mobile applications. This paper presents AdaShadow, a responsive test-time adaptation framework for non-stationary mobile data distribution and resource dynamics via selective updates of adaptation-critical layers. Although the tactic is recognized in generic on-device training, TTA's unsupervised and online context presents unique challenges in estimating layer importance and latency, as well as scheduling the optimal layer update plan. AdaShadow addresses these challenges with a backpropagation-free assessor to rapidly identify critical layers, a unit-based runtime predictor to account for resource dynamics in latency estimation, and an online scheduler for prompt layer update planning. Also, AdaShadow incorporates a memory I/O-aware computation reuse scheme to further reduce latency in the reforward pass. Results show that AdaShadow achieves the best accuracy-latency balance under continual shifts. At low memory and energy costs, Adashadow provides a 2x to 3.5x speedup (ms-level) over state-of-the-art TTA methods with comparable accuracy and a 14.8% to 25.4% accuracy boost over efficient supervised methods with similar latency.
Abstract:The automated vehicle (AV) equipped with the Adaptive Cruise Control (ACC) system is expected to reduce the fuel consumption for the intelligent transportation system. This paper presents the Advanced ACC-Micro (AA-Micro) model, a new energy consumption model based on micro trajectory data, calibrated and verified by empirical data. Utilizing a commercial AV equipped with the ACC system as the test platform, experiments were conducted at the Columbus 151 Speedway, capturing data from multiple ACC and Human-Driven (HV) test runs. The calibrated AA-Micro model integrates features from traditional energy consumption models and demonstrates superior goodness of fit, achieving an impressive 90% accuracy in predicting ACC system energy consumption without overfitting. A comprehensive statistical evaluation of the AA-Micro model's applicability and adaptability in predicting energy consumption and vehicle trajectories indicated strong model consistency and reliability for ACC vehicles, evidenced by minimal variance in RMSE values and uniform RSS distributions. Conversely, significant discrepancies were observed when applying the model to HV data, underscoring the necessity for specialized models to accurately predict energy consumption for HV and ACC systems, potentially due to their distinct energy consumption characteristics.
Abstract:Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.