Abstract:The rise of mobile devices equipped with numerous sensors, such as LiDAR and cameras, has spurred the adoption of multi-modal deep intelligence for distributed sensing tasks, such as smart cabins and driving assistance. However, the arrival times of mobile sensory data vary due to modality size and network dynamics, which can lead to delays (if waiting for slower data) or accuracy decline (if inference proceeds without waiting). Moreover, the diversity and dynamic nature of mobile systems exacerbate this challenge. In response, we present a shift to \textit{opportunistic} inference for asynchronous distributed multi-modal data, enabling inference as soon as partial data arrives. While existing methods focus on optimizing modality consistency and complementarity, known as modal affinity, they lack a \textit{computational} approach to control this affinity in open-world mobile environments. AdaFlow pioneers the formulation of structured cross-modality affinity in mobile contexts using a hierarchical analysis-based normalized matrix. This approach accommodates the diversity and dynamics of modalities, generalizing across different types and numbers of inputs. Employing an affinity attention-based conditional GAN (ACGAN), AdaFlow facilitates flexible data imputation, adapting to various modalities and downstream tasks without retraining. Experiments show that AdaFlow significantly reduces inference latency by up to 79.9\% and enhances accuracy by up to 61.9\%, outperforming status quo approaches.
Abstract:Ubiquitous on-device heart rate sensing is vital for high-stress individuals and chronic patients. Non-contact sensing, compared to contact-based tools, allows for natural user monitoring, potentially enabling more accurate and holistic data collection. However, in open and uncontrolled mobile environments, user movement and lighting introduce. Existing methods, such as curve-based or short-range deep learning recognition based on adjacent frames, strike the optimal balance between real-time performance and accuracy, especially under limited device resources. In this paper, we present UbiHR, a ubiquitous device-based heart rate sensing system. Key to UbiHR is a real-time long-range spatio-temporal model enabling noise-independent heart rate recognition and display on commodity mobile devices, along with a set of mechanisms for prompt and energy-efficient sampling and preprocessing. Diverse experiments and user studies involving four devices, four tasks, and 80 participants demonstrate UbiHR's superior performance, enhancing accuracy by up to 74.2\% and reducing latency by 51.2\%.
Abstract:On-device adapting to continual, unpredictable domain shifts is essential for mobile applications like autonomous driving and augmented reality to deliver seamless user experiences in evolving environments. Test-time adaptation (TTA) emerges as a promising solution by tuning model parameters with unlabeled live data immediately before prediction. However, TTA's unique forward-backward-reforward pipeline notably increases the latency over standard inference, undermining the responsiveness in time-sensitive mobile applications. This paper presents AdaShadow, a responsive test-time adaptation framework for non-stationary mobile data distribution and resource dynamics via selective updates of adaptation-critical layers. Although the tactic is recognized in generic on-device training, TTA's unsupervised and online context presents unique challenges in estimating layer importance and latency, as well as scheduling the optimal layer update plan. AdaShadow addresses these challenges with a backpropagation-free assessor to rapidly identify critical layers, a unit-based runtime predictor to account for resource dynamics in latency estimation, and an online scheduler for prompt layer update planning. Also, AdaShadow incorporates a memory I/O-aware computation reuse scheme to further reduce latency in the reforward pass. Results show that AdaShadow achieves the best accuracy-latency balance under continual shifts. At low memory and energy costs, Adashadow provides a 2x to 3.5x speedup (ms-level) over state-of-the-art TTA methods with comparable accuracy and a 14.8% to 25.4% accuracy boost over efficient supervised methods with similar latency.
Abstract:Diffusion generative models, famous for their performance in image generation, are popular in various cross-domain applications. However, their use in the communication community has been mostly limited to auxiliary tasks like data modeling and feature extraction. These models hold greater promise for fundamental problems in network optimization compared to traditional machine learning methods. Discriminative deep learning often falls short due to its single-step input-output mapping and lack of global awareness of the solution space, especially given the complexity of network optimization's objective functions. In contrast, diffusion generative models can consider a broader range of solutions and exhibit stronger generalization by learning parameters that describe the distribution of the underlying solution space, with higher probabilities assigned to better solutions. We propose a new framework Diffusion Model-based Solution Generation (DiffSG), which leverages the intrinsic distribution learning capabilities of diffusion generative models to learn high-quality solution distributions based on given inputs. The optimal solution within this distribution is highly probable, allowing it to be effectively reached through repeated sampling. We validate the performance of DiffSG on several typical network optimization problems, including mixed-integer non-linear programming, convex optimization, and hierarchical non-convex optimization. Our results show that DiffSG outperforms existing baselines. In summary, we demonstrate the potential of diffusion generative models in tackling complex network optimization problems and outline a promising path for their broader application in the communication community.
Abstract:Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardware resources, optimizing inference performance is necessary. In this paper, we introduce an easily deployable inference performance optimization solution aimed at accelerating LLMs on CPUs. In this solution, we implement an effective way to reduce the KV cache size while ensuring precision. We propose a distributed inference optimization approach and implement it based on oneAPI Collective Communications Library. Furthermore, we propose optimization approaches for LLMs on CPU, and conduct tailored optimizations for the most commonly used models. The code is open-sourced at https://github.com/intel/xFasterTransformer.
Abstract:Artificial Intelligence of Things (AIoT) is an emerging frontier based on the deep fusion of Internet of Things (IoT) and Artificial Intelligence (AI) technologies. Although advanced deep learning techniques enhance the efficient data processing and intelligent analysis of complex IoT data, they still suffer from notable challenges when deployed to practical AIoT applications, such as constrained resources, and diverse task requirements. Knowledge transfer is an effective method to enhance learning performance by avoiding the exorbitant costs associated with data recollection and model retraining. Notably, although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances of various knowledge transfer techniques for AIoT field. This survey endeavors to introduce a new concept of knowledge transfer, referred to as Crowd Knowledge Transfer (CrowdTransfer), which aims to transfer prior knowledge learned from a crowd of agents to reduce the training cost and as well as improve the performance of the model in real-world complicated scenarios. Particularly, we present four transfer modes from the perspective of crowd intelligence, including derivation, sharing, evolution and fusion modes. Building upon conventional transfer learning methods, we further delve into advanced crowd knowledge transfer models from three perspectives for various AIoT applications. Furthermore, we explore some applications of AIoT areas, such as human activity recognition, urban computing, multi-robot system, and smart factory. Finally, we discuss the open issues and outline future research directions of knowledge transfer in AIoT community.
Abstract:Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios. In this paper, we introduce Hawk, a novel framework that leverages interactive large Visual Language Models (VLM) to interpret video anomalies precisely. Recognizing the difference in motion information between abnormal and normal videos, Hawk explicitly integrates motion modality to enhance anomaly identification. To reinforce motion attention, we construct an auxiliary consistency loss within the motion and video space, guiding the video branch to focus on the motion modality. Moreover, to improve the interpretation of motion-to-language, we establish a clear supervisory relationship between motion and its linguistic representation. Furthermore, we have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions. The final results demonstrate that Hawk achieves SOTA performance, surpassing existing baselines in both video description generation and question-answering. Our codes/dataset/demo will be released at https://github.com/jqtangust/hawk.
Abstract:There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization.
Abstract:Cross-Domain Recommendation (CDR) seeks to enable effective knowledge transfer across domains. Existing works rely on either representation alignment or transformation bridges, but they struggle on identifying domain-shared from domain-specific latent factors. Specifically, while CDR describes user representations as a joint distribution over two domains, these methods fail to account for its joint identifiability as they primarily fixate on the marginal distribution within a particular domain. Such a failure may overlook the conditionality between two domains and how it contributes to latent factor disentanglement, leading to negative transfer when domains are weakly correlated. In this study, we explore what should and should not be transferred in cross-domain user representations from a causality perspective. We propose a Hierarchical subspace disentanglement approach to explore the Joint IDentifiability of cross-domain joint distribution, termed HJID, to preserve domain-specific behaviors from domain-shared factors. HJID organizes user representations into layers: generic shallow subspaces and domain-oriented deep subspaces. We first encode the generic pattern in the shallow subspace by minimizing the Maximum Mean Discrepancy of initial layer activation. Then, to dissect how domain-oriented latent factors are encoded in deeper layers activation, we construct a cross-domain causality-based data generation graph, which identifies cross-domain consistent and domain-specific components, adhering to the Minimal Change principle. This allows HJID to maintain stability whilst discovering unique factors for different domains, all within a generative framework of invertible transformations that guarantee the joint identifiability. With experiments on real-world datasets, we show that HJID outperforms SOTA methods on a range of strongly and weakly correlated CDR tasks.
Abstract:Persuasion, as one of the crucial abilities in human communication, has garnered extensive attention from researchers within the field of intelligent dialogue systems. We humans tend to persuade others to change their viewpoints, attitudes or behaviors through conversations in various scenarios (e.g., persuasion for social good, arguing in online platforms). Developing dialogue agents that can persuade others to accept certain standpoints is essential to achieving truly intelligent and anthropomorphic dialogue system. Benefiting from the substantial progress of Large Language Models (LLMs), dialogue agents have acquired an exceptional capability in context understanding and response generation. However, as a typical and complicated cognitive psychological system, persuasive dialogue agents also require knowledge from the domain of cognitive psychology to attain a level of human-like persuasion. Consequently, the cognitive strategy-enhanced persuasive dialogue agent (defined as CogAgent), which incorporates cognitive strategies to achieve persuasive targets through conversation, has become a predominant research paradigm. To depict the research trends of CogAgent, in this paper, we first present several fundamental cognitive psychology theories and give the formalized definition of three typical cognitive strategies, including the persuasion strategy, the topic path planning strategy, and the argument structure prediction strategy. Then we propose a new system architecture by incorporating the formalized definition to lay the foundation of CogAgent. Representative works are detailed and investigated according to the combined cognitive strategy, followed by the summary of authoritative benchmarks and evaluation metrics. Finally, we summarize our insights on open issues and future directions of CogAgent for upcoming researchers.