Abstract:Quantization is a widely-used compression technology to reduce the overhead of serving large language models (LLMs) on terminal devices and in cloud data centers. However, prevalent quantization methods, such as 8-bit weight-activation or 4-bit weight-only quantization, achieve limited performance improvements due to poor support for low-precision (e.g., 4-bit) activation. This work, for the first time, realizes practical W4A4KV4 serving for LLMs, fully utilizing the INT4 tensor cores on modern GPUs and reducing the memory bottleneck caused by the KV cache. Specifically, we propose a novel fine-grained mixed-precision quantization algorithm (FMPQ) that compresses most activations into 4-bit with negligible accuracy loss. To support mixed-precision matrix multiplication for W4A4 and W4A8, we develop a highly optimized W4Ax kernel. Our approach introduces a novel mixed-precision data layout to facilitate access and fast dequantization for activation and weight tensors, utilizing the GPU's software pipeline to hide the overhead of data loading and conversion. Additionally, we propose fine-grained streaming multiprocessor (SM) scheduling to achieve load balance across different SMs. We integrate the optimized W4Ax kernel into our inference framework, COMET, and provide efficient management to support popular LLMs such as LLaMA-3-70B. Extensive evaluations demonstrate that, when running LLaMA family models on a single A100-80G-SMX4, COMET achieves a kernel-level speedup of \textbf{$2.88\times$} over cuBLAS and a \textbf{$2.02 \times$} throughput improvement compared to TensorRT-LLM from an end-to-end framework perspective.
Abstract:Point-to-point and periodic motions are ubiquitous in the world of robotics. To master these motions, Autonomous Dynamic System (DS) based algorithms are fundamental in the domain of Learning from Demonstration (LfD). However, these algorithms face the significant challenge of balancing precision in learning with the maintenance of system stability. This paper addresses this challenge by presenting a novel ADS algorithm that leverages neural network technology. The proposed algorithm is designed to distill essential knowledge from demonstration data, ensuring stability during the learning of both point-to-point and periodic motions. For point-to-point motions, a neural Lyapunov function is proposed to align with the provided demonstrations. In the case of periodic motions, the neural Lyapunov function is used with the transversal contraction to ensure that all generated motions converge to a stable limit cycle. The model utilizes a streamlined neural network architecture, adept at achieving dual objectives: optimizing learning accuracy while maintaining global stability. To thoroughly assess the efficacy of the proposed algorithm, rigorous evaluations are conducted using the LASA dataset and a manually designed dataset. These assessments were complemented by empirical validation through robotic experiments, providing robust evidence of the algorithm's performance
Abstract:Cross-lingual emotion detection allows us to analyze global trends, public opinion, and social phenomena at scale. We participated in the Explainability of Cross-lingual Emotion Detection (EXALT) shared task, achieving an F1-score of 0.6046 on the evaluation set for the emotion detection sub-task. Our system outperformed the baseline by more than 0.16 F1-score absolute, and ranked second amongst competing systems. We conducted experiments using fine-tuning, zero-shot learning, and few-shot learning for Large Language Model (LLM)-based models as well as embedding-based BiLSTM and KNN for non-LLM-based techniques. Additionally, we introduced two novel methods: the Multi-Iteration Agentic Workflow and the Multi-Binary-Classifier Agentic Workflow. We found that LLM-based approaches provided good performance on multilingual emotion detection. Furthermore, ensembles combining all our experimented models yielded higher F1-scores than any single approach alone.
Abstract:Cyberharassment is a critical, socially relevant cybersecurity problem because of the adverse effects it can have on targeted groups or individuals. While progress has been made in understanding cyber-harassment, its detection, attacks on artificial intelligence (AI) based cyberharassment systems, and the social problems in cyberharassment detectors, little has been done in designing experiential learning educational materials that engage students in this emerging social cybersecurity in the era of AI. Experiential learning opportunities are usually provided through capstone projects and engineering design courses in STEM programs such as computer science. While capstone projects are an excellent example of experiential learning, given the interdisciplinary nature of this emerging social cybersecurity problem, it can be challenging to use them to engage non-computing students without prior knowledge of AI. Because of this, we were motivated to develop a hands-on lab platform that provided experiential learning experiences to non-computing students with little or no background knowledge in AI and discussed the lessons learned in developing this lab. In this lab used by social science students at North Carolina A&T State University across two semesters (spring and fall) in 2022, students are given a detailed lab manual and are to complete a set of well-detailed tasks. Through this process, students learn AI concepts and the application of AI for cyberharassment detection. Using pre- and post-surveys, we asked students to rate their knowledge or skills in AI and their understanding of the concepts learned. The results revealed that the students moderately understood the concepts of AI and cyberharassment.
Abstract:Online hate is an escalating problem that negatively impacts the lives of Internet users, and is also subject to rapid changes due to evolving events, resulting in new waves of online hate that pose a critical threat. Detecting and mitigating these new waves present two key challenges: it demands reasoning-based complex decision-making to determine the presence of hateful content, and the limited availability of training samples hinders updating the detection model. To address this critical issue, we present a novel framework called HATEGUARD for effectively moderating new waves of online hate. HATEGUARD employs a reasoning-based approach that leverages the recently introduced chain-of-thought (CoT) prompting technique, harnessing the capabilities of large language models (LLMs). HATEGUARD further achieves prompt-based zero-shot detection by automatically generating and updating detection prompts with new derogatory terms and targets in new wave samples to effectively address new waves of online hate. To demonstrate the effectiveness of our approach, we compile a new dataset consisting of tweets related to three recently witnessed new waves: the 2022 Russian invasion of Ukraine, the 2021 insurrection of the US Capitol, and the COVID-19 pandemic. Our studies reveal crucial longitudinal patterns in these new waves concerning the evolution of events and the pressing need for techniques to rapidly update existing moderation tools to counteract them. Comparative evaluations against state-of-the-art tools illustrate the superiority of our framework, showcasing a substantial 22.22% to 83.33% improvement in detecting the three new waves of online hate. Our work highlights the severe threat posed by the emergence of new waves of online hate and represents a paradigm shift in addressing this threat practically.
Abstract:Autonomous Dynamic System (DS)-based algorithms hold a pivotal and foundational role in the field of Learning from Demonstration (LfD). Nevertheless, they confront the formidable challenge of striking a delicate balance between achieving precision in learning and ensuring the overall stability of the system. In response to this substantial challenge, this paper introduces a novel DS algorithm rooted in neural network technology. This algorithm not only possesses the capability to extract critical insights from demonstration data but also demonstrates the capacity to learn a candidate Lyapunov energy function that is consistent with the provided data. The model presented in this paper employs a straightforward neural network architecture that excels in fulfilling a dual objective: optimizing accuracy while simultaneously preserving global stability. To comprehensively evaluate the effectiveness of the proposed algorithm, rigorous assessments are conducted using the LASA dataset, further reinforced by empirical validation through a robotic experiment.
Abstract:Robots are increasingly being deployed not only in workplaces but also in households. Effectively execute of manipulation tasks by robots relies on variable impedance control with contact forces. Furthermore, robots should possess adaptive capabilities to handle the considerable variations exhibited by different robotic tasks in dynamic environments, which can be obtained through human demonstrations. This paper presents a learning-from-demonstration framework that integrates force sensing and motion information to facilitate variable impedance control. The proposed approach involves the estimation of full stiffness matrices from human demonstrations, which are then combined with sensed forces and motion information to create a model using the non-parametric method. This model allows the robot to replicate the demonstrated task while also responding appropriately to new task conditions through the use of the state-dependent stiffness profile. Additionally, a novel tank based variable impedance control approach is proposed to ensure passivity by using the learned stiffness. The proposed approach was evaluated using two virtual variable stiffness systems. The first evaluation demonstrates that the stiffness estimated approach exhibits superior robustness compared to traditional methods when tested on manual datasets, and the second evaluation illustrates that the novel tank based approach is more easily implementable compared to traditional variable impedance control approaches.
Abstract:Pre-trained multilingual language models play an important role in cross-lingual natural language understanding tasks. However, existing methods did not focus on learning the semantic structure of representation, and thus could not optimize their performance. In this paper, we propose Multi-level Multilingual Knowledge Distillation (MMKD), a novel method for improving multilingual language models. Specifically, we employ a teacher-student framework to adopt rich semantic representation knowledge in English BERT. We propose token-, word-, sentence-, and structure-level alignment objectives to encourage multiple levels of consistency between source-target pairs and correlation similarity between teacher and student models. We conduct experiments on cross-lingual evaluation benchmarks including XNLI, PAWS-X, and XQuAD. Experimental results show that MMKD outperforms other baseline models of similar size on XNLI and XQuAD and obtains comparable performance on PAWS-X. Especially, MMKD obtains significant performance gains on low-resource languages.
Abstract:Soft errors in large VLSI circuits pose dramatic influence on computing- and memory-intensive neural network (NN) processing. Understanding the influence of soft errors on NNs is critical to protect against soft errors for reliable NN processing. Prior work mainly rely on fault simulation to analyze the influence of soft errors on NN processing. They are accurate but usually specific to limited configurations of errors and NN models due to the prohibitively slow simulation speed especially for large NN models and datasets. With the observation that the influence of soft errors propagates across a large number of neurons and accumulates as well, we propose to characterize the soft error induced data disturbance on each neuron with normal distribution model according to central limit theorem and develop a series of statistical models to analyze the behavior of NN models under soft errors in general. The statistical models reveal not only the correlation between soft errors and NN model accuracy, but also how NN parameters such as quantization and architecture affect the reliability of NNs. The proposed models are compared with fault simulation and verified comprehensively. In addition, we observe that the statistical models that characterize the soft error influence can also be utilized to predict fault simulation results in many cases and we explore the use of the proposed statistical models to accelerate fault simulations of NNs. According to our experiments, the accelerated fault simulation shows almost two orders of magnitude speedup with negligible simulation accuracy loss over the baseline fault simulations.
Abstract:Ridesplitting, which is a form of pooled ridesourcing service, has great potential to alleviate the negative impacts of ridesourcing on the environment. However, most existing studies only explored its theoretical environmental benefits based on optimization models and simulations. To put into practice, this study aims to reveal the real-world emission reduction of ridesplitting and its determinants based on the observed data of ridesourcing in Chengdu, China. Integrating the trip data with the COPERT model, this study calculates the CO2 emissions of shared rides (ridesplitting) and their substituted single rides (regular ridesourcing) to estimate the CO2 emission reduction of each ridesplitting trip. The results show that not all ridesplitting trips reduce emissions from ridesourcing in the real world. The CO2 emission reduction rate of ridesplitting varies from trip to trip, averaging at 43.15g/km. Then, the interpretable machine learning models, gradient boosting machines, are applied to explore the relationship between the CO2 emission reduction rate of ridesplitting and its determinants. Based on the SHapley Additive exPlanations method, the overlap rate and detour rate of shared rides are identified to be the most important factors that determine the CO2 emission reduction rate of ridesplitting. Increasing the overlap rate, the number of shared rides, average speed, and ride distance ratio and decreasing the detour rate, actual trip distance, ride distance gap can increase the CO2 emission reduction rate of ridesplitting. In addition, nonlinear effects and interactions of several key factors are examined through the partial dependence plots. This study provides a scientific method for the government and ridesourcing companies to better assess and optimize the environmental benefits of ridesplitting.