Sherman
Abstract:Supply chain networks are critical to the operational efficiency of industries, yet their increasing complexity presents significant challenges in mapping relationships and identifying the roles of various entities. Traditional methods for constructing supply chain networks rely heavily on structured datasets and manual data collection, limiting their scope and efficiency. In contrast, recent advancements in Natural Language Processing (NLP) and large language models (LLMs) offer new opportunities for discovering and analyzing supply chain networks using unstructured text data. This paper proposes a novel approach that leverages LLMs to extract and process raw textual information from publicly available sources to construct a comprehensive supply chain graph. We focus on the civil engineering sector as a case study, demonstrating how LLMs can uncover hidden relationships among companies, projects, and other entities. Additionally, we fine-tune an LLM to classify entities within the supply chain graph, providing detailed insights into their roles and relationships. The results show that domain-specific fine-tuning improves classification accuracy, highlighting the potential of LLMs for industry-specific supply chain analysis. Our contributions include the development of a supply chain graph for the civil engineering sector, as well as a fine-tuned LLM model that enhances entity classification and understanding of supply chain networks.
Abstract:Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models. However, optimal techniques for upcycling at scale remain unclear. In this work, we conduct an extensive study of upcycling methods and hyperparameters for billion-parameter scale language models. We propose a novel "virtual group" initialization scheme and weight scaling approach to enable upcycling into fine-grained MoE architectures. Through ablations, we find that upcycling outperforms continued dense model training. In addition, we show that softmax-then-topK expert routing improves over topK-then-softmax approach and higher granularity MoEs can help improve accuracy. Finally, we upcycled Nemotron-4 15B on 1T tokens and compared it to a continuously trained version of the same model on the same 1T tokens: the continuous trained model achieved 65.3% MMLU, whereas the upcycled model achieved 67.6%. Our results offer insights and best practices to effectively leverage upcycling for building MoE language models.
Abstract:Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two close-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from 8\% to 74\%. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while current classifiers may be effective for single modality detection, they fail to work against our jailbreak. Our work provides a foundation for further development towards more secure and reliable T2I models.
Abstract:Traffic assignment and traffic flow prediction provide critical insights for urban planning, traffic management, and the development of intelligent transportation systems. An efficient model for calculating traffic flows over the entire transportation network could provide a more detailed and realistic understanding of traffic dynamics. However, existing traffic prediction approaches, such as those utilizing graph neural networks, are typically limited to locations where sensors are deployed and cannot predict traffic flows beyond sensor locations. To alleviate this limitation, inspired by fundamental relationship that exists between link flows and the origin-destination (OD) travel demands, we proposed the Heterogeneous Spatio-Temporal Graph Sequence Network (HSTGSN). HSTGSN exploits dependency between origin and destination nodes, even when it is long-range, and learns implicit vehicle route choices under different origin-destination demands. This model is based on a heterogeneous graph which consists of road links, OD links (virtual links connecting origins and destinations) and a spatio-temporal graph encoder-decoder that captures the spatio-temporal relationship between OD demands and flow distribution. We will show how the graph encoder-decoder is able to recover the incomplete information in the OD demand, by using node embedding from the graph decoder to predict the temporal changes in flow distribution. Using extensive experimental studies on real-world networks with complete/incomplete OD demands, we demonstrate that our method can not only capture the implicit spatio-temporal relationship between link traffic flows and OD demands but also achieve accurate prediction performance and generalization capability.
Abstract:In the context of cardiovascular diseases (CVD) that exhibit an elevated prevalence and mortality, the electrocardiogram (ECG) is a popular and standard diagnostic tool for doctors, commonly utilizing a 12-lead configuration in clinical practice. However, the 10 electrodes placed on the surface would cause a lot of inconvenience and discomfort, while the rapidly advancing wearable devices adopt the reduced-lead or single-lead ECG to reduce discomfort as a solution in long-term monitoring. Since the single-lead ECG is a subset of 12-lead ECG, it provides insufficient cardiac health information and plays a substandard role in real-world healthcare applications. Hence, it is necessary to utilize signal generation technologies to reduce their clinical importance gap by reconstructing 12-lead ECG from the real single-lead ECG. Specifically, this study proposes a multi-channel masked autoencoder (MCMA) for this goal. In the experimental results, the visualized results between the generated and real signals can demonstrate the effectiveness of the proposed framework. At the same time, this study introduces a comprehensive evaluation benchmark named ECGGenEval, encompassing the signal-level, feature-level, and diagnostic-level evaluations, providing a holistic assessment of 12-lead ECG signals and generative model. Further, the quantitative experimental results are as follows, the mean square errors of 0.0178 and 0.0658, correlation coefficients of 0.7698 and 0.7237 in the signal-level evaluation, the average F1-score with two generated 12-lead ECG is 0.8319 and 0.7824 in the diagnostic-level evaluation, achieving the state-of-the-art performance. The open-source code is publicly available at \url{https://github.com/CHENJIAR3/MCMA}.
Abstract:In this study, we explored the progression trajectories of artificial intelligence (AI) systems through the lens of complexity theory. We challenged the conventional linear and exponential projections of AI advancement toward Artificial General Intelligence (AGI) underpinned by transformer-based architectures, and posited the existence of critical points, akin to phase transitions in complex systems, where AI performance might plateau or regress into instability upon exceeding a critical complexity threshold. We employed agent-based modelling (ABM) to simulate hypothetical scenarios of AI systems' evolution under specific assumptions, using benchmark performance as a proxy for capability and complexity. Our simulations demonstrated how increasing the complexity of the AI system could exceed an upper criticality threshold, leading to unpredictable performance behaviours. Additionally, we developed a practical methodology for detecting these critical thresholds using simulation data and stochastic gradient descent to fine-tune detection thresholds. This research offers a novel perspective on AI advancement that has a particular relevance to Large Language Models (LLMs), emphasising the need for a tempered approach to extrapolating AI's growth potential and underscoring the importance of developing more robust and comprehensive AI performance benchmarks.
Abstract:In industrial recommendation systems, there are several mini-apps designed to meet the diverse interests and needs of users. The sample space of them is merely a small subset of the entire space, making it challenging to train an efficient model. In recent years, there have been many excellent studies related to cross-domain recommendation aimed at mitigating the problem of data sparsity. However, few of them have simultaneously considered the adaptability of both sample and representation continual transfer setting to the target task. To overcome the above issue, we propose a Entire space Continual and Adaptive Transfer learning framework called ECAT which includes two core components: First, as for sample transfer, we propose a two-stage method that realizes a coarse-to-fine process. Specifically, we perform an initial selection through a graph-guided method, followed by a fine-grained selection using domain adaptation method. Second, we propose an adaptive knowledge distillation method for continually transferring the representations from a model that is well-trained on the entire space dataset. ECAT enables full utilization of the entire space samples and representations under the supervision of the target task, while avoiding negative migration. Comprehensive experiments on real-world industrial datasets from Taobao show that ECAT advances state-of-the-art performance on offline metrics, and brings +13.6% CVR and +8.6% orders for Baiyibutie, a famous mini-app of Taobao.
Abstract:With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for diabetes breath analysis. This provides a more readily accepted method for early diabetes prevention and monitoring. Addressing issues such as the invasive nature, disease transmission risks, and complexity of diabetes testing, this study aims to design a diabetes gas biomarker acetone detection system centered around a sensor array using gas sensors and pattern recognition algorithms. The research covers sensor selection, sensor preparation, circuit design, data acquisition and processing, and detection model establishment to accurately identify acetone. Titanium dioxide was chosen as the nano gas-sensitive material to prepare the acetone gas sensor, with data collection conducted using STM32. Filtering was applied to process the raw sensor data, followed by feature extraction using principal component analysis. A recognition model based on support vector machine algorithm was used for qualitative identification of gas samples, while a recognition model based on backpropagation neural network was employed for quantitative detection of gas sample concentrations. Experimental results demonstrated recognition accuracies of 96% and 97.5% for acetone-ethanol and acetone-methanol mixed gases, and 90% for ternary acetone, ethanol, and methanol mixed gases.
Abstract:Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diffusion actor-critic with entropy regulator (DACER). This algorithm conceptualizes the reverse process of the diffusion model as a novel policy function and leverages the capability of the diffusion model to fit multimodal distributions, thereby enhancing the representational capacity of the policy. Since the distribution of the diffusion policy lacks an analytical expression, its entropy cannot be determined analytically. To mitigate this, we propose a method to estimate the entropy of the diffusion policy utilizing Gaussian mixture model. Building on the estimated entropy, we can learn a parameter $\alpha$ that modulates the degree of exploration and exploitation. Parameter $\alpha$ will be employed to adaptively regulate the variance of the added noise, which is applied to the action output by the diffusion model. Experimental trials on MuJoCo benchmarks and a multimodal task demonstrate that the DACER algorithm achieves state-of-the-art (SOTA) performance in most MuJoCo control tasks while exhibiting a stronger representational capacity of the diffusion policy.
Abstract:Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from practical application. Due to the special insulator-metal phase transition, Vanadium Dioxide (VO2) has been considered as an idea candidate for neuronal device fabrication. However, its intrinsic insulating state requires the VO2 neuronal device to be driven under large bias voltage, resulting in high power consumption and low frequency. Thus in the current study, we have addressed this challenge by preparing oxygen vacancies modulated VO2 film(VO2-x) and fabricating the VO2-x neuronal devices for Spiking Neural Networks (SNNs) construction. Results indicate the neuron devices can be operated under lower voltage with improved processing speed. The proposed VO2-x based back-propagation SNNs (BP-SNNs) system, trained with the MNIST dataset, demonstrates excellent accuracy in image recognition. Our study not only demonstrates the VO2-x based neurons and SNN system for practical application, but also offers an effective way to optimize the future neuromorphic computing systems by defect engineering strategy.