Sherman
Abstract:E-commerce platforms provide entrances for customers to enter mini-apps to meet their specific shopping needs. At the entrance of a mini-app, a trigger item recommended based on customers' historical preferences, is displayed to attract customers to enter the mini-app. Existing Click-Through Rate (CTR) prediction approaches have two significant weaknesses: (i) A portion of customer entries is driven by their interest in the mini-app itself rather than the trigger item. In such cases, approaches highly hinging on the trigger item tend to recommend similar items, thus misunderstanding the customers' real intention; (ii) Approaches that consider customers' intention toward mini-apps, require the regular existence of mini-apps for customers to cultivate routine shopping habits, making such approaches less robust for mini-apps that are available for only short periods (1 or 3 days) in Explosive Promotional Scenarios (EPS), such as the Black Friday and China's Double 11 Shopping Carnival. To address the above-mentioned issues, we introduce a more general and robust CTR prediction approach, dubbed Collaborative Contrastive Network (CCN). Given a user, CCN learns to identify two item clusters that can represent the user's interests and disinterests, via leveraging the collaborative relationship of co-click/co-non-click or the non-collaborative relationship of mono-click as the supervision signal for contrastive learning. This paradigm does not need to explicitly estimate user's binary entry intention and avoids amplifying the impact of the trigger item. Online A/B testing on large-scale real-world data demonstrates that CCN sets a new state-of-the-art performance on Taobao, boosting CTR by 12.3% and order volume by 12.7%.
Abstract:E-commerce app users exhibit behaviors that are inherently logically consistent. A series of multi-scenario user behaviors interconnect to form the scene-level all-domain user moveline, which ultimately reveals the user's true intention. Traditional CTR prediction methods typically focus on the item-level interaction between the target item and the historically interacted items. However, the scene-level interaction between the target item and the user moveline remains underexplored. There are two challenges when modeling the interaction with preceding all-domain user moveline: (i) Heterogeneity between items and scenes: Unlike traditional user behavior sequences that utilize items as carriers, the user moveline utilizes scenes as carriers. The heterogeneity between items and scenes complicates the process of aligning interactions within a unified representation space. (ii) Temporal misalignment of linked scene-level and item-level behaviors: In the preceding user moveline with a fixed sampling length, certain critical scene-level behaviors are closely linked to subsequent item-level behaviors. However, it is impossible to establish a complete temporal alignment that clearly identifies which specific scene-level behaviors correspond to which item-level behaviors. To address these challenges and pioneer modeling user intent from the perspective of the all-domain moveline, we propose All-domain Moveline Evolution Network (AMEN). AMEN not only transfers interactions between items and scenes to homogeneous representation spaces, but also introduces a Temporal Sequential Pairwise (TSP) mechanism to understand the nuanced associations between scene-level and item-level behaviors, ensuring that the all-domain user moveline differentially influences CTR predictions for user's favored and unfavored items. Online A/B testing demonstrates that our method achieves a +11.6% increase in CTCVR.
Abstract:Supply chain networks are critical to the operational efficiency of industries, yet their increasing complexity presents significant challenges in mapping relationships and identifying the roles of various entities. Traditional methods for constructing supply chain networks rely heavily on structured datasets and manual data collection, limiting their scope and efficiency. In contrast, recent advancements in Natural Language Processing (NLP) and large language models (LLMs) offer new opportunities for discovering and analyzing supply chain networks using unstructured text data. This paper proposes a novel approach that leverages LLMs to extract and process raw textual information from publicly available sources to construct a comprehensive supply chain graph. We focus on the civil engineering sector as a case study, demonstrating how LLMs can uncover hidden relationships among companies, projects, and other entities. Additionally, we fine-tune an LLM to classify entities within the supply chain graph, providing detailed insights into their roles and relationships. The results show that domain-specific fine-tuning improves classification accuracy, highlighting the potential of LLMs for industry-specific supply chain analysis. Our contributions include the development of a supply chain graph for the civil engineering sector, as well as a fine-tuned LLM model that enhances entity classification and understanding of supply chain networks.
Abstract:Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models. However, optimal techniques for upcycling at scale remain unclear. In this work, we conduct an extensive study of upcycling methods and hyperparameters for billion-parameter scale language models. We propose a novel "virtual group" initialization scheme and weight scaling approach to enable upcycling into fine-grained MoE architectures. Through ablations, we find that upcycling outperforms continued dense model training. In addition, we show that softmax-then-topK expert routing improves over topK-then-softmax approach and higher granularity MoEs can help improve accuracy. Finally, we upcycled Nemotron-4 15B on 1T tokens and compared it to a continuously trained version of the same model on the same 1T tokens: the continuous trained model achieved 65.3% MMLU, whereas the upcycled model achieved 67.6%. Our results offer insights and best practices to effectively leverage upcycling for building MoE language models.
Abstract:Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two close-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from 8\% to 74\%. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while current classifiers may be effective for single modality detection, they fail to work against our jailbreak. Our work provides a foundation for further development towards more secure and reliable T2I models.
Abstract:Traffic assignment and traffic flow prediction provide critical insights for urban planning, traffic management, and the development of intelligent transportation systems. An efficient model for calculating traffic flows over the entire transportation network could provide a more detailed and realistic understanding of traffic dynamics. However, existing traffic prediction approaches, such as those utilizing graph neural networks, are typically limited to locations where sensors are deployed and cannot predict traffic flows beyond sensor locations. To alleviate this limitation, inspired by fundamental relationship that exists between link flows and the origin-destination (OD) travel demands, we proposed the Heterogeneous Spatio-Temporal Graph Sequence Network (HSTGSN). HSTGSN exploits dependency between origin and destination nodes, even when it is long-range, and learns implicit vehicle route choices under different origin-destination demands. This model is based on a heterogeneous graph which consists of road links, OD links (virtual links connecting origins and destinations) and a spatio-temporal graph encoder-decoder that captures the spatio-temporal relationship between OD demands and flow distribution. We will show how the graph encoder-decoder is able to recover the incomplete information in the OD demand, by using node embedding from the graph decoder to predict the temporal changes in flow distribution. Using extensive experimental studies on real-world networks with complete/incomplete OD demands, we demonstrate that our method can not only capture the implicit spatio-temporal relationship between link traffic flows and OD demands but also achieve accurate prediction performance and generalization capability.
Abstract:In the context of cardiovascular diseases (CVD) that exhibit an elevated prevalence and mortality, the electrocardiogram (ECG) is a popular and standard diagnostic tool for doctors, commonly utilizing a 12-lead configuration in clinical practice. However, the 10 electrodes placed on the surface would cause a lot of inconvenience and discomfort, while the rapidly advancing wearable devices adopt the reduced-lead or single-lead ECG to reduce discomfort as a solution in long-term monitoring. Since the single-lead ECG is a subset of 12-lead ECG, it provides insufficient cardiac health information and plays a substandard role in real-world healthcare applications. Hence, it is necessary to utilize signal generation technologies to reduce their clinical importance gap by reconstructing 12-lead ECG from the real single-lead ECG. Specifically, this study proposes a multi-channel masked autoencoder (MCMA) for this goal. In the experimental results, the visualized results between the generated and real signals can demonstrate the effectiveness of the proposed framework. At the same time, this study introduces a comprehensive evaluation benchmark named ECGGenEval, encompassing the signal-level, feature-level, and diagnostic-level evaluations, providing a holistic assessment of 12-lead ECG signals and generative model. Further, the quantitative experimental results are as follows, the mean square errors of 0.0178 and 0.0658, correlation coefficients of 0.7698 and 0.7237 in the signal-level evaluation, the average F1-score with two generated 12-lead ECG is 0.8319 and 0.7824 in the diagnostic-level evaluation, achieving the state-of-the-art performance. The open-source code is publicly available at \url{https://github.com/CHENJIAR3/MCMA}.
Abstract:In this study, we explored the progression trajectories of artificial intelligence (AI) systems through the lens of complexity theory. We challenged the conventional linear and exponential projections of AI advancement toward Artificial General Intelligence (AGI) underpinned by transformer-based architectures, and posited the existence of critical points, akin to phase transitions in complex systems, where AI performance might plateau or regress into instability upon exceeding a critical complexity threshold. We employed agent-based modelling (ABM) to simulate hypothetical scenarios of AI systems' evolution under specific assumptions, using benchmark performance as a proxy for capability and complexity. Our simulations demonstrated how increasing the complexity of the AI system could exceed an upper criticality threshold, leading to unpredictable performance behaviours. Additionally, we developed a practical methodology for detecting these critical thresholds using simulation data and stochastic gradient descent to fine-tune detection thresholds. This research offers a novel perspective on AI advancement that has a particular relevance to Large Language Models (LLMs), emphasising the need for a tempered approach to extrapolating AI's growth potential and underscoring the importance of developing more robust and comprehensive AI performance benchmarks.
Abstract:In industrial recommendation systems, there are several mini-apps designed to meet the diverse interests and needs of users. The sample space of them is merely a small subset of the entire space, making it challenging to train an efficient model. In recent years, there have been many excellent studies related to cross-domain recommendation aimed at mitigating the problem of data sparsity. However, few of them have simultaneously considered the adaptability of both sample and representation continual transfer setting to the target task. To overcome the above issue, we propose a Entire space Continual and Adaptive Transfer learning framework called ECAT which includes two core components: First, as for sample transfer, we propose a two-stage method that realizes a coarse-to-fine process. Specifically, we perform an initial selection through a graph-guided method, followed by a fine-grained selection using domain adaptation method. Second, we propose an adaptive knowledge distillation method for continually transferring the representations from a model that is well-trained on the entire space dataset. ECAT enables full utilization of the entire space samples and representations under the supervision of the target task, while avoiding negative migration. Comprehensive experiments on real-world industrial datasets from Taobao show that ECAT advances state-of-the-art performance on offline metrics, and brings +13.6% CVR and +8.6% orders for Baiyibutie, a famous mini-app of Taobao.
Abstract:With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for diabetes breath analysis. This provides a more readily accepted method for early diabetes prevention and monitoring. Addressing issues such as the invasive nature, disease transmission risks, and complexity of diabetes testing, this study aims to design a diabetes gas biomarker acetone detection system centered around a sensor array using gas sensors and pattern recognition algorithms. The research covers sensor selection, sensor preparation, circuit design, data acquisition and processing, and detection model establishment to accurately identify acetone. Titanium dioxide was chosen as the nano gas-sensitive material to prepare the acetone gas sensor, with data collection conducted using STM32. Filtering was applied to process the raw sensor data, followed by feature extraction using principal component analysis. A recognition model based on support vector machine algorithm was used for qualitative identification of gas samples, while a recognition model based on backpropagation neural network was employed for quantitative detection of gas sample concentrations. Experimental results demonstrated recognition accuracies of 96% and 97.5% for acetone-ethanol and acetone-methanol mixed gases, and 90% for ternary acetone, ethanol, and methanol mixed gases.