Abstract:This paper describes the 2nd edition of the ICML Topological Deep Learning Challenge that was hosted within the ICML 2024 ELLIS Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). The challenge focused on the problem of representing data in different discrete topological domains in order to bridge the gap between Topological Deep Learning (TDL) and other types of structured datasets (e.g. point clouds, graphs). Specifically, participants were asked to design and implement topological liftings, i.e. mappings between different data structures and topological domains --like hypergraphs, or simplicial/cell/combinatorial complexes. The challenge received 52 submissions satisfying all the requirements. This paper introduces the main scope of the challenge, and summarizes the main results and findings.
Abstract:Inadequate bounding box modeling in regression tasks constrains the performance of one-stage 3D object detection. Our study reveals that the primary reason lies in two aspects: (1) The limited center-offset prediction seriously impairs the bounding box localization since many highest response positions significantly deviate from object centers. (2) The low-quality sample ignored in regression tasks significantly impacts the bounding box prediction since it produces unreliable quality (IoU) rectification. To tackle these problems, we propose Decoupled and Interactive Regression Modeling (DIRM) for one-stage detection. Specifically, Decoupled Attribute Regression (DAR) is implemented to facilitate long regression range modeling for the center attribute through an adaptive multi-sample assignment strategy that deeply decouples bounding box attributes. On the other hand, to enhance the reliability of IoU predictions for low-quality results, Interactive Quality Prediction (IQP) integrates the classification task, proficient in modeling negative samples, with quality prediction for joint optimization. Extensive experiments on Waymo and ONCE datasets demonstrate that DIRM significantly improves the performance of several state-of-the-art methods with minimal additional inference latency. Notably, DIRM achieves state-of-the-art detection performance on both the Waymo and ONCE datasets.
Abstract:The global economy relies on the flow of goods over supply chain networks, with nodes as firms and edges as transactions between firms. While we may observe these external transactions, they are governed by unseen production functions, which determine how firms internally transform the input products they receive into output products that they sell. In this setting, it can be extremely valuable to infer these production functions, to better understand and improve supply chains, and to forecast future transactions more accurately. However, existing graph neural networks (GNNs) cannot capture these hidden relationships between nodes' inputs and outputs. Here, we introduce a new class of models for this setting, by combining temporal GNNs with a novel inventory module, which learns production functions via attention weights and a special loss function. We evaluate our models extensively on real supply chains data, along with data generated from our new open-source simulator, SupplySim. Our models successfully infer production functions, with a 6-50% improvement over baselines, and forecast future transactions on real and synthetic data, outperforming baselines by 11-62%.
Abstract:Merge trees are a valuable tool in scientific visualization of scalar fields; however, current methods for merge tree comparisons are computationally expensive, primarily due to the exhaustive matching between tree nodes. To address this challenge, we introduce the merge tree neural networks (MTNN), a learned neural network model designed for merge tree comparison. The MTNN enables rapid and high-quality similarity computation. We first demonstrate how graph neural networks (GNNs), which emerged as an effective encoder for graphs, can be trained to produce embeddings of merge trees in vector spaces that enable efficient similarity comparison. Next, we formulate the novel MTNN model that further improves the similarity comparisons by integrating the tree and node embeddings with a new topological attention mechanism. We demonstrate the effectiveness of our model on real-world data in different domains and examine our model's generalizability across various datasets. Our experimental analysis demonstrates our approach's superiority in accuracy and efficiency. In particular, we speed up the prior state-of-the-art by more than 100x on the benchmark datasets while maintaining an error rate below 0.1%.
Abstract:A tremendous range of design tasks in materials, physics, and biology can be formulated as finding the optimum of an objective function depending on many parameters without knowing its closed-form expression or the derivative. Traditional derivative-free optimization techniques often rely on strong assumptions about objective functions, thereby failing at optimizing non-convex systems beyond 100 dimensions. Here, we present a tree search method for derivative-free optimization that enables accelerated optimal design of high-dimensional complex systems. Specifically, we introduce stochastic tree expansion, dynamic upper confidence bound, and short-range backpropagation mechanism to evade local optimum, iteratively approximating the global optimum using machine learning models. This development effectively confronts the dimensionally challenging problems, achieving convergence to global optima across various benchmark functions up to 2,000 dimensions, surpassing the existing methods by 10- to 20-fold. Our method demonstrates wide applicability to a wide range of real-world complex systems spanning materials, physics, and biology, considerably outperforming state-of-the-art algorithms. This enables efficient autonomous knowledge discovery and facilitates self-driving virtual laboratories. Although we focus on problems within the realm of natural science, the advancements in optimization techniques achieved herein are applicable to a broader spectrum of challenges across all quantitative disciplines.
Abstract:In the recommender system of Meituan Waimai, we are dealing with ever-lengthening user behavior sequences, which pose an increasing challenge to modeling user preference effectively. Existing sequential recommendation models often fail to capture long-term dependencies or are too complex, complicating the fulfillment of Meituan Waimai's unique business needs. To better model user interests, we consider selecting relevant sub-sequences from users' extensive historical behaviors based on their preferences. In this specific scenario, we've noticed that the contexts in which users interact have a significant impact on their preferences. For this purpose, we introduce a novel method called Context-based Fast Recommendation Strategy to tackle the issue of long sequences. We first identify contexts that share similar user preferences with the target context and then locate the corresponding PoIs based on these identified contexts. This approach eliminates the necessity to select a sub-sequence for every candidate PoI, thereby avoiding high time complexity. Specifically, we implement a prototype-based approach to pinpoint contexts that mirror similar user preferences. To amplify accuracy and interpretability, we employ JS divergence of PoI attributes such as categories and prices as a measure of similarity between contexts. A temporal graph integrating both prototype and context nodes helps incorporate temporal information. We then identify appropriate prototypes considering both target contexts and short-term user preferences. Following this, we utilize contexts aligned with these prototypes to generate a sub-sequence, aimed at predicting CTR and CTCVR scores with target attention. Since its inception in 2023, this strategy has been adopted in Meituan Waimai's display recommender system, leading to a 4.6% surge in CTR and a 4.2% boost in GMV.
Abstract:Recent studies have revealed the vulnerability of Deep Neural Networks (DNNs) to adversarial examples, which can easily fool DNNs into making incorrect predictions. To mitigate this deficiency, we propose a novel adversarial defense method called "Immunity" (Innovative MoE with MUtual information \& positioN stabilITY) based on a modified Mixture-of-Experts (MoE) architecture in this work. The key enhancements to the standard MoE are two-fold: 1) integrating of Random Switch Gates (RSGs) to obtain diverse network structures via random permutation of RSG parameters at evaluation time, despite of RSGs being determined after one-time training; 2) devising innovative Mutual Information (MI)-based and Position Stability-based loss functions by capitalizing on Grad-CAM's explanatory power to increase the diversity and the causality of expert networks. Notably, our MI-based loss operates directly on the heatmaps, thereby inducing subtler negative impacts on the classification performance when compared to other losses of the same type, theoretically. Extensive evaluation validates the efficacy of the proposed approach in improving adversarial robustness against a wide range of attacks.
Abstract:Low-light image enhancement aims to improve the perception of images collected in dim environments and provide high-quality data support for image recognition tasks. When dealing with photos captured under non-uniform illumination, existing methods cannot adaptively extract the differentiated luminance information, which will easily cause over-exposure and under-exposure. From the perspective of unsupervised learning, we propose a multi-scale attention Transformer named MSATr, which sufficiently extracts local and global features for light balance to improve the visual quality. Specifically, we present a multi-scale window division scheme, which uses exponential sequences to adjust the window size of each layer. Within different-sized windows, the self-attention computation can be refined, ensuring the pixel-level feature processing capability of the model. For feature interaction across windows, a global transformer branch is constructed to provide comprehensive brightness perception and alleviate exposure problems. Furthermore, we propose a loop training strategy, using the diverse images generated by weighted mixing and a luminance consistency loss to improve the model's generalization ability effectively. Extensive experiments on several benchmark datasets quantitatively and qualitatively prove that our MSATr is superior to state-of-the-art low-light image enhancement methods, and the enhanced images have more natural brightness and outstanding details. The code is released at https://github.com/fang001021/MSATr.
Abstract:This paper presents the first approach to visualize the importance of topological features that define classes of data. Topological features, with their ability to abstract the fundamental structure of complex data, are an integral component of visualization and analysis pipelines. Although not all topological features present in data are of equal importance. To date, the default definition of feature importance is often assumed and fixed. This work shows how proven explainable deep learning approaches can be adapted for use in topological classification. In doing so, it provides the first technique that illuminates what topological structures are important in each dataset in regards to their class label. In particular, the approach uses a learned metric classifier with a density estimator of the points of a persistence diagram as input. This metric learns how to reweigh this density such that classification accuracy is high. By extracting this weight, an importance field on persistent point density can be created. This provides an intuitive representation of persistence point importance that can be used to drive new visualizations. This work provides two examples: Visualization on each diagram directly and, in the case of sublevel set filtrations on images, directly on the images themselves. This work highlights real-world examples of this approach visualizing the important topological features in graph, 3D shape, and medical image data.
Abstract:The analysis and mining of user heterogeneous behavior are of paramount importance in recommendation systems. However, the conventional approach of incorporating various types of heterogeneous behavior into recommendation models leads to feature sparsity and knowledge fragmentation issues. To address this challenge, we propose a novel approach for personalized recommendation via Large Language Model (LLM), by extracting and fusing heterogeneous knowledge from user heterogeneous behavior information. In addition, by combining heterogeneous knowledge and recommendation tasks, instruction tuning is performed on LLM for personalized recommendations. The experimental results demonstrate that our method can effectively integrate user heterogeneous behavior and significantly improve recommendation performance.