Abstract:Few-shot medical image segmentation has achieved great progress in improving accuracy and efficiency of medical analysis in the biomedical imaging field. However, most existing methods cannot explore inter-class relations among base and novel medical classes to reason unseen novel classes. Moreover, the same kind of medical class has large intra-class variations brought by diverse appearances, shapes and scales, thus causing ambiguous visual characterization to degrade generalization performance of these existing methods on unseen novel classes. To address the above challenges, in this paper, we propose a \underline{\textbf{P}}rototype correlation \underline{\textbf{M}}atching and \underline{\textbf{C}}lass-relation \underline{\textbf{R}}easoning (i.e., \textbf{PMCR}) model. The proposed model can effectively mitigate false pixel correlation matches caused by large intra-class variations while reasoning inter-class relations among different medical classes. Specifically, in order to address false pixel correlation match brought by large intra-class variations, we propose a prototype correlation matching module to mine representative prototypes that can characterize diverse visual information of different appearances well. We aim to explore prototype-level rather than pixel-level correlation matching between support and query features via optimal transport algorithm to tackle false matches caused by intra-class variations. Meanwhile, in order to explore inter-class relations, we design a class-relation reasoning module to segment unseen novel medical objects via reasoning inter-class relations between base and novel classes. Such inter-class relations can be well propagated to semantic encoding of local query features to improve few-shot segmentation performance. Quantitative comparisons illustrates the large performance improvement of our model over other baseline methods.
Abstract:Text-to-3D content creation is a rapidly evolving research area. Given the scarcity of 3D data, current approaches often adapt pre-trained 2D diffusion models for 3D synthesis. Among these approaches, Score Distillation Sampling (SDS) has been widely adopted. However, the issue of over-smoothing poses a significant limitation on the high-fidelity generation of 3D models. To address this challenge, LucidDreamer replaces the Denoising Diffusion Probabilistic Model (DDPM) in SDS with the Denoising Diffusion Implicit Model (DDIM) to construct Interval Score Matching (ISM). However, ISM inevitably inherits inconsistencies from DDIM, causing reconstruction errors during the DDIM inversion process. This results in poor performance in the detailed generation of 3D objects and loss of content. To alleviate these problems, we propose a novel method named Exact Score Matching (ESM). Specifically, ESM leverages auxiliary variables to mathematically guarantee exact recovery in the DDIM reverse process. Furthermore, to effectively capture the dynamic changes of the original and auxiliary variables, the LoRA of a pre-trained diffusion model implements these exact paths. Extensive experiments demonstrate the effectiveness of ESM in text-to-3D generation, particularly highlighting its superiority in detailed generation.
Abstract:With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark from the perspective of disassembling Sora in text-to-video generation, and conducting a comprehensive review of literature, trying to answer the question, \textit{From Sora What We Can See}. Specifically, after basic preliminaries regarding the general algorithms are introduced, the literature is categorized from three mutually perpendicular dimensions: evolutionary generators, excellent pursuit, and realistic panorama. Subsequently, the widely used datasets and metrics are organized in detail. Last but more importantly, we identify several challenges and open problems in this domain and propose potential future directions for research and development.
Abstract:Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for lithium batteries. We design a physics-inspired graph equivariant transformer architecture as the backbone of BAMBOO to learn from quantum mechanical simulations. Additionally, we pioneer an ensemble knowledge distillation approach and apply it on MLFFs to improve the stability of MD simulations. Finally, we propose the density alignment algorithm to align BAMBOO with experimental measurements. BAMBOO demonstrates state-of-the-art accuracy in predicting key electrolyte properties such as density, viscosity, and ionic conductivity across various solvents and salt combinations. Our current model, trained on more than 15 chemical species, achieves the average density error of 0.01 g/cm$^3$ on various compositions compared with experimental data. Moreover, our model demonstrates transferability to molecules not included in the quantum mechanical dataset. We envision this work as paving the way to a "universal MLFF" capable of simulating properties of common organic liquids.
Abstract:Unsupervised Domain Adaptation (UDA), which aims to explore the transferrable features from a well-labeled source domain to a related unlabeled target domain, has been widely progressed. Nevertheless, as one of the mainstream, existing adversarial-based methods neglect to filter the irrelevant semantic knowledge, hindering adaptation performance improvement. Besides, they require an additional domain discriminator that strives extractor to generate confused representations, but discrete designing may cause model collapse. To tackle the above issues, we propose Crucial Semantic Classifier-based Adversarial Learning (CSCAL), which pays more attention to crucial semantic knowledge transferring and leverages the classifier to implicitly play the role of domain discriminator without extra network designing. Specifically, in intra-class-wise alignment, a Paired-Level Discrepancy (PLD) is designed to transfer crucial semantic knowledge. Additionally, based on classifier predictions, a Nuclear Norm-based Discrepancy (NND) is formed that considers inter-class-wise information and improves the adaptation performance. Moreover, CSCAL can be effortlessly merged into different UDA methods as a regularizer and dramatically promote their performance.
Abstract:Deep learning-based Computer-Aided Diagnosis (CAD) has attracted appealing attention in academic researches and clinical applications. Nevertheless, the Convolutional Neural Networks (CNNs) diagnosis system heavily relies on the well-labeled lesion dataset, and the sensitivity to the variation of data distribution also restricts the potential application of CNNs in CAD. Unsupervised Domain Adaptation (UDA) methods are developed to solve the expensive annotation and domain gaps problem and have achieved remarkable success in medical image analysis. Yet existing UDA approaches only adapt knowledge learned from the source lesion domain to a single target lesion domain, which is against the clinical scenario: the new unlabeled target domains to be diagnosed always arrive in an online and continual manner. Moreover, the performance of existing approaches degrades dramatically on previously learned target lesion domains, due to the newly learned knowledge overwriting the previously learned knowledge (i.e., catastrophic forgetting). To deal with the above issues, we develop a meta-adaptation framework named Consecutive Lesion Knowledge Meta-Adaptation (CLKM), which mainly consists of Semantic Adaptation Phase (SAP) and Representation Adaptation Phase (RAP) to learn the diagnosis model in an online and continual manner. In the SAP, the semantic knowledge learned from the source lesion domain is transferred to consecutive target lesion domains. In the RAP, the feature-extractor is optimized to align the transferable representation knowledge across the source and multiple target lesion domains.
Abstract:Uplift modeling is a predictive modeling technique that estimates the user-level incremental effect of a treatment using machine learning models. It is often used for targeting promotions and advertisements, as well as for the personalization of product offerings. In these applications, there are often hundreds of features available to build such models. Keeping all the features in a model can be costly and inefficient. Feature selection is an essential step in the modeling process for multiple reasons: improving the estimation accuracy by eliminating irrelevant features, accelerating model training and prediction speed, reducing the monitoring and maintenance workload for feature data pipeline, and providing better model interpretation and diagnostics capability. However, feature selection methods for uplift modeling have been rarely discussed in the literature. Although there are various feature selection methods for standard machine learning models, we will demonstrate that those methods are sub-optimal for solving the feature selection problem for uplift modeling. To address this problem, we introduce a set of feature selection methods designed specifically for uplift modeling, including both filter methods and embedded methods. To evaluate the effectiveness of the proposed feature selection methods, we use different uplift models and measure the accuracy of each model with a different number of selected features. We use both synthetic and real data to conduct these experiments. We also implemented the proposed filter methods in an open source Python package (CausalML).