Abstract:In a bipartite graph, a subgraph is an $s$-biplex if each vertex of the subgraph is adjacent to all but at most $s$ vertices on the opposite set. The enumeration of $s$-biplexes from a given graph is a fundamental problem in bipartite graph analysis. However, in real-world data engineering, finding all $s$-biplexes is neither necessary nor computationally affordable. A more realistic problem is to identify some of the largest $s$-biplexes from the large input graph. We formulate the problem as the {\em top-$k$ $s$-biplex search (TBS) problem}, which aims to find the top-$k$ maximal $s$-biplexes with the most vertices, where $k$ is an input parameter. We prove that the TBS problem is NP-hard for any fixed $k\ge 1$. Then, we propose a branching algorithm, named MVBP, that breaks the simple $2^n$ enumeration algorithm. Furthermore, from a practical perspective, we investigate three techniques to improve the performance of MVBP: 2-hop decomposition, single-side bounds, and progressive search. Complexity analysis shows that the improved algorithm, named FastMVBP, has a running time $O^*(\gamma_s^{d_2})$, where $\gamma_s<2$, and $d_2$ is a parameter much smaller than the number of vertex in the sparse real-world graphs, e.g. $d_2$ is only $67$ in the AmazonRatings dataset which has more than $3$ million vertices. Finally, we conducted extensive experiments on eight real-world and synthetic datasets to demonstrate the empirical efficiency of the proposed algorithms. In particular, FastMVBP outperforms the benchmark algorithms by up to three orders of magnitude in several instances.
Abstract:Deep learning holds a big promise for optimizing existing peptides with more desirable properties, a critical step towards accelerating new drug discovery. Despite the recent emergence of several optimized Antimicrobial peptides(AMP) generation methods, multi-objective optimizations remain still quite challenging for the idealism-realism tradeoff. Here, we establish a multi-objective AMP synthesis pipeline (MoFormer) for the simultaneous optimization of multi-attributes of AMPs. MoFormer improves the desired attributes of AMP sequences in a highly structured latent space, guided by conditional constraints and fine-grained multi-descriptor.We show that MoFormer outperforms existing methods in the generation task of enhanced antimicrobial activity and minimal hemolysis. We also utilize a Pareto-based non-dominated sorting algorithm and proxies based on large model fine-tuning to hierarchically rank the candidates. We demonstrate substantial property improvement using MoFormer from two perspectives: (1) employing molecular simulations and scoring interactions among amino acids to decipher the structure and functionality of AMPs; (2) visualizing latent space to examine the qualities and distribution features, verifying an effective means to facilitate multi-objective optimization AMPs with design constraints
Abstract:While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Here, we introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the 'teachers'. To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers', enabling it to generate novel molecules that conform to the descriptions through various text prompts. We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements across two-, three-, and four-constraint tasks, with an average molecular validity of over 99% and success ratio of 88.08%, 65.27%, and 61.44%, respectively. The model also exhibits adaptability through zero-shot testing, creating molecules that satisfy combinations of properties that have not been encountered. It can comprehend text inputs with various language styles, extending beyond the confines of outlined prompts, as confirmed through empirical validation. Additionally, the knowledge distillation feature of TSMMG contributes to the continuous enhancement of small models, while the innovative approach to dataset construction effectively addresses the issues of data scarcity and quality, which positions TSMMG as a promising tool in the domains of drug discovery and materials science. Code is available at https://github.com/HHW-zhou/TSMMG.
Abstract:Better understanding the natural world is a crucial task with a wide range of applications. In environments with close proximity between humans and animals, such as zoos, it is essential to better understand the causes behind animal behaviour and what interventions are responsible for changes in their behaviours. This can help to predict unusual behaviours, mitigate detrimental effects and increase the well-being of animals. There has been work on modelling the dynamics behind swarms of birds and insects but the complex social behaviours of mammalian groups remain less explored. In this work, we propose a method to build behavioural models using causal structure discovery and graph neural networks for time series. We apply this method to a mob of meerkats in a zoo environment and study its ability to predict future actions and model the behaviour distribution at an individual-level and at a group level. We show that our method can match and outperform standard deep learning architectures and generate more realistic data, while using fewer parameters and providing increased interpretability.
Abstract:Link prediction in biomedical knowledge graphs (KGs) aims at predicting unknown interactions between entities, including drug-target interaction (DTI) and drug-drug interaction (DDI), which is critical for drug discovery and therapeutics. Previous methods prefer to utilize the rich semantic relations and topological structure of the KG to predict missing links, yielding promising outcomes. However, all these works only focus on improving the predictive performance without considering the inevitable noise and unreliable interactions existing in the KGs, which limits the development of KG-based computational methods. To address these limitations, we propose a Denoised Link Prediction framework, called DenoisedLP. DenoisedLP obtains reliable interactions based on the local subgraph by denoising noisy links in a learnable way, providing a universal module for mining underlying task-relevant relations. To collaborate with the smoothed semantic information, DenoisedLP introduces the semantic subgraph by blurring conflict relations around the predicted link. By maximizing the mutual information between the reliable structure and smoothed semantic relations, DenoisedLP emphasizes the informative interactions for predicting relation-specific links. Experimental results on real-world datasets demonstrate that DenoisedLP outperforms state-of-the-art methods on DTI and DDI prediction tasks, and verify the effectiveness and robustness of denoising unreliable interactions on the contaminated KGs.
Abstract:Decomposition-based multiobjective evolutionary algorithms (MOEAs) with clustering-based reference vector adaptation show good optimization performance for many-objective optimization problems (MaOPs). Especially, algorithms that employ a clustering algorithm with a topological structure (i.e., a network composed of nodes and edges) show superior optimization performance to other MOEAs for MaOPs with irregular Pareto optimal fronts (PFs). These algorithms, however, do not effectively utilize information of the topological structure in the search process. Moreover, the clustering algorithms typically used in conventional studies have limited clustering performance, inhibiting the ability to extract useful information for the search process. This paper proposes an adaptive reference vector-guided evolutionary algorithm using an adaptive resonance theory-based clustering with a topological structure. The proposed algorithm utilizes the information of the topological structure not only for reference vector adaptation but also for mating selection. The proposed algorithm is compared with 8 state-of-the-art MOEAs on 78 test problems. Experimental results reveal the outstanding optimization performance of the proposed algorithm over the others on MaOPs with various properties.
Abstract:Interpersonal ties are pivotal to individual efficacy, status and performance in an agent society. This paper explores three important and interrelated themes in social network theory: the center/periphery partition of the network; network dynamics; and social integration of newcomers. We tackle the question: How would a newcomer harness information brokerage to integrate into a dynamic network going from periphery to center? We model integration as the interplay between the newcomer and the dynamics network and capture information brokerage using a process of relationship building. We analyze theoretical guarantees for the newcomer to reach the center through tactics; proving that a winning tactic always exists for certain types of network dynamics. We then propose three tactics and show their superior performance over alternative methods on four real-world datasets and four network models. In general, our tactics place the newcomer to the center by adding very few new edges on dynamic networks with approximately 14000 nodes.