Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinyu Gong

Large Language Models for Bioinformatics

Jan 10, 2025

Wei Ruan, Yanjun Lyu, Jing Zhang, Jiazhang Cai, Peng Shu, Yang Ge, Yao Lu, Shang Gao, Yue Wang, Peilong Wang(+45 more)

Figure 1 for Large Language Models for Bioinformatics

Abstract:With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we highlight emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.

* 64 pages, 1 figure

Via

Access Paper or Ask Questions

Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions

Nov 05, 2023

Xinyu Gong, Jason Holmes, Yiwei Li, Zhengliang Liu, Qi Gan, Zihao Wu, Jianli Zhang, Yusong Zou, Yuxi Teng, Tian Jiang(+4 more)

Abstract:Recent advances in Large Language Models (LLMs) have presented new opportunities for integrating Artificial General Intelligence (AGI) into biological research and education. This study evaluated the capabilities of leading LLMs, including GPT-4, GPT-3.5, PaLM2, Claude2, and SenseNova, in answering conceptual biology questions. The models were tested on a 108-question multiple-choice exam covering biology topics in molecular biology, biological techniques, metabolic engineering, and synthetic biology. Among the models, GPT-4 achieved the highest average score of 90 and demonstrated the greatest consistency across trials with different prompts. The results indicated GPT-4's proficiency in logical reasoning and its potential to aid biology research through capabilities like data analysis, hypothesis generation, and knowledge integration. However, further development and validation are still required before the promise of LLMs in accelerating biological discovery can be realized.

Via

Access Paper or Ask Questions

MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition

May 12, 2023

Xinyu Gong, Sreyas Mohan, Naina Dhingra, Jean-Charles Bazin, Yilei Li, Zhangyang Wang, Rakesh Ranjan

Abstract:In this paper, we study a novel problem in egocentric action recognition, which we term as "Multimodal Generalization" (MMG). MMG aims to study how systems can generalize when data from certain modalities is limited or even completely missing. We thoroughly investigate MMG in the context of standard supervised action recognition and the more challenging few-shot setting for learning new action categories. MMG consists of two novel scenarios, designed to support security, and efficiency considerations in real-world applications: (1) missing modality generalization where some modalities that were present during the train time are missing during the inference time, and (2) cross-modal zero-shot generalization, where the modalities present during the inference time and the training time are disjoint. To enable this investigation, we construct a new dataset MMG-Ego4D containing data points with video, audio, and inertial motion sensor (IMU) modalities. Our dataset is derived from Ego4D dataset, but processed and thoroughly re-annotated by human experts to facilitate research in the MMG problem. We evaluate a diverse array of models on MMG-Ego4D and propose new methods with improved generalization ability. In particular, we introduce a new fusion module with modality dropout training, contrastive-based alignment training, and a novel cross-modal prototypical loss for better few-shot performance. We hope this study will serve as a benchmark and guide future research in multimodal generalization problems. The benchmark and code will be available at https://github.com/facebookresearch/MMG_Ego4D.

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes

Oct 12, 2022

Zhiwen Fan, Peihao Wang, Yifan Jiang, Xinyu Gong, Dejia Xu, Zhangyang Wang

Figure 1 for NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes

Figure 2 for NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes

Figure 3 for NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes

Figure 4 for NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes

Abstract:Neural volumetric representations have shown the potential that Multi-layer Perceptrons (MLPs) can be optimized with multi-view calibrated images to represent scene geometry and appearance, without explicit 3D supervision. Object segmentation can enrich many downstream applications based on the learned radiance field. However, introducing hand-crafted segmentation to define regions of interest in a complex real-world scene is non-trivial and expensive as it acquires per view annotation. This paper carries out the exploration of self-supervised learning for object segmentation using NeRF for complex real-world scenes. Our framework, called NeRF with Self-supervised Object Segmentation NeRF-SOS, couples object segmentation and neural radiance field to segment objects in any view within a scene. By proposing a novel collaborative contrastive loss in both appearance and geometry levels, NeRF-SOS encourages NeRF models to distill compact geometry-aware segmentation clusters from their density fields and the self-supervised pre-trained 2D visual features. The self-supervised object segmentation framework can be applied to various NeRF models that both lead to photo-realistic rendering results and convincing segmentation maps for both indoor and outdoor scenarios. Extensive results on the LLFF, Tank & Temple, and BlendedMVS datasets validate the effectiveness of NeRF-SOS. It consistently surpasses other 2D-based self-supervised baselines and predicts finer semantics masks than existing supervised counterparts. Please refer to the video on our project page for more details:https://zhiwenfan.github.io/NeRF-SOS.

Via

Access Paper or Ask Questions

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

May 11, 2022

Wuyang Chen, Wei Huang, Xinyu Gong, Boris Hanin, Zhangyang Wang

Figure 1 for Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

Figure 2 for Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

Figure 3 for Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

Figure 4 for Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

Abstract:Advanced deep neural networks (DNNs), designed by either human or AutoML algorithms, are growing increasingly complex. Diverse operations are connected by complicated connectivity patterns, e.g., various types of skip connections. Those topological compositions are empirically effective and observed to smooth the loss landscape and facilitate the gradient flow in general. However, it remains elusive to derive any principled understanding of their effects on the DNN capacity or trainability, and to understand why or in which aspect one specific connectivity pattern is better than another. In this work, we theoretically characterize the impact of connectivity patterns on the convergence of DNNs under gradient descent training in fine granularity. By analyzing a wide network's Neural Network Gaussian Process (NNGP), we are able to depict how the spectrum of an NNGP kernel propagates through a particular connectivity pattern, and how that affects the bound of convergence rates. As one practical implication of our results, we show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate, and significantly accelerate the large-scale neural architecture search without any overhead. Codes will be released at https://github.com/chenwydj/architecture_convergence.

Via

Access Paper or Ask Questions

Unified Implicit Neural Stylization

Apr 06, 2022

Zhiwen Fan, Yifan Jiang, Peihao Wang, Xinyu Gong, Dejia Xu, Zhangyang Wang

Figure 1 for Unified Implicit Neural Stylization

Figure 2 for Unified Implicit Neural Stylization

Figure 3 for Unified Implicit Neural Stylization

Figure 4 for Unified Implicit Neural Stylization

Abstract:Representing visual signals by implicit representation (e.g., a coordinate based deep network) has prevailed among many vision tasks. This work explores a new intriguing direction: training a stylized implicit representation, using a generalized approach that can apply to various 2D and 3D scenarios. We conduct a pilot study on a variety of implicit functions, including 2D coordinate-based representation, neural radiance field, and signed distance function. Our solution is a Unified Implicit Neural Stylization framework, dubbed INS. In contrary to vanilla implicit representation, INS decouples the ordinary implicit function into a style implicit module and a content implicit module, in order to separately encode the representations from the style image and input scenes. An amalgamation module is then applied to aggregate these information and synthesize the stylized output. To regularize the geometry in 3D scenes, we propose a novel self-distillation geometry consistency loss which preserves the geometry fidelity of the stylized scenes. Comprehensive experiments are conducted on multiple task settings, including novel view synthesis of complex scenes, stylization for implicit surfaces, and fitting images using MLPs. We further demonstrate that the learned representation is continuous not only spatially but also style-wise, leading to effortlessly interpolating between different styles and generating images with new mixed styles. Please refer to the video on our project page for more view synthesis results: https://zhiwenfan.github.io/INS.

Via

Access Paper or Ask Questions

Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search

Dec 09, 2021

Yifan Jiang, Xinyu Gong, Junru Wu, Humphrey Shi, Zhicheng Yan, Zhangyang Wang

Figure 1 for Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search

Figure 2 for Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search

Figure 3 for Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search

Figure 4 for Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search

Abstract:Efficient video architecture is the key to deploying video recognition systems on devices with limited computing resources. Unfortunately, existing video architectures are often computationally intensive and not suitable for such applications. The recent X3D work presents a new family of efficient video models by expanding a hand-crafted image architecture along multiple axes, such as space, time, width, and depth. Although operating in a conceptually large space, X3D searches one axis at a time, and merely explored a small set of 30 architectures in total, which does not sufficiently explore the space. This paper bypasses existing 2D architectures, and directly searched for 3D architectures in a fine-grained space, where block type, filter number, expansion ratio and attention block are jointly searched. A probabilistic neural architecture search method is adopted to efficiently search in such a large space. Evaluations on Kinetics and Something-Something-V2 benchmarks confirm our AutoX3D models outperform existing ones in accuracy up to 1.3% under similar FLOPs, and reduce the computational cost up to x1.74 when reaching similar performance.

* Accepted by WACV'2022

Via

Access Paper or Ask Questions

Searching for Two-Stream Models in Multivariate Space for Video Recognition

Aug 30, 2021

Xinyu Gong, Heng Wang, Zheng Shou, Matt Feiszli, Zhangyang Wang, Zhicheng Yan

Figure 1 for Searching for Two-Stream Models in Multivariate Space for Video Recognition

Figure 2 for Searching for Two-Stream Models in Multivariate Space for Video Recognition

Figure 3 for Searching for Two-Stream Models in Multivariate Space for Video Recognition

Figure 4 for Searching for Two-Stream Models in Multivariate Space for Video Recognition

Abstract:Conventional video models rely on a single stream to capture the complex spatial-temporal features. Recent work on two-stream video models, such as SlowFast network and AssembleNet, prescribe separate streams to learn complementary features, and achieve stronger performance. However, manually designing both streams as well as the in-between fusion blocks is a daunting task, requiring to explore a tremendously large design space. Such manual exploration is time-consuming and often ends up with sub-optimal architectures when computational resources are limited and the exploration is insufficient. In this work, we present a pragmatic neural architecture search approach, which is able to search for two-stream video models in giant spaces efficiently. We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models. Furthermore, we propose a progressive search procedure, by searching for the architecture of individual streams, fusion blocks, and attention blocks one after the other. We demonstrate two-stream models with significantly better performance can be automatically discovered in our design space. Our searched two-stream models, namely Auto-TSNet, consistently outperform other models on standard benchmarks. On Kinetics, compared with the SlowFast model, our Auto-TSNet-L model reduces FLOPS by nearly 11 times while achieving the same accuracy 78.9%. On Something-Something-V2, Auto-TSNet-M improves the accuracy by at least 2% over other methods which use less than 50 GFLOPS per video.

* Accepted by ICCV 2021

Via

Access Paper or Ask Questions

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

Aug 26, 2021

Wuyang Chen, Xinyu Gong, Yunchao Wei, Humphrey Shi, Zhicheng Yan, Yi Yang, Zhangyang Wang

Figure 1 for Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

Figure 2 for Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

Figure 3 for Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

Figure 4 for Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

Abstract:This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation. NAS has been explosively studied to automate the discovery of top-performer neural networks, but suffers from heavy resource consumption and often incurs search bias due to truncated training or approximations. Recent NAS works start to explore indicators that can predict a network's performance without training. However, they either leveraged limited properties of deep networks, or the benefits of their training-free indicators are not applied to more extensive search methods. By rigorous correlation analysis, we present a unified framework to understand and accelerate NAS, by disentangling "TEG" characteristics of searched networks - Trainability, Expressivity, Generalization - all assessed in a training-free manner. The TEG indicators could be scaled up and integrated with various NAS search methods, including both supernet and single-path approaches. Extensive studies validate the effective and efficient guidance from our TEG-NAS framework, leading to both improved search accuracy and over 2.3x reduction in search time cost. Moreover, we visualize search trajectories on three landscapes of "TEG" characteristics, observing that while a good local minimum is easier to find on NAS-Bench-201 given its simple topology, balancing "TEG" characteristics is much harder on the DARTS search space due to its complex landscape geometry. Our code is available at https://github.com/VITA-Group/TEGNAS.

* submitted to TPAMI

Via

Access Paper or Ask Questions

SAFIN: Arbitrary Style Transfer With Self-Attentive Factorized Instance Normalization

May 20, 2021

Aaditya Singh, Shreeshail Hingane, Xinyu Gong, Zhangyang Wang

Figure 1 for SAFIN: Arbitrary Style Transfer With Self-Attentive Factorized Instance Normalization

Figure 2 for SAFIN: Arbitrary Style Transfer With Self-Attentive Factorized Instance Normalization

Figure 3 for SAFIN: Arbitrary Style Transfer With Self-Attentive Factorized Instance Normalization

Figure 4 for SAFIN: Arbitrary Style Transfer With Self-Attentive Factorized Instance Normalization

Abstract:Artistic style transfer aims to transfer the style characteristics of one image onto another image while retaining its content. Existing approaches commonly leverage various normalization techniques, although these face limitations in adequately transferring diverse textures to different spatial locations. Self-Attention-based approaches have tackled this issue with partial success but suffer from unwanted artifacts. Motivated by these observations, this paper aims to combine the best of both worlds: self-attention and normalization. That yields a new plug-and-play module that we name Self-Attentive Factorized Instance Normalization (SAFIN). SAFIN is essentially a spatially adaptive normalization module whose parameters are inferred through attention on the content and style image. We demonstrate that plugging SAFIN into the base network of another state-of-the-art method results in enhanced stylization. We also develop a novel base network composed of Wavelet Transform for multi-scale style transfer, which when combined with SAFIN, produces visually appealing results with lesser unwanted textures.

* Accepted at ICME 2021, 5 Pages + 1 Page (references)

Via

Access Paper or Ask Questions