Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yusong Wang

GenColor: Generative Color-Concept Association in Visual Design

Mar 05, 2025

Yihan Hou, Xingchen Zeng, Yusong Wang, Manling Yang, Xiaojiao Chen, Wei Zeng

Abstract:Existing approaches for color-concept association typically rely on query-based image referencing, and color extraction from image references. However, these approaches are effective only for common concepts, and are vulnerable to unstable image referencing and varying image conditions. Our formative study with designers underscores the need for primary-accent color compositions and context-dependent colors (e.g., 'clear' vs. 'polluted' sky) in design. In response, we introduce a generative approach for mining semantically resonant colors leveraging images generated by text-to-image models. Our insight is that contemporary text-to-image models can resemble visual patterns from large-scale real-world data. The framework comprises three stages: concept instancing produces generative samples using diffusion models, text-guided image segmentation identifies concept-relevant regions within the image, and color association extracts primarily accompanied by accent colors. Quantitative comparisons with expert designs validate our approach's effectiveness, and we demonstrate the applicability through cases in various design scenarios and a gallery.

* 19 pages, 16 figures. Accepted at CHI Conference on Human Factors in Computing Systems (CHI'25), April 26-May 1, 2025, Yokohama, Japan

Via

Access Paper or Ask Questions

Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates

Dec 02, 2024

Kaiyuan Gao, Yusong Wang, Haoxiang Guan, Zun Wang, Qizhi Pei, John E. Hopcroft, Kun He, Lijun Wu

Figure 1 for Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates

Figure 2 for Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates

Figure 3 for Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates

Figure 4 for Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates

Abstract:The application of language models (LMs) to molecular structure generation using line notations such as SMILES and SELFIES has been well-established in the field of cheminformatics. However, extending these models to generate 3D molecular structures presents significant challenges. Two primary obstacles emerge: (1) the difficulty in designing a 3D line notation that ensures SE(3)-invariant atomic coordinates, and (2) the non-trivial task of tokenizing continuous coordinates for use in LMs, which inherently require discrete inputs. To address these challenges, we propose Mol-StrucTok, a novel method for tokenizing 3D molecular structures. Our approach comprises two key innovations: (1) We design a line notation for 3D molecules by extracting local atomic coordinates in a spherical coordinate system. This notation builds upon existing 2D line notations and remains agnostic to their specific forms, ensuring compatibility with various molecular representation schemes. (2) We employ a Vector Quantized Variational Autoencoder (VQ-VAE) to tokenize these coordinates, treating them as generation descriptors. To further enhance the representation, we incorporate neighborhood bond lengths and bond angles as understanding descriptors. Leveraging this tokenization framework, we train a GPT-2 style model for 3D molecular generation tasks. Results demonstrate strong performance with significantly faster generation speeds and competitive chemical stability compared to previous methods. Further, by integrating our learned discrete representations into Graphormer model for property prediction on QM9 dataset, Mol-StrucTok reveals consistent improvements across various molecular properties, underscoring the versatility and robustness of our approach.

* 17 pages, 6 figures, preprint

Via

Access Paper or Ask Questions

Brain-inspired continual pre-trained learner via silent synaptic consolidation

Oct 08, 2024

Xuming Ran, Juntao Yao, Yusong Wang, Mingkun Xu, Dianbo Liu

Abstract:Pre-trained models have demonstrated impressive generalization capabilities, yet they remain vulnerable to catastrophic forgetting when incrementally trained on new tasks. Existing architecture-based strategies encounter two primary challenges: 1) Integrating a pre-trained network with a trainable sub-network complicates the delicate balance between learning plasticity and memory stability across evolving tasks during learning. 2) The absence of robust interconnections between pre-trained networks and various sub-networks limits the effective retrieval of pertinent information during inference. In this study, we introduce the Artsy, inspired by the activation mechanisms of silent synapses via spike-timing-dependent plasticity observed in mature brains, to enhance the continual learning capabilities of pre-trained models. The Artsy integrates two key components: During training, the Artsy mimics mature brain dynamics by maintaining memory stability for previously learned knowledge within the pre-trained network while simultaneously promoting learning plasticity in task-specific sub-networks. During inference, artificial silent and functional synapses are utilized to establish precise connections between the pre-synaptic neurons in the pre-trained network and the post-synaptic neurons in the sub-networks, facilitated through synaptic consolidation, thereby enabling effective extraction of relevant information from test samples. Comprehensive experimental evaluations reveal that our model significantly outperforms conventional methods on class-incremental learning tasks, while also providing enhanced biological interpretability for architecture-based approaches. Moreover, we propose that the Artsy offers a promising avenue for simulating biological synaptic mechanisms, potentially advancing our understanding of neural plasticity in both artificial and biological systems.

Via

Access Paper or Ask Questions

Neural P$^3$M: A Long-Range Interaction Modeling Enhancer for Geometric GNNs

Sep 26, 2024

Yusong Wang, Chaoran Cheng, Shaoning Li, Yuxuan Ren, Bin Shao, Ge Liu, Pheng-Ann Heng, Nanning Zheng

Figure 1 for Neural P$^3$M: A Long-Range Interaction Modeling Enhancer for Geometric GNNs

Figure 2 for Neural P$^3$M: A Long-Range Interaction Modeling Enhancer for Geometric GNNs

Figure 3 for Neural P$^3$M: A Long-Range Interaction Modeling Enhancer for Geometric GNNs

Figure 4 for Neural P$^3$M: A Long-Range Interaction Modeling Enhancer for Geometric GNNs

Abstract:Geometric graph neural networks (GNNs) have emerged as powerful tools for modeling molecular geometry. However, they encounter limitations in effectively capturing long-range interactions in large molecular systems. To address this challenge, we introduce Neural P$^3$M, a versatile enhancer of geometric GNNs to expand the scope of their capabilities by incorporating mesh points alongside atoms and reimaging traditional mathematical operations in a trainable manner. Neural P$^3$M exhibits flexibility across a wide range of molecular systems and demonstrates remarkable accuracy in predicting energies and forces, outperforming on benchmarks such as the MD22 dataset. It also achieves an average improvement of 22% on the OE62 dataset while integrating with various architectures.

* Published as a conference paper at NeurIPS 2024

Via

Access Paper or Ask Questions

Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models

Jun 27, 2024

Yicheng Xu, Yuxin Chen, Jiahao Nie, Yusong Wang, Huiping Zhuang, Manabu Okumura

Figure 1 for Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models

Figure 2 for Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models

Figure 3 for Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models

Figure 4 for Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models

Abstract:Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot ability on unseen domains without any reference data. Additionally, we introduce Cross-domain Task-Agnostic Incremental Learning (X-TAIL) setting. In this setting, a CL learner is required to incrementally learn from multiple domains and classify test images from both seen and unseen domains without any domain-identity hint. We theoretically prove RAIL's absolute memorization on incrementally learned domains. Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings. The code will be released upon acceptance.

Via

Access Paper or Ask Questions

VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

May 29, 2024

Ziqing Xing, Zhaoyang Zhang, Zirui Chen, Yusong Wang, Haoran Ma, Zhun Wei, Gang Bao

Figure 1 for VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

Figure 2 for VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

Figure 3 for VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

Figure 4 for VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

Abstract:Recently, studies have shown the potential of integrating field-type iterative methods with deep learning (DL) techniques in solving inverse scattering problems (ISPs). In this article, we propose a novel Variational Born Iterative Network, namely, VBIM-Net, to solve the full-wave ISPs with significantly improved flexibility and inversion quality. The proposed VBIM-Net emulates the alternating updates of the total electric field and the contrast in the variational Born iterative method (VBIM) by multiple layers of subnetworks. We embed the calculation of the contrast variation into each of the subnetworks, converting the scattered field residual into an approximate contrast variation and then enhancing it by a U-Net, thus avoiding the requirement of matched measurement dimension and grid resolution as in existing approaches. The total field and contrast of each layer's output is supervised in the loss function of VBIM-Net, which guarantees the physical interpretability of variables of the subnetworks. In addition, we design a training scheme with extra noise to enhance the model's stability. Extensive numerical results on synthetic and experimental data both verify the inversion quality, generalization ability, and robustness of the proposed VBIM-Net. This work may provide some new inspiration for the design of efficient field-type DL schemes.

* 14 pages, 21 figures

Via

Access Paper or Ask Questions

F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE Guided Flow Matching

May 01, 2024

Shaoning Li, Yusong Wang, Mingyu Li, Jian Zhang, Bin Shao, Nanning Zheng, Jian Tang

Figure 1 for F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE Guided Flow Matching

Figure 2 for F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE Guided Flow Matching

Figure 3 for F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE Guided Flow Matching

Figure 4 for F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE Guided Flow Matching

Abstract:Molecular dynamics (MD) is a crucial technique for simulating biological systems, enabling the exploration of their dynamic nature and fostering an understanding of their functions and properties. To address exploration inefficiency, emerging enhanced sampling approaches like coarse-graining (CG) and generative models have been employed. In this work, we propose a \underline{Frame-to-Frame} generative model with guided \underline{Flow}-matching (F$3$low) for enhanced sampling, which (a) extends the domain of CG modeling to the SE(3) Riemannian manifold; (b) retreating CGMD simulations as autoregressively sampling guided by the former frame via flow-matching models; (c) targets the protein backbone, offering improved insights into secondary structure formation and intricate folding pathways. Compared to previous methods, F$3$low allows for broader exploration of conformational space. The ability to rapidly generate diverse conformations via force-free generative paradigm on SE(3) paves the way toward efficient enhanced sampling methods.

* Accepted by ICLR 2024 GEM workshop

Via

Access Paper or Ask Questions

Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition

May 01, 2024

Dongyuan Li, Ying Zhang, Yusong Wang, Funakoshi Kataro, Manabu Okumura

Abstract:Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training speech recognition task and the downstream speech emotion recognition task. Then, AL methods are employed to iteratively select a subset of the most informative and diverse samples for fine-tuning, thereby reducing time consumption. Experiments demonstrate that our proposed method \textsc{After}, using only 20\% of samples, improves accuracy by 8.45\% and reduces time consumption by 79\%. The additional extension of \textsc{After} and ablation studies further confirm its effectiveness and applicability to various real-world scenarios. Our source code is available on Github for reproducibility. (https://github.com/Clearloveyuan/AFTER).

* Accepted by Journal of Natural Language Processing. arXiv admin note: text overlap with arXiv:2310.00283

Via

Access Paper or Ask Questions

Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition

Nov 18, 2023

Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

Abstract:Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter-view and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared to all baselines.

Via

Access Paper or Ask Questions

Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Sep 30, 2023

Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

Figure 1 for Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Figure 2 for Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Figure 3 for Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Figure 4 for Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Abstract:Speech emotion recognition (SER) has drawn increasing attention for its applications in human-machine interaction. However, existing SER methods ignore the information gap between the pre-training speech recognition task and the downstream SER task, leading to sub-optimal performance. Moreover, they require much time to fine-tune on each specific speech dataset, restricting their effectiveness in real-world scenes with large-scale noisy data. To address these issues, we propose an active learning (AL) based Fine-Tuning framework for SER that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training and the downstream task. Then, AL methods are used to iteratively select a subset of the most informative and diverse samples for fine-tuning, reducing time consumption. Experiments demonstrate that using only 20\%pt. samples improves 8.45\%pt. accuracy and reduces 79\%pt. time consumption.

Via

Access Paper or Ask Questions