Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuxuan Wu

Bi-Directional Multi-Scale Graph Dataset Condensation via Information Bottleneck

Dec 23, 2024

Xingcheng Fu, Yisen Gao, Beining Yang, Yuxuan Wu, Haodong Qian, Qingyun Sun, Xianxian Li

Abstract:Dataset condensation has significantly improved model training efficiency, but its application on devices with different computing power brings new requirements for different data sizes. Thus, condensing multiple scale graphs simultaneously is the core of achieving efficient training in different on-device scenarios. Existing efficient works for multi-scale graph dataset condensation mainly perform efficient approximate computation in scale order (large-to-small or small-to-large scales). However, for non-Euclidean structures of sparse graph data, these two commonly used paradigms for multi-scale graph dataset condensation have serious scaling down degradation and scaling up collapse problems of a graph. The main bottleneck of the above paradigms is whether the effective information of the original graph is fully preserved when consenting to the primary sub-scale (the first of multiple scales), which determines the condensation effect and consistency of all scales. In this paper, we proposed a novel GNN-centric Bi-directional Multi-Scale Graph Dataset Condensation (BiMSGC) framework, to explore unifying paradigms by operating on both large-to-small and small-to-large for multi-scale graph condensation. Based on the mutual information theory, we estimate an optimal ``meso-scale'' to obtain the minimum necessary dense graph preserving the maximum utility information of the original graph, and then we achieve stable and consistent ``bi-directional'' condensation learning by optimizing graph eigenbasis matching with information bottleneck on other scales. Encouraging empirical results on several datasets demonstrates the significant superiority of the proposed framework in graph condensation at different scales.

* Accepted by the Main Technical Track of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI-2025)

Via

Access Paper or Ask Questions

RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning

Sep 30, 2024

Yuxuan Wu, Lei Pan, Wenhua Wu, Guangming Wang, Yanzi Miao, Hesheng Wang

Figure 1 for RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning

Figure 2 for RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning

Figure 3 for RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning

Figure 4 for RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning

Abstract:Sim-to-Real refers to the process of transferring policies learned in simulation to the real world, which is crucial for achieving practical robotics applications. However, recent Sim2real methods either rely on a large amount of augmented data or large learning models, which is inefficient for specific tasks. In recent years, radiance field-based reconstruction methods, especially the emergence of 3D Gaussian Splatting, making it possible to reproduce realistic real-world scenarios. To this end, we propose a novel real-to-sim-to-real reinforcement learning framework, RL-GSBridge, which introduces a mesh-based 3D Gaussian Splatting method to realize zero-shot sim-to-real transfer for vision-based deep reinforcement learning. We improve the mesh-based 3D GS modeling method by using soft binding constraints, enhancing the rendering quality of mesh models. We then employ a GS editing approach to synchronize rendering with the physics simulator, reflecting the interactions of the physical robot more accurately. Through a series of sim-to-real robotic arm experiments, including grasping and pick-and-place tasks, we demonstrate that RL-GSBridge maintains a satisfactory success rate in real-world task completion during sim-to-real transfer. Furthermore, a series of rendering metrics and visualization results indicate that our proposed mesh-based 3D Gaussian reduces artifacts in unstructured objects, demonstrating more realistic rendering performance.

* 7 pages, 5 figures, 4 tables, under review by ICRA2025

Via

Access Paper or Ask Questions

RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning

Aug 30, 2024

Sha Lu, Xuecheng Xu, Yuxuan Wu, Haojian Lu, Xieyuanli Chen, Rong Xiong, Yue Wang

Figure 1 for RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning

Figure 2 for RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning

Figure 3 for RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning

Figure 4 for RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning

Abstract:Global localization using onboard perception sensors, such as cameras and LiDARs, is crucial in autonomous driving and robotics applications when GPS signals are unreliable. Most approaches achieve global localization by sequential place recognition and pose estimation. Some of them train separate models for each task, while others employ a single model with dual heads, trained jointly with separate task-specific losses. However, the accuracy of localization heavily depends on the success of place recognition, which often fails in scenarios with significant changes in viewpoint or environmental appearance. Consequently, this renders the final pose estimation of localization ineffective. To address this, we propose a novel paradigm, PR-by-PE localization, which improves global localization accuracy by deriving place recognition directly from pose estimation. Our framework, RING#, is an end-to-end PR-by-PE localization network operating in the bird's-eye view (BEV) space, designed to support both vision and LiDAR sensors. It introduces a theoretical foundation for learning two equivariant representations from BEV features, which enables globally convergent and computationally efficient pose estimation. Comprehensive experiments on the NCLT and Oxford datasets across both vision and LiDAR modalities demonstrate that our method outperforms state-of-the-art approaches. Furthermore, we provide extensive analyses to confirm the effectiveness of our method. The code will be publicly released.

* 23 pages, 19 figures

Via

Access Paper or Ask Questions

A PLMs based protein retrieval framework

Jul 16, 2024

Yuxuan Wu, Xiao Yi, Yang Tan, Huiqun Yu, Guisheng Fan

Figure 1 for A PLMs based protein retrieval framework

Figure 2 for A PLMs based protein retrieval framework

Figure 3 for A PLMs based protein retrieval framework

Figure 4 for A PLMs based protein retrieval framework

Abstract:Protein retrieval, which targets the deconstruction of the relationship between sequences, structures and functions, empowers the advancing of biology. Basic Local Alignment Search Tool (BLAST), a sequence-similarity-based algorithm, has proved the efficiency of this field. Despite the existing tools for protein retrieval, they prioritize sequence similarity and probably overlook proteins that are dissimilar but share homology or functionality. In order to tackle this problem, we propose a novel protein retrieval framework that mitigates the bias towards sequence similarity. Our framework initiatively harnesses protein language models (PLMs) to embed protein sequences within a high-dimensional feature space, thereby enhancing the representation capacity for subsequent analysis. Subsequently, an accelerated indexed vector database is constructed to facilitate expedited access and retrieval of dense vectors. Extensive experiments demonstrate that our framework can equally retrieve both similar and dissimilar proteins. Moreover, this approach enables the identification of proteins that conventional methods fail to uncover. This framework will effectively assist in protein mining and empower the development of biology.

* 16 pages, 12 figures

Via

Access Paper or Ask Questions

Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Jul 04, 2024

Yuxuan Wu, Ziyu Wang, Bhiksha Raj, Gus Xia

Figure 1 for Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Figure 2 for Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Figure 3 for Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Figure 4 for Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Abstract:We contribute an unsupervised method that effectively learns from raw observation and disentangles its latent space into content and style representations. Unlike most disentanglement algorithms that rely on domain-specific labels and knowledge, our method is based on the insight of domain-general statistical differences between content and style -- content varies more among different fragments within a sample but maintains an invariant vocabulary across data samples, whereas style remains relatively invariant within a sample but exhibits more significant variation across different samples. We integrate such inductive bias into an encoder-decoder architecture and name our method after V3 (variance-versus-invariance). Experimental results show that V3 generalizes across two distinct domains in different modalities, music audio and images of written digits, successfully learning pitch-timbre and digit-color disentanglements, respectively. Also, the disentanglement robustness significantly outperforms baseline unsupervised methods and is even comparable to supervised counterparts. Furthermore, symbolic-level interpretability emerges in the learned codebook of content, forging a near one-to-one alignment between machine representation and human knowledge.

Via

Access Paper or Ask Questions

Motif-Centric Representation Learning for Symbolic Music

Sep 19, 2023

Yuxuan Wu, Roger B. Dannenberg, Gus Xia

Abstract:Music motif, as a conceptual building block of composition, is crucial for music structure analysis and automatic composition. While human listeners can identify motifs easily, existing computational models fall short in representing motifs and their developments. The reason is that the nature of motifs is implicit, and the diversity of motif variations extends beyond simple repetitions and modulations. In this study, we aim to learn the implicit relationship between motifs and their variations via representation learning, using the Siamese network architecture and a pretraining and fine-tuning pipeline. A regularization-based method, VICReg, is adopted for pretraining, while contrastive learning is used for fine-tuning. Experimental results on a retrieval-based task show that these two methods complement each other, yielding an improvement of 12.6% in the area under the precision-recall curve. Lastly, we visualize the acquired motif representations, offering an intuitive comprehension of the overall structure of a music piece. As far as we know, this work marks a noteworthy step forward in computational modeling of music motifs. We believe that this work lays the foundations for future applications of motifs in automatic music composition and music information retrieval.

Via

Access Paper or Ask Questions

DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion

Sep 09, 2022

Ruibin Yuan, Yuxuan Wu, Jacob Li, Jaxter Kim

Figure 1 for DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion

Figure 2 for DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion

Figure 3 for DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion

Figure 4 for DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion

Abstract:The widespread adoption of speech-based online services raises security and privacy concerns regarding the data that they use and share. If the data were compromised, attackers could exploit user speech to bypass speaker verification systems or even impersonate users. To mitigate this, we propose DeID-VC, a speaker de-identification system that converts a real speaker to pseudo speakers, thus removing or obfuscating the speaker-dependent attributes from a spoken voice. The key components of DeID-VC include a Variational Autoencoder (VAE) based Pseudo Speaker Generator (PSG) and a voice conversion Autoencoder (AE) under zero-shot settings. With the help of PSG, DeID-VC can assign unique pseudo speakers at speaker level or even at utterance level. Also, two novel learning objectives are added to bridge the gap between training and inference of zero-shot voice conversion. We present our experimental results with word error rate (WER) and equal error rate (EER), along with three subjective metrics to evaluate the generated output of DeID-VC. The result shows that our method substantially improved intelligibility (WER 10% lower) and de-identification effectiveness (EER 5% higher) compared to our baseline. Code and listening demo: https://github.com/a43992899/DeID-VC

* Accepted by Interspeech 2022

Via

Access Paper or Ask Questions

Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network

Sep 30, 2020

Yuxuan Wu, Hideki Nakayama

Figure 1 for Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network

Figure 2 for Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network

Figure 3 for Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network

Figure 4 for Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network

Abstract:Neural Module Network (NMN) is a machine learning model for solving the visual question answering tasks. NMN uses programs to encode modules' structures, and its modularized architecture enables it to solve logical problems more reasonably. However, because of the non-differentiable procedure of module selection, NMN is hard to be trained end-to-end. To overcome this problem, existing work either included ground-truth program into training data or applied reinforcement learning to explore the program. However, both of these methods still have weaknesses. In consideration of this, we proposed a new learning framework for NMN. Graph-based Heuristic Search is the algorithm we proposed to discover the optimal program through a heuristic search on the data structure named Program Graph. Our experiments on FigureQA and CLEVR dataset show that our methods can realize the training of NMN without ground-truth programs and achieve superior efficiency over existing reinforcement learning methods in program exploration.

Via

Access Paper or Ask Questions

Squeeze-and-Excitation on Spatial and Temporal Deep Feature Space for Action Recognition

Jul 20, 2018

Gaoyun An, Wen Zhou, Yuxuan Wu, Zhenxing Zheng, Yongwen Liu

Figure 1 for Squeeze-and-Excitation on Spatial and Temporal Deep Feature Space for Action Recognition

Figure 2 for Squeeze-and-Excitation on Spatial and Temporal Deep Feature Space for Action Recognition

Figure 3 for Squeeze-and-Excitation on Spatial and Temporal Deep Feature Space for Action Recognition

Figure 4 for Squeeze-and-Excitation on Spatial and Temporal Deep Feature Space for Action Recognition

Abstract:Spatial and temporal features are two key and complementary information for human action recognition. In order to make full use of the intra-frame spatial characteristics and inter-frame temporal relationships, we propose the Squeeze-and-Excitation Long-term Recurrent Convolutional Networks (SE-LRCN) for human action recognition. The Squeeze and Excitation operations are used to implement the feature recalibration. In SE-LRCN, Squeeze-and-Excitation ResNet-34 (SE-ResNet-34) network is adopted to extract spatial features to enhance the dependencies and importance of feature channels of pixel granularity. We also propose the Squeeze-and-Excitation Long Short-Term Memory (SE-LSTM) network to model the temporal relationship, and to enhance the dependencies and importance of feature channels of frame granularity. We evaluate the proposed model on two challenging benchmarks, HMDB51 and UCF101, and the proposed SE-LRCN achieves the competitive results with the state-of-the-art.

* Need to be Revised

Via

Access Paper or Ask Questions