Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qi Feng

Kyoto University

GraphProp: Training the Graph Foundation Models using Graph Properties

Aug 06, 2025

Ziheng Sun, Qi Feng, Lehao Lin, Chris Ding, Jicong Fan

Abstract:This work focuses on training graph foundation models (GFMs) that have strong generalization ability in graph-level tasks such as graph classification. Effective GFM training requires capturing information consistent across different domains. We discover that graph structures provide more consistent cross-domain information compared to node features and graph labels. However, traditional GFMs primarily focus on transferring node features from various domains into a unified representation space but often lack structural cross-domain generalization. To address this, we introduce GraphProp, which emphasizes structural generalization. The training process of GraphProp consists of two main phases. First, we train a structural GFM by predicting graph invariants. Since graph invariants are properties of graphs that depend only on the abstract structure, not on particular labellings or drawings of the graph, this structural GFM has a strong ability to capture the abstract structural information and provide discriminative graph representations comparable across diverse domains. In the second phase, we use the representations given by the structural GFM as positional encodings to train a comprehensive GFM. This phase utilizes domain-specific node attributes and graph labels to further improve cross-domain node feature generalization. Our experiments demonstrate that GraphProp significantly outperforms the competitors in supervised learning and few-shot learning, especially in handling graphs without node attributes.

Via

Access Paper or Ask Questions

Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence

Jun 09, 2025

Qi Feng, Gu Wang

Abstract:We introduce a continuous policy-value iteration algorithm where the approximations of the value function of a stochastic control problem and the optimal control are simultaneously updated through Langevin-type dynamics. This framework applies to both the entropy-regularized relaxed control problems and the classical control problems, with infinite horizon. We establish policy improvement and demonstrate convergence to the optimal control under the monotonicity condition of the Hamiltonian. By utilizing Langevin-type stochastic differential equations for continuous updates along the policy iteration direction, our approach enables the use of distribution sampling and non-convex learning techniques in machine learning to optimize the value function and identify the optimal control simultaneously.

* 37 pages

Via

Access Paper or Ask Questions

Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts

May 18, 2025

Qi Feng, Hidetoshi Shimodaira

Abstract:While Multimodal Large Language Models (MLLMs) excel at general vision-language tasks, visuospatial cognition - reasoning about spatial layouts, relations, and dynamics - remains a significant challenge. Existing models often lack the necessary architectural components and specialized training data for fine-grained spatial understanding. We introduce ViCA2 (Visuospatial Cognitive Assistant 2), a novel MLLM designed to enhance spatial reasoning. ViCA2 features a dual vision encoder architecture integrating SigLIP for semantics and Hiera for spatial structure, coupled with a token ratio control mechanism for efficiency. We also developed ViCA-322K, a new large-scale dataset with over 322,000 spatially grounded question-answer pairs for targeted instruction tuning. On the challenging VSI-Bench benchmark, our ViCA2-7B model achieves a state-of-the-art average score of 56.8, significantly surpassing larger open-source models (e.g., LLaVA-NeXT-Video-72B, 40.9) and leading proprietary models (Gemini-1.5 Pro, 45.4). This demonstrates the effectiveness of our approach in achieving strong visuospatial intelligence with a compact model. We release ViCA2, its codebase, and the ViCA-322K dataset to facilitate further research.

* 26 pages, 19 figures, 4 tables. Code, models, and dataset are available at our project page: https://github.com/nkkbr/ViCA

Via

Access Paper or Ask Questions

Visuospatial Cognitive Assistant

May 18, 2025

Qi Feng, Hidetoshi Shimodaira

Abstract:Video-based spatial cognition is vital for robotics and embodied AI but challenges current Vision-Language Models (VLMs). This paper makes two key contributions. First, we introduce ViCA (Visuospatial Cognitive Assistant)-322K, a diverse dataset of 322,003 QA pairs from real-world indoor videos (ARKitScenes, ScanNet, ScanNet++), offering supervision for 3D metadata-grounded queries and video-based complex reasoning. Second, we develop ViCA-7B, fine-tuned on ViCA-322K, which achieves new state-of-the-art on all eight VSI-Bench tasks, outperforming existing models, including larger ones (e.g., +26.1 on Absolute Distance). For interpretability, we present ViCA-Thinking-2.68K, a dataset with explicit reasoning chains, and fine-tune ViCA-7B to create ViCA-7B-Thinking, a model that articulates its spatial reasoning. Our work highlights the importance of targeted data and suggests paths for improved temporal-spatial modeling. We release all resources to foster research in robust visuospatial intelligence.

* 31 pages, 10 figures, 6 tables. The implementation and fine-tuned model (ViCA-7B) are publicly available at https://huggingface.co/nkkbr/ViCA. The ViCA-322K dataset can be found at https://huggingface.co/datasets/nkkbr/ViCA-322K, and the ViCA-Thinking-2.68K dataset is at https://huggingface.co/datasets/nkkbr/ViCA-thinking-2.68k

Via

Access Paper or Ask Questions

Non-Reversible Langevin Algorithms for Constrained Sampling

Jan 20, 2025

Hengrong Du, Qi Feng, Changwei Tu, Xiaoyu Wang, Lingjiong Zhu

Figure 1 for Non-Reversible Langevin Algorithms for Constrained Sampling

Figure 2 for Non-Reversible Langevin Algorithms for Constrained Sampling

Figure 3 for Non-Reversible Langevin Algorithms for Constrained Sampling

Figure 4 for Non-Reversible Langevin Algorithms for Constrained Sampling

Abstract:We consider the constrained sampling problem where the goal is to sample from a target distribution on a constrained domain. We propose skew-reflected non-reversible Langevin dynamics (SRNLD), a continuous-time stochastic differential equation with skew-reflected boundary. We obtain non-asymptotic convergence rate of SRNLD to the target distribution in both total variation and 1-Wasserstein distances. By breaking reversibility, we show that the convergence is faster than the special case of the reversible dynamics. Based on the discretization of SRNLD, we propose skew-reflected non-reversible Langevin Monte Carlo (SRNLMC), and obtain non-asymptotic discretization error from SRNLD, and convergence guarantees to the target distribution in 1-Wasserstein distance. We show better performance guarantees than the projected Langevin Monte Carlo in the literature that is based on the reversible dynamics. Numerical experiments are provided for both synthetic and real datasets to show efficiency of the proposed algorithms.

* 30 pages, 9 figures

Via

Access Paper or Ask Questions

SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering

Dec 11, 2024

Hiroki Nishizawa, Keitaro Tanaka, Asuka Hirata, Shugo Yamaguchi, Qi Feng, Masatoshi Hamanaka, Shigeo Morishima

Figure 1 for SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering

Figure 2 for SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering

Figure 3 for SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering

Figure 4 for SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering

Abstract:Automatically generating realistic musical performance motion can greatly enhance digital media production, often involving collaboration between professionals and musicians. However, capturing the intricate body, hand, and finger movements required for accurate musical performances is challenging. Existing methods often fall short due to the complex mapping between audio and motion, typically requiring additional inputs like scores or MIDI data. In this work, we present SyncViolinist, a multi-stage end-to-end framework that generates synchronized violin performance motion solely from audio input. Our method overcomes the challenge of capturing both global and fine-grained performance features through two key modules: a bowing/fingering module and a motion generation module. The bowing/fingering module extracts detailed playing information from the audio, which the motion generation module uses to create precise, coordinated body motions reflecting the temporal granularity and nature of the violin performance. We demonstrate the effectiveness of SyncViolinist with significantly improved qualitative and quantitative results from unseen violin performance audio, outperforming state-of-the-art methods. Extensive subjective evaluations involving professional violinists further validate our approach. The code and dataset are available at https://github.com/Kakanat/SyncViolinist.

* 10 pages, 7 figures, 6 tables, WACV 2025

Via

Access Paper or Ask Questions

Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics

May 13, 2024

Haoyang Zheng, Hengrong Du, Qi Feng, Wei Deng, Guang Lin

Figure 1 for Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics

Figure 2 for Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics

Figure 3 for Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics

Figure 4 for Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics

Abstract:Replica exchange stochastic gradient Langevin dynamics (reSGLD) is an effective sampler for non-convex learning in large-scale datasets. However, the simulation may encounter stagnation issues when the high-temperature chain delves too deeply into the distribution tails. To tackle this issue, we propose reflected reSGLD (r2SGLD): an algorithm tailored for constrained non-convex exploration by utilizing reflection steps within a bounded domain. Theoretically, we observe that reducing the diameter of the domain enhances mixing rates, exhibiting a \emph{quadratic} behavior. Empirically, we test its performance through extensive experiments, including identifying dynamical systems with physical constraints, simulations of constrained multi-modal distributions, and image classification tasks. The theoretical and empirical findings highlight the crucial role of constrained exploration in improving the simulation efficiency.

* 28 pages, 13 figures, to appear in ICML 2024

Via

Access Paper or Ask Questions

Fisher information dissipation for time inhomogeneous stochastic differential equations

Feb 01, 2024

Qi Feng, Xinzhe Zuo, Wuchen Li

Figure 1 for Fisher information dissipation for time inhomogeneous stochastic differential equations

Figure 2 for Fisher information dissipation for time inhomogeneous stochastic differential equations

Figure 3 for Fisher information dissipation for time inhomogeneous stochastic differential equations

Figure 4 for Fisher information dissipation for time inhomogeneous stochastic differential equations

Abstract:We provide a Lyapunov convergence analysis for time-inhomogeneous variable coefficient stochastic differential equations (SDEs). Three typical examples include overdamped, irreversible drift, and underdamped Langevin dynamics. We first formula the probability transition equation of Langevin dynamics as a modified gradient flow of the Kullback-Leibler divergence in the probability space with respect to time-dependent optimal transport metrics. This formulation contains both gradient and non-gradient directions depending on a class of time-dependent target distribution. We then select a time-dependent relative Fisher information functional as a Lyapunov functional. We develop a time-dependent Hessian matrix condition, which guarantees the convergence of the probability density function of the SDE. We verify the proposed conditions for several time-inhomogeneous Langevin dynamics. For the overdamped Langevin dynamics, we prove the $O(t^{-1/2})$ convergence in $L^1$ distance for the simulated annealing dynamics with a strongly convex potential function. For the irreversible drift Langevin dynamics, we prove an improved convergence towards the target distribution in an asymptotic regime. We also verify the convergence condition for the underdamped Langevin dynamics. Numerical examples demonstrate the convergence results for the time-dependent Langevin dynamics.

* 9 figures, 36 pages

Via

Access Paper or Ask Questions

Reconfigurable Intelligent Surface-Enabled Array Radar for Interference Mitigation

Jan 28, 2024

Shengyao Chen, Qi Feng, Longyao Ran, Feng Xi, Zhong Liu

Figure 1 for Reconfigurable Intelligent Surface-Enabled Array Radar for Interference Mitigation

Figure 2 for Reconfigurable Intelligent Surface-Enabled Array Radar for Interference Mitigation

Figure 3 for Reconfigurable Intelligent Surface-Enabled Array Radar for Interference Mitigation

Figure 4 for Reconfigurable Intelligent Surface-Enabled Array Radar for Interference Mitigation

Abstract:Conventional active array radars often jointly design the transmit and receive beamforming for effectively suppressing interferences. To further promote the interference suppression performance, this paper introduces a reconfigurable intelligent surface (RIS) to assist the radar receiver because the RIS has the ability to bring plentiful additional degrees-of-freedom. To maximize the output signal-to-interference-plus-noise ratio (SINR) of receive array, we formulate the codesign of transmit beamforming and RIS-assisted receive beamforming into a nonconvex constrained fractional programming problem, and then propose an alternating minimization-based algorithm to jointly optimize the transmitor beamfmer, receive beamformer and RIS reflection coefficients. Concretely, we translate the RIS reflection coefficients design into a series of unimodular quadratic programming (UQP) subproblems by employing the Dinkelbach transform, and offer the closed-form optimal solutions of transmit and receive beamformers according to the minimum variance distortionless response principle. To tackle the UQP subproblems efficiently, we propose a second-order Riemannian Newton method (RNM) with improved Riemannian Newton direction, which avoids the line search and has better convergence speed than typical first-order Riemannian manifold optimization methods. Moreover, we derive the convergence of the proposed codesign algorithm by deducing the explicit convergence condition of RNM. We also analyze the computational complexity. Numerical results demonstrate that the proposed RIS-assisted array radar has superior performance of interference suppression to the RIS-free one, and the SINR improvement is proportional to the number of RIS elements.

* 29 pages, 9 figures

Via

Access Paper or Ask Questions

Reflected Schrödinger Bridge for Constrained Generative Modeling

Jan 06, 2024

Wei Deng, Yu Chen, Nicole Tianjiao Yang, Hengrong Du, Qi Feng, Ricky T. Q. Chen

Figure 1 for Reflected Schrödinger Bridge for Constrained Generative Modeling

Figure 2 for Reflected Schrödinger Bridge for Constrained Generative Modeling

Figure 3 for Reflected Schrödinger Bridge for Constrained Generative Modeling

Figure 4 for Reflected Schrödinger Bridge for Constrained Generative Modeling

Abstract:Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models (Lou23) aim to enhance generalizability by generating the data distribution through a backward process governed by reflected Brownian motion. However, reflected diffusion models may not easily adapt to diverse domains without the derivation of proper diffeomorphic mappings and do not guarantee optimal transport properties. To overcome these limitations, we introduce the Reflected Schrodinger Bridge algorithm: an entropy-regularized optimal transport approach tailored for generating data within diverse bounded domains. We derive elegant reflected forward-backward stochastic differential equations with Neumann and Robin boundary conditions, extend divergence-based likelihood training to bounded domains, and explore natural connections to entropic optimal transport for the study of approximate linear convergence - a valuable insight for practical training. Our algorithm yields robust generative modeling in diverse domains, and its scalability is demonstrated in real-world constrained generative modeling through standard image benchmarks.

Via

Access Paper or Ask Questions