Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Long Teng

TMCIR: Token Merge Benefits Composed Image Retrieval

Apr 15, 2025

Chaoyang Wang, Zeyu Zhang, Long Teng, Zijun Li, Shichao Kan

Abstract:Composed Image Retrieval (CIR) retrieves target images using a multi-modal query that combines a reference image with text describing desired modifications. The primary challenge is effectively fusing this visual and textual information. Current cross-modal feature fusion approaches for CIR exhibit an inherent bias in intention interpretation. These methods tend to disproportionately emphasize either the reference image features (visual-dominant fusion) or the textual modification intent (text-dominant fusion through image-to-text conversion). Such an imbalanced representation often fails to accurately capture and reflect the actual search intent of the user in the retrieval results. To address this challenge, we propose TMCIR, a novel framework that advances composed image retrieval through two key innovations: 1) Intent-Aware Cross-Modal Alignment. We first fine-tune CLIP encoders contrastively using intent-reflecting pseudo-target images, synthesized from reference images and textual descriptions via a diffusion model. This step enhances the encoder ability of text to capture nuanced intents in textual descriptions. 2) Adaptive Token Fusion. We further fine-tune all encoders contrastively by comparing adaptive token-fusion features with the target image. This mechanism dynamically balances visual and textual representations within the contrastive learning pipeline, optimizing the composed feature for retrieval. Extensive experiments on Fashion-IQ and CIRR datasets demonstrate that TMCIR significantly outperforms state-of-the-art methods, particularly in capturing nuanced user intent.

* arXiv admin note: text overlap with arXiv:2310.05473 by other authors

Via

Access Paper or Ask Questions

How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model

Aug 10, 2024

Yuxin Zhu, Huiyu Duan, Kaiwei Zhang, Yucheng Zhu, Xilei Zhu, Long Teng, Xiongkuo Min, Guangtao Zhai

Figure 1 for How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model

Figure 2 for How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model

Figure 3 for How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model

Figure 4 for How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model

Abstract:Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and comprehensive analyses. This paper comprehensively investigates audio-visual attention in ODVs from both subjective and objective perspectives. Specifically, we first introduce a new audio-visual saliency database for omnidirectional videos, termed AVS-ODV database, containing 162 ODVs and corresponding eye movement data collected from 60 subjects under three audio modes including mute, mono, and ambisonics. Based on the constructed AVS-ODV database, we perform an in-depth analysis of how audio influences visual attention in ODVs. To advance the research on audio-visual saliency prediction for ODVs, we further establish a new benchmark based on the AVS-ODV database by testing numerous state-of-the-art saliency models, including visual-only models and audio-visual models. In addition, given the limitations of current models, we propose an innovative omnidirectional audio-visual saliency prediction network (OmniAVS), which is built based on the U-Net architecture, and hierarchically fuses audio and visual features from the multimodal aligned embedding space. Extensive experimental results demonstrate that the proposed OmniAVS model outperforms other state-of-the-art models on both ODV AVS prediction and traditional AVS predcition tasks. The AVS-ODV database and OmniAVS model will be released to facilitate future research.

Via

Access Paper or Ask Questions

A forward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Aug 10, 2024

Lorenc Kapllani, Long Teng

Figure 1 for A forward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Figure 2 for A forward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Figure 3 for A forward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Figure 4 for A forward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Abstract:In this work, we present a novel forward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations (BSDEs). Motivated by the fact that differential deep learning can efficiently approximate the labels and their derivatives with respect to inputs, we transform the BSDE problem into a differential deep learning problem. This is done by leveraging Malliavin calculus, resulting in a system of BSDEs. The unknown solution of the BSDE system is a triple of processes $(Y, Z, \Gamma)$, representing the solution, its gradient, and the Hessian matrix. The main idea of our algorithm is to discretize the integrals using the Euler-Maruyama method and approximate the unknown discrete solution triple using three deep neural networks. The parameters of these networks are then optimized by globally minimizing a differential learning loss function, which is novelty defined as a weighted sum of the dynamics of the discretized system of BSDEs. Through various high-dimensional examples, we demonstrate that our proposed scheme is more efficient in terms of accuracy and computation time compared to other contemporary forward deep learning-based methodologies.

* 16 pages, 3 figures, 4 tables. arXiv admin note: text overlap with arXiv:2404.08456

Via

Access Paper or Ask Questions

Unsupervised 4D Cardiac Motion Tracking with Spatiotemporal Optical Flow Networks

Jul 05, 2024

Long Teng, Wei Feng, Menglong Zhu, Xinchao Li

Abstract:Cardiac motion tracking from echocardiography can be used to estimate and quantify myocardial motion within a cardiac cycle. It is a cost-efficient and effective approach for assessing myocardial function. However, ultrasound imaging has the inherent characteristics of spatially low resolution and temporally random noise, which leads to difficulties in obtaining reliable annotation. Thus it is difficult to perform supervised learning for motion tracking. In addition, there is no end-to-end unsupervised method currently in the literature. This paper presents a motion tracking method where unsupervised optical flow networks are designed with spatial reconstruction loss and temporal-consistency loss. Our proposed loss functions make use of the pair-wise and temporal correlation to estimate cardiac motion from noisy background. Experiments using a synthetic 4D echocardiography dataset has shown the effectiveness of our approach, and its superiority over existing methods on both accuracy and running speed. To the best of our knowledge, this is the first work performed that uses unsupervised end-to-end deep learning optical flow network for 4D cardiac motion tracking.

Via

Access Paper or Ask Questions

A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Apr 12, 2024

Lorenc Kapllani, Long Teng

Figure 1 for A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Figure 2 for A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Figure 3 for A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Figure 4 for A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

Abstract:In this work, we propose a novel backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations (BSDEs), where the deep neural network (DNN) models are trained not only on the inputs and labels but also the differentials of the corresponding labels. This is motivated by the fact that differential deep learning can provide an efficient approximation of the labels and their derivatives with respect to inputs. The BSDEs are reformulated as differential deep learning problems by using Malliavin calculus. The Malliavin derivatives of solution to a BSDE satisfy themselves another BSDE, resulting thus in a system of BSDEs. Such formulation requires the estimation of the solution, its gradient, and the Hessian matrix, represented by the triple of processes $\left(Y, Z, \Gamma\right).$ All the integrals within this system are discretized by using the Euler-Maruyama method. Subsequently, DNNs are employed to approximate the triple of these unknown processes. The DNN parameters are backwardly optimized at each time step by minimizing a differential learning type loss function, which is defined as a weighted sum of the dynamics of the discretized BSDE system, with the first term providing the dynamics of the process $Y$ and the other the process $Z$. An error analysis is carried out to show the convergence of the proposed algorithm. Various numerical experiments up to $50$ dimensions are provided to demonstrate the high efficiency. Both theoretically and numerically, it is demonstrated that our proposed scheme is more efficient compared to other contemporary deep learning-based methodologies, especially in the computation of the process $\Gamma$.

* 40 pages, 5 figures, 5 tables

Via

Access Paper or Ask Questions

AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

Apr 01, 2024

Liu Yang, Huiyu Duan, Long Teng, Yucheng Zhu, Xiaohong Liu, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet

Figure 1 for AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

Figure 2 for AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

Figure 3 for AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

Figure 4 for AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

Abstract:In recent years, the rapid advancement of Artificial Intelligence Generated Content (AIGC) has attracted widespread attention. Among the AIGC, AI generated omnidirectional images hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications, hence omnidirectional AIGC techniques have also been widely studied. AI-generated omnidirectional images exhibit unique distortions compared to natural omnidirectional images, however, there is no dedicated Image Quality Assessment (IQA) criteria for assessing them. This study addresses this gap by establishing a large-scale AI generated omnidirectional image IQA database named AIGCOIQA2024 and constructing a comprehensive benchmark. We first generate 300 omnidirectional images based on 5 AIGC models utilizing 25 text prompts. A subjective IQA experiment is conducted subsequently to assess human visual preferences from three perspectives including quality, comfortability, and correspondence. Finally, we conduct a benchmark experiment to evaluate the performance of state-of-the-art IQA models on our database. The database will be released to facilitate future research.

Via

Access Paper or Ask Questions

Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Oct 05, 2023

Lorenc Kapllani, Long Teng, Matthias Rottmann

Figure 1 for Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Figure 2 for Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Figure 3 for Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Figure 4 for Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Abstract:Deep learning-based numerical schemes for solving high-dimensional backward stochastic differential equations (BSDEs) have recently raised plenty of scientific interest. While they enable numerical methods to approximate very high-dimensional BSDEs, their reliability has not been studied and is thus not understood. In this work, we study uncertainty quantification (UQ) for a class of deep learning-based BSDE schemes. More precisely, we review the sources of uncertainty involved in the schemes and numerically study the impact of different sources. Usually, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. This approach is computationally quite expensive, especially for high-dimensional problems. Hence, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be leveraged to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.

* 41 pages, 23 figures and 15 tables

Via

Access Paper or Ask Questions

Masked Autoencoders as Image Processors

Mar 30, 2023

Huiyu Duan, Wei Shen, Xiongkuo Min, Danyang Tu, Long Teng, Jia Wang, Guangtao Zhai

Abstract:Transformers have shown significant effectiveness for various vision tasks including both high-level vision and low-level vision. Recently, masked autoencoders (MAE) for feature pre-training have further unleashed the potential of Transformers, leading to state-of-the-art performances on various high-level vision tasks. However, the significance of MAE pre-training on low-level vision tasks has not been sufficiently explored. In this paper, we show that masked autoencoders are also scalable self-supervised learners for image processing tasks. We first present an efficient Transformer model considering both channel attention and shifted-window-based self-attention termed CSformer. Then we develop an effective MAE architecture for image processing (MAEIP) tasks. Extensive experimental results show that with the help of MAEIP pre-training, our proposed CSformer achieves state-of-the-art performance on various image processing tasks, including Gaussian denoising, real image denoising, single-image motion deblurring, defocus deblurring, and image deraining.

Via

Access Paper or Ask Questions

QoE Driven VR 360 Video Massive MIMO Transmission

Jun 15, 2021

Long Teng, Guangtao Zhai, Yongpeng Wu, Xiongkuo Min, Wenjun Zhang, Zhi Ding, Chengshang Xiao

Figure 1 for QoE Driven VR 360 Video Massive MIMO Transmission

Figure 2 for QoE Driven VR 360 Video Massive MIMO Transmission

Figure 3 for QoE Driven VR 360 Video Massive MIMO Transmission

Figure 4 for QoE Driven VR 360 Video Massive MIMO Transmission

Abstract:Massive multiple-input and multiple-output (MIMO) enables ultra-high throughput and low latency for tile-based adaptive virtual reality (VR) 360 video transmission in wireless network. In this paper, we consider a massive MIMO system where multiple users in a single-cell theater watch an identical VR 360 video. Based on tile prediction, base station (BS) deliveries the tiles in predicted field of view (FoV) to users. By introducing practical supplementary transmission for missing tiles and unacceptable VR sickness, we propose the first stable transmission scheme for VR video. we formulate an integer non-linear programming (INLP) problem to maximize users' average quality of experience (QoE) score. Moreover, we derive the achievable spectral efficiency (SE) expression of predictive tile groups and the approximately achievable SE expression of missing tile groups, respectively. Analytically, the overall throughput is related to the number of tile groups and the length of pilot sequences. By exploiting the relationship between the structure of viewport tiles and SE expression, we propose a multi-lattice multi-stream grouping method aimed at improving the overall throughput for VR video transmission. Moreover, we analyze the relationship between QoE objective and number of predictive tile. We transform the original INLP problem into an integer linear programming problem by setting the predictive tiles groups as some constants. With variable relaxation and recovery, we obtain the optimal average QoE. Extensive simulation results validate that the proposed algorithm effectively improves QoE.

* Acceptede by IEEE transactions on wireless communications

Via

Access Paper or Ask Questions

Deep Learning algorithms for solving high dimensional nonlinear Backward Stochastic Differential Equations

Oct 03, 2020

Lorenc Kapllani, Long Teng

Figure 1 for Deep Learning algorithms for solving high dimensional nonlinear Backward Stochastic Differential Equations

Figure 2 for Deep Learning algorithms for solving high dimensional nonlinear Backward Stochastic Differential Equations

Figure 3 for Deep Learning algorithms for solving high dimensional nonlinear Backward Stochastic Differential Equations

Figure 4 for Deep Learning algorithms for solving high dimensional nonlinear Backward Stochastic Differential Equations

Abstract:We study deep learning-based schemes for solving high dimensional nonlinear backward stochastic differential equations (BSDEs). First we show how to improve the performances of the proposed scheme in [W. E and J. Han and A. Jentzen, Commun. Math. Stat., 5 (2017), pp.349-380] regarding computational time and stability of numerical convergence by using the advanced neural network architecture instead of the stacked deep neural networks. Furthermore, the proposed scheme in that work can be stuck in local minima, especially for a complex solution structure and longer terminal time. To solve this problem, we investigate to reformulate the problem by including local losses and exploit the Long Short Term Memory (LSTM) networks which are a type of recurrent neural networks (RNN). Finally, in order to study numerical convergence and thus illustrate the improved performances with the proposed methods, we provide numerical results for several 100-dimensional nonlinear BSDEs including a nonlinear pricing problem in finance.

* 23 pages, 7 figures, 12 tables

Via

Access Paper or Ask Questions