Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qian Zhou

Global Regulation and Excitation via Attention Tuning for Stereo Matching

Sep 19, 2025

Jiahao Li, Xinhong Chen, Zhengmin Jiang, Qian Zhou, Yung-Hui Li, Jianping Wang

Figure 1 for Global Regulation and Excitation via Attention Tuning for Stereo Matching

Figure 2 for Global Regulation and Excitation via Attention Tuning for Stereo Matching

Figure 3 for Global Regulation and Excitation via Attention Tuning for Stereo Matching

Figure 4 for Global Regulation and Excitation via Attention Tuning for Stereo Matching

Abstract:Stereo matching achieves significant progress with iterative algorithms like RAFT-Stereo and IGEV-Stereo. However, these methods struggle in ill-posed regions with occlusions, textureless, or repetitive patterns, due to a lack of global context and geometric information for effective iterative refinement. To enable the existing iterative approaches to incorporate global context, we propose the Global Regulation and Excitation via Attention Tuning (GREAT) framework which encompasses three attention modules. Specifically, Spatial Attention (SA) captures the global context within the spatial dimension, Matching Attention (MA) extracts global context along epipolar lines, and Volume Attention (VA) works in conjunction with SA and MA to construct a more robust cost-volume excited by global context and geometric details. To verify the universality and effectiveness of this framework, we integrate it into several representative iterative stereo-matching methods and validate it through extensive experiments, collectively denoted as GREAT-Stereo. This framework demonstrates superior performance in challenging ill-posed regions. Applied to IGEV-Stereo, among all published methods, our GREAT-IGEV ranks first on the Scene Flow test set, KITTI 2015, and ETH3D leaderboards, and achieves second on the Middlebury benchmark. Code is available at https://github.com/JarvisLee0423/GREAT-Stereo.

* International Conference on Computer Vision (ICCV 2025)

Via

Access Paper or Ask Questions

Exploring Generalized Gait Recognition: Reducing Redundancy and Noise within Indoor and Outdoor Datasets

May 21, 2025

Qian Zhou, Xianda Guo, Jilong Wang, Chuanfu Shen, Zhongyuan Wang, Hua Zou, Qin Zou, Chao Liang, Chen Long, Gang Wu

Abstract:Generalized gait recognition, which aims to achieve robust performance across diverse domains, remains a challenging problem due to severe domain shifts in viewpoints, appearances, and environments. While mixed-dataset training is widely used to enhance generalization, it introduces new obstacles including inter-dataset optimization conflicts and redundant or noisy samples, both of which hinder effective representation learning. To address these challenges, we propose a unified framework that systematically improves cross-domain gait recognition. First, we design a disentangled triplet loss that isolates supervision signals across datasets, mitigating gradient conflicts during optimization. Second, we introduce a targeted dataset distillation strategy that filters out the least informative 20\% of training samples based on feature redundancy and prediction uncertainty, enhancing data efficiency. Extensive experiments on CASIA-B, OU-MVLP, Gait3D, and GREW demonstrate that our method significantly improves cross-dataset recognition for both GaitBase and DeepGaitV2 backbones, without sacrificing source-domain accuracy. Code will be released at https://github.com/li1er3/Generalized_Gait.

* 10 pages, 3 figures

Via

Access Paper or Ask Questions

WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive Storytelling

Feb 25, 2025

Zhuoran Lu, Qian Zhou, Yi Wang

Figure 1 for WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive Storytelling

Figure 2 for WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive Storytelling

Figure 3 for WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive Storytelling

Figure 4 for WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive Storytelling

Abstract:Generative AI significantly enhances player agency in interactive narratives (IN) by enabling just-in-time content generation that adapts to player actions. While delegating generation to AI makes IN more interactive, it becomes challenging for authors to control the space of possible narratives - within which the final story experienced by the player emerges from their interaction with AI. In this paper, we present WhatELSE, an AI-bridged IN authoring system that creates narrative possibility spaces from example stories. WhatELSE provides three views (narrative pivot, outline, and variants) to help authors understand the narrative space and corresponding tools leveraging linguistic abstraction to control the boundaries of the narrative space. Taking innovative LLM-based narrative planning approaches, WhatELSE further unfolds the narrative space into executable game events. Through a user study (N=12) and technical evaluations, we found that WhatELSE enables authors to perceive and edit the narrative space and generates engaging interactive narratives at play-time.

* In Proceedings of CHI 2025

Via

Access Paper or Ask Questions

Enhanced Photovoltaic Power Forecasting: An iTransformer and LSTM-Based Model Integrating Temporal and Covariate Interactions

Dec 03, 2024

Guang Wu, Yun Wang, Qian Zhou, Ziyang Zhang

Abstract:Accurate photovoltaic (PV) power forecasting is critical for integrating renewable energy sources into the grid, optimizing real-time energy management, and ensuring energy reliability amidst increasing demand. However, existing models often struggle with effectively capturing the complex relationships between target variables and covariates, as well as the interactions between temporal dynamics and multivariate data, leading to suboptimal forecasting accuracy. To address these challenges, we propose a novel model architecture that leverages the iTransformer for feature extraction from target variables and employs long short-term memory (LSTM) to extract features from covariates. A cross-attention mechanism is integrated to fuse the outputs of both models, followed by a Kolmogorov-Arnold network (KAN) mapping for enhanced representation. The effectiveness of the proposed model is validated using publicly available datasets from Australia, with experiments conducted across four seasons. Results demonstrate that the proposed model effectively capture seasonal variations in PV power generation and improve forecasting accuracy.

Via

Access Paper or Ask Questions

Pseudo Dataset Generation for Out-of-Domain Multi-Camera View Recommendation

Oct 17, 2024

Kuan-Ying Lee, Qian Zhou, Klara Nahrstedt

Figure 1 for Pseudo Dataset Generation for Out-of-Domain Multi-Camera View Recommendation

Figure 2 for Pseudo Dataset Generation for Out-of-Domain Multi-Camera View Recommendation

Figure 3 for Pseudo Dataset Generation for Out-of-Domain Multi-Camera View Recommendation

Figure 4 for Pseudo Dataset Generation for Out-of-Domain Multi-Camera View Recommendation

Abstract:Multi-camera systems are indispensable in movies, TV shows, and other media. Selecting the appropriate camera at every timestamp has a decisive impact on production quality and audience preferences. Learning-based view recommendation frameworks can assist professionals in decision-making. However, they often struggle outside of their training domains. The scarcity of labeled multi-camera view recommendation datasets exacerbates the issue. Based on the insight that many videos are edited from the original multi-camera videos, we propose transforming regular videos into pseudo-labeled multi-camera view recommendation datasets. Promisingly, by training the model on pseudo-labeled datasets stemming from videos in the target domain, we achieve a 68% relative improvement in the model's accuracy in the target domain and bridge the accuracy gap between in-domain and never-before-seen domains.

* Accepted to VCIP 2024. Project page: https://eric11220.github.io/publication/VCIP24/

Via

Access Paper or Ask Questions

StoryVerse: Towards Co-authoring Dynamic Plot with LLM-based Character Simulation via Narrative Planning

May 17, 2024

Yi Wang, Qian Zhou, David Ledo

Figure 1 for StoryVerse: Towards Co-authoring Dynamic Plot with LLM-based Character Simulation via Narrative Planning

Figure 2 for StoryVerse: Towards Co-authoring Dynamic Plot with LLM-based Character Simulation via Narrative Planning

Figure 3 for StoryVerse: Towards Co-authoring Dynamic Plot with LLM-based Character Simulation via Narrative Planning

Abstract:Automated plot generation for games enhances the player's experience by providing rich and immersive narrative experience that adapts to the player's actions. Traditional approaches adopt a symbolic narrative planning method which limits the scale and complexity of the generated plot by requiring extensive knowledge engineering work. Recent advancements use Large Language Models (LLMs) to drive the behavior of virtual characters, allowing plots to emerge from interactions between characters and their environments. However, the emergent nature of such decentralized plot generation makes it difficult for authors to direct plot progression. We propose a novel plot creation workflow that mediates between a writer's authorial intent and the emergent behaviors from LLM-driven character simulation, through a novel authorial structure called "abstract acts". The writers define high-level plot outlines that are later transformed into concrete character action sequences via an LLM-based narrative planning process, based on the game world state. The process creates "living stories" that dynamically adapt to various game world states, resulting in narratives co-created by the author, character simulation, and player. We present StoryVerse as a proof-of-concept system to demonstrate this plot creation workflow. We showcase the versatility of our approach with examples in different stories and game environments.

* Proceedings of the 19th international conference on the foundations of digital games 2024

Via

Access Paper or Ask Questions

HyperCLOVA X Technical Report

Apr 13, 2024

Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim(+386 more)

Abstract:We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.

* 44 pages; updated authors list and fixed author names

Via

Access Paper or Ask Questions

Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI

Jun 21, 2023

Zimeng Li, Sa Xiao, Cheng Wang, Haidong Li, Xiuchao Zhao, Caohui Duan, Qian Zhou, Qiuchen Rao, Yuan Fang, Junshuai Xie(+4 more)

Figure 1 for Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI

Figure 2 for Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI

Figure 3 for Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI

Figure 4 for Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI

Abstract:Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) directly apply square convolution to k-space data without considering the inherent properties of k-space sampling, limiting k-space learning efficiency and image reconstruction quality. In this work, we propose an encoding enhanced (EN2) complex CNN for highly undersampled pulmonary MRI reconstruction. EN2 employs convolution along either the frequency or phase-encoding direction, resembling the mechanisms of k-space sampling, to maximize the utilization of the encoding correlation and integrity within a row or column of k-space. We also employ complex convolution to learn rich representations from the complex k-space data. In addition, we develop a feature-strengthened modularized unit to further boost the reconstruction performance. Experiments demonstrate that our approach can accurately reconstruct hyperpolarized 129Xe and 1H lung MRI from 6-fold undersampled k-space data and provide lung function measurements with minimal biases compared with fully-sampled image. These results demonstrate the effectiveness of the proposed algorithmic components and indicate that the proposed approach could be used for accelerated pulmonary MRI in research and clinical lung disease patient care.

Via

Access Paper or Ask Questions

CrossRoI: Cross-camera Region of Interest Optimization for Efficient Real Time Video Analytics at Scale

May 13, 2021

Hongpeng Guo, Shuochao Yao, Zhe Yang, Qian Zhou, Klara Nahrstedt

Figure 1 for CrossRoI: Cross-camera Region of Interest Optimization for Efficient Real Time Video Analytics at Scale

Figure 2 for CrossRoI: Cross-camera Region of Interest Optimization for Efficient Real Time Video Analytics at Scale

Figure 3 for CrossRoI: Cross-camera Region of Interest Optimization for Efficient Real Time Video Analytics at Scale

Figure 4 for CrossRoI: Cross-camera Region of Interest Optimization for Efficient Real Time Video Analytics at Scale

Abstract:Video cameras are pervasively deployed in city scale for public good or community safety (i.e. traffic monitoring or suspected person tracking). However, analyzing large scale video feeds in real time is data intensive and poses severe challenges to network and computation systems today. We present CrossRoI, a resource-efficient system that enables real time video analytics at scale via harnessing the videos content associations and redundancy across a fleet of cameras. CrossRoI exploits the intrinsic physical correlations of cross-camera viewing fields to drastically reduce the communication and computation costs. CrossRoI removes the repentant appearances of same objects in multiple cameras without harming comprehensive coverage of the scene. CrossRoI operates in two phases - an offline phase to establish cross-camera correlations, and an efficient online phase for real time video inference. Experiments on real-world video feeds show that CrossRoI achieves 42% - 65% reduction for network overhead and 25% - 34% reduction for response delay in real time video analytics applications with more than 99% query accuracy, when compared to baseline methods. If integrated with SotA frame filtering systems, the performance gains of CrossRoI reach 50% - 80% (network overhead) and 33% - 61% (end-to-end delay).

* accepted in 12th ACM Multimedia Systems Conference (MMsys 21')

Via

Access Paper or Ask Questions

DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge

May 05, 2021

Zhe Yang, Klara Nahrstedt, Hongpeng Guo, Qian Zhou

Figure 1 for DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge

Figure 2 for DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge

Figure 3 for DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge

Figure 4 for DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge

Abstract:The ubiquity of smartphone cameras and IoT cameras, together with the recent boom of deep learning and deep neural networks, proliferate various computer vision driven mobile and IoT applications deployed on the edge. This paper focuses on applications which make soft real time requests to perform inference on their data - they desire prompt responses within designated deadlines, but occasional deadline misses are acceptable. Supporting soft real time applications on a multi-tenant edge server is not easy, since the requests sharing the limited GPU computing resources of an edge server interfere with each other. In order to tackle this problem, we comprehensively evaluate how latency and throughput respond to different GPU execution plans. Based on this analysis, we propose a GPU scheduler, DeepRT, which provides latency guarantee to the requests while maintaining high overall system throughput. The key component of DeepRT, DisBatcher, batches data from different requests as much as possible while it is proven to provide latency guarantee for requests admitted by an Admission Control Module. DeepRT also includes an Adaptation Module which tackles overruns. Our evaluation results show that DeepRT outperforms state-of-the-art works in terms of the number of deadline misses and throughput.

* Accepted by the Sixth ACM/IEEE Symposium on Edge Computing, 2021

Via

Access Paper or Ask Questions