Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yida Zhang

Dual-Polarization Stacked Intelligent Metasurfaces for Holographic MIMO

May 27, 2025

Yida Zhang, Qiuyan Liu, Hongtao Luo, Yuqi Xia, Qiang Wang

Abstract:To address the limited wave domain signal processing capabilities of traditional single-polarized stacked intelligent metasurfaces (SIMs) in holographic multiple-input multiple-output (HMIMO) systems, which stems from limited integration space, this paper proposes a dual-polarized SIM (DPSIM) architecture. By stacking dual-polarized reconfigurable intelligent surfaces (DPRIS), DPSIM can independently process signals of two orthogonal polarizations in the wave domain, thereby effectively suppressing polarization cross-interference (PCI) and inter-stream interference (ISI). We introduce a layer-by-layer gradient descent with water-filling (LGD-WF) algorithm to enhance end-to-end performance. Simulation results show that, under the same number of metasurface layers and unit size, the DPSIM-aided HMIMO system can support more simultaneous data streams for ISI-free parallel transmission compared to traditional SIM-aided systems. Furthermore, under different polarization imperfection conditions, both the spectral efficiency (SE) and energy efficiency (EE) of the DPSIM-aided HMIMO system are significantly improved, approaching the theoretical upper bound.

Via

Access Paper or Ask Questions

Goku: Flow Based Video Generative Foundation Models

Feb 10, 2025

Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu(+12 more)

Abstract:This paper introduces Goku, a state-of-the-art family of joint image-and-video generation models leveraging rectified flow Transformers to achieve industry-leading performance. We detail the foundational elements enabling high-quality visual generation, including the data curation pipeline, model architecture design, flow formulation, and advanced infrastructure for efficient and robust large-scale training. The Goku models demonstrate superior performance in both qualitative and quantitative evaluations, setting new benchmarks across major tasks. Specifically, Goku achieves 0.76 on GenEval and 83.65 on DPG-Bench for text-to-image generation, and 84.85 on VBench for text-to-video tasks. We believe that this work provides valuable insights and practical advancements for the research community in developing joint image-and-video generation models.

* Demo: https://saiyan-world.github.io/goku/

Via

Access Paper or Ask Questions

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Feb 07, 2025

Shilong Zhang, Wenbo Li, Shoufa Chen, Chongjian Ge, Peize Sun, Yida Zhang, Yi Jiang, Zehuan Yuan, Binyue Peng, Ping Luo

Abstract:DiT diffusion models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale. High content and motion fidelity aligned with text prompts, however, often require large model parameters and a substantial number of function evaluations (NFEs). Realistic and visually appealing details are typically reflected in high resolution outputs, further amplifying computational demands especially for single stage DiT models. To address these challenges, we propose a novel two stage framework, FlashVideo, which strategically allocates model capacity and NFEs across stages to balance generation fidelity and quality. In the first stage, prompt fidelity is prioritized through a low resolution generation process utilizing large parameters and sufficient NFEs to enhance computational efficiency. The second stage establishes flow matching between low and high resolutions, effectively generating fine details with minimal NFEs. Quantitative and visual results demonstrate that FlashVideo achieves state-of-the-art high resolution video generation with superior computational efficiency. Additionally, the two-stage design enables users to preview the initial output before committing to full resolution generation, thereby significantly reducing computational costs and wait times as well as enhancing commercial viability .

* Model and Weight: https://github.com/FoundationVision/FlashVideo

Via

Access Paper or Ask Questions