Abstract:We introduce MarDini, a new family of video diffusion models that integrate the advantages of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR handles temporal planning, while DM focuses on spatial generation in an asymmetric network design: i) a MAR-based planning model containing most of the parameters generates planning signals for each masked frame using low-resolution input; ii) a lightweight generation model uses these signals to produce high-resolution frames via diffusion de-noising. MarDini's MAR enables video generation conditioned on any number of masked frames at any frame positions: a single model can handle video interpolation (e.g., masking middle frames), image-to-video generation (e.g., masking from the second frame onward), and video expansion (e.g., masking half the frames). The efficient design allocates most of the computational resources to the low-resolution planning model, making computationally expensive but important spatio-temporal attention feasible at scale. MarDini sets a new state-of-the-art for video interpolation; meanwhile, within few inference steps, it efficiently generates videos on par with those of much more expensive advanced image-to-video models.
Abstract:This paper presents, for the first time, the concept of \textit{polarforming} for wireless communications. Polarforming refers to a novel technique that enables dynamic adjustment of antenna polarization using reconfigurable polarized antennas (RPAs). It can fully leverage polarization diversity to improve the performance of wireless communication systems by aligning the effective polarization state of the incoming electromagnetic (EM) wave with the antenna polarization. To better demonstrate the benefits of polarforming, we propose a general RPA-aided system that allows for tunable antenna polarization. A wavefront-based channel model is developed to properly capture depolarization behaviors in both line-of-sight (LoS) and non-line-of-sight (NLoS) channels. Based on this model, we provide a detailed description of transmit and receive polarforming on planes of polarization (PoPs). We also evaluate the performance gains provided by polarforming under stochastic channel conditions. Specifically, we derive a closed-form expression for the relative signal-to-noise ratio (SNR) gain compared to conventional fixed-polarization antenna (FPA) systems and approximate the cumulative distribution function (CDF) for the RPA system. Our analysis reveals that polarforming offers a diversity gain of two, indicating full utilization of polarization diversity for dual-polarized antennas. Furthermore, extensive simulation results validate the effectiveness of polarforming and exhibit substantial improvements over conventional FPA systems. The results also indicate that polarforming not only can combat depolarization effects caused by wireless channels but also can overcome channel correlation when scattering is insufficient.
Abstract:This letter investigates a movable antenna (MA)-aided full-duplex (FD) satellite communication system, where the satellite, equipped with both transmit and receive MAs, serves multiple uplink (UL) and downlink (DL) user terminals (UTs) in FD mode. Specifically, we formulate a multiobjective optimization problem to minimize the UL and DL transmit powers under imperfect channel state information (CSI) conditions. To jointly optimize the MA positions and transmit powers, we propose a two-loop particle swarm optimization (PSO) algorithm based on a multiobjective optimization framework. Simulation results demonstrate that flexible adjustments of MA positions can effectively reduce the total UL and DL transmit powers, while also alleviating the burden on self-interference (SI) cancellation modules.
Abstract:This paper proposes a secure wire-line telephone prototype that leverages physical layer security (PLS) techniques to protect communications from wiretapping. The system generates artificial noise (AN) in both directions over a telephone line and utilizes a telephone hybrid circuit to achieve effective AN cancellation. We conduct a thorough analysis of the secrecy capacity and evaluate the system's performance through both simulations and practical experiments. The results demonstrate that the proposed scheme significantly enhances communication security while preserving the integrity of legitimate signals, making it a robust and viable solution for secure telephone systems.
Abstract:This letter investigates movable antenna (MA)-aided downlink (DL) multiuser communication systems under the near-field channel condition, in which both the base station (BS) and the users are equipped with MAs to fully exploit the degrees of freedom (DoFs) in antenna position optimization by leveraging the wireless channel variation in spatial regions of large size. The objective is to minimize the transmit power by jointly optimizing the beamformers and the MA positions while satisfying the minimum-achievable-rate requirement for each user. We propose a two-loop dynamic neighborhood pruning particle swarm optimization (DNPPSO) algorithm that significantly reduces computational complexity while effectively maintaining the performance of the standard particle swarm optimization (PSO) algorithm. Simulation results validate the effectiveness and advantages of the proposed scheme in power-saving for multiuser communications.
Abstract:Panoptic Scene Graph Generation (PSG) aims to segment objects and recognize their relations, enabling the structured understanding of an image. Previous methods focus on predicting predefined object and relation categories, hence limiting their applications in the open world scenarios. With the rapid development of large multimodal models (LMMs), significant progress has been made in open-set object detection and segmentation, yet open-set relation prediction in PSG remains unexplored. In this paper, we focus on the task of open-set relation prediction integrated with a pretrained open-set panoptic segmentation model to achieve true open-set panoptic scene graph generation (OpenPSG). Our OpenPSG leverages LMMs to achieve open-set relation prediction in an autoregressive manner. We introduce a relation query transformer to efficiently extract visual features of object pairs and estimate the existence of relations between them. The latter can enhance the prediction efficiency by filtering irrelevant pairs. Finally, we design the generation and judgement instructions to perform open-set relation prediction in PSG autoregressively. To our knowledge, we are the first to propose the open-set PSG task. Extensive experiments demonstrate that our method achieves state-of-the-art performance in open-set relation prediction and panoptic scene graph generation. Code is available at \url{https://github.com/franciszzj/OpenPSG}.
Abstract:In this paper, we investigate physical layer security (PLS) for full-duplex (FD) multi-user systems. To simultaneously protect uplink (UL) and downlink (DL) transmissions and ensure efficient use of time-frequency resources, we consider a base station (BS) that operates in FD mode and enables to emit the artificial noise (AN). Conventional fixed-position antennas (FPAs) at the BS struggle to fully exploit spatial degrees of freedom (DoFs). Therefore, we propose a new paradigm for secure FD multi-user systems, where multiple transmit and receive movable antennas (MAs) are deployed at the BS to serve UL and DL users and effectively counter the cooperative interception by multiple eavesdroppers (Eves). Specifically, the MA positions, the transmit, receive, and AN beamformers at the BS, and the UL powers are jointly optimized to maximize the sum of secrecy rates (SSR). To solve the challenging non-convex optimization problem with highly coupled variables, we propose an alternating optimization (AO) algorithm. This algorithm decomposes the original problem into three sub-problems, which are iteratively solved by the proposed multi-velocity particle swarm optimization (MVPSO) and successive convex approximation (SCA). Simulation results demonstrate that the proposed scheme for MA-aided secure FD multi-user systems can significantly enhance security performance compared to conventional FPA systems.
Abstract:In laparoscopic and robotic surgery, precise tool instance segmentation is an essential technology for advanced computer-assisted interventions. Although publicly available procedures of routine surgeries exist, they often lack comprehensive annotations for tool instance segmentation. Additionally, the majority of standard datasets for tool segmentation are derived from porcine(pig) surgeries. To address this gap, we introduce CholecInstanceSeg, the largest open-access tool instance segmentation dataset to date. Derived from the existing CholecT50 and Cholec80 datasets, CholecInstanceSeg provides novel annotations for laparoscopic cholecystectomy procedures in patients. Our dataset comprises 41.9k annotated frames extracted from 85 clinical procedures and 64.4k tool instances, each labelled with semantic masks and instance IDs. To ensure the reliability of our annotations, we perform extensive quality control, conduct label agreement statistics, and benchmark the segmentation results with various instance segmentation baselines. CholecInstanceSeg aims to advance the field by offering a comprehensive and high-quality open-access dataset for the development and evaluation of tool instance segmentation algorithms.
Abstract:This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research.
Abstract:In-context learning (ICL) allows transformer-based language models that are pre-trained on general text to quickly learn a specific task with a few "task demonstrations" without updating their parameters, significantly boosting their flexibility and generality. ICL possesses many distinct characteristics from conventional machine learning, thereby requiring new approaches to interpret this learning paradigm. Taking the viewpoint of recent works showing that transformers learn in context by formulating an internal optimizer, we propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL. We empirically verify the effectiveness of our approach for demonstration attribution while being computationally efficient. Leveraging the results, we then show how DETAIL can help improve model performance in real-world scenarios through demonstration reordering and curation. Finally, we experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.