Abstract:The challenges in dense ultra-reliable low-latency communication networks to deliver the required service to multiple devices are addressed by three main technologies: multiple antennas at the base station (MISO), rate splitting multiple access (RSMA) with private and common message encoding, and simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS). Careful resource allocation, encompassing beamforming and RIS optimization, is required to exploit the synergy between the three. We propose an alternating optimization-based algorithm, relying on minorization-maximization. Numerical results show that the achievable second-order max-min rates of the proposed scheme outperform the baselines significantly. MISO, RSMA, and STAR-RIS all contribute to enabling ultra-reliable low-latency communication (URLLC).
Abstract:We analyze the finite-block-length rate region of wireless systems aided by reconfigurable intelligent surfaces (RISs), employing treating interference as noise. We consider three nearly passive RIS architectures, including locally passive (LP) diagonal (D), globally passive (GP) D, and GP beyond diagonal (BD) RISs. In a GP RIS, the power constraint is applied globally to the whole surface, while some elements may amplify the incident signal locally. The considered RIS architectures provide substantial performance gains compared with systems operating without RIS. GP BD-RIS outperforms, at the price of increasing the complexity, LP and GP D-RIS as it enlarges the feasible set of allowed solutions. However, the gain provided by BD-RIS decreases with the number of RIS elements. Additionally, deploying RISs provides higher gains as the reliability/latency requirement becomes more stringent.
Abstract:In this paper, we develop energy-efficient schemes for multi-user multiple-input single-output (MISO) broadcast channels (BCs), assisted by reconfigurable intelligent surfaces (RISs). To this end, we consider three architectures of RIS: locally passive diagonal (LP-D), globally passive diagonal (GP-D), and globally passive beyond diagonal (GP-BD). In a globally passive RIS, the power of the output signal of the RIS is not greater than its input power, but some RIS elements can amplify the signal. In a locally passive RIS, every element cannot amplify the incident signal. We show that these RIS architectures can substantially improve energy efficiency (EE) if the static power of the RIS elements is not too high. Moreover, GP-BD RIS, which has a higher complexity and static power than LP-D RIS and GP-D RIS, provides better spectral efficiency, but its EE performance highly depends on the static power consumption and may be worse than its diagonal counterparts.
Abstract:This paper addresses the problem of maximizing the capacity of a multiple-input multiple-output (MIMO) link assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). We maximize the capacity by alternately optimizing the transmit covariance matrix, and the BD-RIS scattering matrix, which, according to network theory, should be unitary and symmetric. These constraints make the optimization of BD-RIS more challenging than that of diagonal RIS. To find a stationary point of the capacity we maximize a sequence of quadratic problems in the manifold of unitary matrices. This leads to an efficient algorithm that always improves the capacity obtained by a diagonal RIS. Through simulation examples, we study the capacity improvement provided by a passive BD-RIS architecture over the conventional RIS model in which the phase shift matrix is diagonal.
Abstract:With the success of 2D and 3D visual generative models, there is growing interest in generating 4D content. Existing methods primarily rely on text prompts to produce 4D content, but they often fall short of accurately defining complex or rare motions. To address this limitation, we propose MagicPose4D, a novel framework for refined control over both appearance and motion in 4D generation. Unlike traditional methods, MagicPose4D accepts monocular videos as motion prompts, enabling precise and customizable motion generation. MagicPose4D comprises two key modules: i) Dual-Phase 4D Reconstruction Module} which operates in two phases. The first phase focuses on capturing the model's shape using accurate 2D supervision and less accurate but geometrically informative 3D pseudo-supervision without imposing skeleton constraints. The second phase refines the model using more accurate pseudo-3D supervision, obtained in the first phase and introduces kinematic chain-based skeleton constraints to ensure physical plausibility. Additionally, we propose a Global-local Chamfer loss that aligns the overall distribution of predicted mesh vertices with the supervision while maintaining part-level alignment without extra annotations. ii) Cross-category Motion Transfer Module} leverages the predictions from the 4D reconstruction module and uses a kinematic-chain-based skeleton to achieve cross-category motion transfer. It ensures smooth transitions between frames through dynamic rigidity, facilitating robust generalization without additional training. Through extensive experiments, we demonstrate that MagicPose4D significantly improves the accuracy and consistency of 4D content generation, outperforming existing methods in various benchmarks.
Abstract:Human-human communication is like a delicate dance where listeners and speakers concurrently interact to maintain conversational dynamics. Hence, an effective model for generating listener nonverbal behaviors requires understanding the dyadic context and interaction. In this paper, we present an effective framework for creating 3D facial motions in dyadic interactions. Existing work consider a listener as a reactive agent with reflexive behaviors to the speaker's voice and facial motions. The heart of our framework is Dyadic Interaction Modeling (DIM), a pre-training approach that jointly models speakers' and listeners' motions through masking and contrastive learning to learn representations that capture the dyadic context. To enable the generation of non-deterministic behaviors, we encode both listener and speaker motions into discrete latent representations, through VQ-VAE. The pre-trained model is further fine-tuned for motion generation. Extensive experiments demonstrate the superiority of our framework in generating listener motions, establishing a new state-of-the-art according to the quantitative measures capturing the diversity and realism of generated motions. Qualitative results demonstrate the superior capabilities of the proposed approach in generating diverse and realistic expressions, eye blinks and head gestures.
Abstract:Facial action unit (AU) detection is a fundamental block for objective facial expression analysis. Supervised learning approaches require a large amount of manual labeling which is costly. The limited labeled data are also not diverse in terms of gender which can affect model fairness. In this paper, we propose to use synthetically generated data and multi-source domain adaptation (MSDA) to address the problems of the scarcity of labeled data and the diversity of subjects. Specifically, we propose to generate a diverse dataset through synthetic facial expression re-targeting by transferring the expressions from real faces to synthetic avatars. Then, we use MSDA to transfer the AU detection knowledge from a real dataset and the synthetic dataset to a target dataset. Instead of aligning the overall distributions of different domains, we propose Paired Moment Matching (PM2) to align the features of the paired real and synthetic data with the same facial expression. To further improve gender fairness, PM2 matches the features of the real data with a female and a male synthetic image. Our results indicate that synthetic data and the proposed model improve both AU detection performance and fairness across genders, demonstrating its potential to solve AU detection in-the-wild.
Abstract:Modern wireless communication systems are expected to provide improved latency and reliability. To meet these expectations, a short packet length is needed, which makes the first-order Shannon rate an inaccurate performance metric for such communication systems. A more accurate approximation of the achievable rates of finite-block-length (FBL) coding regimes is known as the normal approximation (NA). It is therefore of substantial interest to study the optimization of the FBL rate in multi-user multiple-input multiple-output (MIMO) systems, in which each user may transmit and/or receive multiple data streams. Hence, we formulate a general optimization problem for improving the spectral and energy efficiency of multi-user MIMO-aided ultra-reliable low-latency communication (URLLC) systems, which are assisted by reconfigurable intelligent surfaces (RISs). We show that a RIS is capable of substantially improving the performance of multi-user MIMO-aided URLLC systems. Moreover, the benefits of RIS increase as the packet length and/or the tolerable bit error rate are reduced. This reveals that RISs can be even more beneficial in URLLC systems for improving the FBL rates than in conventional systems approaching Shannon rates.
Abstract:An emerging technology to enhance the spectral efficiency (SE) and energy efficiency (EE) of wireless communication systems is reconfigurable intelligent surface (RIS), which is shown to be very powerful in single-carrier systems. However, in multi-user orthogonal frequency division multiplexing (OFDM) systems, RIS may not be as promising as in single-carrier systems since an independent optimization of RIS elements at each sub-carrier is impossible in multi-carrier systems. Thus, this paper investigates the performance of various RIS technologies like regular (reflective and passive), simultaneously transmit and reflect (STAR), and multi-sector beyond diagonal (BD) RIS in multi-user multiple-input multiple-output (MIMO) OFDM broadcast channels (BC). This requires to formulate and solve a joint MIMO precoding and RIS optimization problem. The obtained solution reveals that RIS can significantly improve the system performance even when the number of RIS elements is relatively low. Moreover, we develop resource allocation schemes for STAR-RIS and multi-sector BD-RIS in MIMO OFDM BCs, and show that these RIS technologies can outperform a regular RIS, especially when the regular RIS cannot assist the communications for all the users.
Abstract:In this work, we propose MagicDance, a diffusion-based model for 2D human motion and facial expression transfer on challenging human dance videos. Specifically, we aim to generate human dance videos of any target identity driven by novel pose sequences while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressions, skin tone and dressing), consisting of the pretraining of an appearance-control block and fine-tuning of an appearance-pose-joint-control block over human dance poses of the same dataset. Our novel design enables robust appearance control with temporally consistent upper body, facial attributes, and even background. The model also generalizes well on unseen human identities and complex motion sequences without the need for any fine-tuning with additional data with diverse human attributes by leveraging the prior knowledge of image diffusion models. Moreover, the proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion. We also demonstrate the model's ability for zero-shot 2D animation generation, enabling not only the appearance transfer from one identity to another but also allowing for cartoon-like stylization given only pose inputs. Extensive experiments demonstrate our superior performance on the TikTok dataset.