Abstract:Existing 3D facial emotion modeling have been constrained by limited emotion classes and insufficient datasets. This paper introduces "Emo3D", an extensive "Text-Image-Expression dataset" spanning a wide spectrum of human emotions, each paired with images and 3D blendshapes. Leveraging Large Language Models (LLMs), we generate a diverse array of textual descriptions, facilitating the capture of a broad spectrum of emotional expressions. Using this unique dataset, we conduct a comprehensive evaluation of language-based models' fine-tuning and vision-language models like Contranstive Language Image Pretraining (CLIP) for 3D facial expression synthesis. We also introduce a new evaluation metric for this task to more directly measure the conveyed emotion. Our new evaluation metric, Emo3D, demonstrates its superiority over Mean Squared Error (MSE) metrics in assessing visual-text alignment and semantic richness in 3D facial expressions associated with human emotions. "Emo3D" has great applications in animation design, virtual reality, and emotional human-computer interaction.
Abstract:In this paper, a generalization of deep learning-aided joint source channel coding (Deep-JSCC) approach to secure communications is studied. We propose an end-to-end (E2E) learning-based approach for secure communication against multiple eavesdroppers over complex-valued fading channels. Both scenarios of colluding and non-colluding eavesdroppers are studied. For the colluding strategy, eavesdroppers share their logits to collaboratively infer private attributes based on ensemble learning method, while for the non-colluding setup they act alone. The goal is to prevent eavesdroppers from inferring private (sensitive) information about the transmitted images, while delivering the images to a legitimate receiver with minimum distortion. By generalizing the ideas of privacy funnel and wiretap channel coding, the trade-off between the image recovery at the legitimate node and the information leakage to the eavesdroppers is characterized. To solve this secrecy funnel framework, we implement deep neural networks (DNNs) to realize a data-driven secure communication scheme, without relying on a specific data distribution. Simulations over CIFAR-10 dataset verifies the secrecy-utility trade-off. Adversarial accuracy of eavesdroppers are also studied over Rayleigh fading, Nakagami-m, and AWGN channels to verify the generalization of the proposed scheme. Our experiments show that employing the proposed secure neural encoding can decrease the adversarial accuracy by 28%.
Abstract:Physical layer key generation (PLKG) can significantly enhance the security of classic encryption schemes by enabling them to change their secret keys significantly faster and more efficient. However, due to the reliance of PLKG techniques on channel medium, reaching a high secret key rate is challenging in static environments. Recently, exploiting intelligent reflecting surface (IRS) as a means to induce randomness in static wireless channels has received significant research interest. However, the impact of spatial correlation between the IRS elements is rarely studied. To be specific, for the first time, in this contribution, we take into account a spatially correlated IRS which intends to enhance the secret key generation (SKG) rate in a static medium. Closed form analytical expressions for SKG rate are derived for the two cases of random phase shift and equal random phase shift for all the IRS elements. We also analyze the temporal correlation between the channel samples to ensure the randomness of the generated secret key sequence. We further formulate an optimization problem in which we determine the optimal portion of time within a coherence interval dedicated for the direct and indirect channel estimation. We show the accuracy and the fast convergence of our proposed sequential convex programming (SCP) based algorithm and discuss the various parameters affecting spatially correlated IRS assisted PLKG.
Abstract:Recently, simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) have emerged as a novel technology that facilitates sustainable communication by providing 360 coverage and new degrees-of-freedom (DoF) for manipulating signal propagation as well as simultaneous wireless information and power transfer (SWIPT). Inspired by these applications, this paper presents a novel STAR-RIS-aided secure SWIPT system for downlink multiple input single output (MISO) Rate-Splitting multiple access (RSMA) networks. The transmitter concurrently communicates with the information receivers (IRs) and sends energy to untrusted energy receivers (UERs). UERs are also able to wiretap the IR streams. The paper assumes that the channel state information (CSI) of the IRs is known at the transmitter. However, only imperfect CSI (ICSI) for the UERs is available at the transmitter. The paper aims to maximize the achievable worst-case sum secrecy rate (WCSSR) of the IRs under a total transmit power constraint, a sum energy constraint for the UERs, and constraints on the transmission and reflection coefficients by jointly optimizing the precoders and the transmission and reflection beamforming at the STAR-RIS. The formulated problem is non-convex with intricately coupled variables, and to tackle this challenge a suboptimal two-step iterative algorithm based on the sequential parametric convex approximation (SPCA) method is proposed. Specifically, the precoders and the transmission and reflection beamforming vectors are optimized alternatingly. Simulations are conducted to show that the proposed RSMA-based algorithm in a STAR-RIS aided network can improve the secrecy of the confidential information and the overall spectral efficiency.
Abstract:The Cooperative Rate-Splitting (CRS) scheme, proposed evolves from conventional Rate Splitting (RS) and relies on forwarding a portion of the RS message by the relaying users. In terms of secrecy enhancement, it has been shown that CRS outperforms its non-cooperative counterpart for a two-user Multiple Input Single Output (MISO) Broadcast Channel (BC). Given the massive connectivity requirement of 6G, we have generalized the existing secure two-user CRS framework to the multi-user framework, where the highest-security users must be selected as the relay nodes. This paper addresses the problem of maximizing the Worst-Case Secrecy Rate (WCSR) in a UAV-aided downlink network where a multi-antenna UAV Base-Station (UAV-BS) serves a group of users in the presence of an external eavesdropper (Eve). We consider a practical scenario in which only imperfect channel state information of Eve is available at the UAV-BS. Accordingly, we conceive a robust and secure resource allocation algorithm, which maximizes the WCSR by jointly optimizing both the Secure Relaying User Selection (SRUS) and the network parameter allocation problem, including the RS transmit precoders, message splitting variables, time slot sharing and power allocation. To circumvent the resultant non-convexity owing to the discrete variables imposed by SRUS, we propose a two-stage algorithm where the SRUS and network parameter allocation are accomplished in two consecutive stages. With regard to the SRUS, we study both centralized and distributed protocols. On the other hand, for jointly optimizing the network parameter allocation we resort to the Sequential Parametric Convex Approximation (SPCA) algorithm. Our numerical results show that the proposed solution significantly outperforms the existing benchmarks for a wide range of network loads in terms of the WCSR.
Abstract:UAVs are capable of improving the performance of next generation wireless systems. Specifically, UAVs can be exploited as aerial base-stations (UAV-BS) for supporting legitimate ground users in remote uncovered areas or in environments temporarily requiring high capacity. However, their communication performance is prone to both channel estimation errors and potential eavesdropping. Hence, we investigate the effective secrecy throughput of the UAV-aided uplink, in which rate-splitting multiple access (RSMA) is employed by each legitimate user for secure transmission under the scenario of massive access. To maximize the effective network secrecy throughput in the uplink, the transmission rate vs. power allocation relationship is formulated as a max-min optimization problem, relying on realistic imperfect CSI of both the legitimate users and of the potential eavesdroppers (Eves). We then propose a novel transformation of the associated probabilistic constraints for decoupling the variables, so that our design problem can be solved by alternatively activating the related block coordinate decent programming. In the model considered, each user transmits a superposition of two messages to a UAV-BS, each having different transmit power and the UAV-BS uses a SIC technique to decode the received messages. Given the non-convexity of the problem, it is decoupled into a pair of sub-problems. In particular, we derive a closed form expression for the optimal rate-splitting fraction of each user. Then, given the optimal rate-splitting fraction of each user, the \epsilon-constrainted transmit power of each user is calculated by harnessing SPCA programming.
Abstract:Temporal Action Localization (TAL) task in which the aim is to predict the start and end of each action and its class label has many applications in the real world. But due to its complexity, researchers have not reached great results compared to the action recognition task. The complexity is related to predicting precise start and end times for different actions in any video. In this paper, we propose a new network based on Gated Recurrent Unit (GRU) and two novel post-processing ideas for TAL task. Specifically, we propose a new design for the output layer of the GRU resulting in the so-called GRU-Splitted model. Moreover, linear interpolation is used to generate the action proposals with precise start and end times. Finally, to rank the generated proposals appropriately, we use a Learn to Rank (LTR) approach. We evaluated the performance of the proposed method on Thumos14 dataset. Results show the superiority of the performance of the proposed method compared to state-of-the-art. Especially in the mean Average Precision (mAP) metric at Intersection over Union (IoU) 0.7, we get 27.52% which is 5.12% better than that of state-of-the-art methods.
Abstract:In recent years, neural networks achieved much success in various applications. The main challenge in training deep neural networks is the lack of sufficient data to improve the model's generalization and avoid overfitting. One of the solutions is to generate new training samples. This paper proposes a novel data augmentation method for time series based on Dynamic Time Warping. This method is inspired by the concept that warped parts of two time series have the same temporal properties. Exploiting the proposed approach with recently-introduced ResNet reveals the improvement of results on the 2018 UCR Time Series Classification Archive.
Abstract:This paper investigates the physical layer security design of an untrusted relaying network where the source node coexists with a multi-antenna eavesdropper (Eve). While the communication relies on untrustworthy relay nodes to increase reliability, we aim to protect the confidentiality of information against combined eavesdropping attacks performed by both untrusted relay nodes and Eve. Taking into account the hardware impairments, and power budget constraints, this paper presents a novel approach to jointly optimize relay beamformer and transmit powers aimed at maximizing average secrecy rate (ASR). The resultant optimization problem is non-convex, and a suboptimal solution is obtained through the sequential parametric convex approximation (SPCA) method. In order to prevent any failure due to infeasibility, we propose an iterative initialization algorithm to find the feasible initial point of the original problem. To satisfy low-latency as one of the main key performance indicators (KPI) required in beyond 5G (B5G) communications, a computationally efficient data-driven approach is developed exploiting a deep learning model to improve the ASR while the computational burden is significantly reduced. Simulation results assess the effect of different system parameters on the ASR performance as well as the effectiveness of the proposed deep learning solution in large-scale cases.