Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Keld Lundgaard

Sequential Recommendation via Adaptive Robust Attention with Multi-dimensional Embeddings

Sep 08, 2024

Linsey Pang, Amir Hossein Raffiee, Wei Liu, Keld Lundgaard

Figure 1 for Sequential Recommendation via Adaptive Robust Attention with Multi-dimensional Embeddings

Figure 2 for Sequential Recommendation via Adaptive Robust Attention with Multi-dimensional Embeddings

Figure 3 for Sequential Recommendation via Adaptive Robust Attention with Multi-dimensional Embeddings

Figure 4 for Sequential Recommendation via Adaptive Robust Attention with Multi-dimensional Embeddings

Abstract:Sequential recommendation models have achieved state-of-the-art performance using self-attention mechanism. It has since been found that moving beyond only using item ID and positional embeddings leads to a significant accuracy boost when predicting the next item. In recent literature, it was reported that a multi-dimensional kernel embedding with temporal contextual kernels to capture users' diverse behavioral patterns results in a substantial performance improvement. In this study, we further improve the sequential recommender model's robustness and generalization by introducing a mix-attention mechanism with a layer-wise noise injection (LNI) regularization. We refer to our proposed model as adaptive robust sequential recommendation framework (ADRRec), and demonstrate through extensive experiments that our model outperforms existing self-attention architectures.

Via

Access Paper or Ask Questions

Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

Aug 12, 2023

Zhichao Wang, Mengyu Dai, Keld Lundgaard

Figure 1 for Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

Figure 2 for Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

Abstract:The advent of ChatGPT has introduced innovative methods for information gathering and analysis. However, the information provided by ChatGPT is limited to text, and the visualization of this information remains constrained. Previous research has explored zero-shot text-to-video (TTV) approaches to transform text into videos. However, these methods lacked control over the identity of the generated audio, i.e., not identity-agnostic, hindering their effectiveness. To address this limitation, we propose a novel two-stage framework for person-agnostic video cloning, specifically focusing on TTV generation. In the first stage, we leverage pretrained zero-shot models to achieve text-to-speech (TTS) conversion. In the second stage, an audio-driven talking head generation method is employed to produce compelling videos privided the audio generated in the first stage. This paper presents a comparative analysis of different TTS and audio-driven talking head generation methods, identifying the most promising approach for future research and development. Some audio and videos samples can be found in the following link: https://github.com/ZhichaoWang970201/Text-to-Video/tree/main.

* 6 pages

Via

Access Paper or Ask Questions