Abstract:In the realm of competitive gaming, 3D first-person shooter (FPS) games have gained immense popularity, prompting the development of game AI systems to enhance gameplay. However, deploying game AI in practical scenarios still poses challenges, particularly in large-scale and complex FPS games. In this paper, we focus on the practical deployment of game AI in the online multiplayer competitive 3D FPS game called Arena Breakout, developed by Tencent Games. We propose a novel gaming AI system named Private Military Company Agent (PMCA), which is interactable within a large game map and engages in combat with players while utilizing tactical advantages provided by the surrounding terrain. To address the challenges of navigation and combat in modern 3D FPS games, we introduce a method that combines navigation mesh (Navmesh) and shooting-rule with deep reinforcement learning (NSRL). The integration of Navmesh enhances the agent's global navigation capabilities while shooting behavior is controlled using rule-based methods to ensure controllability. NSRL employs a DRL model to predict when to enable the navigation mesh, resulting in a diverse range of behaviors for the game AI. Customized rewards for human-like behaviors are also employed to align PMCA's behavior with that of human players.
Abstract:Deep learning has a wide range of applications in industrial scenario, but reducing false alarm (FA) remains a major difficulty. Optimizing network architecture or network parameters is used to tackle this challenge in academic circles, while ignoring the essential characteristics of data in application scenarios, which often results in increased FA in new scenarios. In this paper, we propose a novel paradigm for fine-grained design of datasets, driven by industrial applications. We flexibly select positive and negative sample sets according to the essential features of the data and application requirements, and add the remaining samples to the training set as uncertainty classes. We collect more than 10,000 mask-wearing recognition samples covering various application scenarios as our experimental data. Compared with the traditional data design methods, our method achieves better results and effectively reduces FA. We make all contributions available to the research community for broader use. The contributions will be available at https://github.com/huh30/OpenDatasets.
Abstract:This paper presents a generative adversarial learning-based human upper body video synthesis approach to generate an upper body video of target person that is consistent with the body motion, face expression, and pose of the person in source video. We use upper body keypoints, facial action units and poses as intermediate representations between source video and target video. Instead of directly transferring the source video to the target video, we firstly map the source person's facial action units and poses into the target person's facial landmarks, then combine the normalized upper body keypoints and generated facial landmarks with spatio-temporal smoothing to generate the corresponding target video's image. Experimental results demonstrated the effectiveness of our method.
Abstract:This paper presents a novel framework to generate realistic face video of an anchor, who is reading certain news. This task is also known as Virtual Anchor. Given some paragraphs of words, we first utilize a pretrained Word2Vec model to embed each word into a vector; then we utilize a Seq2Seq-based model to translate these word embeddings into action units and head poses of the target anchor; these action units and head poses will be concatenated with facial landmarks as well as the former $n$ synthesized frames, and the concatenation serves as input of a Pix2PixHD-based model to synthesize realistic facial images for the virtual anchor. The experimental results demonstrate our framework is feasible for the synthesis of virtual anchor.
Abstract:To improve the experiences of face-to-face conversation with avatar, this paper presents a novel conversation system. It is composed of two sequence-to-sequence models respectively for listening and speaking and a Generative Adversarial Network (GAN) based realistic avatar synthesizer. The models exploit the facial action and head pose to learn natural human reactions. Based on the models' output, the synthesizer uses the Pixel2Pixel model to generate realistic facial images. To show the improvement of our system, we use a 3D model based avatar driving scheme as a reference. We train and evaluate our neural networks with the data from ESPN shows. Experimental results show that our conversation system can generate natural facial reactions and realistic facial images.
Abstract:This paper aims to learn a compact representation of a video for video face recognition task. We make the following contributions: first, we propose a meta attention-based aggregation scheme which adaptively and fine-grained weighs the feature along each feature dimension among all frames to form a compact and discriminative representation. It makes the best to exploit the valuable or discriminative part of each frame to promote the performance of face recognition, without discarding or despising low quality frames as usual methods do. Second, we build a feature aggregation network comprised of a feature embedding module and a feature aggregation module. The embedding module is a convolutional neural network used to extract a feature vector from a face image, while the aggregation module consists of cascaded two meta attention blocks which adaptively aggregate the feature vectors into a single fixed-length representation. The network can deal with arbitrary number of frames, and is insensitive to frame order. Third, we validate the performance of proposed aggregation scheme. Experiments on publicly available datasets, such as YouTube face dataset and IJB-A dataset, show the effectiveness of our method, and it achieves competitive performances on both the verification and identification protocols.