Abstract:We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a progressive learning mechanism that generates 3D facial animations by introducing key motion capture to decrease cross-modal mapping uncertainty and learning complexity. Concretely, our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The former identifies key motions and learns the associated 3D facial expressions, ensuring accurate lip-speech synchronization. The latter extends key motions into a full sequence of 3D talking faces guided by audio features, improving temporal coherence and audio-visual consistency. Extensive experimental comparisons against existing state-of-the-art methods demonstrate the superiority of our approach in generating more vivid and consistent talking face animations. Consistent enhancements in results through the integration of our proposed learning scheme with existing methods underscore the efficacy of our approach. Our code and weights will be at the project website: \url{https://github.com/ffxzh/KMTalk}.
Abstract:Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective molecule optimization tasks under both supervised and unsupervised molecular discovery settings. Codes and demos are publicly available on GitHub at https://github.com/garywei944/ChemFlow.
Abstract:Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective and ethical deployment. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare, emphasizing the critical need for empirical validation to fully exploit their capabilities in enhancing healthcare outcomes. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness. We begin by exploring the roles of LLMs in different medical applications, detailing how they are evaluated based on their performance in tasks such as clinical application, medical text data processing, information retrieval, data analysis, medical scientific writing, educational content generation etc. The subsequent sections delve into the methodologies employed in these evaluations, discussing the benchmarks and metrics used to assess the models' effectiveness, accuracy, and ethical alignment. Through this survey, we aim to equip healthcare professionals, researchers, and policymakers with a comprehensive understanding of the potential strengths and limitations of LLMs in medical applications. By providing detailed insights into the evaluation processes and the challenges faced in integrating LLMs into healthcare, this survey seeks to guide the responsible development and deployment of these powerful models, ensuring they are harnessed to their full potential while maintaining stringent ethical standards.
Abstract:The integration of Large Language Models (LLMs) like GPT-4 into traditional Natural Language Processing (NLP) tasks has opened new avenues for enhancing model performance while reducing the reliance on extensive human annotations. This paper presents a novel approach that leverages the Chain of Thought (CoT) prompting technique to distill knowledge from GPT-4, subsequently applying it to improve the efficiency and effectiveness of a smaller model, BERT, on Named Entity Recognition (NER) tasks. Our method involves a two-phase training process: initially employing GPT-4 annotated data for pre-training and then refining the model with a combination of distilled and original human-annotated data. The results demonstrate that our mixed-training strategy significantly outperforms models trained solely on human annotations, achieving superior F1-scores and showcasing a cost-effective solution for resource-limited or closed-network settings. The study also discusses the challenges encountered, such as LLM output variability and the tendency towards hallucinations, proposing future work directions to enhance prompt design and annotation selection. Our findings indicate a promising synergy between LLM insights and traditional NLP techniques, paving the way for more accessible and robust NLP applications.
Abstract:We introduce a framework for AI-based medical consultation system with knowledge graph embedding and reinforcement learning components and its implement. Our implement of this framework leverages knowledge organized as a graph to have diagnosis according to evidence collected from patients recurrently and dynamically. According to experiment we designed for evaluating its performance, it archives a good result. More importantly, for getting better performance, researchers can implement it on this framework based on their innovative ideas, well designed experiments and even clinical trials.