Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huan Hu

SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models

May 23, 2025

Xiang Liu, Zhaoxiang Liu, Peng Wang, Kohou Wang, Huan Hu, Kai Wang, Shiguo Lian

Abstract:When using supervised fine-tuning (SFT) to adapt large language models (LLMs) to specific domains, a significant challenge arises: should we use the entire SFT dataset for fine-tuning? Common practice often involves fine-tuning directly on the entire dataset due to limited information on the LLM's past training data. However, if the SFT dataset largely overlaps with the model's existing knowledge, the performance gains are minimal, leading to wasted computational resources. Identifying the unknown knowledge within the SFT dataset and using it to fine-tune the model could substantially improve the training efficiency. To address this challenge, we propose a self-learning framework for LLMs inspired by human learning pattern. This framework takes a fine-tuning (SFT) dataset in a specific domain as input. First, the LLMs answer the questions in the SFT dataset. The LLMs then objectively grade the responses and filter out the incorrectly answered QA pairs. Finally, we fine-tune the LLMs based on this filtered QA set. Experimental results in the fields of agriculture and medicine demonstrate that our method substantially reduces training time while achieving comparable improvements to those attained with full dataset fine-tuning. By concentrating on the unknown knowledge within the SFT dataset, our approach enhances the efficiency of fine-tuning LLMs.

* 12 pages, 5 figures

Via

Access Paper or Ask Questions

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis

Mar 10, 2025

Xiang Liu, Zhaoxiang Liu, Huan Hu, Zezhou Chen, Kohou Wang, Kai Wang, Shiguo Lian

Abstract:While conversational generative AI has shown considerable potential in enhancing decision-making for agricultural professionals, its exploration has predominantly been anchored in text-based interactions. The evolution of multimodal conversational AI, leveraging vast amounts of image-text data from diverse sources, marks a significant stride forward. However, the application of such advanced vision-language models in the agricultural domain, particularly for crop disease diagnosis, remains underexplored. In this work, we present the crop disease domain multimodal (CDDM) dataset, a pioneering resource designed to advance the field of agricultural research through the application of multimodal learning techniques. The dataset comprises 137,000 images of various crop diseases, accompanied by 1 million question-answer pairs that span a broad spectrum of agricultural knowledge, from disease identification to management practices. By integrating visual and textual data, CDDM facilitates the development of sophisticated question-answering systems capable of providing precise, useful advice to farmers and agricultural professionals. We demonstrate the utility of the dataset by finetuning state-of-the-art multimodal models, showcasing significant improvements in crop disease diagnosis. Specifically, we employed a novel finetuning strategy that utilizes low-rank adaptation (LoRA) to finetune the visual encoder, adapter and language model simultaneously. Our contributions include not only the dataset but also a finetuning strategy and a benchmark to stimulate further research in agricultural technology, aiming to bridge the gap between advanced AI techniques and practical agricultural applications. The dataset is available at https: //github.com/UnicomAI/UnicomBenchmark/tree/main/CDDMBench.

* Accepted by ECCV 2024 (14 pages, 8 figures)

Via

Access Paper or Ask Questions

Optimizing for the Shortest Path in Denoising Diffusion Model

Mar 06, 2025

Ping Chen, Xingpeng Zhang, Zhaoxiang Liu, Huan Hu, Xiang Liu, Kai Wang, Min Wang, Yanlin Qian, Shiguo Lian

Abstract:In this research, we propose a novel denoising diffusion model based on shortest-path modeling that optimizes residual propagation to enhance both denoising efficiency and quality. Drawing on Denoising Diffusion Implicit Models (DDIM) and insights from graph theory, our model, termed the Shortest Path Diffusion Model (ShortDF), treats the denoising process as a shortest-path problem aimed at minimizing reconstruction error. By optimizing the initial residuals, we improve the efficiency of the reverse diffusion process and the quality of the generated samples. Extensive experiments on multiple standard benchmarks demonstrate that ShortDF significantly reduces diffusion time (or steps) while enhancing the visual fidelity of generated samples compared to prior arts. This work, we suppose, paves the way for interactive diffusion-based applications and establishes a foundation for rapid data generation. Code is available at https://github.com/UnicomAI/ShortDF

* Accepet by CVPR 2025 (10 pages, 6 figures)

Via

Access Paper or Ask Questions

Training Interactive Agent in Large FPS Game Map with Rule-enhanced Reinforcement Learning

Oct 07, 2024

Chen Zhang, Huan Hu, Yuan Zhou, Qiyang Cao, Ruochen Liu, Wenya Wei, Elvis S. Liu

Abstract:In the realm of competitive gaming, 3D first-person shooter (FPS) games have gained immense popularity, prompting the development of game AI systems to enhance gameplay. However, deploying game AI in practical scenarios still poses challenges, particularly in large-scale and complex FPS games. In this paper, we focus on the practical deployment of game AI in the online multiplayer competitive 3D FPS game called Arena Breakout, developed by Tencent Games. We propose a novel gaming AI system named Private Military Company Agent (PMCA), which is interactable within a large game map and engages in combat with players while utilizing tactical advantages provided by the surrounding terrain. To address the challenges of navigation and combat in modern 3D FPS games, we introduce a method that combines navigation mesh (Navmesh) and shooting-rule with deep reinforcement learning (NSRL). The integration of Navmesh enhances the agent's global navigation capabilities while shooting behavior is controlled using rule-based methods to ensure controllability. NSRL employs a DRL model to predict when to enable the navigation mesh, resulting in a diverse range of behaviors for the game AI. Customized rewards for human-like behaviors are also employed to align PMCA's behavior with that of human players.

Via

Access Paper or Ask Questions

Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design

Sep 23, 2022

Huan Hu, Yajie Cui, Zhaoxiang Liu, Shiguo Lian

Figure 1 for Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design

Figure 2 for Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design

Figure 3 for Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design

Figure 4 for Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design

Abstract:Deep learning has a wide range of applications in industrial scenario, but reducing false alarm (FA) remains a major difficulty. Optimizing network architecture or network parameters is used to tackle this challenge in academic circles, while ignoring the essential characteristics of data in application scenarios, which often results in increased FA in new scenarios. In this paper, we propose a novel paradigm for fine-grained design of datasets, driven by industrial applications. We flexibly select positive and negative sample sets according to the essential features of the data and application requirements, and add the remaining samples to the training set as uncertainty classes. We collect more than 10,000 mask-wearing recognition samples covering various application scenarios as our experimental data. Compared with the traditional data design methods, our method achieves better results and effectively reduces FA. We make all contributions available to the research community for broader use. The contributions will be available at https://github.com/huh30/OpenDatasets.

Via

Access Paper or Ask Questions

Video synthesis of human upper body with realistic face

Sep 12, 2019

Zhaoxiang Liu, Huan Hu, Zipeng Wang, Kai Wang, Jinqiang Bai, Shiguo Lian

Figure 1 for Video synthesis of human upper body with realistic face

Figure 2 for Video synthesis of human upper body with realistic face

Figure 3 for Video synthesis of human upper body with realistic face

Figure 4 for Video synthesis of human upper body with realistic face

Abstract:This paper presents a generative adversarial learning-based human upper body video synthesis approach to generate an upper body video of target person that is consistent with the body motion, face expression, and pose of the person in source video. We use upper body keypoints, facial action units and poses as intermediate representations between source video and target video. Instead of directly transferring the source video to the target video, we firstly map the source person's facial action units and poses into the target person's facial landmarks, then combine the normalized upper body keypoints and generated facial landmarks with spatio-temporal smoothing to generate the corresponding target video's image. Experimental results demonstrated the effectiveness of our method.

* 3 pages, 4 figures,Accepted by ISMAR 2019

Via

Access Paper or Ask Questions

A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

Sep 12, 2019

Zipeng Wang, Zhaoxiang Liu, Zezhou Chen, Huan Hu, Shiguo Lian

Figure 1 for A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

Figure 2 for A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

Figure 3 for A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

Figure 4 for A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

Abstract:This paper presents a novel framework to generate realistic face video of an anchor, who is reading certain news. This task is also known as Virtual Anchor. Given some paragraphs of words, we first utilize a pretrained Word2Vec model to embed each word into a vector; then we utilize a Seq2Seq-based model to translate these word embeddings into action units and head poses of the target anchor; these action units and head poses will be concatenated with facial landmarks as well as the former $n$ synthesized frames, and the concatenation serves as input of a Pix2PixHD-based model to synthesize realistic facial images for the virtual anchor. The experimental results demonstrate our framework is feasible for the synthesis of virtual anchor.

* Accepted to ISMAR 2019

Via

Access Paper or Ask Questions

A Realistic Face-to-Face Conversation System based on Deep Neural Networks

Aug 21, 2019

Zezhou Chen, Zhaoxiang Liu, Huan Hu, Jinqiang Bai, Shiguo Lian, Fuyuan Shi, Kai Wang

Figure 1 for A Realistic Face-to-Face Conversation System based on Deep Neural Networks

Figure 2 for A Realistic Face-to-Face Conversation System based on Deep Neural Networks

Figure 3 for A Realistic Face-to-Face Conversation System based on Deep Neural Networks

Figure 4 for A Realistic Face-to-Face Conversation System based on Deep Neural Networks

Abstract:To improve the experiences of face-to-face conversation with avatar, this paper presents a novel conversation system. It is composed of two sequence-to-sequence models respectively for listening and speaking and a Generative Adversarial Network (GAN) based realistic avatar synthesizer. The models exploit the facial action and head pose to learn natural human reactions. Based on the models' output, the synthesizer uses the Pixel2Pixel model to generate realistic facial images. To show the improvement of our system, we use a 3D model based avatar driving scheme as a reference. We train and evaluate our neural networks with the data from ESPN shows. Experimental results show that our conversation system can generate natural facial reactions and realistic facial images.

* Accepted to ICCV 2019 workshop

Via

Access Paper or Ask Questions

Fine-grained Attention-based Video Face Recognition

May 06, 2019

Zhaoxiang Liu, Huan Hu, Jinqiang Bai, Shaohua Li, Shiguo Lian

Figure 1 for Fine-grained Attention-based Video Face Recognition

Figure 2 for Fine-grained Attention-based Video Face Recognition

Figure 3 for Fine-grained Attention-based Video Face Recognition

Figure 4 for Fine-grained Attention-based Video Face Recognition

Abstract:This paper aims to learn a compact representation of a video for video face recognition task. We make the following contributions: first, we propose a meta attention-based aggregation scheme which adaptively and fine-grained weighs the feature along each feature dimension among all frames to form a compact and discriminative representation. It makes the best to exploit the valuable or discriminative part of each frame to promote the performance of face recognition, without discarding or despising low quality frames as usual methods do. Second, we build a feature aggregation network comprised of a feature embedding module and a feature aggregation module. The embedding module is a convolutional neural network used to extract a feature vector from a face image, while the aggregation module consists of cascaded two meta attention blocks which adaptively aggregate the feature vectors into a single fixed-length representation. The network can deal with arbitrary number of frames, and is insensitive to frame order. Third, we validate the performance of proposed aggregation scheme. Experiments on publicly available datasets, such as YouTube face dataset and IJB-A dataset, show the effectiveness of our method, and it achieves competitive performances on both the verification and identification protocols.

* 9 pages, 6 figures

Via

Access Paper or Ask Questions