Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Fang

Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach

Jun 18, 2025

Wenqi Guan, Yang Fang

Abstract:Large Language Models have brought a radical change in the process of remote learning students, among other aspects of educative activities. Current retrieval of remote learning resources lacks depth in contextual meaning that provides comprehensive information on complex student queries. This work proposes a novel approach to enhancing remote learning retrieval by integrating GPT-based models within the LangChain framework. We achieve this system in a more intuitive and productive manner using CoT reasoning and prompt engineering. The framework we propose puts much emphasis on increasing the precision and relevance of the retrieval results to return comprehensive and contextually enriched explanations and resources that best suit each student's needs. We also assess the effectiveness of our approach against paradigmatic LLMs and report improvements in user satisfaction and learning outcomes.

Via

Access Paper or Ask Questions

YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework

Jun 18, 2025

Dahang Wan, Rongsheng Lu, Yang Fang, Xianli Lang, Shuangbao Shu, Jingjing Chen, Siyuan Shen, Ting Xu, Zecong Ye

Abstract:Multispectral object detection, which integrates information from multiple bands, can enhance detection accuracy and environmental adaptability, holding great application potential across various fields. Although existing methods have made progress in cross-modal interaction, low-light conditions, and model lightweight, there are still challenges like the lack of a unified single-stage framework, difficulty in balancing performance and fusion strategy, and unreasonable modality weight allocation. To address these, based on the YOLOv11 framework, we present YOLOv11-RGBT, a new comprehensive multimodal object detection framework. We designed six multispectral fusion modes and successfully applied them to models from YOLOv3 to YOLOv12 and RT-DETR. After reevaluating the importance of the two modalities, we proposed a P3 mid-fusion strategy and multispectral controllable fine-tuning (MCF) strategy for multispectral models. These improvements optimize feature fusion, reduce redundancy and mismatches, and boost overall model performance. Experiments show our framework excels on three major open-source multispectral object detection datasets, like LLVIP and FLIR. Particularly, the multispectral controllable fine-tuning strategy significantly enhanced model adaptability and robustness. On the FLIR dataset, it consistently improved YOLOv11 models' mAP by 3.41%-5.65%, reaching a maximum of 47.61%, verifying the framework and strategies' effectiveness. The code is available at: https://github.com/wandahangFY/YOLOv11-RGBT.

* 29 pages, 8 figures . The errors in the first version have been corrected, and no new version will be submitted in the near future. The next version will include more experiments

Via

Access Paper or Ask Questions

3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI Image Similarity Challenge

Dec 29, 2021

Xinlong Sun, Yangyang Qin, Xuyuan Xu, Guoping Gong, Yang Fang, Yexin Wang

Figure 1 for 3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI Image Similarity Challenge

Figure 2 for 3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI Image Similarity Challenge

Figure 3 for 3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI Image Similarity Challenge

Figure 4 for 3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI Image Similarity Challenge

Abstract:As a basic task of computer vision, image similarity retrieval is facing the challenge of large-scale data and image copy attacks. This paper presents our 3rd place solution to the matching track of Image Similarity Challenge (ISC) 2021 organized by Facebook AI. We propose a multi-branch retrieval method of combining global descriptors and local descriptors to cover all attack cases. Specifically, we attempt many strategies to optimize global descriptors, including abundant data augmentations, self-supervised learning with a single Transformer model, overlay detection preprocessing. Moreover, we introduce the robust SIFT feature and GPU Faiss for local retrieval which makes up for the shortcomings of the global retrieval. Finally, KNN-matching algorithm is used to judge the match and merge scores. We show some ablation experiments of our method, which reveals the complementary advantages of global and local features.

* This is the 3rd place solution for Facebook Image Similarity Challenge and NIPS2021 Workshop. The current first draft version will be updated later

Via

Access Paper or Ask Questions

RSINet: Rotation-Scale Invariant Network for Online Visual Tracking

Nov 18, 2020

Yang Fang, Geun-Sik Jo, Chang-Hee Lee

Figure 1 for RSINet: Rotation-Scale Invariant Network for Online Visual Tracking

Figure 2 for RSINet: Rotation-Scale Invariant Network for Online Visual Tracking

Figure 3 for RSINet: Rotation-Scale Invariant Network for Online Visual Tracking

Figure 4 for RSINet: Rotation-Scale Invariant Network for Online Visual Tracking

Abstract:Most Siamese network-based trackers perform the tracking process without model update, and cannot learn targetspecific variation adaptively. Moreover, Siamese-based trackers infer the new state of tracked objects by generating axis-aligned bounding boxes, which contain extra background noise, and are unable to accurately estimate the rotation and scale transformation of moving objects, thus potentially reducing tracking performance. In this paper, we propose a novel Rotation-Scale Invariant Network (RSINet) to address the above problem. Our RSINet tracker consists of a target-distractor discrimination branch and a rotation-scale estimation branch, the rotation and scale knowledge can be explicitly learned by a multi-task learning method in an end-to-end manner. In addtion, the tracking model is adaptively optimized and updated under spatio-temporal energy control, which ensures model stability and reliability, as well as high tracking efficiency. Comprehensive experiments on OTB-100, VOT2018, and LaSOT benchmarks demonstrate that our proposed RSINet tracker yields new state-of-the-art performance compared with recent trackers, while running at real-time speed about 45 FPS.

* 8 pages, 5 figures, the paper has been accepted by international conference on pattern recognition 2020

Via

Access Paper or Ask Questions

Exploring Heterogeneous Information Networks via Pre-Training

Jul 07, 2020

Yang Fang, Xiang Zhao, Weidong Xiao

Figure 1 for Exploring Heterogeneous Information Networks via Pre-Training

Figure 2 for Exploring Heterogeneous Information Networks via Pre-Training

Figure 3 for Exploring Heterogeneous Information Networks via Pre-Training

Figure 4 for Exploring Heterogeneous Information Networks via Pre-Training

Abstract:To explore heterogeneous information networks (HINs), network representation learning (NRL) is proposed, which represents a network in a low-dimension space. Recently, graph neural networks (GNNs) have drawn a lot of attention which are very expressive for mining a HIN, while they suffer from low efficiency issue. In this paper, we propose a pre-training and fine-tuning framework PF-HIN to capture the features of a HIN. Unlike traditional GNNs that have to train the whole model for each downstream task, PF-HIN only needs to fine-tune the model using the pre-trained parameters and minimal extra task-specific parameters, thus improving the model efficiency and effectiveness. Specifically, in pre-training phase, we first use a ranking-based BFS strategy to form the input node sequence. Then inspired by BERT, we adopt deep bi-directional transformer encoders to train the model, which is a variant of GNN aggregator that is more powerful than traditional deep neural networks like CNN and LSTM. The model is pre-trained based on two tasks, i.e., masked node modeling (MNM) and adjacent node prediction (ANP). Additionally, we leverage factorized embedding parameterization and cross-layer parameter sharing to reduce the parameters. In fine-tuning stage, we choose four benchmark downstream tasks, i.e., link prediction, similarity search, node classification and node clustering. We use node sequence pairs as input for link prediction and similarity search, and a single node sequence as input for node classification and clustering. The experimental results of the above tasks on four real-world datasets verify the advancement of PF-HIN, as it outperforms state-of-the-art alternatives consistently and significantly.

Via

Access Paper or Ask Questions

Content-Based Top-N Recommendation using Heterogeneous Relations

Jun 27, 2016

Yifan Chen, Xiang Zhao, Junjiao Gan, Junkai Ren, Yang Fang

Figure 1 for Content-Based Top-N Recommendation using Heterogeneous Relations

Figure 2 for Content-Based Top-N Recommendation using Heterogeneous Relations

Figure 3 for Content-Based Top-N Recommendation using Heterogeneous Relations

Figure 4 for Content-Based Top-N Recommendation using Heterogeneous Relations

Abstract:Top-$N$ recommender systems have been extensively studied. However, the sparsity of user-item activities has not been well resolved. While many hybrid systems were proposed to address the cold-start problem, the profile information has not been sufficiently leveraged. Furthermore, the heterogeneity of profiles between users and items intensifies the challenge. In this paper, we propose a content-based top-$N$ recommender system by learning the global term weights in profiles. To achieve this, we bring in PathSim, which could well measures the node similarity with heterogeneous relations (between users and items). Starting from the original TF-IDF value, the global term weights gradually converge, and eventually reflect both profile and activity information. To facilitate training, the derivative is reformulated into matrix form, which could easily be paralleled. We conduct extensive experiments, which demonstrate the superiority of the proposed method.

* 13 pages, 8 figures, ADC 2016

Via

Access Paper or Ask Questions