Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiarui Cai

InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models

Jan 19, 2025

Jing Ding, Kai Feng, Binbin Lin, Jiarui Cai, Qiushi Wang, Yu Xie, Xiaojin Zhang, Zhongyu Wei, Wei Chen

Abstract:The application of large language models (LLMs) has achieved remarkable success in various fields, but their effectiveness in specialized domains like the Chinese insurance industry remains underexplored. The complexity of insurance knowledge, encompassing specialized terminology and diverse data types, poses significant challenges for both models and users. To address this, we introduce InsQABench, a benchmark dataset for the Chinese insurance sector, structured into three categories: Insurance Commonsense Knowledge, Insurance Structured Database, and Insurance Unstructured Documents, reflecting real-world insurance question-answering tasks.We also propose two methods, SQL-ReAct and RAG-ReAct, to tackle challenges in structured and unstructured data tasks. Evaluations show that while LLMs struggle with domain-specific terminology and nuanced clause texts, fine-tuning on InsQABench significantly improves performance. Our benchmark establishes a solid foundation for advancing LLM applications in the insurance domain, with data and code available at https://github.com/HaileyFamo/InsQABench.git.

Via

Access Paper or Ask Questions

Hyperbolic Learning with Synthetic Captions for Open-World Detection

Apr 07, 2024

Fanjie Kong, Yanbei Chen, Jiarui Cai, Davide Modolo

Figure 1 for Hyperbolic Learning with Synthetic Captions for Open-World Detection

Figure 2 for Hyperbolic Learning with Synthetic Captions for Open-World Detection

Figure 3 for Hyperbolic Learning with Synthetic Captions for Open-World Detection

Figure 4 for Hyperbolic Learning with Synthetic Captions for Open-World Detection

Abstract:Open-world detection poses significant challenges, as it requires the detection of any object using either object class labels or free-form texts. Existing related works often use large-scale manual annotated caption datasets for training, which are extremely expensive to collect. Instead, we propose to transfer knowledge from vision-language models (VLMs) to enrich the open-vocabulary descriptions automatically. Specifically, we bootstrap dense synthetic captions using pre-trained VLMs to provide rich descriptions on different regions in images, and incorporate these captions to train a novel detector that generalizes to novel concepts. To mitigate the noise caused by hallucination in synthetic captions, we also propose a novel hyperbolic vision-language learning approach to impose a hierarchy between visual and caption embeddings. We call our detector ``HyperLearner''. We conduct extensive experiments on a wide variety of open-world detection benchmarks (COCO, LVIS, Object Detection in the Wild, RefCOCO) and our results show that our model consistently outperforms existing state-of-the-art methods, such as GLIP, GLIPv2 and Grounding DINO, when using the same backbone.

* CVPR 2024

Via

Access Paper or Ask Questions

MeMOT: Multi-Object Tracking with Memory

Mar 31, 2022

Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

Figure 1 for MeMOT: Multi-Object Tracking with Memory

Figure 2 for MeMOT: Multi-Object Tracking with Memory

Figure 3 for MeMOT: Multi-Object Tracking with Memory

Figure 4 for MeMOT: Multi-Object Tracking with Memory

Abstract:We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span. This is realized by preserving a large spatio-temporal memory to store the identity embeddings of the tracked objects, and by adaptively referencing and aggregating useful information from the memory as needed. Our model, called MeMOT, consists of three main modules that are all Transformer-based: 1) Hypothesis Generation that produce object proposals in the current video frame; 2) Memory Encoding that extracts the core information from the memory for each tracked object; and 3) Memory Decoding that solves the object detection and data association tasks simultaneously for multi-object tracking. When evaluated on widely adopted MOT benchmark datasets, MeMOT observes very competitive performance.

* CVPR 2022 Oral

Via

Access Paper or Ask Questions

ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot

Aug 05, 2021

Jiarui Cai, Yizhou Wang, Jenq-Neng Hwang

Figure 1 for ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot

Figure 2 for ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot

Figure 3 for ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot

Figure 4 for ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot

Abstract:One-stage long-tailed recognition methods improve the overall performance in a "seesaw" manner, i.e., either sacrifice the head's accuracy for better tail classification or elevate the head's accuracy even higher but ignore the tail. Existing algorithms bypass such trade-off by a multi-stage training process: pre-training on imbalanced set and fine-tuning on balanced set. Though achieving promising performance, not only are they sensitive to the generalizability of the pre-trained model, but also not easily integrated into other computer vision tasks like detection and segmentation, where pre-training of classifiers solely is not applicable. In this paper, we propose a one-stage long-tailed recognition scheme, ally complementary experts (ACE), where the expert is the most knowledgeable specialist in a sub-set that dominates its training, and is complementary to other experts in the less-seen categories without being disturbed by what it has never seen. We design a distribution-adaptive optimizer to adjust the learning pace of each expert to avoid over-fitting. Without special bells and whistles, the vanilla ACE outperforms the current one-stage SOTA method by 3-10% on CIFAR10-LT, CIFAR100-LT, ImageNet-LT and iNaturalist datasets. It is also shown to be the first one to break the "seesaw" trade-off by improving the accuracy of the majority and minority categories simultaneously in only one stage. Code and trained models are at https://github.com/jrcai/ACE.

* ICCV 2021 (Oral)

Via

Access Paper or Ask Questions

NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Jul 02, 2021

Jerrick Liu, Nathan Inkawhich, Oliver Nina, Radu Timofte, Sahil Jain, Bob Lee, Yuru Duan, Wei Wei, Lei Zhang, Songzheng Xu(+23 more)

Figure 1 for NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Figure 2 for NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Figure 3 for NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Figure 4 for NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Abstract:In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in complementary ways. We discuss the top methods submitted for this competition and evaluate their results on our blind test set. Our challenge results show significant improvement of more than 15% accuracy from our current baselines for each track of the competition

* Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, 588-595
* 10 pages, 1 figure. Conference on Computer Vision and Pattern Recognition

Via

Access Paper or Ask Questions

Multi-Target Multi-Camera Tracking of Vehicles using Metadata-Aided Re-ID and Trajectory-Based Camera Link Model

May 03, 2021

Hung-Min Hsu, Jiarui Cai, Yizhou Wang, Jenq-Neng Hwang, Kwang-Ju Kim

Figure 1 for Multi-Target Multi-Camera Tracking of Vehicles using Metadata-Aided Re-ID and Trajectory-Based Camera Link Model

Figure 2 for Multi-Target Multi-Camera Tracking of Vehicles using Metadata-Aided Re-ID and Trajectory-Based Camera Link Model

Figure 3 for Multi-Target Multi-Camera Tracking of Vehicles using Metadata-Aided Re-ID and Trajectory-Based Camera Link Model

Figure 4 for Multi-Target Multi-Camera Tracking of Vehicles using Metadata-Aided Re-ID and Trajectory-Based Camera Link Model

Abstract:In this paper, we propose a novel framework for multi-target multi-camera tracking (MTMCT) of vehicles based on metadata-aided re-identification (MA-ReID) and the trajectory-based camera link model (TCLM). Given a video sequence and the corresponding frame-by-frame vehicle detections, we first address the isolated tracklets issue from single camera tracking (SCT) by the proposed traffic-aware single-camera tracking (TSCT). Then, after automatically constructing the TCLM, we solve MTMCT by the MA-ReID. The TCLM is generated from camera topological configuration to obtain the spatial and temporal information to improve the performance of MTMCT by reducing the candidate search of ReID. We also use the temporal attention model to create more discriminative embeddings of trajectories from each camera to achieve robust distance measures for vehicle ReID. Moreover, we train a metadata classifier for MTMCT to obtain the metadata feature, which is concatenated with the temporal attention based embeddings. Finally, the TCLM and hierarchical clustering are jointly applied for global ID assignment. The proposed method is evaluated on the CityFlow dataset, achieving IDF1 76.77%, which outperforms the state-of-the-art MTMCT methods.

* IEEE Transactions on Image Processing

Via

Access Paper or Ask Questions

IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency

Jun 24, 2020

Jiarui Cai, Yizhou Wang, Haotian Zhang, Hung-Min Hsu, Chengqian Ma, Jenq-Neng Hwang

Figure 1 for IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency

Figure 2 for IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency

Figure 3 for IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency

Abstract:Multiple object tracking (MOT) is a crucial task in computer vision society. However, most tracking-by-detection MOT methods, with available detected bounding boxes, cannot effectively handle static, slow-moving and fast-moving camera scenarios simultaneously due to ego-motion and frequent occlusion. In this work, we propose a novel tracking framework, called "instance-aware MOT" (IA-MOT), that can track multiple objects in either static or moving cameras by jointly considering the instance-level features and object motions. First, robust appearance features are extracted from a variant of Mask R-CNN detector with an additional embedding head, by sending the given detections as the region proposals. Meanwhile, the spatial attention, which focuses on the foreground within the bounding boxes, is generated from the given instance masks and applied to the extracted embedding features. In the tracking stage, object instance masks are aligned by feature similarity and motion consistency using the Hungarian association algorithm. Moreover, object re-identification (ReID) is incorporated to recover ID switches caused by long-term occlusion or missing detection. Overall, when evaluated on the MOTS20 and KITTI-MOTS dataset, our proposed method won the first place in Track 3 of the BMTT Challenge in CVPR2020 workshops.

* The 5th Benchmarking Multi-Target Tracking (BMTT) Workshop, CVPR 2020

Via

Access Paper or Ask Questions