Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shu Chen

REInstruct: Building Instruction Data from Unlabeled Corpus

Aug 20, 2024

Shu Chen, Xinyan Guan, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

Figure 1 for REInstruct: Building Instruction Data from Unlabeled Corpus

Figure 2 for REInstruct: Building Instruction Data from Unlabeled Corpus

Figure 3 for REInstruct: Building Instruction Data from Unlabeled Corpus

Figure 4 for REInstruct: Building Instruction Data from Unlabeled Corpus

Abstract:Manually annotating instruction data for large language models is difficult, costly, and hard to scale. Meanwhile, current automatic annotation methods typically rely on distilling synthetic data from proprietary LLMs, which not only limits the upper bound of the quality of the instruction data but also raises potential copyright issues. In this paper, we propose REInstruct, a simple and scalable method to automatically build instruction data from an unlabeled corpus without heavy reliance on proprietary LLMs and human annotation. Specifically, REInstruct first selects a subset of unlabeled texts that potentially contain well-structured helpful and insightful content and then generates instructions for these texts. To generate accurate and relevant responses for effective and robust training, REInstruct further proposes a rewriting-based approach to improve the quality of the generated instruction data. By training Llama-7b on a combination of 3k seed data and 32k synthetic data from REInstruct, fine-tuned model achieves a 65.41\% win rate on AlpacaEval leaderboard against text-davinci-003, outperforming other open-source, non-distilled instruction data construction methods. The code is publicly available at \url{https://github.com/cs32963/REInstruct}.

* Accepted by ACL2024 Findings

Via

Access Paper or Ask Questions

Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Jul 25, 2024

Shu Chen, Jinwei Luo, Weike Pan, Jiangxing Yu, Xin Huang, Zhong Ming

Figure 1 for Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Figure 2 for Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Figure 3 for Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Figure 4 for Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Abstract:Sequential recommendation leverages interaction sequences to predict forthcoming user behaviors, crucial for crafting personalized recommendations. However, the true preferences of a user are inherently complex and high-dimensional, while the observed data is merely a simplified and low-dimensional projection of the rich preferences, which often leads to prevalent issues like data sparsity and inaccurate model training. To learn true preferences from the sparse data, most existing works endeavor to introduce some extra information or design some ingenious models. Although they have shown to be effective, extra information usually increases the cost of data collection, and complex models may result in difficulty in deployment. Innovatively, we avoid the use of extra information or alterations to the model; instead, we fill the transformation space between the observed data and the underlying preferences with randomness. Specifically, we propose a novel model-agnostic and highly generic framework for sequential recommendation called sample enrichment via temporary operations on subsequences (SETO), which temporarily and separately enriches the transformation space via sequence enhancement operations with rationality constraints in training. The transformation space not only exists in the process from input samples to preferences but also in preferences to target samples. We highlight our SETO's effectiveness and versatility over multiple representative and state-of-the-art sequential recommendation models (including six single-domain sequential models and two cross-domain sequential models) across multiple real-world datasets (including three single-domain datasets, three cross-domain datasets and a large-scale industry dataset).

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

Technique Report of CVPR 2024 PBDL Challenges

Jun 15, 2024

Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li(+91 more)

Figure 1 for Technique Report of CVPR 2024 PBDL Challenges

Figure 2 for Technique Report of CVPR 2024 PBDL Challenges

Figure 3 for Technique Report of CVPR 2024 PBDL Challenges

Figure 4 for Technique Report of CVPR 2024 PBDL Challenges

Abstract:The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches.

* CVPR 2024 Workshop - PBDL Challenge Report

Via

Access Paper or Ask Questions

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

May 08, 2024

Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai(+39 more)

Figure 1 for MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Figure 2 for MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Figure 3 for MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Figure 4 for MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Abstract:The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.

* MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

Via

Access Paper or Ask Questions

A Survey on Cross-Domain Sequential Recommendation

Jan 19, 2024

Shu Chen, Zitao Xu, Weike Pan, Qiang Yang, Zhong Ming

Abstract:Cross-domain sequential recommendation (CDSR) shifts the modeling of user preferences from flat to stereoscopic by integrating and learning interaction information from multiple domains at different granularities (ranging from inter-sequence to intra-sequence and from single-domain to cross-domain). In this survey, we first define the CDSR problem using a four-dimensional tensor and then analyze its multi-type input representations under multidirectional dimensionality reductions. Following that, we provide a systematic overview from both macro and micro views. From a macro view, we abstract the multi-level fusion structures of various models across domains and discuss their bridges for fusion. From a micro view, focusing on the existing models, we specifically discuss the basic technologies and then explain the auxiliary learning technologies. Finally, we exhibit the available public datasets and the representative experimental results as well as provide some insights into future directions for research in CDSR.

Via

Access Paper or Ask Questions

Siamese Representation Learning for Unsupervised Relation Extraction

Oct 01, 2023

Guangxin Zhang, Shu Chen

Abstract:Unsupervised relation extraction (URE) aims at discovering underlying relations between named entity pairs from open-domain plain text without prior information on relational distribution. Existing URE models utilizing contrastive learning, which attract positive samples and repulse negative samples to promote better separation, have got decent effect. However, fine-grained relational semantic in relationship makes spurious negative samples, damaging the inherent hierarchical structure and hindering performances. To tackle this problem, we propose Siamese Representation Learning for Unsupervised Relation Extraction -- a novel framework to simply leverage positive pairs to representation learning, possessing the capability to effectively optimize relation representation of instances and retain hierarchical information in relational feature space. Experimental results show that our model significantly advances the state-of-the-art results on two benchmark datasets and detailed analyses demonstrate the effectiveness and robustness of our proposed model on unsupervised relation extraction.

* 26th European Conference on Artificial Intelligence ECAI 2023

Via

Access Paper or Ask Questions

SAM-Deblur: Let Segment Anything Boost Image Deblurring

Sep 05, 2023

Siwei Li, Mingxuan Liu, Yating Zhang, Shu Chen, Haoxiang Li, Hong Chen, Zifei Dou

Abstract:Image deblurring is a critical task in the field of image restoration, aiming to eliminate blurring artifacts. However, the challenge of addressing non-uniform blurring leads to an ill-posed problem, which limits the generalization performance of existing deblurring models. To solve the problem, we propose a framework SAM-Deblur, integrating prior knowledge from the Segment Anything Model (SAM) into the deblurring task for the first time. In particular, SAM-Deblur is divided into three stages. First, We preprocess the blurred images, obtain image masks via SAM, and propose a mask dropout method for training to enhance model robustness. Then, to fully leverage the structural priors generated by SAM, we propose a Mask Average Pooling (MAP) unit specifically designed to average SAM-generated segmented areas, serving as a plug-and-play component which can be seamlessly integrated into existing deblurring networks. Finally, we feed the fused features generated by the MAP Unit into the deblurring model to obtain a sharp image. Experimental results on the RealBlurJ, ReloBlur, and REDS datasets reveal that incorporating our methods improves NAFNet's PSNR by 0.05, 0.96, and 7.03, respectively. Code will be available at \href{https://github.com/HPLQAQ/SAM-Deblur}{SAM-Deblur}.

* Under review

Via

Access Paper or Ask Questions

Improving Neural Radiance Fields with Depth-aware Optimization for Novel View Synthesis

Apr 11, 2023

Shu Chen, Junyao Li, Yang Zhang, Beiji Zou

Figure 1 for Improving Neural Radiance Fields with Depth-aware Optimization for Novel View Synthesis

Figure 2 for Improving Neural Radiance Fields with Depth-aware Optimization for Novel View Synthesis

Figure 3 for Improving Neural Radiance Fields with Depth-aware Optimization for Novel View Synthesis

Figure 4 for Improving Neural Radiance Fields with Depth-aware Optimization for Novel View Synthesis

Abstract:With dense inputs, Neural Radiance Fields (NeRF) is able to render photo-realistic novel views under static conditions. Although the synthesis quality is excellent, existing NeRF-based methods fail to obtain moderate three-dimensional (3D) structures. The novel view synthesis quality drops dramatically given sparse input due to the implicitly reconstructed inaccurate 3D-scene structure. We propose SfMNeRF, a method to better synthesize novel views as well as reconstruct the 3D-scene geometry. SfMNeRF leverages the knowledge from the self-supervised depth estimation methods to constrain the 3D-scene geometry during view synthesis training. Specifically, SfMNeRF employs the epipolar, photometric consistency, depth smoothness, and position-of-matches constraints to explicitly reconstruct the 3D-scene structure. Through these explicit constraints and the implicit constraint from NeRF, our method improves the view synthesis as well as the 3D-scene geometry performance of NeRF at the same time. In addition, SfMNeRF synthesizes novel sub-pixels in which the ground truth is obtained by image interpolation. This strategy enables SfMNeRF to include more samples to improve generalization performance. Experiments on two public datasets demonstrate that SfMNeRF surpasses state-of-the-art approaches. Code is available at https://github.com/XTU-PR-LAB/SfMNeRF

Via

Access Paper or Ask Questions

Structure-Aware NeRF without Posed Camera via Epipolar Constraint

Oct 01, 2022

Shu Chen, Yang Zhang, Yaxin Xu, Beiji Zou

Figure 1 for Structure-Aware NeRF without Posed Camera via Epipolar Constraint

Figure 2 for Structure-Aware NeRF without Posed Camera via Epipolar Constraint

Figure 3 for Structure-Aware NeRF without Posed Camera via Epipolar Constraint

Figure 4 for Structure-Aware NeRF without Posed Camera via Epipolar Constraint

Abstract:The neural radiance field (NeRF) for realistic novel view synthesis requires camera poses to be pre-acquired by a structure-from-motion (SfM) approach. This two-stage strategy is not convenient to use and degrades the performance because the error in the pose extraction can propagate to the view synthesis. We integrate the pose extraction and view synthesis into a single end-to-end procedure so they can benefit from each other. For training NeRF models, only RGB images are given, without pre-known camera poses. The camera poses are obtained by the epipolar constraint in which the identical feature in different views has the same world coordinates transformed from the local camera coordinates according to the extracted poses. The epipolar constraint is jointly optimized with pixel color constraint. The poses are represented by a CNN-based deep network, whose input is the related frames. This joint optimization enables NeRF to be aware of the scene's structure that has an improved generalization performance. Extensive experiments on a variety of scenes demonstrate the effectiveness of the proposed approach. Code is available at https://github.com/XTU-PR-LAB/SaNerf.

Via

Access Paper or Ask Questions

Estimation of 3D Human Pose Using Prior Knowledge

May 09, 2021

Shu Chen, Lei Zhang, Beiji Zou

Figure 1 for Estimation of 3D Human Pose Using Prior Knowledge

Figure 2 for Estimation of 3D Human Pose Using Prior Knowledge

Figure 3 for Estimation of 3D Human Pose Using Prior Knowledge

Figure 4 for Estimation of 3D Human Pose Using Prior Knowledge

Abstract:Estimating three-dimensional human poses from the positions of two-dimensional joints has shown promising results.However, using two-dimensional joint coordinates as input loses more information than image-based approaches and results in ambiguity.In order to overcome this problem, we combine bone length and camera parameters with two-dimensional joint coordinates for input.This combination is more discriminative than the two-dimensional joint coordinates in that it can improve the accuracy of the model's prediction depth and alleviate the ambiguity that comes from projecting three-dimensional coordinates into two-dimensional space. Furthermore, we introduce direction constraints which can better measure the difference between the ground truth and the output of the proposed model. The experimental results on the H36M show that the method performed better than other state-of-the-art three-dimensional human pose estimation approaches.

* letter

Via

Access Paper or Ask Questions