Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feifei Li

InfoCons: Identifying Interpretable Critical Concepts in Point Clouds via Information Theory

May 26, 2025

Feifei Li, Mi Zhang, Zhaoxiang Wang, Min Yang

Abstract:Interpretability of point cloud (PC) models becomes imperative given their deployment in safety-critical scenarios such as autonomous vehicles. We focus on attributing PC model outputs to interpretable critical concepts, defined as meaningful subsets of the input point cloud. To enable human-understandable diagnostics of model failures, an ideal critical subset should be *faithful* (preserving points that causally influence predictions) and *conceptually coherent* (forming semantically meaningful structures that align with human perception). We propose InfoCons, an explanation framework that applies information-theoretic principles to decompose the point cloud into 3D concepts, enabling the examination of their causal effect on model predictions with learnable priors. We evaluate InfoCons on synthetic datasets for classification, comparing it qualitatively and quantitatively with four baselines. We further demonstrate its scalability and flexibility on two real-world datasets and in two applications that utilize critical scores of PC.

* Accepted by ICML 2025 (Poster)

Via

Access Paper or Ask Questions

Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization

Mar 19, 2025

Feifei Li, Mi Zhang, Yiming Sun, Min Yang

Abstract:Text-to-image diffusion models have achieved state-of-the-art results in synthesis tasks; however, there is a growing concern about their potential misuse in creating harmful content. To mitigate these risks, post-hoc model intervention techniques, such as concept unlearning and safety guidance, have been developed. However, fine-tuning model weights or adapting the hidden states of the diffusion model operates in an uninterpretable way, making it unclear which part of the intermediate variables is responsible for unsafe generation. These interventions severely affect the sampling trajectory when erasing harmful concepts from complex, multi-concept prompts, thus hindering their practical use in real-world settings. In this work, we propose the safe generation framework Detect-and-Guide (DAG), leveraging the internal knowledge of diffusion models to perform self-diagnosis and fine-grained self-regulation during the sampling process. DAG first detects harmful concepts from noisy latents using refined cross-attention maps of optimized tokens, then applies safety guidance with adaptive strength and editing regions to negate unsafe generation. The optimization only requires a small annotated dataset and can provide precise detection maps with generalizability and concept specificity. Moreover, DAG does not require fine-tuning of diffusion models, and therefore introduces no loss to their generation diversity. Experiments on erasing sexual content show that DAG achieves state-of-the-art safe generation performance, balancing harmfulness mitigation and text-following performance on multi-concept real-world prompts.

* CVPR25

Via

Access Paper or Ask Questions

PoI: Pixel of Interest for Novel View Synthesis Assisted Scene Coordinate Regression

Feb 07, 2025

Feifei Li, Qi Song, Chi Zhang, Hui Shuai, Rui Huang

Abstract:The task of estimating camera poses can be enhanced through novel view synthesis techniques such as NeRF and Gaussian Splatting to increase the diversity and extension of training data. However, these techniques often produce rendered images with issues like blurring and ghosting, which compromise their reliability. These issues become particularly pronounced for Scene Coordinate Regression (SCR) methods, which estimate 3D coordinates at the pixel level. To mitigate the problems associated with unreliable rendered images, we introduce a novel filtering approach, which selectively extracts well-rendered pixels while discarding the inferior ones. This filter simultaneously measures the SCR model's real-time reprojection loss and gradient during training. Building on this filtering technique, we also develop a new strategy to improve scene coordinate regression using sparse inputs, drawing on successful applications of sparse input techniques in novel view synthesis. Our experimental results validate the effectiveness of our method, demonstrating state-of-the-art performance on indoor and outdoor datasets.

Via

Access Paper or Ask Questions

Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Dec 20, 2024

Jun Ma, Feifei Li, Sumin Kim, Reza Asakereh, Bao-Hiep Le, Dang-Khoa Nguyen-Vu, Alexander Pfefferle, Muxin Wei, Ruochen Gao, Donghang Lyu(+72 more)

Figure 1 for Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Figure 2 for Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Figure 3 for Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Abstract:Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spanning nine common imaging modalities from over 20 different institutions. The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline that substantially reduced computational requirements while maintaining state-of-the-art segmentation accuracy. Moreover, the post-challenge phase advanced the algorithms through the design of performance booster and reproducibility tasks, resulting in improved algorithms and validated reproducibility of the winning solution. Furthermore, the best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption. The data and code are publicly available to foster the further development of medical image segmentation foundation models and pave the way for impactful real-world applications.

* CVPR 2024 MedSAM on Laptop Competition Summary: https://www.codabench.org/competitions/1847/

Via

Access Paper or Ask Questions

A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

Nov 07, 2024

Panwen Hu, Nan Xiao, Feifei Li, Yongquan Chen, Rui Huang

Figure 1 for A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

Figure 2 for A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

Figure 3 for A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

Figure 4 for A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

Abstract:In this era of videos, automatic video editing techniques attract more and more attention from industry and academia since they can reduce workloads and lower the requirements for human editors. Existing automatic editing systems are mainly scene- or event-specific, e.g., soccer game broadcasting, yet the automatic systems for general editing, e.g., movie or vlog editing which covers various scenes and events, were rarely studied before, and converting the event-driven editing method to a general scene is nontrivial. In this paper, we propose a two-stage scheme for general editing. Firstly, unlike previous works that extract scene-specific features, we leverage the pre-trained Vision-Language Model (VLM) to extract the editing-relevant representations as editing context. Moreover, to close the gap between the professional-looking videos and the automatic productions generated with simple guidelines, we propose a Reinforcement Learning (RL)-based editing framework to formulate the editing problem and train the virtual editor to make better sequential editing decisions. Finally, we evaluate the proposed method on a more general editing task with a real movie dataset. Experimental results demonstrate the effectiveness and benefits of the proposed context representation and the learning ability of our RL-based editing framework.

Via

Access Paper or Ask Questions

CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

Nov 01, 2024

Ziting Wang, Haitao Yuan, Wei Dong, Gao Cong, Feifei Li

Figure 1 for CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

Figure 2 for CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

Figure 3 for CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

Figure 4 for CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

Abstract:Large Language Models (LLMs) have demonstrated remarkable generation capabilities but often struggle to access up-to-date information, which can lead to hallucinations. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating knowledge from external databases, enabling more accurate and relevant responses. Due to the context window constraints of LLMs, it is impractical to input the entire external database context directly into the model. Instead, only the most relevant information, referred to as chunks, is selectively retrieved. However, current RAG research faces three key challenges. First, existing solutions often select each chunk independently, overlooking potential correlations among them. Second, in practice the utility of chunks is non-monotonic, meaning that adding more chunks can decrease overall utility. Traditional methods emphasize maximizing the number of included chunks, which can inadvertently compromise performance. Third, each type of user query possesses unique characteristics that require tailored handling, an aspect that current approaches do not fully consider. To overcome these challenges, we propose a cost constrained retrieval optimization system CORAG for retrieval-augmented generation. We employ a Monte Carlo Tree Search (MCTS) based policy framework to find optimal chunk combinations sequentially, allowing for a comprehensive consideration of correlations among chunks. Additionally, rather than viewing budget exhaustion as a termination condition, we integrate budget constraints into the optimization of chunk combinations, effectively addressing the non-monotonicity of chunk utility.

Via

Access Paper or Ask Questions

ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object

Oct 14, 2024

Jiwei Chen, Laiyan Ding, Chi Zhang, Feifei Li, Rui Huang

Figure 1 for ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object

Figure 2 for ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object

Figure 3 for ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object

Figure 4 for ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object

Abstract:Vision-based BEV (Bird-Eye-View) 3D object detection has recently become popular in autonomous driving. However, objects with a high similarity to the background from a camera perspective cannot be detected well by existing methods. In this paper, we propose 2D Region-oriented Attention for a BEV-based 3D Object Detection Network (ROA-BEV), which can make the backbone focus more on feature learning in areas where objects may exist. Moreover, our method increases the information content of ROA through a multi-scale structure. In addition, every block of ROA utilizes a large kernel to ensure that the receptive field is large enough to catch large objects' information. Experiments on nuScenes show that ROA-BEV improves the performance based on BEVDet and BEVDepth. The code will be released soon.

Via

Access Paper or Ask Questions

Segment Anything in Medical Images and Videos: Benchmark and Deployment

Aug 06, 2024

Jun Ma, Sumin Kim, Feifei Li, Mohammed Baharoon, Reza Asakereh, Hongwei Lyu, Bo Wang

Figure 1 for Segment Anything in Medical Images and Videos: Benchmark and Deployment

Figure 2 for Segment Anything in Medical Images and Videos: Benchmark and Deployment

Figure 3 for Segment Anything in Medical Images and Videos: Benchmark and Deployment

Figure 4 for Segment Anything in Medical Images and Videos: Benchmark and Deployment

Abstract:Recent advances in segmentation foundation models have enabled accurate and efficient segmentation across a wide range of natural images and videos, but their utility to medical data remains unclear. In this work, we first present a comprehensive benchmarking of the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos and point out its strengths and weaknesses by comparing it to SAM1 and MedSAM. Then, we develop a transfer learning pipeline and demonstrate SAM2 can be quickly adapted to medical domain by fine-tuning. Furthermore, we implement SAM2 as a 3D slicer plugin and Gradio API for efficient 3D image and video segmentation. The code has been made publicly available at \url{https://github.com/bowang-lab/MedSAM}.

Via

Access Paper or Ask Questions

Unleash the Power of Ellipsis: Accuracy-enhanced Sparse Vector Technique with Exponential Noise

Jul 29, 2024

Yuhan Liu, Sheng Wang, Yixuan Liu, Feifei Li, Hong Chen

Abstract:The Sparse Vector Technique (SVT) is one of the most fundamental tools in differential privacy (DP). It works as a backbone for adaptive data analysis by answering a sequence of queries on a given dataset, and gleaning useful information in a privacy-preserving manner. Unlike the typical private query releases that directly publicize the noisy query results, SVT is less informative -- it keeps the noisy query results to itself and only reveals a binary bit for each query, indicating whether the query result surpasses a predefined threshold. To provide a rigorous DP guarantee for SVT, prior works in the literature adopt a conservative privacy analysis by assuming the direct disclosure of noisy query results as in typical private query releases. This approach, however, hinders SVT from achieving higher query accuracy due to an overestimation of the privacy risks, which further leads to an excessive noise injection using the Laplacian or Gaussian noise for perturbation. Motivated by this, we provide a new privacy analysis for SVT by considering its less informative nature. Our analysis results not only broaden the range of applicable noise types for perturbation in SVT, but also identify the exponential noise as optimal among all evaluated noises (which, however, is usually deemed non-applicable in prior works). The main challenge in applying exponential noise to SVT is mitigating the sub-optimal performance due to the bias introduced by noise distributions. To address this, we develop a utility-oriented optimal threshold correction method and an appending strategy, which enhances the performance of SVT by increasing the precision and recall, respectively. The effectiveness of our proposed methods is substantiated both theoretically and empirically, demonstrating significant improvements up to $50\%$ across evaluated metrics.

Via

Access Paper or Ask Questions

Optimal Matrix Sketching over Sliding Windows

May 13, 2024

Hanyan Yin, Dongxie Wen, Jiajun Li, Zhewei Wei, Xiao Zhang, Zengfeng Huang, Feifei Li

Figure 1 for Optimal Matrix Sketching over Sliding Windows

Figure 2 for Optimal Matrix Sketching over Sliding Windows

Figure 3 for Optimal Matrix Sketching over Sliding Windows

Figure 4 for Optimal Matrix Sketching over Sliding Windows

Abstract:Matrix sketching, aimed at approximating a matrix $\boldsymbol{A} \in \mathbb{R}^{N\times d}$ consisting of vector streams of length $N$ with a smaller sketching matrix $\boldsymbol{B} \in \mathbb{R}^{\ell\times d}, \ell \ll N$, has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound and provides a covariance error guarantee of $\varepsilon = \lVert \boldsymbol{A}^\top \boldsymbol{A} - \boldsymbol{B}^\top \boldsymbol{B} \rVert_2/\lVert \boldsymbol{A} \rVert_F^2$. The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix $\boldsymbol{A}_W$, formed by input vectors over the most recent $N$ time units. However, despite recent efforts, whether achieving the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound on sliding windows is possible has remained an open question. In this paper, we introduce the DS-FD algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of \dsfd across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically.

Via

Access Paper or Ask Questions