Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feng Han

Gemini Embedding: Generalizable Embeddings from Gemini

Mar 10, 2025

Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hernández Ábrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera(+37 more)

Abstract:In this report, we introduce Gemini Embedding, a state-of-the-art embedding model leveraging the power of Gemini, Google's most capable large language model. Capitalizing on Gemini's inherent multilingual and code understanding capabilities, Gemini Embedding produces highly generalizable embeddings for text spanning numerous languages and textual modalities. The representations generated by Gemini Embedding can be precomputed and applied to a variety of downstream tasks including classification, similarity, clustering, ranking, and retrieval. Evaluated on the Massive Multilingual Text Embedding Benchmark (MMTEB), which includes over one hundred tasks across 250+ languages, Gemini Embedding substantially outperforms prior state-of-the-art models, demonstrating considerable improvements in embedding quality. Achieving state-of-the-art performance across MMTEB's multilingual, English, and code benchmarks, our unified model demonstrates strong capabilities across a broad selection of tasks and surpasses specialized domain-specific models.

* 19 pages

Via

Access Paper or Ask Questions

DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

Jan 02, 2025

Feng Han, Kai Chen, Chao Gong, Zhipeng Wei, Jingjing Chen, Yu-Gang Jiang

Figure 1 for DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

Figure 2 for DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

Figure 3 for DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

Figure 4 for DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

Abstract:The exceptional generative capability of text-to-image models has raised substantial safety concerns regarding the generation of Not-Safe-For-Work (NSFW) content and potential copyright infringement. To address these concerns, previous methods safeguard the models by eliminating inappropriate concepts. Nonetheless, these models alter the parameters of the backbone network and exert considerable influences on the structural (low-frequency) components of the image, which undermines the model's ability to retain non-target concepts. In this work, we propose our Dual encoder Modulation network (DuMo), which achieves precise erasure of inappropriate target concepts with minimum impairment to non-target concepts. In contrast to previous methods, DuMo employs the Eraser with PRior Knowledge (EPR) module which modifies the skip connection features of the U-NET and primarily achieves concept erasure on details (high-frequency) components of the image. To minimize the damage to non-target concepts during erasure, the parameters of the backbone U-NET are frozen and the prior knowledge from the original skip connection features is introduced to the erasure process. Meanwhile, the phenomenon is observed that distinct erasing preferences for the image structure and details are demonstrated by the EPR at different timesteps and layers. Therefore, we adopt a novel Time-Layer MOdulation process (TLMO) that adjusts the erasure scale of EPR module's outputs across different layers and timesteps, automatically balancing the erasure effects and model's generative ability. Our method achieves state-of-the-art performance on Explicit Content Erasure, Cartoon Concept Removal and Artistic Style Erasure, clearly outperforming alternative methods. Code is available at https://github.com/Maplebb/DuMo

* AAAI 2025 accepted

Via

Access Paper or Ask Questions

Explaining Model Overfitting in CNNs via GMM Clustering

Dec 12, 2024

Hui Dou, Xinyu Mu, Mengjun Yi, Feng Han, Jian Zhao, Furao Shen

Abstract:Convolutional Neural Networks (CNNs) have demonstrated remarkable prowess in the field of computer vision. However, their opaque decision-making processes pose significant challenges for practical applications. In this study, we provide quantitative metrics for assessing CNN filters by clustering the feature maps corresponding to individual filters in the model via Gaussian Mixture Model (GMM). By analyzing the clustering results, we screen out some anomaly filters associated with outlier samples. We further analyze the relationship between the anomaly filters and model overfitting, proposing three hypotheses. This method is universally applicable across diverse CNN architectures without modifications, as evidenced by its successful application to models like AlexNet and LeNet-5. We present three meticulously designed experiments demonstrating our hypotheses from the perspectives of model behavior, dataset characteristics, and filter impacts. Through this work, we offer a novel perspective for evaluating the CNN performance and gain new insights into the operational behavior of model overfitting.

Via

Access Paper or Ask Questions

Hyperspectral Image Cross-Domain Object Detection Method based on Spectral-Spatial Feature Alignment

Nov 25, 2024

Hongqi Zhang, He Sun, Hongmin Gao, Feng Han, Xu Sun, Lianru Gao, Bing Zhang

Figure 1 for Hyperspectral Image Cross-Domain Object Detection Method based on Spectral-Spatial Feature Alignment

Figure 2 for Hyperspectral Image Cross-Domain Object Detection Method based on Spectral-Spatial Feature Alignment

Figure 3 for Hyperspectral Image Cross-Domain Object Detection Method based on Spectral-Spatial Feature Alignment

Figure 4 for Hyperspectral Image Cross-Domain Object Detection Method based on Spectral-Spatial Feature Alignment

Abstract:With consecutive bands in a wide range of wavelengths, hyperspectral images (HSI) have provided a unique tool for object detection task. However, existing HSI object detection methods have not been fully utilized in real applications, which is mainly resulted by the difference of spatial and spectral resolution between the unlabeled target domain and a labeled source domain, i.e. the domain shift of HSI. In this work, we aim to explore the unsupervised cross-domain object detection of HSI. Our key observation is that the local spatial-spectral characteristics remain invariant across different domains. For solving the problem of domain-shift, we propose a HSI cross-domain object detection method based on spectral-spatial feature alignment, which is the first attempt in the object detection community to the best of our knowledge. Firstly, we develop a spectral-spatial alignment module to extract domain-invariant local spatial-spectral features. Secondly, the spectral autocorrelation module has been designed to solve the domain shift in the spectral domain specifically, which can effectively align HSIs with different spectral resolutions. Besides, we have collected and annotated an HSI dataset for the cross-domain object detection. Our experimental results have proved the effectiveness of HSI cross-domain object detection, which has firstly demonstrated a significant and promising step towards HSI cross-domain object detection in the object detection community.

Via

Access Paper or Ask Questions

PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Jun 06, 2024

Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao Zhang

Figure 1 for PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Figure 2 for PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Figure 3 for PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Figure 4 for PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Abstract:Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. However, traditional KD techniques face specific challenges when applied to LLMs, including restricted access to LLM outputs, significant teacher-student capacity gaps, and the inherited mis-calibration issue. In this work, we present PLaD, a novel preference-based LLM distillation framework. PLaD exploits the teacher-student capacity discrepancy to generate pseudo-preference pairs where teacher outputs are preferred over student outputs. Then, PLaD leverages a ranking loss to re-calibrate student's estimation of sequence likelihood, which steers the student's focus towards understanding the relative quality of outputs instead of simply imitating the teacher. PLaD bypasses the need for access to teacher LLM's internal states, tackles the student's expressivity limitations, and mitigates the student mis-calibration issue. Through extensive experiments on two sequence generation tasks and with various LLMs, we demonstrate the effectiveness of our proposed PLaD framework.

* Findings of ACL 2024

Via

Access Paper or Ask Questions

Multi-Scale Dilated Convolution Network for Long-Term Time Series Forecasting

May 09, 2024

Feifei Li, Suhan Guo, Feng Han, Jian Zhao, Furao Shen

Figure 1 for Multi-Scale Dilated Convolution Network for Long-Term Time Series Forecasting

Figure 2 for Multi-Scale Dilated Convolution Network for Long-Term Time Series Forecasting

Figure 3 for Multi-Scale Dilated Convolution Network for Long-Term Time Series Forecasting

Figure 4 for Multi-Scale Dilated Convolution Network for Long-Term Time Series Forecasting

Abstract:Accurate forecasting of long-term time series has important applications for decision making and planning. However, it remains challenging to capture the long-term dependencies in time series data. To better extract long-term dependencies, We propose Multi Scale Dilated Convolution Network (MSDCN), a method that utilizes a shallow dilated convolution architecture to capture the period and trend characteristics of long time series. We design different convolution blocks with exponentially growing dilations and varying kernel sizes to sample time series data at different scales. Furthermore, we utilize traditional autoregressive model to capture the linear relationships within the data. To validate the effectiveness of the proposed approach, we conduct experiments on eight challenging long-term time series forecasting benchmark datasets. The experimental results show that our approach outperforms the prior state-of-the-art approaches and shows significant inference speed improvements compared to several strong baseline methods.

Via

Access Paper or Ask Questions

Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure

Dec 15, 2023

Feng Han, Jingang Yi

Figure 1 for Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure

Figure 2 for Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure

Figure 3 for Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure

Figure 4 for Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure

Abstract:External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneously trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design however leads to uncontrolled robot motion. We present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC modeling structure. Two GP-based learning controllers are presented by using the EIC structure property. The partial EIC (PEIC)-based control design partitions the robotic dynamics into a fully actuated subsystem and one reduced-order underactuated system. The null-space EIC (NEIC)-based control compensates for the uncontrolled motion in a subspace, while the other closed-loop dynamics are not affected. Under the PEIC- and NEIC-based, the tracking and balance tasks are guaranteed and convergence rate and bounded errors are achieved without causing any uncontrolled motion by the original EIC-based control. We validate the results and demonstrate the GP-based learning control design performance using two inverted pendulum platforms.

Via

Access Paper or Ask Questions

Video Summarization: Towards Entity-Aware Captions

Dec 01, 2023

Hammad A. Ayyubi, Tianqi Liu, Arsha Nagrani, Xudong Lin, Mingda Zhang, Anurag Arnab, Feng Han, Yukun Zhu, Jialu Liu, Shih-Fu Chang

Figure 1 for Video Summarization: Towards Entity-Aware Captions

Figure 2 for Video Summarization: Towards Entity-Aware Captions

Figure 3 for Video Summarization: Towards Entity-Aware Captions

Figure 4 for Video Summarization: Towards Entity-Aware Captions

Abstract:Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named entities for meaningful summarization. As such, we propose the task of summarizing news video directly to entity-aware captions. We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task. Further, we propose a method that augments visual information from videos with context retrieved from external world knowledge to generate entity-aware captions. We demonstrate the effectiveness of our approach on three video captioning models. We also show that our approach generalizes to existing news image captions dataset. With all the extensive experiments and insights, we believe we establish a solid basis for future research on this challenging task.

Via

Access Paper or Ask Questions

NeRFTAP: Enhancing Transferability of Adversarial Patches on Face Recognition using Neural Radiance Fields

Nov 29, 2023

Xiaoliang Liu, Furao Shen, Feng Han, Jian Zhao, Changhai Nie

Abstract:Face recognition (FR) technology plays a crucial role in various applications, but its vulnerability to adversarial attacks poses significant security concerns. Existing research primarily focuses on transferability to different FR models, overlooking the direct transferability to victim's face images, which is a practical threat in real-world scenarios. In this study, we propose a novel adversarial attack method that considers both the transferability to the FR model and the victim's face image, called NeRFTAP. Leveraging NeRF-based 3D-GAN, we generate new view face images for the source and target subjects to enhance transferability of adversarial patches. We introduce a style consistency loss to ensure the visual similarity between the adversarial UV map and the target UV map under a 0-1 mask, enhancing the effectiveness and naturalness of the generated adversarial face images. Extensive experiments and evaluations on various FR models demonstrate the superiority of our approach over existing attack techniques. Our work provides valuable insights for enhancing the robustness of FR systems in practical adversarial settings.

Via

Access Paper or Ask Questions

Cascaded Nonlinear Control Design for Highly Underactuated Balance Robots

Oct 02, 2023

Feng Han, Jingang Yi

Abstract:This paper presents a nonlinear control design for highly underactuated balance robots, which possess more numbers of unactuated degree-of-freedom (DOF) than actuated ones. To address the challenge of simultaneously trajectory tracking of actuated coordinates and balancing of unactuated coordinates, the proposed control converts a robot dynamics into a series of cascaded subsystems and each of them is considered virtually actuated. To achieve the control goal, we sequentially design and update the virtual and actual control inputs to incorporate the balance task such that the unactuated coordinates are balanced to their instantaneous equilibrium. The closed-loop dynamics are shown to be stable and the tracking errors exponentially converge towards a neighborhood near the origin. The simulation results demonstrate the effectiveness of the proposed control design by using a triple-inverted pendulum cart system.

Via

Access Paper or Ask Questions