Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junlin Zhang

Event Vision Sensor: A Review

Feb 10, 2025

Xinyue Qin, Junlin Zhang, Wenzhong Bao, Chun Lin, Honglei Chen

Abstract:By monitoring temporal contrast, event-based vision sensors can provide high temporal resolution and low latency while maintaining low power consumption and simplicity in circuit structure. These characteristics have garnered significant attention in both academia and industry. In recent years, the application of back-illuminated (BSI) technology, wafer stacking techniques, and industrial interfaces has brought new opportunities for enhancing the performance of event-based vision sensors. This is evident in the substantial advancements made in reducing noise, improving resolution, and increasing readout rates. Additionally, the integration of these technologies has enhanced the compatibility of event-based vision sensors with current and edge vision systems, providing greater possibilities for their practical applications. This paper will review the progression from neuromorphic engineering to state-of-the-art event-based vision sensor technologies, including their development trends, operating principles, and key features. Moreover, we will delve into the sensitivity of event-based vision sensors and the opportunities and challenges they face in the realm of infrared imaging, providing references for future research and applications.

Via

Access Paper or Ask Questions

Imagen 3

Aug 13, 2024

Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman(+240 more)

Abstract:We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

Via

Access Paper or Ask Questions

Relation Modeling and Distillation for Learning with Noisy Labels

Jun 02, 2024

Xiaming Che, Junlin Zhang, Zhuang Qi, Xin Qi

Abstract:Learning with noisy labels has become an effective strategy for enhancing the robustness of models, which enables models to better tolerate inaccurate data. Existing methods either focus on optimizing the loss function to mitigate the interference from noise, or design procedures to detect potential noise and correct errors. However, their effectiveness is often compromised in representation learning due to the dilemma where models overfit to noisy labels. To address this issue, this paper proposes a relation modeling and distillation framework that models inter-sample relationships via self-supervised learning and employs knowledge distillation to enhance understanding of latent associations, which mitigate the impact of noisy labels. Specifically, the proposed method, termed RMDNet, includes two main modules, where the relation modeling (RM) module implements the contrastive learning technique to learn representations of all data, an unsupervised approach that effectively eliminates the interference of noisy tags on feature extraction. The relation-guided representation learning (RGRL) module utilizes inter-sample relation learned from the RM module to calibrate the representation distribution for noisy samples, which is capable of improving the generalization of the model in the inference phase. Notably, the proposed RMDNet is a plug-and-play framework that can integrate multiple methods to its advantage. Extensive experiments were conducted on two datasets, including performance comparison, ablation study, in-depth analysis and case study. The results show that RMDNet can learn discriminative representations for noisy data, which results in superior performance than the existing methods.

Via

Access Paper or Ask Questions

Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection

Jun 02, 2024

Zhuang Qi, Junlin Zhang, Xin Qi

Figure 1 for Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection

Figure 2 for Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection

Figure 3 for Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection

Figure 4 for Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection

Abstract:Visual tracking fundamentally involves regressing the state of the target in each frame of a video. Despite significant progress, existing regression-based trackers still tend to experience failures and inaccuracies. To enhance the precision of target estimation, this paper proposes a tracking technique based on robust regression. Firstly, we introduce a novel robust linear regression estimator, which achieves favorable performance when the error vector follows i.i.d Gaussian-Laplacian distribution. Secondly, we design an iterative process to quickly solve the problem of outliers. In fact, the coefficients are obtained by Iterative Gradient Descent and Threshold Selection algorithm (IGDTS). In addition, we expend IGDTS to a generative tracker, and apply IGDTS-distance to measure the deviation between the sample and the model. Finally, we propose an update scheme to capture the appearance changes of the tracked object and ensure that the model is updated correctly. Experimental results on several challenging image sequences show that the proposed tracker outperformance existing trackers.

Via

Access Paper or Ask Questions

Comparative Study of Neighbor-based Methods for Local Outlier Detection

May 29, 2024

Zhuang Qi, Junlin Zhang, Xiaming Chen, Xin Qi

Figure 1 for Comparative Study of Neighbor-based Methods for Local Outlier Detection

Figure 2 for Comparative Study of Neighbor-based Methods for Local Outlier Detection

Figure 3 for Comparative Study of Neighbor-based Methods for Local Outlier Detection

Figure 4 for Comparative Study of Neighbor-based Methods for Local Outlier Detection

Abstract:The neighbor-based method has become a powerful tool to handle the outlier detection problem, which aims to infer the abnormal degree of the sample based on the compactness of the sample and its neighbors. However, the existing methods commonly focus on designing different processes to locate outliers in the dataset, while the contributions of different types neighbors to outlier detection has not been well discussed. To this end, this paper studies the neighbor in the existing outlier detection algorithms and a taxonomy is introduced, which uses the three-level components of information, neighbor and methodology to define hybrid methods. This taxonomy can serve as a paradigm where a novel neighbor-based outlier detection method can be proposed by combining different components in this taxonomy. A large number of comparative experiments were conducted on synthetic and real-world datasets in terms of performance comparison and case study, and the results show that reverse K-nearest neighbor based methods achieve promising performance and dynamic selection method is suitable for working in high-dimensional space. Notably, it is verified that rationally selecting components from this taxonomy may create an algorithms superior to existing methods.

Via

Access Paper or Ask Questions

DebCSE: Rethinking Unsupervised Contrastive Sentence Embedding Learning in the Debiasing Perspective

Sep 14, 2023

Pu Miao, Zeyao Du, Junlin Zhang

Abstract:Several prior studies have suggested that word frequency biases can cause the Bert model to learn indistinguishable sentence embeddings. Contrastive learning schemes such as SimCSE and ConSERT have already been adopted successfully in unsupervised sentence embedding to improve the quality of embeddings by reducing this bias. However, these methods still introduce new biases such as sentence length bias and false negative sample bias, that hinders model's ability to learn more fine-grained semantics. In this paper, we reexamine the challenges of contrastive sentence embedding learning from a debiasing perspective and argue that effectively eliminating the influence of various biases is crucial for learning high-quality sentence embeddings. We think all those biases are introduced by simple rules for constructing training data in contrastive learning and the key for contrastive learning sentence embedding is to mimic the distribution of training data in supervised machine learning in unsupervised way. We propose a novel contrastive framework for sentence embedding, termed DebCSE, which can eliminate the impact of these biases by an inverse propensity weighted sampling method to select high-quality positive and negative pairs according to both the surface and semantic similarity between sentences. Extensive experiments on semantic textual similarity (STS) benchmarks reveal that DebCSE significantly outperforms the latest state-of-the-art models with an average Spearman's correlation coefficient of 80.33% on BERTbase.

* ACM International Conference on Information and Knowledge Management(CIKM '23), October 21-25,2023,Birmingham,United Kingdom

Via

Access Paper or Ask Questions

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

May 23, 2023

Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang(+14 more)

Figure 1 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Figure 2 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Figure 3 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Figure 4 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Abstract:We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, BEiT-3, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities, to provide a comprehensive and efficient evaluation tool. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime. For these purposes, the Perception Test introduces 11.6k real-world videos, 23s average length, designed to show perceptually interesting situations, filmed by around 100 participants worldwide. The videos are densely annotated with six types of labels (multiple-choice and grounded video question-answers, object and point tracks, temporal action and sound segments), enabling both language and non-language evaluations. The fine-tuning and validation splits of the benchmark are publicly available (CC-BY license), in addition to a challenge server with a held-out test split. Human baseline results compared to state-of-the-art video QA models show a significant gap in performance (91.4% vs 43.6%), suggesting that there is significant room for improvement in multimodal video understanding. Dataset, baselines code, and challenge server are available at https://github.com/deepmind/perception_test

* 25 pages, 11 figures

Via

Access Paper or Ask Questions

MemoNet:Memorizing Representations of All Cross Features Efficiently via Multi-Hash Codebook Network for CTR Prediction

Nov 03, 2022

Pengtao Zhang, Junlin Zhang

Abstract:New findings in natural language processing(NLP) demonstrate that the strong memorization capability contributes a lot to the success of large language models.This inspires us to explicitly bring an independent memory mechanism into CTR ranking model to learn and memorize all cross features'representations. In this paper,we propose multi-Hash Codebook NETwork(HCNet) as the memory mechanism for efficiently learning and memorizing representations of all cross features in CTR tasks.HCNet uses multi-hash codebook as the main memory place and the whole memory procedure consists of three phases: multi-hash addressing,memory restoring and feature shrinking.HCNet can be regarded as a general module and can be incorporated into any current deep CTR model.We also propose a new CTR model named MemoNet which combines HCNet with a DNN backbone.Extensive experimental results on three public datasets show that MemoNet reaches superior performance over state-of-the-art approaches and validate the effectiveness of HCNet as a strong memory module.Besides, MemoNet shows the prominent feature of big models in NLP,which means we can enlarge the size of codebook in HCNet to sustainably obtain performance gains.Our work demonstrates the importance and feasibility of learning and memorizing representations of all cross features ,which sheds light on a new promising research direction.

Via

Access Paper or Ask Questions

FiBiNet++:Improving FiBiNet by Greatly Reducing Model Size for CTR Prediction

Sep 12, 2022

Pengtao Zhang, Junlin Zhang

Figure 1 for FiBiNet++:Improving FiBiNet by Greatly Reducing Model Size for CTR Prediction

Figure 2 for FiBiNet++:Improving FiBiNet by Greatly Reducing Model Size for CTR Prediction

Figure 3 for FiBiNet++:Improving FiBiNet by Greatly Reducing Model Size for CTR Prediction

Figure 4 for FiBiNet++:Improving FiBiNet by Greatly Reducing Model Size for CTR Prediction

Abstract:Click-Through Rate(CTR) estimation has become one of the most fundamental tasks in many real-world applications and various deep models have been proposed to resolve this problem. Some research has proved that FiBiNet is one of the best performance models and outperforms all other models on Avazu dataset.However, the large model size of FiBiNet hinders its wider applications.In this paper, we propose a novel FiBiNet++ model to redesign FiBiNet's model structure ,which greatly reducess model size while further improves its performance.Extensive experiments on three public datasets show that FiBiNet++ effectively reduces non-embedding model parameters of FiBiNet by 12x to 16x on three datasets and has comparable model size with DNN model which is the smallest one among deep CTR models.On the other hand, FiBiNet++ leads to significant performance improvements compared to state-of-the-art CTR methods,including FiBiNet.

Via

Access Paper or Ask Questions

ContextNet: A Click-Through Rate Prediction Framework Using Contextual information to Refine Feature Embedding

Jul 26, 2021

Zhiqiang Wang, Qingyun She, PengTao Zhang, Junlin Zhang

Figure 1 for ContextNet: A Click-Through Rate Prediction Framework Using Contextual information to Refine Feature Embedding

Figure 2 for ContextNet: A Click-Through Rate Prediction Framework Using Contextual information to Refine Feature Embedding

Figure 3 for ContextNet: A Click-Through Rate Prediction Framework Using Contextual information to Refine Feature Embedding

Figure 4 for ContextNet: A Click-Through Rate Prediction Framework Using Contextual information to Refine Feature Embedding

Abstract:Click-through rate (CTR) estimation is a fundamental task in personalized advertising and recommender systems and it's important for ranking models to effectively capture complex high-order features.Inspired by the success of ELMO and Bert in NLP field, which dynamically refine word embedding according to the context sentence information where the word appears, we think it's also important to dynamically refine each feature's embedding layer by layer according to the context information contained in input instance in CTR estimation tasks. We can effectively capture the useful feature interactions for each feature in this way. In this paper, We propose a novel CTR Framework named ContextNet that implicitly models high-order feature interactions by dynamically refining each feature's embedding according to the input context. Specifically, ContextNet consists of two key components: contextual embedding module and ContextNet block. Contextual embedding module aggregates contextual information for each feature from input instance and ContextNet block maintains each feature's embedding layer by layer and dynamically refines its representation by merging contextual high-order interaction information into feature embedding. To make the framework specific, we also propose two models(ContextNet-PFFN and ContextNet-SFFN) under this framework by introducing linear contextual embedding network and two non-linear mapping sub-network in ContextNet block. We conduct extensive experiments on four real-world datasets and the experiment results demonstrate that our proposed ContextNet-PFFN and ContextNet-SFFN model outperform state-of-the-art models such as DeepFM and xDeepFM significantly.

* arXiv admin note: text overlap with arXiv:2102.07619

Via

Access Paper or Ask Questions