Abstract:There has been growing interest in facial video-based remote photoplethysmography (rPPG) measurement recently, with a focus on assessing various vital signs such as heart rate and heart rate variability. Despite previous efforts on static datasets, their approaches have been hindered by inaccurate region of interest (ROI) localization and motion issues, and have shown limited generalization in real-world scenarios. To address these challenges, we propose a novel masked attention regularization (MAR-rPPG) framework that mitigates the impact of ROI localization and complex motion artifacts. Specifically, our approach first integrates a masked attention regularization mechanism into the rPPG field to capture the visual semantic consistency of facial clips, while it also employs a masking technique to prevent the model from overfitting on inaccurate ROIs and subsequently degrading its performance. Furthermore, we propose an enhanced rPPG expert aggregation (EREA) network as the backbone to obtain rPPG signals and attention maps simultaneously. Our EREA network is capable of discriminating divergent attentions from different facial areas and retaining the consistency of spatiotemporal attention maps. For motion robustness, a simple open source detector MediaPipe for data preprocessing is sufficient for our framework due to its superior capability of rPPG signal extraction and attention regularization. Exhaustive experiments on three benchmark datasets (UBFC-rPPG, PURE, and MMPD) substantiate the superiority of our proposed method, outperforming recent state-of-the-art works by a considerable margin.
Abstract:Financial assets exhibit complex dependency structures, which are crucial for investors to create diversified portfolios to mitigate risk in volatile financial markets. To explore the financial asset dependencies dynamics, we propose a novel approach that models the dependencies of assets as an Asset Dependency Matrix (ADM) and treats the ADM sequences as image sequences. This allows us to leverage deep learning-based video prediction methods to capture the spatiotemporal dependencies among assets. However, unlike images where neighboring pixels exhibit explicit spatiotemporal dependencies due to the natural continuity of object movements, assets in ADM do not have a natural order. This poses challenges to organizing the relational assets to reveal better the spatiotemporal dependencies among neighboring assets for ADM forecasting. To tackle the challenges, we propose the Asset Dependency Neural Network (ADNN), which employs the Convolutional Long Short-Term Memory (ConvLSTM) network, a highly successful method for video prediction. ADNN can employ static and dynamic transformation functions to optimize the representations of the ADM. Through extensive experiments, we demonstrate that our proposed framework consistently outperforms the baselines in the ADM prediction and downstream application tasks. This research contributes to understanding and predicting asset dependencies, offering valuable insights for financial market participants.
Abstract:Our comprehension of biological neuronal networks has profoundly influenced the evolution of artificial neural networks (ANNs). However, the neurons employed in ANNs exhibit remarkable deviations from their biological analogs, mainly due to the absence of complex dendritic trees encompassing local nonlinearity. Despite such disparities, previous investigations have demonstrated that point neurons can functionally substitute dendritic neurons in executing computational tasks. In this study, we scrutinized the importance of nonlinear dendrites within neural networks. By employing machine-learning methodologies, we assessed the impact of dendritic structure nonlinearity on neural network performance. Our findings reveal that integrating dendritic structures can substantially enhance model capacity and performance while keeping signal communication costs effectively restrained. This investigation offers pivotal insights that hold considerable implications for the development of future neural network accelerators.
Abstract:Adaptive network pruning approach has recently drawn significant attention due to its excellent capability to identify the importance and redundancy of layers and filters and customize a suitable pruning solution. However, it remains unsatisfactory since current adaptive pruning methods rely mostly on an additional monitor to score layer and filter importance, and thus faces high complexity and weak interpretability. To tackle these issues, we have deeply researched the weight reconstruction process in iterative prune-train process and propose a Protective Self-Adaptive Pruning (PSAP) method. First of all, PSAP can utilize its own information, weight sparsity ratio, to adaptively adjust pruning ratio of layers before each pruning step. Moreover, we propose a protective reconstruction mechanism to prevent important filters from being pruned through supervising gradients and to avoid unrecoverable information loss as well. Our PSAP is handy and explicit because it merely depends on weights and gradients of model itself, instead of requiring an additional monitor as in early works. Experiments on ImageNet and CIFAR-10 also demonstrate its superiority to current works in both accuracy and compression ratio, especially for compressing with a high ratio or pruning from scratch.
Abstract:It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis. In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls. The cluster internal evaluation index based on hyper-balls(HCVI) is defined. Moreover, a general method for determining the optimal number of clusters based on HCVI is proposed. The proposed methods can evaluate the clustering results produced by the several classic methods and determine the optimal cluster number for data sets containing noises and clusters with arbitrary shapes. The experimental results on synthetic and real data sets indicate that the new index outperforms existing ones.
Abstract:Traditional recommender systems are typically passive in that they try to adapt their recommendations to the user's historical interests. However, it is highly desirable for commercial applications, such as e-commerce, advertisement placement, and news portals, to be able to expand the users' interests so that they would accept items that they were not originally aware of or interested in to increase customer interactions. In this paper, we present Influential Recommender System (IRS), a new recommendation paradigm that aims to proactively lead a user to like a given objective item by progressively recommending to the user a sequence of carefully selected items (called an influence path). We propose the Influential Recommender Network (IRN), which is a Transformer-based sequential model to encode the items' sequential dependencies. Since different people react to external influences differently, we introduce the Personalized Impressionability Mask (PIM) to model how receptive a user is to external influence to generate the most effective influence path for the user. To evaluate IRN, we design several performance metrics to measure whether or not the influence path can smoothly expand the user interest to include the objective item while maintaining the user's satisfaction with the recommendation. Experimental results show that IRN significantly outperforms the baseline recommenders and demonstrates its capability of influencing users' interests.
Abstract:Neural Architecture Search (NAS) is an automated architecture engineering method for deep learning design automation, which serves as an alternative to the manual and error-prone process of model development, selection, evaluation and performance estimation. However, one major obstacle of NAS is the extremely demanding computation resource requirements and time-consuming iterations particularly when the dataset scales. In this paper, targeting at the emerging vision transformer (ViT), we present NasHD, a hyperdimensional computing based supervised learning model to rank the performance given the architectures and configurations. Different from other learning based methods, NasHD is faster thanks to the high parallel processing of HDC architecture. We also evaluated two HDC encoding schemes: Gram-based and Record-based of NasHD on their performance and efficiency. On the VIMER-UFO benchmark dataset of 8 applications from a diverse range of domains, NasHD Record can rank the performance of nearly 100K vision transformer models with about 1 minute while still achieving comparable results with sophisticated models.
Abstract:For an autonomous robotic system, monitoring surgeon actions and assisting the main surgeon during a procedure can be very challenging. The challenges come from the peculiar structure of the surgical scene, the greater similarity in appearance of actions performed via tools in a cavity compared to, say, human actions in unconstrained environments, as well as from the motion of the endoscopic camera. This paper presents ESAD, the first large-scale dataset designed to tackle the problem of surgeon action detection in endoscopic minimally invasive surgery. ESAD aims at contributing to increase the effectiveness and reliability of surgical assistant robots by realistically testing their awareness of the actions performed by a surgeon. The dataset provides bounding box annotation for 21 action classes on real endoscopic video frames captured during prostatectomy, and was used as the basis of a recent MIDL 2020 challenge. We also present an analysis of the dataset conducted using the baseline model which was released as part of the challenge, and a description of the top performing models submitted to the challenge together with the results they obtained. This study provides significant insight into what approaches can be effective and can be extended further. We believe that ESAD will serve in the future as a useful benchmark for all researchers active in surgeon action detection and assistive robotics at large.