Abstract:Temporal action detection (TAD), which locates and recognizes action segments, remains a challenging task in video understanding due to variable segment lengths and ambiguous boundaries. Existing methods treat neighboring contexts of an action segment indiscriminately, leading to imprecise boundary predictions. We introduce a single-stage ContextDet framework, which makes use of large-kernel convolutions in TAD for the first time. Our model features a pyramid adaptive context aggragation (ACA) architecture, capturing long context and improving action discriminability. Each ACA level consists of two novel modules. The context attention module (CAM) identifies salient contextual information, encourages context diversity, and preserves context integrity through a context gating block (CGB). The long context module (LCM) makes use of a mixture of large- and small-kernel convolutions to adaptively gather long-range context and fine-grained local features. Additionally, by varying the length of these large kernels across the ACA pyramid, our model provides lightweight yet effective context aggregation and action discrimination. We conducted extensive experiments and compared our model with a number of advanced TAD methods on six challenging TAD benchmarks: MultiThumos, Charades, FineAction, EPIC-Kitchens 100, Thumos14, and HACS, demonstrating superior accuracy at reduced inference speed.
Abstract:Electroencephalography (EEG) provides reliable indications of human cognition and mental states. Accurate emotion recognition from EEG remains challenging due to signal variations among individuals and across measurement sessions. To address these challenges, we introduce a multi-source dynamic contrastive domain adaptation method (MS-DCDA), which models coarse-grained inter-domain and fine-grained intra-class adaptations through a multi-branch contrastive neural network and contrastive sub-domain discrepancy learning. Our model leverages domain knowledge from each individual source and a complementary source ensemble and uses dynamically weighted learning to achieve an optimal tradeoff between domain transferability and discriminability. The proposed MS-DCDA model was evaluated using the SEED and SEED-IV datasets, achieving respectively the highest mean accuracies of $90.84\%$ and $78.49\%$ in cross-subject experiments as well as $95.82\%$ and $82.25\%$ in cross-session experiments. Our model outperforms several alternative domain adaptation methods in recognition accuracy, inter-class margin, and intra-class compactness. Our study also suggests greater emotional sensitivity in the frontal and parietal brain lobes, providing insights for mental health interventions, personalized medicine, and development of preventive strategies.
Abstract:The anti-interference capability of wireless links is a physical layer problem for edge computing. Although convolutional codes have inherent error correction potential due to the redundancy introduced in the data, the performance of the convolutional code is drastically degraded due to multipath effects on the channel. In this paper, we propose the use of a Hidden Markov Model (HMM) for the reconstruction of convolutional codes and decoding by the Viterbi algorithm. Furthermore, to implement soft-decision decoding, the observation of HMM is replaced by Gaussian mixture models (GMM). Our method provides superior error correction potential than the standard method because the model parameters contain channel state information (CSI). We evaluated the performance of the method compared to standard Viterbi decoding by numerical simulation. In the multipath channel, the hybrid HMM decoder can achieve a performance gain of 4.7 dB and 2 dB when using hard-decision and soft-decision decoding, respectively. The HMM decoder also achieves significant performance gains for the RSC code, suggesting that the method could be extended to turbo codes.