Abstract:We present hyper-connections, a simple yet effective method that can serve as an alternative to residual connections. This approach specifically addresses common drawbacks observed in residual connection variants, such as the seesaw effect between gradient vanishing and representation collapse. Theoretically, hyper-connections allow the network to adjust the strength of connections between features at different depths and dynamically rearrange layers. We conduct experiments focusing on the pre-training of large language models, including dense and sparse models, where hyper-connections show significant performance improvements over residual connections. Additional experiments conducted on vision tasks also demonstrate similar improvements. We anticipate that this method will be broadly applicable and beneficial across a wide range of AI problems.
Abstract:Open set recognition enables deep neural networks (DNNs) to identify samples of unknown classes, while maintaining high classification accuracy on samples of known classes. Existing methods basing on auto-encoder (AE) and prototype learning show great potential in handling this challenging task. In this study, we propose a novel method, called Class-Specific Semantic Reconstruction (CSSR), that integrates the power of AE and prototype learning. Specifically, CSSR replaces prototype points with manifolds represented by class-specific AEs. Unlike conventional prototype-based methods, CSSR models each known class on an individual AE manifold, and measures class belongingness through AE's reconstruction error. Class-specific AEs are plugged into the top of the DNN backbone and reconstruct the semantic representations learned by the DNN instead of the raw image. Through end-to-end learning, the DNN and the AEs boost each other to learn both discriminative and representative information. The results of experiments conducted on multiple datasets show that the proposed method achieves outstanding performance in both close and open set recognition and is sufficiently simple and flexible to incorporate into existing frameworks.