Abstract:Cox models with time-dependent coefficients and covariates are widely used in survival analysis. In high-dimensional settings, sparse regularization techniques are employed for variable selection, but existing methods for time-dependent Cox models lack flexibility in enforcing specific sparsity patterns (i.e., covariate structures). We propose a flexible framework for variable selection in time-dependent Cox models, accommodating complex selection rules. Our method can adapt to arbitrary grouping structures, including interaction selection, temporal, spatial, tree, and directed acyclic graph structures. It achieves accurate estimation with low false alarm rates. We develop the sox package, implementing a network flow algorithm for efficiently solving models with complex covariate structures. Sox offers a user-friendly interface for specifying grouping structures and delivers fast computation. Through examples, including a case study on identifying predictors of time to all-cause death in atrial fibrillation patients, we demonstrate the practical application of our method with specific selection rules.
Abstract:This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL), namely the reward extrapolation error, where the learned reward function may fail to explain the task correctly and misguide the agent in unseen environments due to the intrinsic covariate shift. Leveraging both expert data and lower-quality diverse data, we devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function and utilizing an estimated dynamics model. Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy, based on which we characterize the impact of covariate shift by examining subtle two-tier tradeoffs between the exploitation (on both expert and diverse data) and exploration (on the estimated dynamics model). We show that CLARE can provably alleviate the reward extrapolation error by striking the right exploitation-exploration balance therein. Extensive experiments corroborate the significant performance gains of CLARE over existing state-of-the-art algorithms on MuJoCo continuous control tasks (especially with a small offline dataset), and the learned reward is highly instructive for further learning.
Abstract:Many Electromagnetic time reversal (EMTR)-based fault location methods were proposed in the latest decade. In this paper, we briefly review the EMTR-based fault location method using direct convolution (EMTR-conv) and generalize it to multi-phase transmission lines. Moreover, noting that the parameters of real transmission lines are frequency-dependent, while constant-parameters were often used during the reverse process of EMTR-based methods in the previous studies, we investigate the influence of this simplification to the fault location performance by considering frequency-dependent parameters and lossy ground in the forward process which shows the location error increases as the distance between the observation point and the fault position increases, especially when the ground resistivity is high. Therefore, we propose a correction method to reduce the location error by using double observation points. Numerical experiments are carried out in a 3-phase 300-km transmission line considering different ground resistivities, fault types and fault conditions, which shows the method gives good location errors and works efficiently via direct convolution of the signals collected from the fault and the pre-stored calculated transient signals.
Abstract:Electromagnetic time reversal (EMTR) is drawing increasing interest in short-circuit fault location. In this letter, we investigate the classic EMTR fault location methods and find that it is not necessary to reverse the obtained signal in time which is a standard operation in these methods before injecting it into the network. The effectiveness of EMTR fault location method results from the specific similarity of the transfer functions in the forward and reverse processes. Therefore, we can inject an arbitrary type and length of source in the reverse process to locate the fault. Based on this observation, we propose a new EMTR fault location method using direct convolution. This method is different from the traditional methods, and it only needs to pre-calculate the assumed fault transients for a given network, which can be stored in embedded hardware. The faults can be located efficiently via direct convolution of the signal collected from a fault and the pre-stored calculated transients, even using a fraction of the fault signal.
Abstract:This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality. Baseline systems are provided for popular speech recognition toolkits, namely Athena, ESPnet, Kaldi and Pika.