Abstract:Automated evaluation of text generation systems has recently seen increasing attention, particularly checking whether generated text stays truthful to input sources. Existing methods frequently rely on an evaluation using task-specific language models, which in turn allows for little interpretability of generated scores. We introduce SRLScore, a reference-free evaluation metric designed with text summarization in mind. Our approach generates fact tuples constructed from Semantic Role Labels, applied to both input and summary texts. A final factuality score is computed by an adjustable scoring mechanism, which allows for easy adaption of the method across domains. Correlation with human judgments on English summarization datasets shows that SRLScore is competitive with state-of-the-art methods and exhibits stable generalization across datasets without requiring further training or hyperparameter tuning. We experiment with an optional co-reference resolution step, but find that the performance boost is mostly outweighed by the additional compute required. Our metric is available online at https://github.com/heyjing/SRLScore.
Abstract:With recent advancements in the area of Natural Language Processing, the focus is slowly shifting from a purely English-centric view towards more language-specific solutions, including German. Especially practical for businesses to analyze their growing amount of textual data are text summarization systems, which transform long input documents into compressed and more digestible summary texts. In this work, we assess the particular landscape of German abstractive text summarization and investigate the reasons why practically useful solutions for abstractive text summarization are still absent in industry. Our focus is two-fold, analyzing a) training resources, and b) publicly available summarization systems. We are able to show that popular existing datasets exhibit crucial flaws in their assumptions about the original sources, which frequently leads to detrimental effects on system generalization and evaluation biases. We confirm that for the most popular training dataset, MLSUM, over 50% of the training set is unsuitable for abstractive summarization purposes. Furthermore, available systems frequently fail to compare to simple baselines, and ignore more effective and efficient extractive summarization approaches. We attribute poor evaluation quality to a variety of different factors, which are investigated in more detail in this work: A lack of qualitative (and diverse) gold data considered for training, understudied (and untreated) positional biases in some of the existing datasets, and the lack of easily accessible and streamlined pre-processing strategies or analysis tools. We provide a comprehensive assessment of available models on the cleaned datasets, and find that this can lead to a reduction of more than 20 ROUGE-1 points during evaluation. The code for dataset filtering and reproducing results can be found online at https://github.com/dennlinger/summaries
Abstract:In this letter, we present a robust, real-time, inertial navigation system (INS)-Centric GNSS-Visual-Inertial navigation system (IC-GVINS) for wheeled robot, in which the precise INS is fully utilized in both the state estimation and visual process. To improve the system robustness, the INS information is employed during the whole keyframe-based visual process, with strict outlier-culling strategy. GNSS is adopted to perform an accurate and convenient initialization of the IC-GVINS, and is further employed to achieve absolute positioning in large-scale environments. The IMU, visual, and GNSS measurements are tightly fused within the framework of factor graph optimization. Dedicated experiments were conducted to evaluate the robustness and accuracy of the IC-GVINS on a wheeled robot. The IC-GVINS demonstrates superior robustness in various visual-degenerated scenes with moving objects. Compared to the state-of-the-art visual-inertial navigation systems, the proposed method yields improved robustness and accuracy in various environments. We open source our codes combined with the dataset on GitHub
Abstract:In light of the success of transferring language models into NLP tasks, we ask whether the full BERT model is always the best and does it exist a simple but effective method to find the winning ticket in state-of-the-art deep neural networks without complex calculations. We construct a series of BERT-based models with different size and compare their predictions on 8 binary classification tasks. The results show there truly exist smaller sub-networks performing better than the full model. Then we present a further study and propose a simple method to shrink BERT appropriately before fine-tuning. Some extended experiments indicate that our method could save time and storage overhead extraordinarily with little even no accuracy loss.
Abstract:Inertial measurement unit (IMU) preintegration is widely used in factor graph optimization (FGO); e.g., in visual-inertial navigation system and global navigation satellite system/inertial navigation system (GNSS/INS) integration. However, most existing IMU preintegration models ignore the Earth's rotation and lack delicate integration processes, and these limitations severely degrade the INS accuracy. In this study, we construct a refined IMU preintegration model that incorporates the Earth's rotation, and analytically compute the covariance and Jacobian matrix. To mitigate the impact caused by sensors other than IMU in the evaluation system, FGO-based GNSS/INS integration is adopted to quantitatively evaluate the accuracy of the refined preintegration. Compared to a classic filtering-based GNSS/INS integration baseline, the employed FGO-based integration using the refined preintegration yields the same accuracy. In contrast, the existing rough preintegration yields significant accuracy degradation. The performance difference between the refined and rough preintegration models can exceed 200% for an industrial-grade MEMS module and 10% for a consumer-grade MEMS chip. Clearly, the Earth's rotation is the major factor to be considered in IMU preintegration in order to maintain the IMU precision, even for a consumer-grade IMU.
Abstract:Autonomous driving is becoming a future practical lifestyle greatly driven by deep learning. Specifically, an effective traffic sign detection by deep learning plays a critical role for it. However, different countries have different sets of traffic signs, making localized traffic sign recognition model training a tedious and daunting task. To address the issues of taking amount of time to compute complicate algorithm and low ratio of detecting blurred and sub-pixel images of localized traffic signs, we propose Multi-Scale Deconvolution Networks (MDN), which flexibly combines multi-scale convolutional neural network with deconvolution sub-network, leading to efficient and reliable localized traffic sign recognition model training. It is demonstrated that the proposed MDN is effective compared with classical algorithms on the benchmarks of the localized traffic sign, such as Chinese Traffic Sign Dataset (CTSD), and the German Traffic Sign Benchmarks (GTSRB).