Abstract:Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER). Nonetheless, this progress has been mainly tested in well-formatted documents such as news, Wikipedia, or scientific articles. In social media the landscape is different, in which it adds another layer of complexity due to its noisy and dynamic nature. In this paper, we focus on NER in Twitter, one of the largest social media platforms, and construct a new NER dataset, TweetNER7, which contains seven entity types annotated over 11,382 tweets from September 2019 to August 2021. The dataset was constructed by carefully distributing the tweets over time and taking representative trends as a basis. Along with the dataset, we provide a set of language model baselines and perform an analysis on the language model performance on the task, especially analyzing the impact of different time periods. In particular, we focus on three important temporal aspects in our analysis: short-term degradation of NER models over time, strategies to fine-tune a language model over different periods, and self-labeling as an alternative to lack of recently-labeled data. TweetNER7 is released publicly (https://huggingface.co/datasets/tner/tweetner7) along with the models fine-tuned on it (NER models have been integrated into TweetNLP and can be found athttps://github.com/asahi417/tner/tree/master/examples/tweetner7_paper).
Abstract:From a conceptual perspective, 5G technology promises to deliver low latency, high data rate and more reliable connections for the next generations of communication systems. To face these demands, modulation schemes based on Orthogonal Frequency Domain Multiplexing (OFDM) can accommodate these requirements for wireless systems. On the other hand, several hybrid OFDM-based systems such as the Time-Interleaved Block Windowed Burst Orthogonal Frequency Division Multiplexing (TIBWB-OFDM) are capable of achieving even better spectral confinement and power efficiency. This paper addresses to the implementation of the TIBWB-OFDM system in a more realistic and practical wireless link scenarios by addressing the challenges of proper and reliable channel estimation and frame synchronization. We propose to incorporate a preamble formed by optimum correlation training sequences, such as the Zadoff-Chu (ZC) sequences. The added ZC preamble sequence is used to jointly estimate the frame beginning, through signal correlation strategies and a threshold decision device, and acquire the channel state information (CSI), by employing estimators based on the preamble sequence and the transmitted data. The employed receiver estimators show that it is possible to detect the TIBWB-OFDM frame beginning and provide a close BER performance comparatively to the one where the perfect channel is known.