Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Topic Detection and Tracking with Time-Aware Document Embeddings

Dec 12, 2021

Hang Jiang, Doug Beeferman, Weiquan Mao, Deb Roy

Figure 1 for Topic Detection and Tracking with Time-Aware Document Embeddings

Figure 2 for Topic Detection and Tracking with Time-Aware Document Embeddings

Figure 3 for Topic Detection and Tracking with Time-Aware Document Embeddings

Figure 4 for Topic Detection and Tracking with Time-Aware Document Embeddings

Share this with someone who'll enjoy it:

Abstract:The time at which a message is communicated is a vital piece of metadata in many real-world natural language processing tasks such as Topic Detection and Tracking (TDT). TDT systems aim to cluster a corpus of news articles by event, and in that context, stories that describe the same event are likely to have been written at around the same time. Prior work on time modeling for TDT takes this into account, but does not well capture how time interacts with the semantic nature of the event. For example, stories about a tropical storm are likely to be written within a short time interval, while stories about a movie release may appear over weeks or months. In our work, we design a neural method that fuses temporal and textual information into a single representation of news documents for event detection. We fine-tune these time-aware document embeddings with a triplet loss architecture, integrate the model into downstream TDT systems, and evaluate the systems on two benchmark TDT data sets in English. In the retrospective setting, we apply clustering algorithms to the time-aware embeddings and show substantial improvements over baselines on the News2013 data set. In the online streaming setting, we add our document encoder to an existing state-of-the-art TDT pipeline and demonstrate that it can benefit the overall performance. We conduct ablation studies on the time representation and fusion algorithm strategies, showing that our proposed model outperforms alternative strategies. Finally, we probe the model to examine how it handles recurring events more effectively than previous TDT systems.

View paper on

Share this with someone who'll enjoy it:

Title:Topic Detection and Tracking with Time-Aware Document Embeddings

Paper and Code