6 months ago

Abstract

This research addresses the crucial challenge of effectively measuring threats in social media comments targeting voting, public officials, and institutions in the United States. Our understanding of these online threats and their links to real-world risks is limited, making it difficult to assess their seriousness. To overcome these limitations, we propose a comprehensive threat level scale from 0 to 5 and collect a dataset of 1.3 million Telegram responses for developing and rigorously testing these threat levels. Additionally, we explore OpenAI-human annotation to efficiently label this vast dataset. Our innovative two-step transfer learning approach initially employs a pre-existing, pre-trained model for labeling, followed by expert validation. Next, we use the AI-annotated samples to develop independent models, and expert annotators verify their predictions. Notably, our findings demonstrate that the GPT-2 model, despite its fewer annotated training set, performs comparably to OpenAI's anno-tations, showcasing its potential for cost-effective threat detection with more annotated samples. With the long-term objective of establishing continuous threat-level monitoring, we identify the strengths and limitations of our current approach and propose a roadmap for enhancing threat detection.

Source PDF