vor 5 Tagen

Adam Dorian Wong John D. Hastings

Inhaltsverzeichnis

Zusammenfassung

Mobile Geräte stellen ein häufiges Ziel von Cyberkriminellen dar, die über Phishing-Links in SMS (Smishing) agieren, welche auf Domain-Generation-Algorithmen (DGA) zurückgreifen, um ihre bösartige Infrastruktur dynamisch zu rotieren. Trotz dieser Bedrohungslage konzentriert sich die Forschung und Evaluierung von DGA-Verfahren überwiegend auf Malware-Command-and-Control- (C2-) und E-Mail-Phishing-Datensätze, sodass es nach wie vor an belastbaren Erkenntnissen darüber mangelt, wie gut Detektoren in der Lage sind, DGA-getriebene Domänentaktiken jenseits von Unternehmensperimetern zu erkennen, die durch Smishing ausgelöst werden. Diese Arbeit schließt diese Lücke, indem traditionelle sowie auf maschinellem Lernen basierende DGA-Detektoren gegen den neuen, semi-synthetischen Datensatz „Gravity Falls" evaluiert werden. Dieser Datensatz basiert auf Smishing-Links, die zwischen 2022 und 2025 verbreitet wurden. „Gravity Falls" dokumentiert die Entwicklung eines einzelnen Bedrohungsakteurs innerhalb von vier Taktik-Clustern, die sich von kurzen, zufälligen Zeichenketten hin zur Kombination von Wörterbuchbegriffen sowie thematischen „Combo-Squatting"-Varianten entwickelt haben, die zur Erlangung von Zugangsdaten und zur Begehung von Gebühren- bzw. Strafzahlungs-Betrug dienen. Zwei String-Analyseverfahren (Shannon-Entropie und Exp0se) sowie zwei auf maschinellem Lernen basierende Detektoren (ein LSTM-Klassifikator und COSSAS DGAD) wurden unter Verwendung der Top-1-Million-Domänen als harmlose Baseline bewertet. Die Ergebnisse zeigen sich stark taktikabhängig: Die Leistungsfähigkeit ist bei zufällig generierten Zeichenketten am höchsten, nimmt jedoch bei der Kombination von Wörterbuchbegriffen und thematischen Combo-Squatting-Varianten ab, wobei bei mehreren Werkzeug-Cluster-Kombinationen eine geringe Recall-Rate festgestellt wurde. Insgesamt erweisen sich sowohl traditionelle Heuristiken als auch neuere ML-basierte Detektoren als ungeeignet, um die sich ständig weiterentwickelnden DGA-Taktiken, wie sie im Datensatz „Gravity Falls" beobachtet wurden, konsistent zu identifizieren. Dies unterstreicht die Notwendigkeit kontextbewussterer Ansätze und liefert einen reproduzierbaren Benchmark für künftige Evaluierungen.

One-sentence Summary

Adam Dorian Wong and John D. Hastings of Dakota State University introduce Gravity Falls, a semi-synthetic smishing-derived DGA dataset spanning 2022–2025, revealing that both traditional heuristics and ML detectors (including LSTM and COSSAS DGAD) fail against evolving tactics like themed combo-squatting, urging context-aware defenses for mobile threat landscapes.

Key Contributions

The paper introduces Gravity Falls, a new semi-synthetic DGA dataset derived from real-world SMS spearphishing campaigns (2022–2025), capturing a threat actor’s evolving tactics across four technique clusters—from randomized strings to themed combo-squatting—filling a gap in mobile-targeted DGA research previously dominated by malware C2 and email datasets.
It evaluates four DGA detectors (Shannon entropy, Exp0se, LSTM, COSSAS DGAD) against Gravity Falls using Top-1M domains as benign baselines, revealing that all methods struggle with dictionary-based and themed domains, showing tactic-dependent performance and low recall in multiple tool-cluster pairings.
The findings demonstrate that both traditional heuristics and recent ML-based detectors are ill-suited for the dynamic, context-rich DGA patterns in smishing, motivating context-aware detection methods and providing a reproducible benchmark for future evaluation of mobile threat infrastructure.

Introduction

The authors leverage the Gravity Falls dataset—a semi-synthetic collection of smishing-driven DGA domains from 2022 to 2025—to evaluate how well traditional and machine-learning DGA detectors perform against real-world, evolving attack tactics outside enterprise networks. While prior work focuses on malware C2 or email phishing, smishing targets individuals with fewer protections and rapidly rotating domains, making detection critical yet understudied. The authors find that both entropy-based heuristics and modern ML models like LSTM and COSSAS DGAD struggle with dictionary concatenation and themed combo-squatting variants, revealing a gap in detector adaptability to tactic shifts. Their main contribution is a new benchmark dataset and evidence that current tools are insufficient for smishing-specific DGA evolution, urging context-aware detection methods.

Dataset

The authors use the Gravity Falls dataset, composed of C2 domains delivered via SMS between 2022 and 2025, organized into four technique clusters reflecting annual evolution of the same threat actor’s TTPs. The data is semi-synthetic, blending observed malicious domains with predicted ones used for sinkholing and measurement.
Each cluster has distinct characteristics:
- Cats Cradle (2022): Short randomized 7-character domains with common TLDs; landing pages mimicked CAPTCHA portals.
- Double Helix (2023): Dictionary-based concatenations with newer gTLDs; occasional truncations suggest encoding constraints.
- Pandoras Box (2024): Professional package-delivery lures; combo-squatting with random suffixes; heavy use of Chinese infrastructure.
- Easy Rider (2025): Government/toll-themed lures; shifted to email-to-iMessage/SMS with foreign numbers; combo-squatting stabilized.
Control groups (10,000 domains each) were drawn from Alexa, Cisco, Cloudflare, and Majestic Top-1M lists (2017–2025), treated as benign baselines. Experimental groups combined 5,000 malicious domains from each cluster with 5,000 from Alexa Top-1M to maintain consistent size; Alexa was used for padding due to its static nature.
Data was collected via recipient-side SMS observation, followed by WHOIS lookups (via DomainTools), passive DNS queries (SecurityTrails), and URL snapshots (URLscan). From 2024 onward, Iris Investigate replaced manual workflows, enabling link graphs and structured CSV exports. IOCs were initially shared via OTX, later migrated to GitHub with curation to avoid platform suspensions.
For model evaluation, domains were randomized using Claude AI scripts, fed into tools in order (Control A–D, then Experimental A–D), with malicious samples stacked before benign ones to test for potential model assimilation. No explicit cropping or metadata construction beyond tool outputs was applied, though future work suggests retroactive standardization via DomainTools for higher fidelity.

Method

The authors leverage two distinct CAPTCHA generation techniques to evaluate target validation mechanisms, each designed to simulate human-like input patterns while introducing controlled randomness to thwart automated systems.

In the first approach, Cats Cradle (2022), the system generates randomized sequences of alphabetical characters constrained to lengths between five and eight characters. This method relies on the perceptual unpredictability of letter arrangements to challenge automated solvers, while maintaining a structure that remains legible and interpretable to human users. The technique does not enforce semantic meaning, instead prioritizing visual and typographic variability as a barrier to machine recognition.

The second method, Double Helix (2023), adopts a more linguistically grounded strategy by concatenating pairs of dictionary words. This dual-word structure preserves semantic coherence while increasing combinatorial complexity, making it harder for bots to guess or brute-force valid inputs. The authors assess both techniques under the same objective: validating target systems through the deployment of fake CAPTCHAs that mimic real-world adversarial conditions.

No architectural diagrams or training workflows are provided in the source material; the focus remains on the design and intent of the CAPTCHA generation strategies rather than their implementation or evaluation infrastructure.

Experiment

Evaluated four domain-generation tactics (Cats Cradle, Double Helix, Pandoras Box, Easy Rider) using traditional and ML-based detectors, revealing strong performance only on randomized domains (Cats Cradle) and poor detection on dictionary-based or combo-squatting variants.
Traditional detectors like Exp0se excelled at high-entropy domains but struggled with structured, dictionary-driven tactics, confirming their role as high-throughput sieves rather than comprehensive solutions.
ML-based tools (LSTM, DGAD) showed limited generalization beyond randomized domains, indicating current models are not robust against blended, real-world smishing tactics that mix brand tokens and minor randomization.
Defenders should adopt layered strategies: use lexical heuristics for obvious random domains, and supplement with contextual signals (message content, infrastructure, brand abuse policies) for more sophisticated tactics.
LLMs demonstrated potential in identifying thematic patterns across clusters, suggesting future integration could enhance detection capabilities.
Experimental limitations include semi-synthetic data, sampling duplicates, skewed benign/malicious ratios, and outdated benign baselines, all of which constrain generalizability and should be addressed in future work.

The authors evaluate four domain detection methods across four distinct domain-generation tactics, finding that performance varies significantly by tactic type. Traditional and ML-based detectors achieve high precision and accuracy on randomized domains but struggle with dictionary-based and themed combo-squatting domains. Results indicate that current tools are not robust against real-world smishing tactics that blend recognizable words with minor randomization.

Quell-PDF

Inhaltsverzeichnis

KI mit KI entwickeln

Von der Idee bis zum Launch – beschleunigen Sie Ihre KI-Entwicklung mit kostenlosem KI-Co-Coding, sofort einsatzbereiter Umgebung und bestem GPU-Preis.

KI-gestütztes kollaboratives Programmieren

Sofort einsatzbereite GPUs

Die besten Preise

Erste Schritte Preise anzeigen

HyperAI Newsletters

Abonnieren Sie unsere neuesten Updates

Wir werden die neuesten Updates der Woche in Ihren Posteingang liefern um neun Uhr jeden Montagmorgen

Unterstützt von MailChimp

HyperAI

vor 5 Tagen

Adam Dorian Wong John D. Hastings

Inhaltsverzeichnis

Zusammenfassung

One-sentence Summary

Key Contributions

The paper introduces Gravity Falls, a new semi-synthetic DGA dataset derived from real-world SMS spearphishing campaigns (2022–2025), capturing a threat actor’s evolving tactics across four technique clusters—from randomized strings to themed combo-squatting—filling a gap in mobile-targeted DGA research previously dominated by malware C2 and email datasets.
It evaluates four DGA detectors (Shannon entropy, Exp0se, LSTM, COSSAS DGAD) against Gravity Falls using Top-1M domains as benign baselines, revealing that all methods struggle with dictionary-based and themed domains, showing tactic-dependent performance and low recall in multiple tool-cluster pairings.
The findings demonstrate that both traditional heuristics and recent ML-based detectors are ill-suited for the dynamic, context-rich DGA patterns in smishing, motivating context-aware detection methods and providing a reproducible benchmark for future evaluation of mobile threat infrastructure.

Introduction

Dataset

The authors use the Gravity Falls dataset, composed of C2 domains delivered via SMS between 2022 and 2025, organized into four technique clusters reflecting annual evolution of the same threat actor’s TTPs. The data is semi-synthetic, blending observed malicious domains with predicted ones used for sinkholing and measurement.
Each cluster has distinct characteristics:
- Cats Cradle (2022): Short randomized 7-character domains with common TLDs; landing pages mimicked CAPTCHA portals.
- Double Helix (2023): Dictionary-based concatenations with newer gTLDs; occasional truncations suggest encoding constraints.
- Pandoras Box (2024): Professional package-delivery lures; combo-squatting with random suffixes; heavy use of Chinese infrastructure.
- Easy Rider (2025): Government/toll-themed lures; shifted to email-to-iMessage/SMS with foreign numbers; combo-squatting stabilized.
Control groups (10,000 domains each) were drawn from Alexa, Cisco, Cloudflare, and Majestic Top-1M lists (2017–2025), treated as benign baselines. Experimental groups combined 5,000 malicious domains from each cluster with 5,000 from Alexa Top-1M to maintain consistent size; Alexa was used for padding due to its static nature.
Data was collected via recipient-side SMS observation, followed by WHOIS lookups (via DomainTools), passive DNS queries (SecurityTrails), and URL snapshots (URLscan). From 2024 onward, Iris Investigate replaced manual workflows, enabling link graphs and structured CSV exports. IOCs were initially shared via OTX, later migrated to GitHub with curation to avoid platform suspensions.
For model evaluation, domains were randomized using Claude AI scripts, fed into tools in order (Control A–D, then Experimental A–D), with malicious samples stacked before benign ones to test for potential model assimilation. No explicit cropping or metadata construction beyond tool outputs was applied, though future work suggests retroactive standardization via DomainTools for higher fidelity.

Method

Experiment

Evaluated four domain-generation tactics (Cats Cradle, Double Helix, Pandoras Box, Easy Rider) using traditional and ML-based detectors, revealing strong performance only on randomized domains (Cats Cradle) and poor detection on dictionary-based or combo-squatting variants.
Traditional detectors like Exp0se excelled at high-entropy domains but struggled with structured, dictionary-driven tactics, confirming their role as high-throughput sieves rather than comprehensive solutions.
ML-based tools (LSTM, DGAD) showed limited generalization beyond randomized domains, indicating current models are not robust against blended, real-world smishing tactics that mix brand tokens and minor randomization.
Defenders should adopt layered strategies: use lexical heuristics for obvious random domains, and supplement with contextual signals (message content, infrastructure, brand abuse policies) for more sophisticated tactics.
LLMs demonstrated potential in identifying thematic patterns across clusters, suggesting future integration could enhance detection capabilities.
Experimental limitations include semi-synthetic data, sampling duplicates, skewed benign/malicious ratios, and outdated benign baselines, all of which constrain generalizability and should be addressed in future work.

Quell-PDF

Inhaltsverzeichnis

KI mit KI entwickeln

Von der Idee bis zum Launch – beschleunigen Sie Ihre KI-Entwicklung mit kostenlosem KI-Co-Coding, sofort einsatzbereiter Umgebung und bestem GPU-Preis.

KI-gestütztes kollaboratives Programmieren

Sofort einsatzbereite GPUs

Die besten Preise

Erste Schritte Preise anzeigen

HyperAI Newsletters

Abonnieren Sie unsere neuesten Updates

Wir werden die neuesten Updates der Woche in Ihren Posteingang liefern um neun Uhr jeden Montagmorgen

Unterstützt von MailChimp

Command Palette

Gravity Falls: Eine vergleichende Analyse von Methoden zur Erkennung von Domain-Generation-Algorithmen (DGA) bei Mobile-Device-Spearphishing

Adam Dorian Wong John D. Hastings

Zusammenfassung

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

KI mit KI entwickeln

HyperAI Newsletters

Command Palette

Gravity Falls: Eine vergleichende Analyse von Methoden zur Erkennung von Domain-Generation-Algorithmen (DGA) bei Mobile-Device-Spearphishing

Adam Dorian Wong John D. Hastings

Zusammenfassung

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

KI mit KI entwickeln

HyperAI Newsletters

Command Palette

Gravity Falls: Eine vergleichende Analyse von Methoden zur Erkennung von Domain-Generation-Algorithmen (DGA) bei Mobile-Device-Spearphishing

Adam Dorian Wong John D. Hastings

Zusammenfassung

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

KI mit KI entwickeln

HyperAI Newsletters