HyperAIHyperAI

Command Palette

Search for a command to run...

3달 전

Gravity Falls: 모바일 장치 표적 스풀피싱에 대한 도메인 생성 알고리즘 (DGA) 검출 기법의 비교 분석

Adam Dorian Wong John D. Hastings

초록

모바일 기기는 SMS 스미싱 (Smishing) 링크를 통해 전자범죄 위협 행위자들의 주요 표적이 되고 있으며, 이러한 링크는 hostile(적대적) 인프라를 회피하기 위해 도메인 생성 알고리즘 (DGA, Domain Generation Algorithms) 을 활용한다. 그럼에도 불구하고, DGA 관련 연구와 평가는 주로 맬웨어 C2(Commannd and Control) 및 이메일 피싱 데이터셋에 집중되어 있어, 기업 경계 밖에서 스미싱 기반 도메인 전술에 대한 검출기의 일반화 성능을 입증한 근거는limited(제한적) 이다. 본 연구는 2022 년부터 2025 년 사이 전송된 스미싱 링크에서 유래한 새로운 준실험적 (semi-synthetic) 데이터셋인 'Gravity Falls'를 대상으로 전통적 및 머신러닝 기반 DGA 검출기를 평가함으로써 이러한 격차를 해소하고자 한다. Gravity Falls 는 단일 위협 행위자의 진화를 네 가지 기법 군 (technique clusters) 에 걸쳐 포착하며, 이는 짧은 무작위 문자열에서 시작하여 사전 용어 연결 (dictionary concatenation) 과 인증 탈취 (credential theft), 수수료/과태료 사기 (fee/fine fraud) 에 활용되는 테마 기반 변조 도메인 (themed combo-squatting variants) 으로 변화하는 과정을 포함한다. 평가는 벤젠 기준 (benign baselines) 으로 Top-1M 도메인을 사용하였으며, 두 가지 문자열 분석 기법 (Shannon 엔트로피 및 Exp0se) 과 두 가지 머신러닝 기반 검출기 (LSTM 분류기 및 COSSAS DGAD) 가 적용되었다. 결과는 전술에 따라 뚜렷한 차이를 보이는데, 무작위 문자열 도메인에서 가장 높은 성능을 보였으나 사전 용어 연결 및 테마 기반 변조 도메인에서는 성능이 떨어졌고, 다양한 도구를 클러스터와 결합한 경우 recall(검출률) 이 낮았다. 전체적으로 볼 때, 전통적 휴리스틱과 최신 머신러닝 검출기는 Gravity Falls 에서 관찰된 DGA 전술의 지속적인 진화에 일관되게 대응하기에 적합하지 않으며, 이는 상황 인식 (context-aware) 접근법에 대한 필요성을 제기하고 향후 평가를 위한 재현 가능한 벤치마킹 기준을 제공한다.

One-sentence Summary

Adam Dorian Wong and John D. Hastings of Dakota State University introduce Gravity Falls, a semi-synthetic smishing-derived DGA dataset spanning 2022–2025, revealing that both traditional heuristics and ML detectors (including LSTM and COSSAS DGAD) fail against evolving tactics like themed combo-squatting, urging context-aware defenses for mobile threat landscapes.

Key Contributions

  • The paper introduces Gravity Falls, a new semi-synthetic DGA dataset derived from real-world SMS spearphishing campaigns (2022–2025), capturing a threat actor’s evolving tactics across four technique clusters—from randomized strings to themed combo-squatting—filling a gap in mobile-targeted DGA research previously dominated by malware C2 and email datasets.
  • It evaluates four DGA detectors (Shannon entropy, Exp0se, LSTM, COSSAS DGAD) against Gravity Falls using Top-1M domains as benign baselines, revealing that all methods struggle with dictionary-based and themed domains, showing tactic-dependent performance and low recall in multiple tool-cluster pairings.
  • The findings demonstrate that both traditional heuristics and recent ML-based detectors are ill-suited for the dynamic, context-rich DGA patterns in smishing, motivating context-aware detection methods and providing a reproducible benchmark for future evaluation of mobile threat infrastructure.

Introduction

The authors leverage the Gravity Falls dataset—a semi-synthetic collection of smishing-driven DGA domains from 2022 to 2025—to evaluate how well traditional and machine-learning DGA detectors perform against real-world, evolving attack tactics outside enterprise networks. While prior work focuses on malware C2 or email phishing, smishing targets individuals with fewer protections and rapidly rotating domains, making detection critical yet understudied. The authors find that both entropy-based heuristics and modern ML models like LSTM and COSSAS DGAD struggle with dictionary concatenation and themed combo-squatting variants, revealing a gap in detector adaptability to tactic shifts. Their main contribution is a new benchmark dataset and evidence that current tools are insufficient for smishing-specific DGA evolution, urging context-aware detection methods.

Dataset

  • The authors use the Gravity Falls dataset, composed of C2 domains delivered via SMS between 2022 and 2025, organized into four technique clusters reflecting annual evolution of the same threat actor’s TTPs. The data is semi-synthetic, blending observed malicious domains with predicted ones used for sinkholing and measurement.

  • Each cluster has distinct characteristics:

    • Cats Cradle (2022): Short randomized 7-character domains with common TLDs; landing pages mimicked CAPTCHA portals.
    • Double Helix (2023): Dictionary-based concatenations with newer gTLDs; occasional truncations suggest encoding constraints.
    • Pandoras Box (2024): Professional package-delivery lures; combo-squatting with random suffixes; heavy use of Chinese infrastructure.
    • Easy Rider (2025): Government/toll-themed lures; shifted to email-to-iMessage/SMS with foreign numbers; combo-squatting stabilized.
  • Control groups (10,000 domains each) were drawn from Alexa, Cisco, Cloudflare, and Majestic Top-1M lists (2017–2025), treated as benign baselines. Experimental groups combined 5,000 malicious domains from each cluster with 5,000 from Alexa Top-1M to maintain consistent size; Alexa was used for padding due to its static nature.

  • Data was collected via recipient-side SMS observation, followed by WHOIS lookups (via DomainTools), passive DNS queries (SecurityTrails), and URL snapshots (URLscan). From 2024 onward, Iris Investigate replaced manual workflows, enabling link graphs and structured CSV exports. IOCs were initially shared via OTX, later migrated to GitHub with curation to avoid platform suspensions.

  • For model evaluation, domains were randomized using Claude AI scripts, fed into tools in order (Control A–D, then Experimental A–D), with malicious samples stacked before benign ones to test for potential model assimilation. No explicit cropping or metadata construction beyond tool outputs was applied, though future work suggests retroactive standardization via DomainTools for higher fidelity.

Method

The authors leverage two distinct CAPTCHA generation techniques to evaluate target validation mechanisms, each designed to simulate human-like input patterns while introducing controlled randomness to thwart automated systems.

In the first approach, Cats Cradle (2022), the system generates randomized sequences of alphabetical characters constrained to lengths between five and eight characters. This method relies on the perceptual unpredictability of letter arrangements to challenge automated solvers, while maintaining a structure that remains legible and interpretable to human users. The technique does not enforce semantic meaning, instead prioritizing visual and typographic variability as a barrier to machine recognition.

The second method, Double Helix (2023), adopts a more linguistically grounded strategy by concatenating pairs of dictionary words. This dual-word structure preserves semantic coherence while increasing combinatorial complexity, making it harder for bots to guess or brute-force valid inputs. The authors assess both techniques under the same objective: validating target systems through the deployment of fake CAPTCHAs that mimic real-world adversarial conditions.

No architectural diagrams or training workflows are provided in the source material; the focus remains on the design and intent of the CAPTCHA generation strategies rather than their implementation or evaluation infrastructure.

Experiment

  • Evaluated four domain-generation tactics (Cats Cradle, Double Helix, Pandoras Box, Easy Rider) using traditional and ML-based detectors, revealing strong performance only on randomized domains (Cats Cradle) and poor detection on dictionary-based or combo-squatting variants.
  • Traditional detectors like Exp0se excelled at high-entropy domains but struggled with structured, dictionary-driven tactics, confirming their role as high-throughput sieves rather than comprehensive solutions.
  • ML-based tools (LSTM, DGAD) showed limited generalization beyond randomized domains, indicating current models are not robust against blended, real-world smishing tactics that mix brand tokens and minor randomization.
  • Defenders should adopt layered strategies: use lexical heuristics for obvious random domains, and supplement with contextual signals (message content, infrastructure, brand abuse policies) for more sophisticated tactics.
  • LLMs demonstrated potential in identifying thematic patterns across clusters, suggesting future integration could enhance detection capabilities.
  • Experimental limitations include semi-synthetic data, sampling duplicates, skewed benign/malicious ratios, and outdated benign baselines, all of which constrain generalizability and should be addressed in future work.

The authors evaluate four domain detection methods across four distinct domain-generation tactics, finding that performance varies significantly by tactic type. Traditional and ML-based detectors achieve high precision and accuracy on randomized domains but struggle with dictionary-based and themed combo-squatting domains. Results indicate that current tools are not robust against real-world smishing tactics that blend recognizable words with minor randomization.


AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp
Gravity Falls: 모바일 장치 표적 스풀피싱에 대한 도메인 생성 알고리즘 (DGA) 검출 기법의 비교 분석 | 문서 | HyperAI초신경