Command Palette
Search for a command to run...
StreakMind: 인공위성 흔적의 AI 기반 검출 및 분석과 자동화된 데이터베이스 통합
StreakMind: 인공위성 흔적의 AI 기반 검출 및 분석과 자동화된 데이터베이스 통합
Rafael Carrillo René Duffard Pablo García-Martín Javier Romero Nicolás Morales Luis Gonçalves
초록
인공위성 및 우주 파편은 천문 이미지를 점령하는 현상이 심화되며, 과학적 관측 활동에 부정적인 영향을 미치고 다량의 스트레이크(streak) 노출 자료를 생성하고 있다. 대규모 관측 데이터에 대한 수동 검사는 더 이상 실행 가능하지 않으며, 데이터 품질 관리와 지구 궤도 객체 모니터링을 위해 스트레이크의 신원 확인 및 특성 규명 작업이 필수화되고 있다. 본 연구에서는 지구 근접 객체(Near-Earth Objects, NEOs) 및 위성 스트레이크를 탐지하고, 그 기하학적 구조를 특성화하며, 알려진 궤도 객체와 교차 식별할 수 있도록 설계된 자동화 파이프라인인 'StreakMind'를 제시한다. 이 시스템은 대규모 관측 작업에 적합하도록 모든 추론 결과를 구조화된 데이터베이스로 통합한다.StreakMind는 2,335장의 이미지로 구성된 수동 및 합성 하이브리드 학습 데이터를 기반으로 YOLO-OBB 객체 탐지 모델을 훈련시켰으며, 처리된 FITS 파일 프레임에서 스트레이크를 탐지한다. 이후 기하학적 정제, 프레임 간 연관성 분석, 위성 교차 식별, 그리고 가우시안 기반 신뢰도 점수 산출 과정을 거쳐 최종 식별 결과를 생성하며, 이는 정규화된 관계형 데이터베이스(Relational Database)에 저장된다. 본 연구에서는 라사가 관측소(L98)에서 Celestron C14+Fastar 망원경을 통해 획득한 이미지를 활용하여 자동화된 스트레이크 탐지 및 특성화 방법을 개발하고 검증하였다. 테스트셋에서 모델은 94%의 정밀도(Precision)와 97%의 재현율(Recall)을 달성하였다.
One-sentence Summary
StreakMind is an automated pipeline employing a YOLO-OBB model trained on a hybrid manual-synthetic dataset of 2335 images to detect and characterise satellite streaks and near-Earth objects in processed FITS frames, cross-identify them with known orbital objects, and integrate results into a normalised relational database to support data quality control and orbital monitoring, achieving 94% precision and 97% recall on test data from La Sagra Observatory acquired with a Celestron C14+Fastar telescope.
Key Contributions
- This work presents StreakMind, an end-to-end pipeline designed to detect linear streaks in ground-based astronomical images, refine their geometry, and cross-identify candidate artificial objects using external ephemerides. The system standardises measurements into Minor Planet Center-style records and integrates all outputs into a relational database suitable for large-scale analyses.
- A YOLO-OBB model was trained on a hybrid manual-synthetic dataset of 2335 images to detect streaks in processed FITS frames. Inter-frame association and Gaussian-based confidence scoring are applied to produce final identifications.
- Images acquired at La Sagra Observatory with a Celestron C14+Fastar telescope were used to develop and test the automated streak detection methods. The model achieved a precision of 94% and a recall of 97% on the test set.
Introduction
Wide-field astronomical surveys now generate massive volumes of imagery contaminated by artificial satellites and space debris, making manual inspection infeasible for near-Earth object detection and orbital monitoring. While existing detection methods can identify linear features, they often lack robust end-to-end integration for large-scale database management and precise geometric characterization. The authors present StreakMind, an automated pipeline that leverages a YOLO-OBB model trained on hybrid manual and synthetic data to detect and characterize linear streaks. This system refines geometric measurements, associates detections across consecutive frames, and cross-identifies candidates against external ephemerides before integrating all outputs into a normalized relational database.
Dataset
Dataset Composition and Sources
- The authors combine 2055 real astronomical FITS images from La Sagra Observatory with 280 synthetically generated images.
- Real observations were conducted between April and June 2019 using a Celestron C14+Fastar telescope equipped with an SBIG ST-10 3 CCD camera.
- Images were acquired with 2x2 binning to reduce data volume and facilitate nightly transfers.
- Synthetic data was introduced specifically to balance the dataset by increasing the representation of long streaks.
Key Details for Each Subset
- Real images measure 1092 x 736 pixels and contain 765 manually identified streaks ranging from 8.5 to 1161.7 pixels in length.
- Images are categorized based on a 269.1 pixel threshold derived from the 75th percentile of the streak-length distribution.
- This classification yields 1523 images without streaks, 412 with short streaks, and 120 with long streaks.
- The synthetic subset includes 280 images where streaks have a minimum length of 269 pixels and follow a Gaussian angular distribution.
Training Splits and Data Usage
- The final dataset is divided into training (70%), validation (20%), and test (10%) subsets using stratified sampling.
- Stratification ensures each subset preserves the original class distribution of short-streak, long-streak, and no-streak images.
- FITS files are converted to PNG format with ZScale normalisation to enhance contrast for faint structures.
- Manual labelling via Tycho Tracker software generates Oriented Bounding Boxes (OBBs) for each detected streak.
Processing and Metadata Construction
- Images are aligned to a common reference frame, resulting in dead margins caused by telescope pointing variations.
- A vertical flip correction is applied to coordinates during conversion to align FITS origins with standard PNG raster conventions.
- A 40 pixel edge threshold is used to determine if a streak is complete or incomplete relative to image borders.
- Metadata construction includes observatory codes, telescope details, astrometric coordinates, and synthesized MPC-formatted observation records.
Method
The core of the StreakMind pipeline is built upon the You Only Look Once (YOLO) family of real-time object detection models, specifically utilizing the YOLO11 architecture introduced in 2024. This single-stage detector is chosen for its ability to predict object location and category in one pass, which is critical for processing large volumes of astronomical imagery efficiently. The model retains the standard three-part structure: a backbone for feature extraction, a neck for multi-scale feature combination, and a head for final prediction. For this specific application, the network is configured to output Oriented Bounding Boxes (OBBs) rather than standard axis-aligned boxes, allowing it to accurately capture the arbitrary orientation of linear streaks.
The geometric representation of these detections is central to the pipeline's accuracy. As illustrated in the figure below, the OBB is defined by four vertices (v1 to v4), a center point (c), a length (L), a width (w), and an orientation angle (θ) relative to the image axes.
While the YOLO11 model provides the initial detection, the authors note that standard regressors often underestimate the true extent of long streaks. To mitigate this, a photometric pre-analysis stage is implemented to longitudinally extend the OBBs. This process involves transforming the image region into a photometrically enhanced format and sampling a one-dimensional flux profile I(s) along the major axis. The box is extended iteratively as long as the measured flux remains above a dynamic threshold defined by I(s)>Ibg+kσ, where Ibg is the background level and σ is the noise estimate. This ensures that faint wings of the streak are captured, as suggested by the extended dashed green boundary in the diagram.
The training process utilizes a pretrained YOLO11 model initially trained on the DOTAv1.0 dataset, which is then fine-tuned on an augmented dataset containing both real and synthetically generated astronomical images. Training is conducted on cloud-based NVIDIA A100 GPUs. Following detection and geometric refinement, the pipeline employs a catalogue-driven filtering stage to remove false positives caused by stellar diffraction spikes by cross-matching with the Gaia DR3 catalogue. Finally, the refined detections undergo inter-frame association, where geometric extrapolation and temporal metadata are used to link streaks across consecutive frames, enabling the identification of moving objects.
Experiment
The evaluation framework combined quantitative testing on a held-out dataset with qualitative visual inspections to validate detection accuracy under controlled astronomical conditions. Subsequent application to real observational data confirmed the automated pipeline's superiority over manual inspection in terms of scalability, sensitivity to faint features, and reproducible database integration. Geometric characterization proved robust across most streak lengths, establishing the system as a viable end-to-end solution for processing large volumes of survey data.
The the the table details the composition of the dataset split into training, validation, and test subsets, categorized by the presence and length of streaks or the absence of streaks. The background class without streaks constitutes the majority of the data across all subsets, while long and short streaks represent smaller, balanced portions. This consistent partitioning ensures that the model is trained and tested on a representative sample of both positive detections and negative background frames. The No-streak class comprises the majority of samples in every subset. Long and short streak categories maintain consistent relative proportions throughout the dataset splits. The distribution ensures a balanced representation of streak types and background images for model evaluation.
The the the table displays the percentile distribution of detected streak lengths in pixels, ranging from the 25th to the 95th percentile. The text indicates that the model performs robustly for streaks up to half the image width, while longer streaks require additional geometric post-processing. This distribution reflects the variety of streak sizes encountered in the observational data. The data shows a significant spread in streak lengths, with the upper percentiles representing features much longer than the median. Geometric accuracy is reported to be highly reliable for streaks within the lower to middle range of the length distribution. For the longest streaks, detection stability is maintained but necessitates specific photometric-based post-processing steps.
The the the table details the composition of the dataset used for the experiment, divided into training, validation, and test subsets. The data is heavily imbalanced, with the majority of samples belonging to the no-streak class across all splits. The distribution of streak lengths and no-streak images remains consistent across the training, validation, and test sets. The no-streak class is the most prevalent category, accounting for the majority of the data in each subset. Short streaks represent the second most common class, making up a substantial minority of the total samples. Long streaks are the rarest category, appearing in a very small fraction of the images across all subsets.
The evaluation setup consists of training, validation, and test subsets with a consistent distribution where the no-streak background class predominates alongside balanced portions of short and long streaks. Experimental results show that the model achieves robust geometric accuracy for streaks within the lower to middle length ranges, typically extending up to half the image width. While detection stability is maintained for the longest streaks, these instances require specific geometric and photometric post-processing steps to ensure reliable performance across the full data spectrum.