HyperAIHyperAI

Command Palette

Search for a command to run...

FirstAidQA: 저연결성 환경에서의 응급처치 및 응급 대응을 위한 합성 데이터셋

Saiyma Sittul Muna Rezwan Islam Salvi Mushfiqur Rahman Mushfique Ajwad Abrar

초록

비상 상황에서는 한 초도 소중하다. 대규모 언어 모델(Large Language Models, LLM)은 시간이 급한 환경이나 저연결 또는 무연결 환경에서의 적용이 여전히 제한적이다. 현재의 모델들은 계산 자원을 많이 소모하기 때문에, 구조대원이나 일반 시민이 자주 사용하는 저사양 장치에는 부적합하다. 경량화된 특화 솔루션 개발의 주요 장벽은 응급처치 및 응급 대응 분야에 특화된 고품질 데이터셋이 부족하다는 점이다. 이 격차를 해소하기 위해, 우리는 5,500개의 고품질 질문-답변 쌍을 포함한 합성 데이터셋인 FirstAidQA를 소개한다. 이 데이터셋은 다양한 응급처치 및 응급 대응 시나리오를 포괄하고 있으며, ChatGPT-4o-mini라는 대규모 언어 모델을 활용해, 2019년 출간된 '바이탈 응급처치서(Vital First Aid Book)'의 텍스트를 기반으로 프롬프트 기반의 인-컨텍스트 학습(in-context learning) 방식으로 생성되었다. 이후 텍스트 정제, 맥락 기반 청크화, 필터링 등의 전처리 과정을 거친 후, 인간 검증을 통해 질문-답변 쌍의 정확성, 안전성, 실용적 관련성을 보장하였다. FirstAidQA는 LLM과 소규모 언어 모델(Small Language Models, SLM)의 지시어 조정(instruction-tuning) 및 미세조정(fine-tuning)을 지원하도록 설계되었으며, 응급 상황에서 더 빠르고 신뢰성 높으며 오프라인 작동이 가능한 시스템 구현을 가능하게 한다. 본 연구는 응급처치 및 응급 대응 분야에서 안전이 핵심이 되고 자원이 제한된 AI 응용 기술의 발전을 촉진하기 위해 이 데이터셋을 공개한다. 데이터셋은 Hugging Face에서 다음 링크를 통해 공개되어 있다: https://huggingface.co/datasets/i-am-mushfiq/FirstAidQA.

One-sentence Summary

Muna et al. from Islamic University of Technology introduce FirstAidQA, a synthetic dataset of 5,500 high-quality first aid question-answer pairs generated via ChatGPT-4o-mini using prompt-based in-context learning and human validation, addressing the scarcity of domain-specific emergency response data to train lightweight LLMs and SLMs for offline-capable systems in time-sensitive, low-connectivity scenarios.

Key Contributions

  • Identifies the critical absence of domain-specific datasets for first aid as a barrier to deploying lightweight AI in low-connectivity emergency scenarios and introduces FirstAidQA, a synthetic dataset of 5,500 question-answer pairs generated via ChatGPT-4o-mini using in-context learning from the Vital First Aid Book with rigorous preprocessing and human validation.
  • Validates dataset safety and accuracy through expert evaluation of 200 randomly sampled pairs by three medical professionals, assessing criteria including safety completeness and relevance while documenting flagged examples for cautious handling as evidenced in the provided evaluation tables.
  • Enables offline-capable emergency response systems by structuring FirstAidQA specifically for fine-tuning small language models, building on methodologies proven effective in prior resource-constrained medical applications like Cahlen's offline first-aid systems.

Introduction

The authors address a critical gap in emergency response tools for low-connectivity regions where immediate, accurate first-aid guidance can save lives but internet access is unreliable. Prior solutions like FAQ-based chatbots or commercial voice assistants often omit evidence-based steps or provide incomplete instructions, while existing medical QA datasets focus on clinical records or general health information—not actionable, step-by-step first aid for laypeople. Synthetic datasets like Self-Instruct or Offline Practical Skills QA demonstrate LLMs' potential for scalable data generation but lack first-aid specificity. The authors' main contribution is FirstAidQA, a purpose-built synthetic dataset generated to deliver reliable, guideline-compliant first-aid instructions offline, overcoming the absence of dedicated resources for this high-stakes domain.

Dataset

  • The authors introduce FirstAidQA, a synthetic dataset comprising 5,500 question-answer pairs focused on first aid and emergency response scenarios. It is generated using ChatGPT-4o-mini via prompt-based in-context learning, with source material exclusively drawn from the certified Vital First Aid Book (2019).
  • Key category details include:
    • Total size: 5,500 QA pairs spanning 15 emergency categories (e.g., CPR, burns, fractures, head injuries, bleeding management).
    • Source: Text chunks from the Vital First Aid Book, manually segmented to preserve context (e.g., casualty movement protocols or burn treatment steps).
    • Filtering: Irrelevant theoretical content was excluded; only text applicable to real-world emergencies was retained for QA generation.
    • Safety rules: Prompts explicitly constrained the LLM to generate answers strictly from provided context chunks, with diversified topic sampling to reduce bias.
  • The dataset supports instruction-tuning and fine-tuning of lightweight LLMs/SLMs for offline deployment in low-connectivity environments. The authors use the full dataset (without specified train/validation splits) to train models requiring rapid, reliable emergency guidance, emphasizing practical procedural knowledge over clinical diagnostics.
  • Processing includes contextual chunking of source text, structured JSON-formatted output generation (20 QA pairs per prompt batch), human validation for accuracy/safety, and iterative refinement to ensure diversity (e.g., adding pediatric/elderly scenarios). No cropping strategy is applied; instead, context-preserving chunks maintain situational relevance for edge-device deployment.

Experiment

  • Expert evaluation of 200 randomly sampled QA pairs by three medical professionals validated clarity, relevance, specificity and completeness, and safety and accuracy, with mean ratings documented in Table 2
  • Tables 3 and 4 highlight specific QA pairs containing potentially unsafe instructions that require cautious handling during dataset utilization

The authors use expert evaluation to assess 200 QA pairs across four criteria, with scores averaged across three medical professionals. Results show the highest mean score for Relevance (4.7) and the lowest for Safety & Accuracy (3.7), indicating that while questions are well-targeted and clear, some answers may contain medically inaccurate or unsafe content.


AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp
FirstAidQA: 저연결성 환경에서의 응급처치 및 응급 대응을 위한 합성 데이터셋 | 문서 | HyperAI초신경