منذ 2 أشهر

جدول المحتويات

الملخص

في الحالات الطارئة، تُعد كل ثانية حاسمة. لا تزال هناك قيود كبيرة في تطبيق نماذج اللغة الكبيرة (LLMs) في البيئات الحساسة للزمن، أو ذات الاتصال المنخفض أو المعدوم. تُعد النماذج الحالية مكثفة من حيث الحوسبة، مما يجعلها غير مناسبة للأجهزة من الفئة الدنيا التي يُستخدمها عادةً العاملون في الإغاثة الأولية أو المدنيون. ويعتبر أحد العوائق الرئيسية في تطوير حلول خفيفة الوزن ومخصصة للمجال هو نقص البيانات عالية الجودة المُعدة خصيصًا للاستجابة الأولية والطوارئ. لسد هذا الفجوة، نقدم "FirstAidQA"، وهو مجموعة بيانات اصطناعية تحتوي على 5500 زوجًا من الأسئلة والأجوبة عالية الجودة، تغطي طيفًا واسعًا من السيناريوهات المتعلقة بالاستجابة الأولية والطوارئ. تم إنشاء هذه المجموعة باستخدام نموذج لغة كبير، ChatGPT-4o-mini، باستخدام تقنية التعلم السياقي المُشَغَّل بالأسئلة (prompt-based in-context learning)، مستندة إلى نصوص من كتاب "الإسعافات الأولية الحيوية" (2019). وتم تطبيق خطوات ما قبل المعالجة مثل تنظيف النصوص، وتقسيمها إلى وحدات سياقية، وتصفية المحتوى، تليها مراجعة بشرية لضمان دقة الأزواج وسلامتها وملاءمتها العملية. صُممت "FirstAidQA" لدعم عملية ضبط التعليم التوجيهي (instruction-tuning) وتحسين النماذج الدقيقة (fine-tuning) لنموذج اللغة الكبير (LLMs) ونماذج اللغة الصغيرة (SLMs)، مما يمكّن من تطوير أنظمة أسرع وأكثر موثوقية وقادرة على العمل دون اتصال بالإنترنت في البيئات الطارئة. ونُشرت المجموعة بشكل عام لتعزيز الأبحاث في مجال تطبيقات الذكاء الاصطناعي الحرجة من حيث السلامة والمقيدة بالموارد في مجالات الإسعافات الأولية والاستجابة للطوارئ. يمكن الوصول إلى المجموعة على منصة Hugging Face من خلال الرابط التالي: https://huggingface.co/datasets/i-am-mushfiq/FirstAidQA.

One-sentence Summary

Muna et al. from Islamic University of Technology introduce FirstAidQA, a synthetic dataset of 5,500 high-quality first aid question-answer pairs generated via ChatGPT-4o-mini using prompt-based in-context learning and human validation, addressing the scarcity of domain-specific emergency response data to train lightweight LLMs and SLMs for offline-capable systems in time-sensitive, low-connectivity scenarios.

Key Contributions

Identifies the critical absence of domain-specific datasets for first aid as a barrier to deploying lightweight AI in low-connectivity emergency scenarios and introduces FirstAidQA, a synthetic dataset of 5,500 question-answer pairs generated via ChatGPT-4o-mini using in-context learning from the Vital First Aid Book with rigorous preprocessing and human validation.
Validates dataset safety and accuracy through expert evaluation of 200 randomly sampled pairs by three medical professionals, assessing criteria including safety completeness and relevance while documenting flagged examples for cautious handling as evidenced in the provided evaluation tables.
Enables offline-capable emergency response systems by structuring FirstAidQA specifically for fine-tuning small language models, building on methodologies proven effective in prior resource-constrained medical applications like Cahlen's offline first-aid systems.

Introduction

The authors address a critical gap in emergency response tools for low-connectivity regions where immediate, accurate first-aid guidance can save lives but internet access is unreliable. Prior solutions like FAQ-based chatbots or commercial voice assistants often omit evidence-based steps or provide incomplete instructions, while existing medical QA datasets focus on clinical records or general health information—not actionable, step-by-step first aid for laypeople. Synthetic datasets like Self-Instruct or Offline Practical Skills QA demonstrate LLMs' potential for scalable data generation but lack first-aid specificity. The authors' main contribution is FirstAidQA, a purpose-built synthetic dataset generated to deliver reliable, guideline-compliant first-aid instructions offline, overcoming the absence of dedicated resources for this high-stakes domain.

Dataset

The authors introduce FirstAidQA, a synthetic dataset comprising 5,500 question-answer pairs focused on first aid and emergency response scenarios. It is generated using ChatGPT-4o-mini via prompt-based in-context learning, with source material exclusively drawn from the certified Vital First Aid Book (2019).
Key category details include:
- Total size: 5,500 QA pairs spanning 15 emergency categories (e.g., CPR, burns, fractures, head injuries, bleeding management).
- Source: Text chunks from the Vital First Aid Book, manually segmented to preserve context (e.g., casualty movement protocols or burn treatment steps).
- Filtering: Irrelevant theoretical content was excluded; only text applicable to real-world emergencies was retained for QA generation.
- Safety rules: Prompts explicitly constrained the LLM to generate answers strictly from provided context chunks, with diversified topic sampling to reduce bias.
The dataset supports instruction-tuning and fine-tuning of lightweight LLMs/SLMs for offline deployment in low-connectivity environments. The authors use the full dataset (without specified train/validation splits) to train models requiring rapid, reliable emergency guidance, emphasizing practical procedural knowledge over clinical diagnostics.
Processing includes contextual chunking of source text, structured JSON-formatted output generation (20 QA pairs per prompt batch), human validation for accuracy/safety, and iterative refinement to ensure diversity (e.g., adding pediatric/elderly scenarios). No cropping strategy is applied; instead, context-preserving chunks maintain situational relevance for edge-device deployment.

Experiment

Expert evaluation of 200 randomly sampled QA pairs by three medical professionals validated clarity, relevance, specificity and completeness, and safety and accuracy, with mean ratings documented in Table 2
Tables 3 and 4 highlight specific QA pairs containing potentially unsafe instructions that require cautious handling during dataset utilization

The authors use expert evaluation to assess 200 QA pairs across four criteria, with scores averaged across three medical professionals. Results show the highest mean score for Relevance (4.7) and the lowest for Safety & Accuracy (3.7), indicating that while questions are well-targeted and clear, some answers may contain medically inaccurate or unsafe content.

ملف PDF المصدر

جدول المحتويات

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

من الفكرة إلى الإطلاق — سرّع تطوير الذكاء الاصطناعي الخاص بك مع المساعدة البرمجية المجانية بالذكاء الاصطناعي، وبيئة جاهزة للاستخدام، وأفضل أسعار لوحدات معالجة الرسومات.

البرمجة التعاونية باستخدام الذكاء الاصطناعي

وحدات GPU جاهزة للعمل

أفضل الأسعار

ابدأ عرض الأسعار

HyperAI Newsletters

اشترك في آخر تحديثاتنا

سنرسل لك أحدث التحديثات الأسبوعية إلى بريدك الإلكتروني في الساعة التاسعة من صباح كل يوم اثنين

مدعوم بواسطة MailChimp

HyperAI

منذ 2 أشهر

مجموعة بيانات

الضبط الدقيق المراقب

الإجابة على الأسئلة الذكية

بنية ذكاء اصطناعي الأساسية

النهج/المعمارية

معالجة اللغة الطبيعية

مهمة

Saiyma Sittul Muna Rezwan Islam Salvi Mushfiqur Rahman Mushfique Ajwad Abrar

جدول المحتويات

الملخص

One-sentence Summary

Key Contributions

Identifies the critical absence of domain-specific datasets for first aid as a barrier to deploying lightweight AI in low-connectivity emergency scenarios and introduces FirstAidQA, a synthetic dataset of 5,500 question-answer pairs generated via ChatGPT-4o-mini using in-context learning from the Vital First Aid Book with rigorous preprocessing and human validation.
Validates dataset safety and accuracy through expert evaluation of 200 randomly sampled pairs by three medical professionals, assessing criteria including safety completeness and relevance while documenting flagged examples for cautious handling as evidenced in the provided evaluation tables.
Enables offline-capable emergency response systems by structuring FirstAidQA specifically for fine-tuning small language models, building on methodologies proven effective in prior resource-constrained medical applications like Cahlen's offline first-aid systems.

Introduction

Dataset

The authors introduce FirstAidQA, a synthetic dataset comprising 5,500 question-answer pairs focused on first aid and emergency response scenarios. It is generated using ChatGPT-4o-mini via prompt-based in-context learning, with source material exclusively drawn from the certified Vital First Aid Book (2019).
Key category details include:
- Total size: 5,500 QA pairs spanning 15 emergency categories (e.g., CPR, burns, fractures, head injuries, bleeding management).
- Source: Text chunks from the Vital First Aid Book, manually segmented to preserve context (e.g., casualty movement protocols or burn treatment steps).
- Filtering: Irrelevant theoretical content was excluded; only text applicable to real-world emergencies was retained for QA generation.
- Safety rules: Prompts explicitly constrained the LLM to generate answers strictly from provided context chunks, with diversified topic sampling to reduce bias.
The dataset supports instruction-tuning and fine-tuning of lightweight LLMs/SLMs for offline deployment in low-connectivity environments. The authors use the full dataset (without specified train/validation splits) to train models requiring rapid, reliable emergency guidance, emphasizing practical procedural knowledge over clinical diagnostics.
Processing includes contextual chunking of source text, structured JSON-formatted output generation (20 QA pairs per prompt batch), human validation for accuracy/safety, and iterative refinement to ensure diversity (e.g., adding pediatric/elderly scenarios). No cropping strategy is applied; instead, context-preserving chunks maintain situational relevance for edge-device deployment.

Experiment

Expert evaluation of 200 randomly sampled QA pairs by three medical professionals validated clarity, relevance, specificity and completeness, and safety and accuracy, with mean ratings documented in Table 2
Tables 3 and 4 highlight specific QA pairs containing potentially unsafe instructions that require cautious handling during dataset utilization

ملف PDF المصدر

جدول المحتويات

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

البرمجة التعاونية باستخدام الذكاء الاصطناعي

وحدات GPU جاهزة للعمل

أفضل الأسعار

ابدأ عرض الأسعار

HyperAI Newsletters

اشترك في آخر تحديثاتنا

سنرسل لك أحدث التحديثات الأسبوعية إلى بريدك الإلكتروني في الساعة التاسعة من صباح كل يوم اثنين

مدعوم بواسطة MailChimp

Command Palette

FirstAidQA: مجموعة بيانات اصطناعية للإسعافات الأولية والاستجابة للطوارئ في البيئات ذات الاتصال المنخفض

Saiyma Sittul Muna Rezwan Islam Salvi Mushfiqur Rahman Mushfique Ajwad Abrar

الملخص

One-sentence Summary

Key Contributions

Introduction

Dataset

Experiment

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

HyperAI Newsletters

Command Palette

FirstAidQA: مجموعة بيانات اصطناعية للإسعافات الأولية والاستجابة للطوارئ في البيئات ذات الاتصال المنخفض

Saiyma Sittul Muna Rezwan Islam Salvi Mushfiqur Rahman Mushfique Ajwad Abrar

الملخص

One-sentence Summary

Key Contributions

Introduction

Dataset

Experiment

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

HyperAI Newsletters

Command Palette

FirstAidQA: مجموعة بيانات اصطناعية للإسعافات الأولية والاستجابة للطوارئ في البيئات ذات الاتصال المنخفض

Saiyma Sittul Muna Rezwan Islam Salvi Mushfiqur Rahman Mushfique Ajwad Abrar

الملخص

One-sentence Summary

Key Contributions

Introduction

Dataset

Experiment

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

HyperAI Newsletters