HyperAIHyperAI

Command Palette

Search for a command to run...

OpenAI Launches GABRIEL: Open-Source AI Toolkit to Scale Social Science Research by Turning Qualitative Data into Quantitative Insights

At OpenAI, a core part of our mission is empowering scientists to work faster and tackle more complex challenges. Today, we’re introducing GABRIEL, an open-source toolkit that leverages GPT to transform unstructured text and images into quantitative measurements. Designed for economists, social scientists, and data scientists, GABRIEL enables the analysis of qualitative data at scale—something that has long been a major bottleneck in social science research. Qualitative data captures the depth and nuance of human experience—what people say, write, teach, argue, and feel. It includes everything from interviews and syllabi to social media posts and photographs. While this data is abundant, turning it into rigorous, actionable insights has traditionally been slow, labor-intensive, and often impractical. As a result, many important research questions go unanswered not because the data is missing, but because it’s too difficult to analyze. GABRIEL is built to change that. It allows researchers to define what they want to measure using plain language—such as “How family-friendly is this job listing?”—and then apply that same criterion consistently across thousands or even millions of documents, generating a numerical score for each. This automation drastically reduces the time spent on repetitive data labeling, freeing researchers to focus on higher-level tasks: selecting meaningful research questions, validating findings, and interpreting results with care. For example, GABRIEL can analyze vast collections of scientific papers to track how specific research methods evolve over time. It can assess course curricula to measure the emphasis on different topics or skills. It can extract detailed historical information from documents about every small town across Europe or uncover patterns in customer reviews to reveal what consumers value most. In our research paper, we benchmark GABRIEL’s performance across multiple domains and find that GPT-based labeling achieves high accuracy and consistency. Beyond measurement, GABRIEL includes a suite of practical tools that researchers frequently need. These include intelligent data merging across mismatched datasets, smart deduplication, passage coding, support for generating new scientific hypotheses, and deidentification of personal information to protect privacy. GABRIEL is now available as an open-source Python library, complete with a beginner-friendly tutorial notebook to help researchers get started quickly. It’s designed to be accessible even to those with limited technical experience. We’re committed to continuously improving GABRIEL based on feedback from the academic community. Our hope is that GABRIEL will help more social scientists unlock the power of qualitative data, bringing the richness of human stories into evidence-based research like never before.

Related Links