Boost LLM Performance with Strategic Context Enrichment Using Metadata and On-Demand Data Retrieval
My goal for this article is to highlight the importance of providing LLMs with relevant data and to show you how to effectively feed that data into your models to significantly improve their performance. After reading, you should understand why context enrichment matters, where to find useful data, and how to implement both pre- and on-demand data retrieval techniques in real-world applications. Large language models are trained on massive amounts of text—essentially the entire internet—during pre-training. This vast exposure allows them to learn patterns, facts, and language structures. However, during inference, their performance depends heavily on the quality and completeness of the context provided. Often, we limit their potential by failing to include key pieces of information, even when they’re readily available. For example, in a document Q&A system, you might pass the text content of uploaded files to the LLM but forget to include the filename. Yet, the filename may contain critical context—like “Q3_2024_Financial_Report.pdf”—which helps the model understand the timing and nature of the document. Similarly, metadata such as file type, folder path, upload date, page numbers, or author can significantly improve accuracy and relevance. This metadata is often already present in your system. File systems, databases, and cloud storage platforms store rich contextual details. Extracting and including these in the LLM’s prompt can make a major difference. For instance, filtering search results to only include PDFs from a specific department or date range ensures the model receives only relevant data, reducing noise and improving response quality. Beyond existing metadata, you can enrich context by proactively extracting additional information from documents. This can be done during preprocessing by using an LLM to identify and extract structured data—such as names, dates, locations, document types, or key entities. You define the desired data points, create a system prompt, and run the LLM on the text. The output can then be stored in a database and reused across queries. The main drawback of this approach is that you must anticipate what data will be needed. In dynamic or open-ended scenarios, you can’t always predict what information will be relevant. That’s where on-demand information retrieval comes in. With this method, you equip your LLM with a tool that can call an external function to extract specific data when needed. For example, you can create a function that takes a query like “What is the project deadline?” and a document text, then returns the deadline using a simple LLM prompt. The LLM can then invoke this function during reasoning, effectively expanding its knowledge in real time. This technique mirrors systems used by Anthropic and others in advanced agent architectures, where an orchestrator agent delegates sub-tasks to specialized tools. While powerful, it increases token usage, so monitoring cost and efficiency is essential. Two key applications demonstrate the value of enriched context. First, metadata filtering in retrieval-augmented generation (RAG) allows you to narrow down document chunks before feeding them to the LLM. If a user asks about a sales forecast, filtering for documents labeled “Sales” and dated after January 2024 ensures the model works with the right data. Second, AI agents can perform real-time internet searches to answer questions about events beyond the model’s training cutoff. This enables up-to-date responses on news, sports, or breaking developments—something static models alone cannot do. In summary, enhancing LLM performance isn’t just about better models or prompts—it’s about better context. By leveraging existing metadata, extracting structured information in advance, and retrieving data on demand, you empower your LLM to deliver more accurate, relevant, and insightful responses. The missing piece in many AI applications isn’t the model—it’s the data.