Lessons Learned: Building an Effective MCP Server for Dynamic Code Analysis
Developing an MCP (Multi-Cloud Platform) server for an observability application to enhance dynamic code analysis capabilities involves several crucial lessons. The initial proof-of-concept (POC) showed tremendous potential, but achieving that promise required multiple iterations and significant adjustments. Lesson 1: Don’t Dump All Your Data on the Agent Initially, I exposed a detailed API designed for a complex UI, expecting the AI agent to process and make sense of the vast amounts of data. However, this approach quickly backfired. The response JSON contained around 360,000 characters and 16,000 words, including call stacks, error frames, and references. Despite the context window limit for the model (Claude 3.7 Sonnet supporting up to 200,000 tokens), the large data volume overwhelmed the agent. It responded ineffectively and even claimed there were no errors despite receiving tangible results. Switching to a model with a larger context window, like Gemini 2.5 Pro (supporting up to one million tokens), resolved this issue partially. The agent provided a more intelligent response, parsing errors and identifying systematic causes. Nevertheless, relying on a specific model is impractical. Instead, I optimized the API to deliver smaller, more focused data sets, ensuring the agent could consistently analyze important new exceptions and suggest fixes. Lesson 2: Use Time Duration Values Another stumbling block was the agent's handling of time ranges. Instead of using standard 'From' and 'To' parameters with datetime values, the agent utilized ISO 8601 time duration formats, such as "P7D" for the past seven days. This approach stemmed from the agent's lack of awareness of the current date and time. Asking the agent for the current date resulted in a response from a past date, highlighting the need to use relative time durations. This solution was effective, but it emphasized the importance of documenting the expected value and example syntax in the tool parameter descriptions. Proper documentation ensured the agent could handle time-related queries accurately and seamlessly. Lesson 3: Correct the Agent’s Mistakes When the agent made mistakes, such as using environment names verbatim in requests, my MCP server would silently fail, returning no data or a generic error. The agent then tried alternative strategies, often giving up on the correct approach. To improve this, I modified my implementation to return detailed error messages along with valid options. This feedback loop allowed the agent to learn from its mistakes and avoid repeating them in the future. For instance, if the agent provided invalid environment IDs, the server would explicitly state the error and offer a list of valid environment IDs. This change significantly enhanced the agent's interactions, making it more reliable and efficient. Lesson 4: Emphasize User Intent The agent's functionality often lagged behind its understanding of user intent. A tool that displayed runtime usage of methods and endpoints was initially described in functional terms, without highlighting its practical applications. As a result, the agent failed to trigger the tool for relevant prompts. Rewriting the tool description to emphasize use cases, such as detecting breaking changes or generating tests, helped the agent recognize and utilize the tool effectively. For example, a prompt like "compare slow traces for this method between the test and prod environments" now seamlessly triggers the tool, as the agent understands its relevance and utility. Lesson 5: Document JSON Responses Effective communication with the agent depended heavily on well-documented JSON responses. Without comments, the agent struggled to interpret the context and significance of returned data. For instance, the "Score" object in the error response included various parameters without clear explanations. Adding a "_comment" section to the JSON provided essential context, enabling the agent to explain scores to users and incorporate this information into its recommendations. Choosing the Right Architecture: SSE vs. STDIO Two primary architectures are available for MCP servers: STDIO and Server-Sent Events (SSE). STDIO involves running the server as a client-side command, which can be CLI-triggered (e.g., npx, docker, python). This setup is widely supported but has limitations. Updates or new capabilities require clients to manually roll out changes, and versioning becomes complex due to tight coupling with backend APIs. In contrast, I opted for an SSE server, hosted as part of our application services. This eliminated the need for client-side commands, reduced friction in updating and versioning, and allowed for seamless integration with the application's code. While not all clients support SSE, tools like supergateway can serve as proxies, making it easier to transition from the STDIO model to the SSE server. Final Thoughts Industry insiders have praised the adaptability and potential of MCPs in enhancing dynamic code analysis. Companies like Digma-ai, which focus on AI-driven observability solutions, are leading the way in integrating MCPs into their products. The lessons learned in this development process highlight the importance of optimizing data delivery, clear documentation, and user-centric design in MCP implementations. As technology matures, best practices and user feedback will further refine these approaches, making MCPs an invaluable tool for developers and observability teams. Follow the author on Twitter at @doppleware or LinkedIn for more insights and updates on MCPs and dynamic code analysis using observability. You can also explore the MCP server project at https://github.com/digma-ai/digma-mcp-server.
