New Open-Source Project Uses AI to Visualize and Enhance Government Grant Funding Data
Summary Introduction to Government Funding Graph The Government Funding Graph is an open-source project aimed at enhancing the process of grant writing by providing suggestions for research topics, funding bodies, research institutions, and researchers. Initiated by someone with experience in Innovate UK grant applications, the project is particularly timely given the increasing focus on government efficiency and AI integration in both the US and UK. Examples include Elon Musk's proposed Department of Government Efficiency (DOGE) and Kier Starmer’s efforts to incorporate AI into government processes. Key Components and Libraries Used UKRI API: The UK Research and Innovation (UKRI) API provides access to public grant funding datasets. No authentication is required. The project primarily uses the 'Search projects' and 'Projects' endpoints to fetch and parse project data. Code for asynchronous pagination and project data retrieval is provided, ensuring efficient data handling and sorting by funded value. NetworkX: NetworkX is a Python library used to construct and manipulate graphs. A directed graph (DiGraph) is created to represent the relationships between funders, projects, research organizations, and individuals involved. Nodes are categorized into four groups: funder_name, lead_research_organisation, project_title, and person_name. Custom attributes like “funding” are added to nodes to normalize their sizes based on the percentage of total funding. PyVis: PyVis is used to visualize the graph dynamically. The Network class in PyVis is utilized to create a visually appealing graph with specific configurations like height, width, and background color. The graph is converted from NetworkX to PyVis format and saved as an HTML file, which is then embedded in the Streamlit UI for display. LlamaIndex: LlamaIndex is employed for graph retrieval-augmented generation (RAG). This AI framework enhances the accuracy of responses by grounding them in the external knowledge base provided by the graph. A chat engine is created to handle user queries, maintaining a chat history to improve context and using the tree_summarize response builder to generate comprehensive summaries. Streamlit: Streamlit is a lightweight Python web framework used to build and deploy the application quickly. Users can interact with the graph through a simple web interface. The Streamlit app includes forms for user input, filtering options, and dynamic graph visualizations. Detailed Project Steps Data Retrieval: Projects are searched using the UKRI API, and results are paginated asynchronously to handle large datasets efficiently. Project data is parsed to extract funders, projects, organizations, and individuals, with superfluous information removed. Graph Construction: A directed graph is constructed using NetworkX, with nodes representing various entities and edges representing their relationships. Node sizes are normalized based on the total funding received, improving the visual representation of importance. Graph Filtering: Users can filter nodes through the Streamlit UI, creating a subgraph that excludes irrelevant context. This is achieved using NetworkX's subgraph_view function. Filters help in focusing on specific projects or organizations, making the graph more manageable and relevant. Graph Visualization: The graph is converted to PyVis format and rendered as an interactive HTML visualization within the Streamlit app. Interactive features like clickable links to detailed project pages enhance user experience. Graph RAG Implementation: LlamaIndex is used to implement graph RAG, allowing users to query the graph for detailed information. User queries are processed through a chat engine, which looks up entities in the graph to provide grounded responses, reducing the risk of hallucinations. An alternative LangChain implementation was also explored but ultimately not chosen for this project. Linting with Pylint: Pylint is used to check and enforce coding standards, helping to identify potential bugs and stylistic issues. A .pylintrc file is created to configure Pylint settings, and the linter is run automatically during Docker image builds. Final Deployment The application is deployed on Streamlit Community Cloud, allowing anyone to access it for free. The live demo can be found at: https://governmentfundinggraph.streamlit.app. All the code for the project is available in the GitHub repository. Industry Insights and Company Profiles The Government Funding Graph project leverages modern AI and data visualization techniques to address a real-world problem in grant writing. By integrating tools like NetworkX, PyVis, and LlamaIndex, it offers a dynamic and intuitive way to explore government funding data, which is crucial for researchers and businesses seeking financial support. The project aligns with the growing trend of using data-driven approaches to optimize government spending and enhance transparency. Streamlit's ease of use in deploying data apps further underscores the accessibility and practicality of this solution. The author invites feedback and contributions to the project, highlighting the value of community involvement in open-source initiatives.
