Create a Simple ReAct Agent in Raw Python Without Frameworks
The article details the process of building a ReAct agent, a type of AI system that combines reasoning and action capabilities, using raw Python and without relying on frameworks like LangChain, AutoGen, or CrewAI. The author, who implemented this agent, aimed to better understand the inner workings of such systems and boost their confidence in developing more complex AI constructs. What is a ReAct Agent? ReAct stands for Reason and Act, a framework outlined in the paper "ReAct: Synergizing Reasoning and Acting in Language Models." Traditional language models (LMS) excel at generating reasoning traces and taking basic actions, such as responding in a chat loop. However, they lack the ability to diversify actions based on reasoning. For example, a typical chatbot might repeatedly ask questions without referring to external resources or performing calculations. By integrating reasoning and action, the ReAct framework allows LMs to decide on and execute various actions, like querying Wikipedia or searching academic databases like ArXiv, enhancing their performance and intelligence. Problem Statement The goal is to build an AI agent that can not only respond in a loop but also perform additional actions based on user queries. These actions include browsing Wikipedia, searching ArXiv, and performing simple calculations. The implementation will use OpenAI’s GPT-4 as the core language model. Implementation Setting Up the Environment The first step involves setting up a Python environment. The author recommends creating a new Conda environment and installing the necessary packages: bash conda create -n react-agent python==3.12 conda activate react-agent The required Python imports are minimal: python from openai import OpenAI import re import httpx import requests import xml.etree.ElementTree as ET import json The OPENAI_API_KEY is stored in a .env file and loaded using the dotenv package: python from dotenv import load_dotenv load_dotenv() A simple test confirms that the language model is working correctly: python chat_completion = OpenAI().chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello there!"}] ) print(chat_completion.choices[0].message.content) Building the ChatBot The ChatBot class is designed to handle messages and use the LLM to generate responses. It maintains a list of messages and appends the user’s message before calling the LLM: ```python class ChatBot: def init(self, system=""): self.system = system self.messages = [] if self.system: self.messages.append({"role": "system", "content": system}) def __call__(self, message): self.messages.append({"role": "user", "content": message}) result = self.run_llm() self.messages.append({"role": "assistant", "content": result}) return result def run_llm(self): completion = OpenAI().chat.completions.create( model="gpt-4", temperature=0, messages=self.messages ) return completion.choices[0].message.content ``` Creating the Agent The Agent class extends the ChatBot by adding a loop to handle multiple turns and diverse actions. The agent takes a system prompt, a maximum number of turns, and a dictionary of known actions: ```python class Agent: def init(self, system_prompt="", max_turns=1, known_actions=None): self.max_turns = max_turns self.bot = ChatBot(system_prompt) self.known_actions = known_actions def run(self, question): i = 0 next_prompt = question while i < self.max_turns: i += 1 result = self.bot(next_prompt) print(result) actions = [action_re.match(a) for a in result.split('\n') if action_re.match(a)] if actions: action, action_input = actions[0].groups() if action not in self.known_actions: raise Exception("Unknown action: {}: {}".format(action, action_input)) print(" -- running {} {}".format(action, action_input)) observation = self.known_actions[action](action_input) print("Observation:", observation) next_prompt = "Observation: {}".format(observation) else: return ``` Defining Actions Three actions are defined: wikipedia, arxiv_search, and calculate. Wikipedia: Queries the Wikipedia API to fetch a snippet of information: python def wikipedia(q): return httpx.get("https://en.wikipedia.org/w/api.php", params={ "action": "query", "list": "search", "srsearch": q, "format": "json" }).json()["query"]["search"][0]["snippet"] ArXiv Search: Searches the ArXiv database for research papers and returns a summary: python def arxiv_search(q): url = f'http://export.arxiv.org/api/query?search_query=all:{q}&start=0&max_results=1' res = requests.get(url) et_root = ET.fromstring(res.content) for entry in et_root.findall(f"{ARXIV_NAMESPACE}entry"): title = entry.find(f"{ARXIV_NAMESPACE}title").text.strip() summary = entry.find(f"{ARXIV_NAMESPACE}summary").text.strip() return json.dumps({"title": title, "summary": summary}) Calculate: Evaluates mathematical expressions: python def calculate(what): return eval(what) Prompting the Agent The prompt is crucial for the agent's operation. It guides the agent through a loop of thought, action, pause, and observation, providing examples of each step: ```python prompt = """ You run in a loop of Thought, Action, PAUSE, Observation. At the end of the loop you output an Answer. Use Thought to describe your thoughts about the question you have been asked. Use Action to run one of the actions available to you - then return PAUSE. Observation will be the result of running those actions. Your available actions are: calculate: e.g., calculate: 4 * 7 / 3 - Runs a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary wikipedia: e.g., wikipedia: Returns a summary from searching Wikipedia arxiv_search: e.g., arxiv_search: Returns a summary of research papers Example session: Question: What is the capital of France? Thought: I should look up France on Wikipedia Action: wikipedia: France PAUSE Observation: France is a country. The capital is Paris. Answer: The capital of France is Paris """ ``` Testing the Agent The agent is created and tested with a dictionary of known actions and a maximum number of turns set to 3: ```python known_actions = { "wikipedia": wikipedia, "calculate": calculate, "arxiv_search": arxiv_search } agent = Agent(prompt, max_turns=3, known_actions=known_actions) Test 1: Capital of Indonesia agent.run("what is the capital of indonesia?") ``` Output: Thought: I should look up Indonesia on Wikipedia to find out its capital. Action: wikipedia: Indonesia PAUSE -- running wikipedia Indonesia Observation: Indonesia, officially the Republic of Indonesia, is a country in Southeast Asia and Oceania, between the Indian and Pacific oceans. Comprising over 17 Answer: The capital of Indonesia is Jakarta. Test 2: Explaining the LightRAG paper agent.run("explain the lightRAG paper") ``` Output: Thought: I should search for the "lightRAG paper" to find relevant information about it, as it seems to be a specific research topic or publication. Action: arxiv_search: lightRAG PAUSE -- running arxiv_search lightRAG Observation: {"title": "LightRAG: Simple and Fast Retrieval-Augmented Generation", "summary": "Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs. However, existing RAG systems have significant limitations, including reliance on flat data representations and inadequate contextual awareness, which can lead to fragmented answers that fail to capture complex inter-dependencies. To address these challenges, we propose LightRAG, which incorporates graph structures into text indexing and retrieval processes. This innovative framework employs a dual-level retrieval system that enhances comprehensive information retrieval from both low-level and high-level knowledge discovery. Additionally, the integration of graph structures with vector representations facilitates efficient retrieval of related entities and their relationships, significantly improving response times while maintaining contextual relevance. This capability is further enhanced by an incremental update algorithm that ensures the timely integration of new data, allowing the system to remain effective and responsive in rapidly changing data environments. Extensive experimental validation demonstrates considerable improvements in retrieval accuracy and efficiency compared to existing approaches. We have made our LightRAG open-source and available at the link: https://github.com/HKUDS/LightRAG"} Answer: The LightRAG paper, titled "LightRAG: Simple and Fast Retrieval-Augmented Generation," presents a system that enhances large language models by integrating external knowledge sources. This approach aims to provide more accurate and contextually relevant responses. LightRAG addresses limitations in existing Retrieval-Augmented Generation (RAG) systems, such as reliance on flat data representations and inadequate contextual awareness, which can lead to fragmented answers. It incorporates graph structures into text indexing and retrieval processes, employing a dual-level retrieval system for comprehensive information retrieval. This system improves response times and maintains contextual relevance by efficiently retrieving related entities and their relationships. LightRAG also features an incremental update algorithm for timely integration of new data, ensuring effectiveness in rapidly changing environments. The paper reports significant improvements in retrieval accuracy and efficiency, and the LightRAG system is available as open-source at the provided GitHub link. Industry Evaluation and Author Background Industry experts and AI researchers have praised the ReAct framework for its effectiveness in enhancing the capabilities of language models. By integrating reasoning and action, ReAct demonstrates a significant leap in AI intelligence, making LMs more versatile and capable of handling complex tasks. The simplicity of the framework and its implementation using raw Python highlight the accessibility of advanced AI techniques and encourage developers to experiment and innovate. The author, a skilled AI developer, shares insights and updates on their X social media and YouTube channel. They emphasize the importance of understanding the foundational concepts behind AI systems to build more robust and intelligent models. The article provides a valuable resource for those looking to dive deeper into the world of AI agents and agentic systems, offering both code and visual explanations. Follow the author on X for daily AI news and research updates, and subscribe to their YouTube channel for visual explanations of AI concepts and papers. Clapping and sharing the article are also encouraged to support their efforts and the community's learning journey.