HyperAI超神経
Back to Headlines

How to Build a ReAct Agent in Raw Python: Enhancing LLMs with Dynamic Actions

5日前

In this article, the author walks through the process of building a simple ReAct agent in raw Python, bypassing the use of frameworks like LangChain, AutoGen, or CrewAI. The goal is to create an AI agent that can not only respond to user queries but also perform additional actions such as browsing Wikipedia, searching ArXiv (a repository of free research papers), or conducting simple calculations. This implementation provides a deeper understanding of how these systems work and boosts confidence in building more complex agentic systems. What is a ReAct Agent? A ReAct (Reason and Act) agent combines the reasoning capabilities of large language models (LLMs) with the ability to perform actions in an environment. Traditional chatbots typically engage in a loop of generating responses and waiting for user input. However, they lack the capability to perform diverse actions based on their reasoning. For instance, a chatbot might decide to look up information on Wikipedia before answering a user's query. This combination of reasoning and action execution significantly enhances the intelligence and performance of LLMs, marking the beginning of the agentic era in AI. Problem Statement The author aims to build an AI agent that can: 1. Browse Wikipedia or ArXiv for information. 2. Perform simple calculations. This agent will use OpenAI's GPT-4 model as its LLM backbone but can be adapted to work with any LLM. Implementation Environment Setup The first step is setting up the Python environment using Conda: bash conda create -n react-agent python==3.12 conda activate react-agent Next, the necessary packages are imported: python from openai import OpenAI import re import httpx import requests import xml.etree.ElementTree as ET import json from dotenv import load_dotenv The OPENAI_API_KEY is loaded from a .env file to authenticate the OpenAI SDK. Basic ChatBot A basic chatbot class is defined to handle user messages and generate responses using the LLM: ```python class ChatBot: def init(self, system=""): self.system = system self.messages = [] if self.system: self.messages.append({"role": "system", "content": system}) def __call__(self, message): self.messages.append({"role": "user", "content": message}) result = self.run_llm() self.messages.append({"role": "assistant", "content": result}) return result def run_llm(self): completion = client.chat.completions.create( model="gpt-4", temperature=0, messages=self.messages ) return completion.choices[0].message.content ``` Agent Class The agent class orchestrates the ReAct loop, which consists of Thought, Action, PAUSE, and Observation: ```python class Agent: def init(self, system_prompt="", max_turns=1, known_actions=None): self.max_turns = max_turns self.bot = ChatBot(system_prompt) self.known_actions = known_actions def run(self, question): i = 0 next_prompt = question while i < self.max_turns: i += 1 result = self.bot(next_prompt) print(result) actions = [action_re.match(a) for a in result.split('\n') if action_re.match(a)] if actions: action, action_input = actions[0].groups() if action not in self.known_actions: raise Exception("Unknown action: {}: {}".format(action, action_input)) print(" -- running {} {}".format(action, action_input)) observation = self.known_actions[action](action_input) print("Observation:", observation) next_prompt = "Observation: {}".format(observation) else: return result ``` The agent limits the number of turns to prevent infinite loops and supports predefined actions. Actions Functions Three action functions are defined to handle different types of tasks: ```python def wikipedia(q): return httpx.get("https://en.wikipedia.org/w/api.php", params={ "action": "query", "list": "search", "srsearch": q, "format": "json" }).json()["query"]["search"][0]["snippet"] def arxiv_search(q): url = f'http://export.arxiv.org/api/query?search_query=all:{q}&start=0&max_results=1' res = requests.get(url) et_root = ET.fromstring(res.content) for entry in et_root.findall(f"{ARXIV_NAMESPACE}entry"): title = entry.find(f"{ARXIV_NAMESPACE}title").text.strip() summary = entry.find(f"{ARXIV_NAMESPACE}summary").text.strip() return json.dumps({"title": title, "summary": summary}) def calculate(what): return eval(what) ``` Prompt Design Prompts are crucial for guiding the ReAct agent. The author uses a few-shot prompting technique, providing examples to help the LLM understand the desired behavior: ```python prompt = """ You run in a loop of Thought, Action, PAUSE, Observation. At the end of the loop you output an Answer. Use Thought to describe your thoughts about the question you have been asked. Use Action to run one of the actions available to you - then return PAUSE. Observation will be the result of running those actions. Your available actions are: - calculate: e.g., calculate: 4 * 7 / 3 - wikipedia: e.g., wikipedia: France - arxiv_search: e.g., arxiv_search: lightrag paper Example session: Question: What is the capital of France? Thought: I should look up France on Wikipedia. Action: wikipedia: France PAUSE Observation: France is a country. The capital is Paris. Answer: The capital of France is Paris. """ ``` Testing the Agent The agent is tested with two queries: 1. Query 1: Capital of Indonesia python known_actions = { "wikipedia": wikipedia, "calculate": calculate, "arxiv_search": arxiv_search } agent = Agent(prompt, max_turns=3, known_actions=known_actions) agent.run("what is the capital of indonesia?") Output: Thought: I should look up Indonesia on Wikipedia to find out its capital. Action: wikipedia: Indonesia PAUSE -- running wikipedia Indonesia Observation: Indonesia, officially the Republic of Indonesia, is a country in Southeast Asia and Oceania, between the Indian and Pacific oceans. Comprising over 17... Answer: The capital of Indonesia is Jakarta. Query 2: Explain the LightRAG paper python agent.run("explain the lightrag paper") Output: Thought: I should search for the "lightrag paper" to find relevant information about it, as it seems to be a specific research topic or publication. Action: arxiv_search: lightrag PAUSE -- running arxiv_search lightrag Observation: {"title": "LightRAG: Simple and Fast Retrieval-Augmented Generation", "summary": "Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs. However, existing RAG systems have significant limitations, including reliance on flat data representations and inadequate contextual awareness, which can lead to fragmented answers that fail to capture complex inter-dependencies. To address these challenges, we propose LightRAG, which incorporates graph structures into text indexing and retrieval processes. This innovative framework employs a dual-level retrieval system that enhances comprehensive information retrieval from both low-level and high-level knowledge discovery. Additionally, the integration of graph structures with vector representations facilitates efficient retrieval of related entities and their relationships, significantly improving response times while maintaining contextual relevance. This capability is further enhanced by an incremental update algorithm that ensures the timely integration of new data, allowing the system to remain effective and responsive in rapidly changing data environments. Extensive experimental validation demonstrates considerable improvements in retrieval accuracy and efficiency compared to existing approaches. We have made our LightRAG open-source and available at the link: https://github.com/HKUDS/LightRAG"} Answer: The LightRAG paper, titled "LightRAG: Simple and Fast Retrieval-Augmented Generation," presents a system that enhances large language models by integrating external knowledge sources. This approach aims to provide more accurate and contextually relevant responses. LightRAG addresses limitations in existing Retrieval-Augmented Generation (RAG) systems, such as reliance on flat data representations and inadequate contextual awareness, which can lead to fragmented answers. It incorporates graph structures into text indexing and retrieval processes, employing a dual-level retrieval system for comprehensive information retrieval. This system improves response times and maintains contextual relevance by efficiently retrieving related entities and their relationships. LightRAG also features an incremental update algorithm for timely integration of new data, ensuring effectiveness in rapidly changing environments. The paper reports significant improvements in retrieval accuracy and efficiency, and the LightRAG system is available as open-source at the provided GitHub link. Notice how the agent autonomously decided to use the arxiv_search action to find information about the LightRAG paper, demonstrating its intelligent behavior. Visual Explanation For a more visual approach, the author provides a diagram detailing the flow of the ReAct loop, which includes Thought, Action, PAUSE, and Observation. Industry Evaluation and Company Profiles Building ReAct agents from scratch is a valuable exercise for developers and researchers. It helps in understanding the inner workings of LLMs and how they can be extended to perform more complex tasks. The approach outlined in this article is flexible and can be applied to various LLMs beyond GPT-4. The simplicity of the implementation makes it accessible to a wide range of users, fostering innovation and exploration in the field of AI. The LightRAG project, which the agent successfully explained, showcases the potential of combining LLMs with external knowledge sources, particularly in the context of research and development. Follow the author on X (formerly Twitter) for daily AI news and updates. Subscribing to their YouTube channel will also provide visual explanations of AI concepts and papers. Clapping and sharing this article can help spread valuable insights and resources to the broader tech community.

Related Links