ChatGPT Agent: A Slow but Steady AI Assistant That Struggles with Practical Tasks
OpenAI recently unveiled ChatGPT Agent, a new tool designed to help users complete a variety of complex, multi-step tasks using its own "virtual computer." By typing "/agent," users can activate the tool, which suggests tasks such as finding a top-rated coffee grinder under $150, reviewing rare earth metals coverage from The Wall Street Journal, creating a Google Maps list of the best bakeries in Copenhagen, finding a vintage "Japanese-style" lamp on Etsy for less than $200, and scheduling a date night via Google Calendar. The Verge tested the $200 monthly ChatGPT Pro subscription, and while the tool showed promise, it also revealed significant limitations. For instance, when tasked with finding a specific Japanese-inspired vintage-style lamp on Etsy, ChatGPT Agent took about 50 minutes to complete the search. The step-by-step process included navigating to Etsy, filtering the search, and checking shipping details. However, despite claiming to add the items to the user's cart, it only provided individual URLs, as it operates on its own virtual computer and lacks access to user-specific accounts. OpenAI's focus is on optimizing ChatGPT Agent for difficult tasks rather than speed. Yash Kumar, a product lead, and Isa Fulford, a research lead, explained during a private demo that the tool is intended to work in the background, allowing users to tackle other tasks while it completes the assigned jobs. They acknowledged that the agent's performance is slow but emphasized that it can still provide a meaningful speed-up for tasks that might take a user even longer to complete manually. The tool's reliability is another issue. When asked to manage financial transactions, such as setting up an automatic bank transfer, ChatGPT Agent refused and flagged the task as unauthorized due to security concerns. The agent can assist with everyday consumer purchases, like groceries and travel bookings, but not with high-stakes financial actions. This limitation is reinforced by the inclusion of a Watch Mode, which requires users to stay on the ChatGPT tab to ensure security during certain tasks. To further test its capabilities, The Verge asked ChatGPT Agent to buy flowers for a friend in Colorado. The agent successfully provided a list of options, complete with price ranges and delivery times, but encountered problems when it came to placing the actual order. Despite having previously recommended a local florist, the agent struggled to access the florist's website and complete the transaction, citing the inability to log into third-party sites or enter payment details. These issues highlight a significant gap between the tool's promise and its current functionality. While ChatGPT Agent excels at gathering and comparing information, it falters in executing tasks on the user's behalf, particularly when it comes to interacting with external websites and managing sensitive information. This can be a major inconvenience for users expecting the agent to handle the entire process seamlessly. Industry insiders and early adopters have noted that ChatGPT Agent represents a step forward in the development of AI assistants, but its performance and reliability issues could limit its adoption. The tool's reliance on a virtual computer and restricted access to user accounts make it less useful than it could be. As the technology matures, it's expected that these limitations will be addressed, potentially making ChatGPT Agent a more compelling solution for tasks that are tedious or time-consuming for humans. OpenAI's continued investment in AI tools like ChatGPT Agent underscores its commitment to advancing the field and addressing real-world challenges. However, the tool's current state suggests that there is still a significant amount of work to be done to improve its efficiency and trustworthiness. Until these issues are resolved, the practical value of ChatGPT Agent may remain limited. In summary, ChatGPT Agent is a promising addition to OpenAI's suite of AI tools, designed to handle complex, multi-step tasks. However, its slow performance, reliability issues, and inability to execute transactions directly on users' devices are significant drawbacks that need to be addressed for it to become a truly effective assistant. OpenAI's focus on hard tasks and background processing is commendable, but the tool's immediate utility is constrained by its technical limitations. As the technology evolves, it has the potential to revolutionize how we delegate everyday tasks, but for now, it falls short of its ambitious goals.