HyperAI
Back to Headlines

Gemini in Chrome: Early Steps Toward Google's Agentic AI Future

5 days ago

Google has taken a significant step forward in integrating its AI assistant, Gemini, directly into the Chrome browser. This new feature, currently available in early access to AI Pro or AI Ultra subscribers using the Beta, Dev, or Canary versions of Chrome, allows users to interact with Gemini right from their browser. By clicking the Gemini button in the top-right corner of Chrome, users can start conversations, and the assistant can "see" what's on their screen, enhancing its ability to provide contextually relevant assistance. One of the standout features of Gemini in Chrome is its ability to summarize articles and identify objects in images and videos. I tested this by using Gemini to summarize articles on The Verge and even find gaming-related news on the homepage. It successfully pointed out new Game Boy games added to Nintendo's Switch Online service, the upcoming "Elden Ring" film adaptation, and a major Steam Deck update. However, to get accurate summaries, you need to make the content visible to the assistant. For instance, if you want Gemini to summarize The Verge’s comments section, you must first expand it. Gemini also supports voice commands through its "Live" feature, activated by a button in the bottom-right corner of the dialog box. This feature was particularly useful when paired with YouTube videos. I asked Gemini about tools being used in a bathroom remodeling video, and it correctly identified a nail gun and other equipment. Similarly, it accurately described components in a motherboard repair video, recognizing a capacitor and the tools used to remove it. However, the accuracy of these responses depended on the video having clear labels or captions; without them, Gemini sometimes struggled. Another practical application I discovered was Gemini’s recipe extraction from YouTube videos. I watched a cooking tutorial and asked Gemini to list the ingredients and steps, which it did, saving me the trouble of writing them down or searching the video description for a link. Additionally, when shopping on Amazon, Gemini was helpful in identifying specific items, such as waterproof bags, and suggesting similar products. Despite these successes, Gemini’s integration with Chrome is not without its limitations. At times, responses were too lengthy for the pop-up window, especially on smaller screens like my MacBook Air’s 13-inch display. AI is often praised for its efficiency in providing quick, concise answers, but Gemini sometimes delivered detailed responses that required expanding the window, which detracted from the user experience. The assistant also frequently repeated follow-up questions, such as offering more information on a topic, which became repetitive. Moreover, Gemini occasionally encountered issues with real-time data. For example, when I asked it to locate MrBeast in a video of him exploring ancient Mayan cities, it initially responded that it couldn't access real-time information. A second attempt yielded the correct location, listed in the video's description: Mexico. This inconsistency highlights the ongoing challenges with real-time data processing and context awareness. The vision behind Gemini’s integration with Chrome is broader than just answering simple queries. Google aims to make its AI more "agentic," meaning capable of performing tasks on behalf of users. After summarizing a restaurant’s menu, I was inspired to consider asking Gemini to place a pickup order, a task it cannot yet handle. In the future, it could seamlessly assist with various activities, such as bookmarking travel-related pages or adding specific YouTube videos to your Watch Later playlist. Google’s plans to enhance Gemini’s capabilities are evident with Project Mariner’s "Agent Mode," which will allow the assistant to manage up to 10 tasks simultaneously and search the web autonomously. While this advanced mode is currently only available in the Gemini app, it suggests a path where the browser integration could become much more powerful. The goal is to create an AI that acts as a seamless extension of the user, enhancing productivity and convenience across multiple digital platforms. Industry insiders view Gemini in Chrome as a promising development, signaling a shift towards more interactive and proactive AI assistants. This integration marks a step toward achieving Google’s vision of "agentic" AI, where assistants can take action independently. As Google continues to refine and expand Gemini’s features, it is expected to significantly impact how users interact with web content and perform everyday tasks online. Google, known for its robust AI and machine learning capabilities, is positioning itself to lead in this evolving space, potentially setting new standards for browser-based AI assistants.

Related Links