Next Story
Newszop

Google announces Gemini 2.5 Computer Use AI model that can control web browsers like humans do

Send Push
Google just rolled out Gemini 2.5 Computer Use, an AI model that can actually click buttons, fill out forms, and scroll through websites just like a person would. Instead of relying on structured APIs to interact with software, this model uses visual understanding to navigate interfaces designed for humans.

This is how Gemini 2.5 Computer Use model works
Built on Gemini 2.5 Pro's visual understanding capabilities, the model operates in a continuous loop. It receives screenshots of the current environment, analyses the user's request along with action history, and generates responses as function calls representing UI actions. The system supports 13 actions including opening browsers, typing text, dragging and dropping elements, and navigating URLs. After each action executes, the model receives a new screenshot to restart the loop until the task completes.


Google demonstrated the model through use cases ranging from managing pet spa appointments across multiple websites to organising digital sticky notes. The model shows particular strength in web browsers and Android mobile interfaces, though it's not yet optimised for desktop operating system control.


Google’s Gemini 2.5 Computer Use beats Claude , ChatGPT on benchmarks
Google claims Gemini 2.5 Computer Use outperforms rivals like Claude and ChatGPT on several web and mobile control benchmarks, while also delivering lower latency. Early testers are already putting it to work. One AI assistant company said it's often 50% faster than competing solutions, while another found it boosted performance by up to 18% on complex data parsing tasks. Google's own payments team uses it to fix broken UI tests, successfully recovering over 60% of failed test runs.

Safety guardrails are in place to mitigate AI risks
Since AI agents controlling computers come with unique risks—including potential misuse and unexpected behavior—Google built safety features directly into the model. Developers can set up controls to prevent the AI from auto-completing high-risk actions like bypassing CAPTCHAs or compromising system security.

The model's available now in preview through Google AI Studio and Vertex AI , and there's a demo on Browserbase where you can watch it tackle tasks like playing 2048 or browsing Hacker News.


Loving Newspoint? Download the app now