Introduction
This week brought three AI developments worth your attention.
TL;DR
- Agents can now use UIs reliably enough for real work.
- Security gets a detect → patch → PR loop, not just linting.
- 6 GW of GPUs means cheaper, faster AI—if power & cooling keep up.
First, agents learned to operate software interfaces visually—no API required. Second, security got an automated teammate that hunts vulnerabilities and proposes fixes. Third, OpenAI locked in massive compute capacity that will make advanced AI cheaper and more accessible.
I’ll explain what happened, why it matters, and what you can do with it. No fluff. Just the useful bits.
1. Google launches Gemini 2.5 “Computer Use”
Released: Oct 7, 2025 (preview) [1]
Google released a Gemini 2.5 capability that actually uses computers the way you and I do. It sees the screen, clicks buttons, fills forms, scrolls pages, and completes multi-step tasks with safety rails. Google reports state-of-the-art results on browser/mobile UI control and is making it available via the Gemini API. [1]
Is this truly new?
- Concept: not new—OpenAI showed a “computer-using agent”/Operator earlier in Jan 2025. [2, 3]
- What’s new now: Google’s public preview focused on browser control, with benchmarks and an API path. [1]
Scope differences (this week): Google’s preview targets browser actions (no broad OS/file access), whereas OpenAI has showcased agents with a broader virtual computer concept. [1, 2, 3, 4]
New API access + better reported benchmark scores make this practical for teams who struggled with brittle RPA/DOM scripts. [1]
RPA = Robotic Process Automation.
In plain English: it’s software that mimics what a person does on a computer—clicking buttons, filling forms, copying data between apps—to automate repetitive, rule-based tasks. No physical robots; just “screen robots” (scripts/bots).
What the “Brittle code” looks like:
# clicks the first button in the third column... until layout changes
page.click("//div[3]//button[1]")
Less brittle:
# stable, semantic hook: data attributes / ARIA roles
page.click("[data-action='checkout']") # your app adds this
# or
page.get_by_role("button", name="Checkout")
Most automation breaks when the website changes. Vision-based agents adapt like humans do. That’s the difference between brittle scripts and robust helpers.
Implementation Strategy: Visual Agents
| Builder Action | Implementation Detail |
|---|---|
| Add Stable UX Hooks | Inject data-action="pay" or data-role="primary-cta" onto key buttons to provide reliable semantic selection for the vision model. |
| Keep Agents On-Rails | Strictly enforce allow-list domains and hard step caps (e.g., maximum of 12 steps per execution loop) to prevent runaway processes. |
| Enforce Idempotency | Log all state-mutating actions with idempotency keys at the API level to absolutely prevent double-purchases or duplicate records if the agent retries. |
2. DeepMind unveils CodeMender
Published: Oct 2025 (blog & early results) [5]
What happened. DeepMind introduced CodeMender, an AI agent that hunts for bugs and fixes them automatically. It combines fuzzing, static analysis, differential testing, and LLM reasoning to spot vulnerabilities and propose patches. In early trials it submitted dozens of fixes to real OSS projects (with human review). [5]
This goes beyond “AI code suggestions.” It’s continuous security maintenance: detect risky patterns → propose fixes → open PRs → harden codebases over time.
Example.
Unsafe buffer handling in an image library is flagged; the agent proposes a safe rewrite, runs tests, then opens a PR with a clear diff and rationale.
Security debt compounds silently. An agent that finds and fixes vulnerabilities continuously? That’s not just helpful—it’s necessary.
Implementation Strategy: Automated Security
| Builder Action | Implementation Detail |
|---|---|
| Baseline Core Libraries | Begin deployment strictly on your top 3 internal libraries. Baseline your MTTR (mean time to repair) prior to implementation to measure hard ROI. |
| Enforce Human Gates | Strictly require human code-review and passing smoke tests on all auto-patch Pull Requests before merging. Do not allow autonomous merging to main. |
| Track Security Metrics | Quantify the agent’s value by tracking “vulns prevented / 1k LOC changed” on a monthly rolling basis. |
3. OpenAI and AMD: 6 gigawatts of AI compute
Announced: Oct 2025 (multi-year partnership; first 1 GW planned for 2H 2026 with MI450) [6]
What happened. OpenAI and AMD signed a deal for up to 6 GW of AMD Instinct GPUs. It’s one of the largest AI compute build-outs announced to date, with milestone-linked warrants. [6]
Compute capacity is oxygen for AI. More capacity → longer training runs, better multimodal models, and cheaper inference—if power and cooling keep pace.
What this means for you.
- Expect faster rollouts of long-context, tool-using agents with planning and memory.
- Fewer waitlists and downward pressure on API prices as capacity comes online.
- But timelines will depend on siting, power, and networking readiness.
Computing power isn’t the bottleneck anymore—**infrastructure** is. The best AI in the world is useless if you can’t power it.
The Weekly AI Delta
| Category | Previous State | New Development |
|---|---|---|
| Computer Use | The capability existed primarily in closed concepts (OpenAI Operator). | Google launched a broad public preview via Vertex with published benchmarks and an API path. |
| Security Agents | Static linters and passive LLM autocomplete suggestions. | CodeMender introduces an integrated detect → patch → PR loop validated against real OSS repositories. [5] |
| Compute Capacity | Ambiguous hyperscale build-outs. | OpenAI explicitly committed to 6 GW of capacity mapped to a concrete AMD MI450 timeline. [6] |
Quick comparison: Google vs. OpenAI (computer-using agents)
| Capability | Google Gemini 2.5 Computer Use | OpenAI Operator (concept) |
|---|---|---|
| Primary scope | Browser UI actions | Virtual computer + broader flows |
| Input signal | Visual/DOM + prompts | Visual/DOM + OS sandbox |
| Access model | API/Vertex preview | Limited demos/announcements |
| Guardrails focus | Step caps, allow-lists | Sandboxed VM + human reviews |
| Best fit (today) | Web workflows with flaky DOM | End-to-end app simulations |
| Maturity (this week) | New public preview | Earlier concept, evolving |
Risk Mitigation
| Domain | Limitation / Gotcha | Strategic Mitigation |
|---|---|---|
| Visual Agents | Cookie banners, captchas, MFA logic, and legal consent flows break autonomous agents. | These edge cases require manual product-level design, hardcoded overrides, or API bypasses. |
| CodeMender | Auto-generated security patches can inadvertently regress application performance. | You must run rigorous performance benchmarks in CI/CD alongside your standard security checks. |
| Compute Scale | Pledged 6 GW capacity does not immediately equal API availability. | Assume grid constraints and facility cooling delays will heavily dictate when token prices actually drop. |
Conclusion
So what changed this week? Agents got hands. Security got smarter. Compute got bigger.
Google’s computer-use model means automation can work wherever humans work—legacy systems, government portals, clunky interfaces—without waiting for APIs.[1]
DeepMind’s CodeMender shifts security from reactive firefighting to proactive maintenance. [5]
AMD’s 6-gigawatt deal with OpenAI signals more capacity and lower costs—if the infrastructure keeps pace. [6]
What to do now: Pilot visual agents in a safe sandbox, try security automation on your riskiest code, and design your stack for multi-provider LLM backends.
The tools are coming. Be ready to use them and have fun :)
Did you like this post? Please let me know if you have any comments or suggestions.
Posts about AI that might be interesting for youReferences
- Google — Introducing the Gemini 2.5 Computer Use model
- OpenAI — Computer-Using Agent (announcement page)
- OpenAI — Introducing Operator
- VentureBeat — Google’s AI can now surf the web, click buttons, and fill out forms
- Google DeepMind — Introducing CodeMender: an AI agent for code security
- AMD Investor Relations — AMD and OpenAI announce strategic partnership to deploy 6 GW of AMD GPUs