Elena' s AI Blog

Safety, Agents, and Compute

10 Oct 2025 / 8 minutes to read

Elena Daehnhardt


Midjourney 7.0: An artificial intelligence bot sits on a gigantic iceberg at the North Pole and holds a refreshing cocktail in hand, realistic, HD -


Introduction

This week brought three AI developments worth your attention.

TL;DR

  • Agents can now use UIs reliably enough for real work.
  • Security gets a detect → patch → PR loop, not just linting.
  • 6 GW of GPUs means cheaper, faster AI—if power & cooling keep up.

First, agents learned to operate software interfaces visually—no API required. Second, security got an automated teammate that hunts vulnerabilities and proposes fixes. Third, OpenAI locked in massive compute capacity that will make advanced AI cheaper and more accessible.

I’ll explain what happened, why it matters, and what you can do with it. No fluff. Just the useful bits.

1. Google launches Gemini 2.5 “Computer Use”

Released: Oct 7, 2025 (preview) [1]

Google released a Gemini 2.5 capability that actually uses computers the way you and I do. It sees the screen, clicks buttons, fills forms, scrolls pages, and completes multi-step tasks with safety rails. Google reports state-of-the-art results on browser/mobile UI control and is making it available via the Gemini API. [1]

Is this truly new?

  • Concept: not new—OpenAI showed a “computer-using agent”/Operator earlier in Jan 2025. [2, 3]
  • What’s new now: Google’s public preview focused on browser control, with benchmarks and an API path. [1]

Scope differences (this week): Google’s preview targets browser actions (no broad OS/file access), whereas OpenAI has showcased agents with a broader virtual computer concept. [1, 2, 3, 4 ]

New API access + better reported benchmark scores make this practical for teams who struggled with brittle RPA/DOM scripts. [1]

RPA = Robotic Process Automation.

In plain English: it’s software that mimics what a person does on a computer—clicking buttons, filling forms, copying data between apps—to automate repetitive, rule-based tasks. No physical robots; just “screen robots” (scripts/bots).

What the “Brittle code” looks like:

# clicks the first button in the third column... until layout changes
page.click("//div[3]//button[1]")

Less brittle:

# stable, semantic hook: data attributes / ARIA roles
page.click("[data-action='checkout']")         # your app adds this
# or
page.get_by_role("button", name="Checkout")

Most automation breaks when the website changes. Vision-based agents adapt like humans do. That’s the difference between brittle scripts and robust helpers.

Action for builders

  • Add stable UX hooks: data-action="pay" / data-role="primary-cta" on key buttons for reliable selection.
  • Keep agents on-rails: allow-list domains and step caps (e.g., 12 steps).
  • Log actions with idempotency keys to prevent double-purchases.

Read Google DeepMind

2. DeepMind unveils CodeMender

Published: Oct 2025 (blog & early results) [5]

What happened. DeepMind introduced CodeMender, an AI agent that hunts for bugs and fixes them automatically. It combines fuzzing, static analysis, differential testing, and LLM reasoning to spot vulnerabilities and propose patches. In early trials it submitted dozens of fixes to real OSS projects (with human review). [5]

This goes beyond “AI code suggestions.” It’s continuous security maintenance: detect risky patterns → propose fixes → open PRs → harden codebases over time.

Example.
Unsafe buffer handling in an image library is flagged; the agent proposes a safe rewrite, runs tests, then opens a PR with a clear diff and rationale.

Security debt compounds silently. An agent that finds and fixes vulnerabilities continuously? That’s not just helpful—it’s necessary.

Action for builders

  • Start with your top 3 internal libraries; baseline MTTR (mean time to repair) and measure improvements.
  • Require human review + smoke tests on all auto-patch PRs.
  • Track “vulns prevented / 1k LOC changed” monthly.

Read Google DeepMind

3. OpenAI and AMD: 6 gigawatts of AI compute

Announced: Oct 2025 (multi-year partnership; first 1 GW planned for 2H 2026 with MI450) [6]

What happened. OpenAI and AMD signed a deal for up to 6 GW of AMD Instinct GPUs. It’s one of the largest AI compute build-outs announced to date, with milestone-linked warrants. [6]

Compute capacity is oxygen for AI. More capacity → longer training runs, better multimodal models, and cheaper inference—if power and cooling keep pace.

What this means for you.

  • Expect faster rollouts of long-context, tool-using agents with planning and memory.
  • Fewer waitlists and downward pressure on API prices as capacity comes online.
  • But timelines will depend on siting, power, and networking readiness.

Computing power isn’t the bottleneck anymore—**infrastructure** is. The best AI in the world is useless if you can’t power it.

Read AMD

What changed (this week vs. before)

  • Computer Use: The capability existed (OpenAI Operator, Jan 2025). New: Google’s broader public preview + benchmarks + API path. [1–4]
  • Security agents: Linters and LLM suggestions existed. New: an integrated detect → patch → PR loop validated on real OSS. [5]
  • Compute: Hyperscale build-outs are ongoing. New: the size (6 GW) and explicit MI450 timeline. [6]

Quick comparison: Google vs. OpenAI (computer-using agents)

Capability Google Gemini 2.5 Computer Use OpenAI Operator (concept)
Primary scope Browser UI actions Virtual computer + broader flows
Input signal Visual/DOM + prompts Visual/DOM + OS sandbox
Access model API/Vertex preview Limited demos/announcements
Guardrails focus Step caps, allow-lists Sandboxed VM + human reviews
Best fit (today) Web workflows with flaky DOM End-to-end app simulations
Maturity (this week) New public preview Earlier concept, evolving

Citations: [1, 2, 3 , 4]

Limits & gotchas

  • Agents: cookie banners, captchas, MFA, and legal consent flows still need product-level design and explicit handling.
  • CodeMender: patches can regress performance; keep perf benchmarks in CI alongside security checks.
  • Compute: capacity ≠ availability; grid constraints and cooling determine how fast tokens actually get cheaper.

Conclusion

So what changed this week? Agents got hands. Security got smarter. Compute got bigger.

Google’s computer-use model means automation can work wherever humans work—legacy systems, government portals, clunky interfaces—without waiting for APIs.[1] DeepMind’s CodeMender shifts security from reactive firefighting to proactive maintenance. [5]
AMD’s 6-gigawatt deal with OpenAI signals more capacity and lower costs—if the infrastructure keeps pace. [6]

What to do now: Pilot visual agents in a safe sandbox, try security automation on your riskiest code, and design your stack for multi-provider LLM backends.

The tools are coming. Be ready to use them and have fun :)

Did you like this post? Please let me know if you have any comments or suggestions.

Posts about AI that might be interesting for you






References

  1. Google — Introducing the Gemini 2.5 Computer Use model
  2. OpenAI — Computer-Using Agent (announcement page)
  3. OpenAI — Introducing Operator
  4. VentureBeat — Google’s AI can now surf the web, click buttons, and fill out forms
  5. Google DeepMind — Introducing CodeMender: an AI agent for code security
  6. AMD Investor Relations — AMD and OpenAI announce strategic partnership to deploy 6 GW of AMD GPUs
desktop bg dark

About Elena

Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.




Citation
Elena Daehnhardt. (2025) 'Safety, Agents, and Compute', daehnhardt.com, 10 October 2025. Available at: https://daehnhardt.com/blog/2025/10/10/safety-agents-and-compute/
All Posts