Introduction

Last week, I wrote about new models, security risks, and the scaling ceiling.

This week feels different. It is less about the model as an object and more about AI becoming part of ordinary systems: chat defaults, APIs, coding tools, government evaluation, labour law, electricity markets, and local permitting.

The model is still important. But increasingly, the signal is not only what the model can do. It is where the model is placed, who controls access to it, how developers build around it, and who pays when its costs leave the data centre.

I have picked the key signals this week, plus a short follow-up from last week’s DeepSeek story.

Models and Defaults

1. GPT-5.5 Instant becomes the default — and defaults are distribution power

OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT — TechCrunch, 5 May 2026

OpenAI claims ChatGPT's new default model hallucinates way less — The Verge, 5 May 2026

OpenAI makes default ChatGPT more personal — Axios, 5 May 2026

OpenAI released GPT-5.5 Instant on 5 May and made it the new default ChatGPT model, replacing GPT-5.3 Instant for everyday use. It is not the most dramatic kind of AI announcement. There was no theatrical launch event, no single jaw-dropping demo, and no new “this changes everything” benchmark chart. But default model changes matter because most people do not carefully choose their model. They use the one placed in front of them.

That makes GPT-5.5 Instant a distribution story as much as a model story.

According to OpenAI’s reported internal evaluations, GPT-5.5 Instant reduces hallucinated claims in high-stakes areas such as law, medicine, and finance compared with GPT-5.3 Instant, while keeping the low-latency behaviour expected from an “Instant” model. The Verge and Axios both highlighted the same direction of travel: fewer inaccurate claims, more concise answers, and more use of contextual memory for users who have those features enabled.

Golden Stat: OpenAI reported a 52.5% reduction in hallucinated claims in high-stakes domains versus GPT-5.3 Instant.

That last part is important. A default model is not just a bundle of weights. It is also a set of product decisions: how much context it uses, when it searches, how verbose it is, how it handles uncertainty, how it uses memory, and how much control users have over that memory.

There is a privacy tension here that should not be brushed aside. A more personal assistant can be genuinely useful. It can remember what you are working on, avoid asking the same questions again, and adapt to your style. But personalization also raises the stakes for transparency. Users need to know what the system remembers, where that information comes from, and how to delete or change it.

Why This Matters

The most important model update is not always the most powerful model. Sometimes it is the model that becomes normal.

For developers and AI product builders, GPT-5.5 Instant is a reminder that model quality is only one layer of the system. Defaults shape user behaviour. Latency shapes trust. Memory shapes perceived intelligence. And small changes in hallucination rate matter more when they are multiplied across millions of everyday interactions.

The frontier-model race is loud. Default distribution is quieter, but it may be more consequential.

Developer Tools

2. GPT-Realtime-2: voice becomes a reasoning interface

Advancing voice intelligence with new models in the API, 7 May 2026

OpenAI published a genuinely new developer signal on May 7: three new real-time voice models in the API.

The most important one is GPT-Realtime-2, described by OpenAI as its first voice model with GPT-5-class reasoning. This matters because voice AI has often been treated as a thin layer around speech recognition and text generation: transcribe the user, pass the text to a model, generate an answer, then speak it back.

GPT-Realtime-2 points to something more integrated. The model is designed for live voice interactions where the AI can keep a conversation moving while reasoning through a task, handling corrections, using tools, and responding in a tone that fits the moment.

OpenAI also introduced two companion models:

GPT-Realtime-Translate, for live speech translation from more than 70 input languages into 13 output languages.
GPT-Realtime-Whisper, a streaming speech-to-text model that transcribes speech live as someone is talking.

This is not just a “better voice assistant” update. It is a developer platform signal.

If voice models can reason, translate, transcribe, and use tools in real time, then voice becomes a serious interface for software. Think customer support, travel apps, healthcare intake, live education, accessibility tools, meetings, field work, and multilingual collaboration.

The interesting part is not only that the model speaks. The interesting part is that it can act while speaking.

That makes the product design problem much harder. Developers will need to think about interruption handling, consent, safety, disclosure, latency, privacy, and what the user should hear while the model is checking a calendar, calling an API, or recovering from an error.

Voice AI is moving from novelty to infrastructure. And infrastructure always brings defaults.

3. Google’s developer week — multimodal RAG, webhooks, and faster Gemma inference

Gemini API File Search is now multimodal: build efficient, verifiable RAG — Google, 5 May 2026

Webhooks — Gemini API documentation

Reduce friction and latency for long-running jobs with Webhooks in Gemini API — Google, 4 May 2026

Accelerating Gemma 4: faster inference with Multi-Token Prediction — Google, 5 May 2026

Google had one of the more developer-relevant AI weeks, but not because of a single giant model launch. The signal was a cluster of practical infrastructure updates.

First, Gemini API File Search became multimodal. Google added three features: multimodal support, custom metadata, and page-level citations. In plain language, this means developers can build retrieval-augmented generation systems over mixed material — not only text chunks, but also visual and document-like data — while keeping stronger links back to the original sources.

That matters because many real company knowledge bases are not clean text. They contain PDFs, screenshots, charts, slides, scanned diagrams, product images, tables, and messy internal documents. Classic RAG often reduces that world to plain text chunks. Multimodal retrieval is a step toward systems that can search over the material as it actually exists.

The page-level citation feature is also not a small detail. Production AI systems need more than answers; they need traceability. A model that says “the answer is in this document somewhere” is less useful than a model that can point to the relevant page. Grounding and citation quality are becoming product features, not just research concerns.

Second, Google introduced Gemini API Webhooks for long-running jobs. Instead of making developers repeatedly poll an API to check whether a batch job, video generation task, or long-running agentic workflow has finished, Gemini can send a signed notification when the work is complete.

This is boring in the best possible way.

Polling works for demos. Webhooks belong to production systems. They reduce unnecessary requests, lower latency, and make long-running AI workflows easier to integrate with queues, databases, dashboards, and notification systems. Serious AI applications need lifecycle plumbing: job states, retries, timeouts, signatures, audit trails, and failure handling.

For developers, this directly reduces the classic serverless timeout pain: if your agent chain runs longer than a typical synchronous execution window, webhooks let you move to asynchronous completion without babysitting polling loops.

Third, Google announced Multi-Token Prediction drafters for Gemma 4, designed to speed up inference. Multi-token prediction is a speculative decoding technique: a smaller or auxiliary drafting process predicts several likely next tokens, and the main model verifies them. If the guesses are good, generation can move faster without changing the final model output.

Google says the Gemma 4 drafters can deliver up to a 3× inference speedup without quality or reasoning degradation, and that the release supports common developer stacks such as Hugging Face Transformers, vLLM, SGLang, MLX, Ollama, and LiteRT-LM.

Golden Stat: Google reports up to 3× faster inference with Gemma 4 Multi-Token Prediction drafters.

That connects directly to one of the biggest practical AI pressures of 2026: latency. Inference cost matters, but so does waiting time. For coding agents, chat interfaces, customer-support tools, and interactive assistants, a slow model feels less intelligent even when its answer is good.

Why This Matters

This was the week’s clearest “AI becomes infrastructure” signal for developers.

Multimodal File Search points to better RAG. Webhooks point to more reliable agentic workflows. Gemma 4 drafters point to faster inference. None of these features is as glamorous as a frontier-model launch, but they are the kind of features developers need when moving from prototypes to production.

The industry is learning that a useful AI system is not just a model call. It is retrieval, state, citations, latency, async orchestration, permissions, monitoring, and failure recovery. The model may be the engine, but the infrastructure decides whether the car can be driven every day.

4. Azure SQL adds AI-aware memory right-sizing

Unlocking More Power with Flexible Memory in Azure SQL Managed Instance — Microsoft Community Hub, 6 May 2026

Azure SQL Managed Instance resource limits and flexible memory — Microsoft Learn

Microsoft introduced automated memory right-sizing for Azure SQL Managed Instances to better handle bursty workloads from retrieval-augmented generation and AI-integrated apps.

This is not a flashy release, but it is exactly how infrastructure adapts in practice: database defaults are being tuned for AI traffic patterns rather than only traditional business software patterns.

In enterprise systems, overprovisioning memory to survive occasional AI bursts gets expensive quickly. Automated right-sizing is a practical attempt to protect performance while reducing persistent overspend.

Why This Matters

AI cost control is moving downward into core cloud primitives. If your database layer is becoming AI-aware, AI is no longer an “add-on feature”; it is part of baseline platform engineering.

5. DeepMind x EVE Online treats agentic behavior as a systems problem

Fenris Creations enters research partnership with Google DeepMind — CCP/Fenris, 6 May 2026

Google DeepMind partners with EVE Online for AI model testing — Ars Technica, 6 May 2026

Google DeepMind and CCP Games (now Fenris Creations) announced on 6 May a partnership using EVE Online as a living environment for evaluating longer-horizon agent behaviour under complex social and economic constraints.

Unlike short benchmark tasks, this type of environment tests persistence, adaptation, and multi-agent strategy in a world that changes because of other actors. That makes it relevant for where autonomous software is heading.

Why This Matters

Agent progress is shifting from “can it solve one prompt?” to “can it behave coherently over time in shared systems?”

6. GitHub Copilot CLI moves from personal assistant to managed developer infrastructure

Enterprise-managed plugins in GitHub Copilot CLI are now in public preview — GitHub Changelog, 6 May 2026

Rubber Duck in GitHub Copilot CLI now supports more models — GitHub Changelog, 7 May 2026

About the Rubber Duck agent — GitHub Docs

GitHub shipped two Copilot CLI updates this week that point in the same direction: coding agents are becoming managed infrastructure inside organizations.

On 6 May, GitHub announced enterprise-managed plugins for GitHub Copilot CLI in public preview. Enterprise administrators can configure and distribute plugins across an organization, set baseline standards, and make custom agents or skills available automatically to users.

That may sound like a small admin feature, but it is a real shift. Coding agents are no longer only personal tools installed by individual developers. They are becoming part of the internal developer platform. Companies will want approved plugins, shared workflows, controlled access, secure defaults, and repeatable onboarding.

The next day, GitHub announced broader model support for the Rubber Duck feature in Copilot CLI. Rubber Duck is a built-in critic agent that gives Copilot a second opinion on its plans, code, and tests, using a different model family from the one driving the main session. In GPT-led sessions, Rubber Duck can use Claude. In Claude-led sessions, it can use GPT-5.5 as the reviewer model.

This is a subtle but important pattern: cross-model review.

The first generation of AI coding tools felt like autocomplete. The second generation felt like chat. The current generation increasingly looks like a small software team: one model plans, another writes, another reviews, and tools execute inside the development environment. That does not remove the need for human review. If anything, it changes what human review must focus on: requirements, architecture, security, edge cases, and whether the agent solved the right problem.

Why This Matters

The GitHub signal is not merely “Copilot got a new feature.” It is that coding agents are moving into the governance layer of software development.

Enterprise-managed plugins make agent behaviour more standardized across teams. Rubber Duck makes multi-model critique more normal. Together, they suggest a future where AI coding is less about one clever assistant and more about controlled, observable agent workflows embedded in the developer platform.

For developers, this could be useful. For engineering leaders, it creates new responsibilities. If AI agents can run commands, modify code, call tools, and influence architecture, they need the same kind of governance we already apply to CI/CD, secrets, dependencies, and production access.

Evaluation and Governance

7. The EU’s Digital Omnibus brings AI governance closer to implementation

EU agrees to simplify AI rules and ban nudification apps — European Commission, 7 May 2026

AI Act deal on simplification measures and nudifier app ban — European Parliament

The EU’s Digital Omnibus agreement (7 May 2026) signals a governance shift from broad principle debates toward operational timelines and enforceable boundaries.

Two details matter for this week’s governance signal: the hard ban direction on non-consensual AI nudification apps, and clearer implementation scheduling for AI Act obligations, including a runway toward high-risk obligations in late 2027.

This does not simplify compliance overnight. But it does reduce ambiguity for teams planning product roadmaps across Europe.

Why This Matters

Governance is now less theoretical. Teams can map legal risk and engineering timelines against concrete dates instead of policy drafts.

8. CAISI expands pre-release frontier model testing

CAISI signs agreements regarding frontier AI national security testing with Google DeepMind, Microsoft and xAI — NIST, 5 May 2026

Microsoft, xAI, Google will share AI models with US government for security reviews — Reuters, 5 May 2026

US and tech firms strike deal to review AI models for national security before public release — The Guardian, 5 May 2026

Advancing AI evaluation with CAISI and the UK AI Security Institute — Microsoft, 5 May 2026

On 5 May, NIST’s Centre for AI Standards and Innovation, or CAISI, announced expanded agreements with Google DeepMind, Microsoft, and xAI for frontier AI national-security testing. Reuters and The Guardian reported the same core point: major AI developers will give the US government early access to advanced models for security evaluation before public release.

This is one of the more important governance signals of the week because it moves evaluation upstream.

Historically, many AI risks became visible after release: users discovered jailbreaks, researchers found dangerous capabilities, and companies patched behaviour after public pressure. Pre-release testing changes that rhythm. It does not solve AI safety by itself, and it is not the same thing as binding regulation. But it does make government evaluation part of the frontier-model release pipeline.

The focus is national-security risk: cyber capabilities, biosecurity, chemical threats, and other high-consequence areas where small improvements in model capability can matter. NIST says these expanded collaborations provide for pre-deployment evaluations and related research. CAISI also says it has completed more than 40 evaluations across released and unreleased systems.

There is a political tension here. The US government wants to evaluate frontier systems without slowing domestic AI development so much that capability moves elsewhere. Companies want to demonstrate responsibility without handing over too much strategic control. Users and civil society want assurance that the most powerful systems are tested before they are widely deployed.

So far, this is still mostly a voluntary and cooperative model. But the direction is clear: frontier AI releases are becoming more formal, more watched, and more entangled with national-security institutions.

Why This Matters

The story is not simply that “the government is testing AI models.” The deeper signal is that pre-release evaluation is becoming part of the normal release process for frontier models.

For developers, this may feel distant. Most of us are not training frontier models. But evaluation norms tend to travel downward. What starts with national-security testing for frontier labs can influence procurement rules, enterprise risk reviews, red-teaming expectations, audit trails, and model-card requirements.

The release pipeline for powerful AI systems is starting to look less like consumer software and more like critical infrastructure.

Society and Work

9. China signals that AI adoption alone does not justify dismissal

A tech worker in China is laid off and replaced by AI. Is it legal? — NPR, 1 May 2026

Chinese court rules that companies cannot terminate employees just to replace them with AI — Fortune, 3 May 2026

No, China hasn't made it illegal to fire humans and replace them with AI — PC Gamer, 6 May 2026

Asia tech news roundup — The Register, 4 May 2026

A Chinese court ruling became one of the week’s clearest labour signals: AI adoption alone is not enough to justify dismissing a worker.

The case involved a tech worker in Hangzhou, identified in reports as Zhou, whose work included checking the quality of large language model outputs. The company automated parts of the role, offered him a lower-paid position, and then terminated him after he refused. Zhou challenged the dismissal and won. The Hangzhou Intermediate People’s Court upheld the decision.

The precise wording matters here. This is not the same as saying “China made it illegal to replace people with AI.” That would be too broad. Companies can still restructure, automate, and reorganize. But the ruling signals that an employer cannot simply point to AI adoption as a magic legal reason for dismissal. The company still needs lawful grounds, reasonable process, and fair treatment.

PC Gamer’s headline captured the nuance well: China has not made AI replacement categorically illegal, but it has made the move more legally and financially difficult for companies that try to shift automation costs directly onto workers.

There is another important caveat. China is a civil-law system, so this ruling does not create precedent in the same way a court decision might in the United States or the United Kingdom. But Chinese courts and state bodies can elevate cases as guiding examples. That makes this ruling more than a private employment dispute. It becomes a signal to companies, workers, arbitrators, and local courts about how AI-related dismissals may be viewed.

Why This Matters

This is one of the first clear legal boundaries around AI-driven job replacement.

The principle is simple but powerful: adopting AI is a business choice, not an act of nature. If a company chooses to automate work, it cannot automatically push the entire cost of that decision onto the worker.

Golden Stat: The Hangzhou court rejected AI replacement as a standalone legal basis for dismissal in this case.

That idea will travel, even if the legal mechanism does not. Other countries may not copy the ruling directly, but labour lawyers, unions, regulators, and policymakers will notice the argument. In the US and Europe, where companies increasingly describe layoffs as AI-enabled efficiency, the Hangzhou case gives a clean counter-formulation: AI adoption does not remove ordinary employment obligations.

The social question surrounding AI-driven job displacement is often framed as “Will AI take jobs?” This ruling asks a more practical question: when AI changes a job, what process is owed to the person doing it?

Infrastructure

10. Data centres become a ratepayer and permitting fight

US power grid operator PJM is considering market overhaul — Reuters, 6 May 2026

US grid operator PJM forecasts sufficient generation for peak demand this summer — Reuters, 7 May 2026

New evidence on data center employment effects — Brookings, 4 May 2026

Capacity cost explosion: what PJM's bill means for data centers — Utility Dive, 29 April 2026

Sanders, Ocasio-Cortez announce AI Data Center Moratorium Act — Senator Bernie Sanders, 25 March 2026

The AI data-centre story moved further into electricity politics this week.

The most current hook is PJM, the large US grid operator serving parts of the Mid-Atlantic, Midwest, and South. Reuters reported on 6 May that PJM is considering market changes after warning of possible electricity shortfalls as early as 2027. Reuters also reported on 7 May that PJM expects sufficient generation for this summer’s peak demand, with projected peak load around 156,400 megawatts and about 180,200 megawatts of generation available.

So the story is not simply “the grid is about to fail.” It is more specific: the region is trying to manage fast demand growth, high capacity costs, generator delays, and political anger about electricity bills.

Data centres are a major part of that debate. Brookings reported this week that in the PJM region, which serves roughly 65 million people across 13 states and Washington, D.C., power supply costs jumped from $2.2 billion to $14.7 billion in a single year, with data centres accounting for nearly two-thirds of the increase. Utility Dive separately described a sharp rise in PJM capacity billings, from $2.69 billion in 2024 to $10.39 billion in 2025.

Golden Stat: PJM power supply costs rose from $2.2B to $14.7B in one year, with data centres tied to most of the increase.

The exact numbers vary depending on which cost category is measured, but the direction is unambiguous: AI and cloud data-centre growth are colliding with electricity markets that were not designed for this speed or concentration of demand.

This is why the Sanders/Ocasio-Cortez AI Data Centre Moratorium Act, introduced on 25 March, still matters as context even though it is outside this week’s window. The bill is unlikely to pass in the current Congress, and critics have fairly argued that a broad moratorium is a blunt instrument. But the underlying political problem is real. Local communities and ratepayers are asking why the costs of AI infrastructure should be socialized through higher bills, water use, land use, and grid upgrades.

That question will not disappear because one federal bill fails.

Why This Matters

AI infrastructure is becoming local politics.

For AI companies, the constraints may not be limited to GPUs or model architecture. It may be substations, transmission queues, permitting, water use, community opposition, and electricity market design. For cloud customers, infrastructure costs eventually become product costs. For households and small businesses, the concern is simpler: whether the AI boom makes electricity more expensive.

This is a very different kind of AI governance. It is not about whether a model can reason. It is about whether a county wants another data centre, whether a grid operator can connect it, and whether residents believe they are paying for someone else’s compute.

The model may live in the cloud. The bill arrives somewhere very physical.

Follow-up From Last Week

DeepSeek V4 gets its independent benchmark reality check: the end of benchmark exceptionalism

CAISI evaluation of DeepSeek V4 Pro — NIST, 1 May 2026

Last week, DeepSeek V4 was one of the central model stories: open weights, aggressive pricing, long context, and a strong cost-efficiency claim. I do not want to repeat that full discussion here. The new signal this week is narrower and more useful: CAISI’s independent evaluation complicated the frontier-parity story.

NIST’s CAISI evaluation found that DeepSeek V4 Pro remains cost-efficient, but lags the US frontier by approximately eight months on CAISI’s independent benchmarks. The evaluation also highlighted the difference between public benchmark claims and harder-to-contaminate, non-public assessments.

Golden Stat: CAISI assessed DeepSeek V4 Pro at roughly an 8-month lag versus US frontier capability on its independent evaluation track.

That does not make DeepSeek V4 unimportant. It makes it more practical to discuss.

The lesson for developers is simple: treat launch benchmarks as a starting point, not a procurement decision. If a model is much cheaper, it may be excellent for classification, extraction, translation, summarization, or bulk document processing even if it is not the strongest model for hard reasoning or agentic coding. But the only evaluation that really matters is your own workload.

Why This Matters

The DeepSeek story has moved from “is this frontier?” to “where is this economically useful?”

That is a healthier question. Most production AI systems do not need the absolute best model for every step. They need the right model for each part of the pipeline: a cheap model for extraction, a stronger model for synthesis, a fast model for user interaction, and perhaps a specialist model for code or mathematics.

The cost floor is changing. The benchmark story is messier. Both facts can be true.

Closing Thoughts

Step back from the week, and the theme is hard to miss: AI is becoming infrastructure.

A default ChatGPT model shapes everyday behaviour. Gemini’s developer updates make retrieval, async jobs, and inference speed feel more like ordinary software engineering problems. GitHub’s Copilot CLI changes show coding agents becoming something enterprises configure and govern. CAISI’s agreements move frontier model evaluation closer to the release pipeline. A Chinese labour ruling gives automation a legal boundary. PJM’s electricity debate shows that AI data centres are no longer an abstract cloud story; they are a grid, ratepayer, and permitting story.

None of this is as clean as a model launch.

But that may be the point. When a technology becomes infrastructure, it stops appearing only as a product announcement. It starts appearing in admin settings, court rulings, API docs, capacity markets, electricity bills, and local planning meetings.

The model is still part of the story. This week, it just was not the whole story.

Did you like this post? Please let me know if you have any comments or suggestions.

AI’s New Defaults and Hidden Costs

Introduction

Models and Defaults

1. GPT-5.5 Instant becomes the default — and defaults are distribution power

Developer Tools

2. GPT-Realtime-2: voice becomes a reasoning interface

3. Google’s developer week — multimodal RAG, webhooks, and faster Gemma inference

4. Azure SQL adds AI-aware memory right-sizing

5. DeepMind x EVE Online treats agentic behavior as a systems problem

6. GitHub Copilot CLI moves from personal assistant to managed developer infrastructure

Evaluation and Governance

7. The EU’s Digital Omnibus brings AI governance closer to implementation

8. CAISI expands pre-release frontier model testing

Society and Work

9. China signals that AI adoption alone does not justify dismissal

Infrastructure

10. Data centres become a ratepayer and permitting fight

Follow-up From Last Week

DeepSeek V4 gets its independent benchmark reality check: the end of benchmark exceptionalism

Closing Thoughts

References

Citation