Introduction
This week, AI didn’t make a fuss. Instead, it quietly slipped into places where it can genuinely help: in our editors, in our browsers, and even at the roadside charger. We gained a coding model that understands messy prompts, an assistant that makes shopping less painful, a small predictive model that tackles EV range worries, and a new open-source maths system that writes and checks its own proofs. It’s the sort of progress that whispers, not shouts.
The Tools Are Getting Smarter, Softer, and Surprisingly Practical
1. Claude Opus 4.5
Anthropic released Claude Opus 4.5, a refined version of their flagship model. The announcement highlights improved reasoning, stronger coding ability, more reliable safety behaviour, and faster responses. It’s designed with practical development work in mind: fewer hallucinations, better handling of multi-step tasks, and steadier code generation across languages.
For anyone who’s ever stared at a misbehaving function with dramatic flair, Opus 4.5 feels like the calm colleague who reads your puzzled commit and simply says, “I know what you meant.”
Why developers are appreciating it:
- It handles multi-language code without losing track of context.
- Its generation is more cautious, producing results that feel closer to production-ready.
- Safety improvements reduce the risk of leaking sensitive details when coding with real codebases.
- It works smoothly through the API with no special configuration.
However, a significant part of this release was the native integration with Excel and Chrome, allowing it to perform web research without the usual copy-paste friction and handle complex spreadsheets.
If you are interested in real use cases, see this lovely review video:
2. ChatGPT’s New Shopping Research
Introducing shopping research in ChatGPT
OpenAI introduced a shopping research feature inside ChatGPT. According to the announcement, it helps users compare items across brands, understand trade-offs, and receive curated suggestions based on their preferences. Instead of scrolling through long lists, you describe what you need, and ChatGPT compares relevant products with transparent summaries and links.
For developers and product teams, this is another small sign that “agentic shopping assistance” is becoming a pattern. It’s not about pushing products; it’s about filtering complexity into something you can actually act on.
We once believed online shopping would save us time. Now we need AI to save us from online shopping. At this pace, we’ll soon have AI agents negotiating with other AI agents while we sip tea and wonder how life became so civilised.
This video explains that it is not a good tool for price comparison and “deal finder”, however, it is yet a great products research tool:
3. Reducing EV Range Anxiety with Predictive AI
Reducing EV range anxiety: How a simple AI model predicts port availability
Google Research shared a practical study using a simple linear regression model to predict whether an EV charging port will be available. The post confirms this isn’t a giant neural network but a deliberately lightweight model chosen for speed and on-device efficiency, proving that sometimes a scalpel works better than a sledgehammer.
Despite its simplicity, the model significantly improves predictions over common heuristics. It helps drivers reduce uncertainty about whether they’ll find an open charging port on arrival — a small but meaningful improvement for anyone planning longer trips.
It’s funny how our worries shift. We moved from “Will the next petrol station appear before the fuel light panics?” to “Will someone else be using the charger?” At least this time, an AI is trying to help, which is more than my old satnav ever managed when it triumphantly declared I had arrived in the middle of a field.
4. DeepSeekMath-V2: An Open-Source Leap in Mathematical Reasoning
DeepSeekMath-V2: A Model That Doesn’t Just Guess — It Checks Itself
One of the quiet highlights this week comes from DeepSeek: a maths model that feels less like a calculator and more like a careful classmate who double-checks every step. According to the official documentation and early reporting, DeepSeekMath-V2 actually reaches competition-level performance on some of the world’s toughest maths contests.
The team reports gold-medal-level results on the 2025 International Mathematical Olympiad (IMO) and similarly strong performance on the 2024 Chinese Mathematical Olympiad (CMO) — both famously difficult and usually solved by the sharpest teenage mathematicians worldwide.
See GitHub PDF
and Hugging Face model card.
And then there’s the 2024 Putnam Competition, which is notorious for reducing entire generations of undergraduates to silence. For context: many human participants score somewhere between 0 and 10. Even the top students hover around 70–90. DeepSeekMath-V2 reportedly scored 118 out of 120 — almost perfect, a claim detailed in their technical report.
How does it manage this? Not by guessing. The model uses a generator–verifier loop: first it writes a full proof, then it checks each step. If it finds a gap, it tries again. It’s a bit like watching someone rewrite their homework until every line finally makes sense.
- DeepSeekMath-V2 vs. Typical Human Top Scores *
| Competition | Human Performance (Typical Top Scorers) | DeepSeekMath-V2 Performance | Sources |
|---|---|---|---|
| International Mathematical Olympiad (IMO) 2025 | Gold medallists usually solve 4–5 out of 6 problems (≈ 28–35 points). Only ~10–12% earn gold. | Reported to solve 5 out of 6 problems → gold-medal-level. | GitHub PDF • Hugging Face |
| Chinese Mathematical Olympiad (CMO) 2024 | One of the hardest national contests; gold ≈ top tier of solvers, typically scoring near the upper boundary. | Reported gold-medal-level performance matching strong human contestants. | GitHub PDF • Hugging Face |
| Putnam Competition 2024 | Exceptional human scores vary yearly; top scorers often fall in the 70–90 range out of 120. Many participants score 0–10 points. | Reported 118/120 — near-perfect and well above historical human highs. | GitHub PDF |
There’s a small footnote worth keeping in mind: these results use scaled test-time compute — the model is allowed many attempts, while humans only get one. Even so, the direction is striking. It shows what happens when an AI system aims not just for the right answer, but for reasoning that can be inspected, corrected, and trusted. It’s the kind of steady progress that quietly reshapes what we expect from machine intelligence.
And, please note that only ~10–12% earn gold medals, and these gold medalists typically solve 4–5 problems, whereas the average participant solves far fewer.
This video provides some technical details on this new math model:
Conclusion
This week’s AI updates arrived without drama — and perhaps that’s why they matter. We gained a steadier coding companion, a calmer shopping guide, a simple model that makes EV driving less stressful, and an open-source system that can write and verify its own mathematical proofs. None of these are flashy leaps, yet together they mark a shift towards tools that genuinely fit into our days.
Did you like this post? Please let me know if you have any comments or suggestions.
Weekly posts (recent) that might be interesting for you