AI Daily Dev — Week 21

OpenAI Merges ChatGPT, Codex, and Developer API Into One Org Under Brockman

Co-founder Greg Brockman takes permanent product leadership; Codex chief Thibault Sottiaux now runs the combined core platform across consumer, enterprise, and developer.
Memo: 'invest in a single agentic platform and merge ChatGPT and Codex into one unified agentic experience for all' — Atlas browser folds into the same surface.
Lands four days before Google I/O and amid escalating pressure from Anthropic's enterprise lead; framed as run-up to a potential IPO later this year.
Reverses the 'side quests' era that began with December's 'code red' refocus on ChatGPT.

industry thenextweb.com

Single model generates synchronized video, audio, voice, and text in one pass — replacing the Veo 3.1 + Nano Banana split pipeline.
Chat-based editing: 'swap the red car for a black one' or 'remove the watermark' rewrites only the affected frames, keeping the rest pixel-stable.
Inherits Gemini's long context — characters keep their faces, outfits, and props across scenes of a short film.
Rolls out in Flash and Pro tiers across the Gemini app and AI Studio; the UI leak that surfaced two weeks ago ('Powered by Omni' next to codename Toucan) is now live.

models blog.google

New top-tier Gemini debuts in AI Mode in Search on day one — first time Google has shipped a frontier model into Search at launch.
Pre-keynote benchmarks circulating cite ~84.6% on ARC-AGI-2 and meaningful jumps on multimodal reasoning, but still trail Anthropic's April-7 Mythos and OpenAI's GPT-5.5 on the hardest agentic-coding evals.
Available via the Gemini app, AI Studio, and the Gemini API; Antigravity IDE picks it up as the default agent model.
Demis Hassabis framed it as the model that unifies all Gemini capabilities — single-model multimodality, deeper reasoning, lower latency at the Flash tier.

models blog.google

Self-hosted sandboxes shipped in public beta — tool execution moves to your infra while Anthropic still runs orchestration, context, and recovery.
Launch providers are Cloudflare, Daytona, Modal, and Vercel; each ships a configured guide for the Claude Platform.
MCP tunnels in research preview let agents reach private MCP servers over a single outbound connection — no inbound firewall rules, no public endpoints.
Lands the day before Anthropic's Code with Claude London Day 1 keynote (May 20).

tools claude.com

Gemini CLI and Gemini Code Assist extensions stop serving Google AI Pro, Ultra, and free Code Assist accounts on June 18, 2026.
Standard and Enterprise licenseholders keep current access — the cut hits independent devs and hobbyists first.
Replacement Antigravity CLI is a ~140 MB Go binary with built-in browser control, sandbox, Git policies, skills runtime, and subagents.
Agent Skills, Hooks, Subagents, and Extensions all carry over as Antigravity plugins.
HN thread on the front page within hours; commenters bristling at the forced migration and binary size.

tools developers.googleblog.com

OpenAI co-founder and ex-Tesla AI lead starts this week under pre-training lead Nick Joseph.
Mandate: stand up a new team that uses Claude itself to accelerate pre-training research — the bottleneck Anthropic just bought 220K H-class GPUs to push on.
Karpathy pauses his year-old education startup Eureka Labs to take the role.
HN front-page thread 'I've joined Anthropic' is the top developer story of the day.

industry techcrunch.com

Built on Moonshot's Kimi K2.5 checkpoint, with Cursor putting 85% of compute into extra training and RL on top.
79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1 — within points of Opus 4.7 and GPT-5.5.
Priced at $0.50 / $2.50 per million input/output tokens, roughly a tenth of Anthropic and OpenAI's frontier tiers.
Trained on 25× more synthetic tasks than Composer 2, including 'feature deletion' rebuild puzzles.
HN reaction split between developers cheering open-base coding models and skeptics flagging benchmark gaming on Cursor's own evals.

models cursor.com

Internal general-purpose reasoning model produced an infinite family of point configurations that beat square-grid scaling on the 1946 planar unit distance problem.
First time a prominent open question central to a math subfield has been solved autonomously by AI — and by a model not specially trained on math.
Fields medalist Tim Gowers wrote the companion paper and said he would recommend it to the Annals of Mathematics 'without hesitation.'
Verified externally by Noga Alon, Melanie Wood, Thomas Bloom, and Princeton's Will Sawin, who tightened the exponent.
HN front-page thread; reaction split between mathematicians calling it a milestone and skeptics asking how the proof tracks attribution to prior literature.

research openai.com

Benioff on the All-In podcast: a single customer's $300M annual token bill makes Salesforce one of Anthropic's largest commercial accounts.
Salesforce has also invested $300M+ in Anthropic and holds ~1% — payer and shareholder at the same time.
Teasing 'cool stuff with Slack and code' next; wants coding agents inside Slack alongside Agentforce.
Calls for an intermediary routing layer so simple tokens go to cheap models, not Claude — a hint at where the spend cap eventually bites.

industry thenextweb.com

iOS and Android, every plan including Free and Go; sessions run on your Mac, devbox, or remote env while the phone is just the control surface.
Approve commands, review diffs and test output, switch models, add context, or start new threads — files and credentials never leave the host machine.
Secure relay keeps sessions reachable across devices without exposing them to the public internet; Codex for Windows remote support 'coming soon.'
Remote SSH for company devboxes also went GA the same day; the long-running r/OpenAI and GitHub request to match Claude's mobile loop is finally answered.

tools openai.com