AI DAILY / DEV
Weekly Rollup
Week 21

OpenAI Merges ChatGPT, Codex, and Developer API Into One Org Under Brockman

  • Co-founder Greg Brockman takes permanent product leadership; Codex chief Thibault Sottiaux now runs the combined core platform across consumer, enterprise, and developer.
  • Memo: 'invest in a single agentic platform and merge ChatGPT and Codex into one unified agentic experience for all' — Atlas browser folds into the same surface.
  • Lands four days before Google I/O and amid escalating pressure from Anthropic's enterprise lead; framed as run-up to a potential IPO later this year.
  • Reverses the 'side quests' era that began with December's 'code red' refocus on ChatGPT.
industry thenextweb.com

Gemini Omni Lands at I/O as Google's First Unified Video, Audio, and Image Model

  • Single model generates synchronized video, audio, voice, and text in one pass — replacing the Veo 3.1 + Nano Banana split pipeline.
  • Chat-based editing: 'swap the red car for a black one' or 'remove the watermark' rewrites only the affected frames, keeping the rest pixel-stable.
  • Inherits Gemini's long context — characters keep their faces, outfits, and props across scenes of a short film.
  • Rolls out in Flash and Pro tiers across the Gemini app and AI Studio; the UI leak that surfaced two weeks ago ('Powered by Omni' next to codename Toucan) is now live.
models blog.google

Google Ships a New Gemini Flagship at I/O Behind Claude Mythos and GPT-5.5

  • New top-tier Gemini debuts in AI Mode in Search on day one — first time Google has shipped a frontier model into Search at launch.
  • Pre-keynote benchmarks circulating cite ~84.6% on ARC-AGI-2 and meaningful jumps on multimodal reasoning, but still trail Anthropic's April-7 Mythos and OpenAI's GPT-5.5 on the hardest agentic-coding evals.
  • Available via the Gemini app, AI Studio, and the Gemini API; Antigravity IDE picks it up as the default agent model.
  • Demis Hassabis framed it as the model that unifies all Gemini capabilities — single-model multimodality, deeper reasoning, lower latency at the Flash tier.
models blog.google

Claude Managed Agents Get Self-Hosted Sandboxes and Private MCP Tunnels

  • Self-hosted sandboxes shipped in public beta — tool execution moves to your infra while Anthropic still runs orchestration, context, and recovery.
  • Launch providers are Cloudflare, Daytona, Modal, and Vercel; each ships a configured guide for the Claude Platform.
  • MCP tunnels in research preview let agents reach private MCP servers over a single outbound connection — no inbound firewall rules, no public endpoints.
  • Lands the day before Anthropic's Code with Claude London Day 1 keynote (May 20).
tools claude.com

Google Sunsets Gemini CLI on June 18, Folds It Into Antigravity CLI

  • Gemini CLI and Gemini Code Assist extensions stop serving Google AI Pro, Ultra, and free Code Assist accounts on June 18, 2026.
  • Standard and Enterprise licenseholders keep current access — the cut hits independent devs and hobbyists first.
  • Replacement Antigravity CLI is a ~140 MB Go binary with built-in browser control, sandbox, Git policies, skills runtime, and subagents.
  • Agent Skills, Hooks, Subagents, and Extensions all carry over as Antigravity plugins.
  • HN thread on the front page within hours; commenters bristling at the forced migration and binary size.
tools developers.googleblog.com

Andrej Karpathy Joins Anthropic to Run a Claude-Powered Pre-Training Team

  • OpenAI co-founder and ex-Tesla AI lead starts this week under pre-training lead Nick Joseph.
  • Mandate: stand up a new team that uses Claude itself to accelerate pre-training research — the bottleneck Anthropic just bought 220K H-class GPUs to push on.
  • Karpathy pauses his year-old education startup Eureka Labs to take the role.
  • HN front-page thread 'I've joined Anthropic' is the top developer story of the day.
industry techcrunch.com

Cursor Composer 2.5 Matches Opus 4.7 and GPT-5.5 at One-Tenth the Price

  • Built on Moonshot's Kimi K2.5 checkpoint, with Cursor putting 85% of compute into extra training and RL on top.
  • 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1 — within points of Opus 4.7 and GPT-5.5.
  • Priced at $0.50 / $2.50 per million input/output tokens, roughly a tenth of Anthropic and OpenAI's frontier tiers.
  • Trained on 25× more synthetic tasks than Composer 2, including 'feature deletion' rebuild puzzles.
  • HN reaction split between developers cheering open-base coding models and skeptics flagging benchmark gaming on Cursor's own evals.
models cursor.com

OpenAI's Reasoning Model Disproves an 80-Year-Old Erdős Conjecture

  • Internal general-purpose reasoning model produced an infinite family of point configurations that beat square-grid scaling on the 1946 planar unit distance problem.
  • First time a prominent open question central to a math subfield has been solved autonomously by AI — and by a model not specially trained on math.
  • Fields medalist Tim Gowers wrote the companion paper and said he would recommend it to the Annals of Mathematics 'without hesitation.'
  • Verified externally by Noga Alon, Melanie Wood, Thomas Bloom, and Princeton's Will Sawin, who tightened the exponent.
  • HN front-page thread; reaction split between mathematicians calling it a milestone and skeptics asking how the proof tracks attribution to prior literature.
research openai.com

Salesforce Will Spend $300M on Anthropic Tokens This Year — 'Almost Entirely on Coding'

  • Benioff on the All-In podcast: a single customer's $300M annual token bill makes Salesforce one of Anthropic's largest commercial accounts.
  • Salesforce has also invested $300M+ in Anthropic and holds ~1% — payer and shareholder at the same time.
  • Teasing 'cool stuff with Slack and code' next; wants coding agents inside Slack alongside Agentforce.
  • Calls for an intermediary routing layer so simple tokens go to cheap models, not Claude — a hint at where the spend cap eventually bites.
industry thenextweb.com

Codex Lands in the ChatGPT Mobile App for Remote Control From Your Phone

  • iOS and Android, every plan including Free and Go; sessions run on your Mac, devbox, or remote env while the phone is just the control surface.
  • Approve commands, review diffs and test output, switch models, add context, or start new threads — files and credentials never leave the host machine.
  • Secure relay keeps sessions reachable across devices without exposing them to the public internet; Codex for Windows remote support 'coming soon.'
  • Remote SSH for company devboxes also went GA the same day; the long-running r/OpenAI and GitHub request to match Claude's mobile loop is finally answered.
tools openai.com