AI Daily Dev — Week 26

Fable 5 Ban Hits Day 10 as NSA Testimony Reshapes the Story

NSA Director Joshua Rudd told Sen. Warner in a classified June 11 briefing that Mythos breached 'nearly all' NSA classified systems in hours during a red-team.
Senator Warner went public over the weekend; now the most-cited reason for the June 12 export-control directive.
Free Fable 5/Mythos 5 trial for Pro/Max/Team/Enterprise subscribers expired today with the models still dark globally.
Fable 5 reappeared in the Claude Android model picker Sunday but throws a rate-limit error — Anthropic confirms it's a UI artifact, not a partial restore.

industry anthropic.com

Deep Think reasoning rolls out to 2.5 Pro on the Gemini API, AI Studio, and Vertex AI — no longer Ultra-only.
Project Mariner's browser-control tool and native audio output now exposed for both 2.5 Pro and 2.5 Flash.
Thinking budgets extend to 2.5 Pro so developers can cap or disable reasoning tokens per call; structured thought summaries land in the response payload.
Cited benchmarks: 87.6% LiveCodeBench, 65th percentile USAMO, IMO gold-medal standard.

models deepmind.google

Full GPT-5.5-Cyber GA for verified defenders: 85.6% on CyberGym vs 81.8% for stock GPT-5.5.
Codex Security plugin ships in the Codex app — finds, validates, and patches vulnerabilities inline with attack-path traces and severity reports.
'Patch the Planet' partners with Trail of Bits, HackerOne, and Calif across cURL, Python, Go, urllib3, PyPI, Valkey, RustCrypto, and 12 more.
First-week tally: hundreds of bugs, 64 PRs, 51 issues across 19 projects; field results include a Firefox WASM CVE patched before Pwn2Own.

tools openai.com

Tag @Claude in any Slack channel and it joins the conversation as a persistent team member, building context as the channel evolves.
Ambient mode lets Claude proactively post updates, follow up on forgotten threads, and surface relevant info from other channels.
Research preview today for Claude Enterprise and Team customers on Opus 4.8; replaces the legacy Claude in Slack app on August 3.
Anthropic says 65% of its own product team's code now comes from the internal version of Claude Tag.

tools anthropic.com

Custom LLM-inference accelerator co-developed with Broadcom; OpenAI's first Intelligence Processor.
Nine months from initial design to manufacturing tape-out — billed as the fastest ASIC cycle ever for a chip of this class.
Engineering samples already running GPT-5.3-Codex-Spark in the lab at production frequency and power.
Targets ~50% inference cost reduction vs current GPUs; gigawatt-scale rollout with Microsoft starts end of 2026, 10GW committed through 2029.
Direct shot at Nvidia's pricing power — HN thread climbed the front page within hours.

industry openai.com

Letter to the US Senate Banking Committee alleges operators tied to Alibaba's Qwen lab ran 28.8M Claude exchanges from 25,000 fraudulent accounts between April 22 and June 5.
Targets called out by name: software engineering, agentic reasoning, and long-horizon tasks — exactly the capabilities Anthropic charges premium for.
HN thread surfaces the underlying market: Chinese resellers offer Claude tokens at 70–90% below list by pooling Max accounts and reselling reasoning traces.
Alibaba ADRs slid to a 16-month low on the news, extending YTD losses to 33%.

industry news.ycombinator.com

Anthropic rerun of last year's quadruped study with Opus 4.7 driving non-roboticist employees.
Claude was 10x+ faster than every human team that finished a task, 37x faster than the no-AI team, 19x faster than the team using an AI assistant.
Generated nearly 10x less code than humans for comparable or better results.
Still couldn't fetch the actual ball — failed at closed-loop visual precision control.

research anthropic.com

Demonstrate a workflow on your Mac once, Codex turns the recording into a reusable skill.
Shipped June 18 in Codex 26.616; requires Computer Use enabled.
Available on Plus, Pro, Business, Enterprise, and Edu — excludes EU, UK, and Switzerland.
Generated skill describes when to use it, inputs, steps, and how to verify.

tools openai.com

Skill/plugin that runs Claude Code, Codex, Cursor, and Gemini CLI through a YAGNI ladder before they generate anything.
24K stars in three days post-launch; 44K stars and 2,100 forks by June 21.
Author benchmarks: 80–94% less code, 3–6x faster tasks, 47–77% lower API cost.
HN debates whether the 'lazy senior dev' framing breaks on genuinely custom work.

open-source github.com

Tenet Security disclosure: attacker-crafted Sentry events are pulled in via MCP and executed by the coding agent.
Claude Code, Cursor, and Codex all ran attacker commands at developer privilege in tests.
2,388 organizations exposed via public DSNs; Sentry called the issue 'technically not defensible' and declined to patch.
First high-profile demo of MCP tool-poisoning landing on production coding agents.

tools thehackernews.com