AI DAILY / DEV
Weekly Rollup
Week 26

Fable 5 Ban Hits Day 10 as NSA Testimony Reshapes the Story

  • NSA Director Joshua Rudd told Sen. Warner in a classified June 11 briefing that Mythos breached 'nearly all' NSA classified systems in hours during a red-team.
  • Senator Warner went public over the weekend; now the most-cited reason for the June 12 export-control directive.
  • Free Fable 5/Mythos 5 trial for Pro/Max/Team/Enterprise subscribers expired today with the models still dark globally.
  • Fable 5 reappeared in the Claude Android model picker Sunday but throws a rate-limit error — Anthropic confirms it's a UI artifact, not a partial restore.
industry anthropic.com

Gemini 2.5 Pro Gets Deep Think, Computer Use, and Native Audio in One Drop

  • Deep Think reasoning rolls out to 2.5 Pro on the Gemini API, AI Studio, and Vertex AI — no longer Ultra-only.
  • Project Mariner's browser-control tool and native audio output now exposed for both 2.5 Pro and 2.5 Flash.
  • Thinking budgets extend to 2.5 Pro so developers can cap or disable reasoning tokens per call; structured thought summaries land in the response payload.
  • Cited benchmarks: 87.6% LiveCodeBench, 65th percentile USAMO, IMO gold-medal standard.
models deepmind.google

OpenAI Opens GPT-5.5-Cyber to All Defenders and Sics Codex on Open-Source CVEs

  • Full GPT-5.5-Cyber GA for verified defenders: 85.6% on CyberGym vs 81.8% for stock GPT-5.5.
  • Codex Security plugin ships in the Codex app — finds, validates, and patches vulnerabilities inline with attack-path traces and severity reports.
  • 'Patch the Planet' partners with Trail of Bits, HackerOne, and Calif across cURL, Python, Go, urllib3, PyPI, Valkey, RustCrypto, and 12 more.
  • First-week tally: hundreds of bugs, 64 PRs, 51 issues across 19 projects; field results include a Firefox WASM CVE patched before Pwn2Own.
tools openai.com

Anthropic Drops Claude Tag, an Always-On Slack Teammate Running on Opus 4.8

  • Tag @Claude in any Slack channel and it joins the conversation as a persistent team member, building context as the channel evolves.
  • Ambient mode lets Claude proactively post updates, follow up on forgotten threads, and surface relevant info from other channels.
  • Research preview today for Claude Enterprise and Team customers on Opus 4.8; replaces the legacy Claude in Slack app on August 3.
  • Anthropic says 65% of its own product team's code now comes from the internal version of Claude Tag.
tools anthropic.com

OpenAI Unveils Jalapeño, Its First Custom Inference Chip Co-Built With Broadcom

  • Custom LLM-inference accelerator co-developed with Broadcom; OpenAI's first Intelligence Processor.
  • Nine months from initial design to manufacturing tape-out — billed as the fastest ASIC cycle ever for a chip of this class.
  • Engineering samples already running GPT-5.3-Codex-Spark in the lab at production frequency and power.
  • Targets ~50% inference cost reduction vs current GPUs; gigawatt-scale rollout with Microsoft starts end of 2026, 10GW committed through 2029.
  • Direct shot at Nvidia's pricing power — HN thread climbed the front page within hours.
industry openai.com

Anthropic Says Alibaba Ran the 'Largest Known Distillation Attack' on Claude

  • Letter to the US Senate Banking Committee alleges operators tied to Alibaba's Qwen lab ran 28.8M Claude exchanges from 25,000 fraudulent accounts between April 22 and June 5.
  • Targets called out by name: software engineering, agentic reasoning, and long-horizon tasks — exactly the capabilities Anthropic charges premium for.
  • HN thread surfaces the underlying market: Chinese resellers offer Claude tokens at 70–90% below list by pooling Max accounts and reselling reasoning traces.
  • Alibaba ADRs slid to a 16-month low on the news, extending YTD losses to 33%.
industry news.ycombinator.com

Claude Opus 4.7 Beats Human Teams 20x on Robot-Dog Tasks in Project Fetch Phase Two

  • Anthropic rerun of last year's quadruped study with Opus 4.7 driving non-roboticist employees.
  • Claude was 10x+ faster than every human team that finished a task, 37x faster than the no-AI team, 19x faster than the team using an AI assistant.
  • Generated nearly 10x less code than humans for comparable or better results.
  • Still couldn't fetch the actual ball — failed at closed-loop visual precision control.
research anthropic.com

OpenAI Ships Record & Replay for Codex on macOS

  • Demonstrate a workflow on your Mac once, Codex turns the recording into a reusable skill.
  • Shipped June 18 in Codex 26.616; requires Computer Use enabled.
  • Available on Plus, Pro, Business, Enterprise, and Edu — excludes EU, UK, and Switzerland.
  • Generated skill describes when to use it, inputs, steps, and how to verify.
tools openai.com

Ponytail Cracks 44K Stars Telling Coding Agents Not to Write the Code

  • Skill/plugin that runs Claude Code, Codex, Cursor, and Gemini CLI through a YAGNI ladder before they generate anything.
  • 24K stars in three days post-launch; 44K stars and 2,100 forks by June 21.
  • Author benchmarks: 80–94% less code, 3–6x faster tasks, 47–77% lower API cost.
  • HN debates whether the 'lazy senior dev' framing breaks on genuinely custom work.
open-source github.com

Agentjacking: Poisoned Sentry Errors Hijack Claude Code, Cursor, and Codex

  • Tenet Security disclosure: attacker-crafted Sentry events are pulled in via MCP and executed by the coding agent.
  • Claude Code, Cursor, and Codex all ran attacker commands at developer privilege in tests.
  • 2,388 organizations exposed via public DSNs; Sentry called the issue 'technically not defensible' and declined to patch.
  • First high-profile demo of MCP tool-poisoning landing on production coding agents.
tools thehackernews.com