Guides, insights, and news from the world of AI agents.
Hashimoto's HN-#1 "entire companies under AI psychosis" framing, turned into a 9-question audit you can run on your own agent stack tomorrow.
Codex now ships inside the ChatGPT mobile app. What mobile actually unlocks, what it doesn't, and how to pair it with a real backend safely.
Domain-specific skill bundles are filling in around the generic .claude-directory frame — scientific, academic, learning packs all trending in one cycle.
Agent-volume PRs broke CI/CD. Here's the four-layer Continuous Compute stack — routing, filesystems, Agent View, skills, memory — that ops teams need now.
Sal Khan admitted Khanmigo was 'a non-event.' Quizlet killed Q-Chat. The teacher tools won. Here's what the post-Khanmigo AI tutoring field looks like.
Cowork on Opus 4.7 booked 8 flights end-to-end. Anthropic's racing the open stack for the agent shell layer — and the 2014 container playbook is back.
Lindy, JP Morgan, and OpenAI all shipped a separate judge layer for production agents in Q2 2026. It's a category, not a fad.
Two-day-old skill ecosystem already spawned validators: react-doctor, agentmemory's LongMemEval benchmarks, and Osmani's curation outpacing first-party.
YC named it tokenmaxxing — one founder + agent harness doing the work of 400 engineers. Here's the stack: Codex parallel tabs and Claude Code skills.
ByteDance's UI-TARS-desktop hit GitHub trending #6 inside the Chinese-AI surge week. How the visual-agent stack stacks up against Claude and Operator.
When to switch agent retrieval from embeddings to PageIndex's vectorless tree search — and when not to. The honest 2026 read.
Dexter (24.4k stars, MIT) vs Anthropic's 10 finance agent templates: a job-by-job buyer's guide for self-host vs managed financial research.
Anthropic shipped ten finance agent templates today — KYC, pitchbooks, reconciliation, more. Which one fits which job, and the open-source alternatives.
After DeepSeek V4, the coding-agent stack has three substrates with very different pricing and lock-in. Which should you switch to?
TradingAgents +3,315 stars/day. Maigret +1,117. Dexter, TaxHacker, Pixelle. The horizontal framework era is over — here's what's replacing it.
Three harness substrates embedding into CI/CD this month. We compare use case, distribution, lock-in, and pricing across all three.
DeepSeek-TUI, Hermes, and the non-English tokenizer tax just stacked into a coherent harness alternative to Claude Code Max. Install, cost math, gaps.
David Gomes' AI Engineer talk turned Cursor's WorkTrees rewrite into the first production case study for Skills-as-Runtime. What 200 lines actually replaced.
mattpocock/skills (+5,551), awesome-codex-skills (+637), pi-mono (+949): coverage, license, governance side-by-side. Pick the one for your stack.
Two skill directories landed within 24h — one for Claude, one for Codex. Which one to publish into, and why the cross-vendor index is the real prize.
ml-intern is HuggingFace's open-source AI agent that reads papers, discovers datasets, and trains models autonomously — outperforming Claude Code on GPQA.
Garry Tan's gstack gives Claude Code 23 specialist skills: CEO, Eng Manager, QA. 82K stars and still climbing. Here's what actually works.
Shannon autonomously pentests web apps at 96% XBOW success for ~$50 a run. We review how it works, what it misses, and who should use it.
Claude Mythos found 271 Firefox bugs in one pass. How Anthropic's restricted security agent works — and what it means for defenders.
95.6K stars in 7 weeks. Hermes Agent's v0.10 adds Ollama local models and Chrome CDP browser integration. Honest review of what works — and what doesn't.
deer-flow (62.8k stars), evolver, and GenericAgent hit GitHub top 10 simultaneously. We compare architecture, security posture, and production readiness.
GenericAgent and EvoMap hit 800+ GitHub stars/day building AI that grows its own skill trees. How each works, and the security risk nobody mentions.
Cloudflare shipped AI Platform, Email for Agents, and Artifacts/Git on the same day. Setup guide + when to use each for production AI agents.
SOUL.md gives AI agents persistent identity across sessions — no more blank-slate resets. Learn the pattern, write your first soul file.
Archon makes AI coding agents deterministic with YAML workflows. 17K stars, +452 today. Is 'harness engineering' a real category — or just retry logic?
multica: 1,724 stars, #5 GitHub trending. Managed agents platform for Claude Code — task routing, skill compounding, and team-level coordination reviewed.
NousResearch's hermes-agent hit 7,450 stars in 24 hours. We tested its self-improvement loop, compared it to Archon and Multica, and asked the hard question.
cc-switch unifies five AI coding CLIs into one app. Here's how it works, when to use which agent, and what the platform economics mean for developers.
obra/superpowers gained 1,589 GitHub stars in one day. A hands-on guide to what it is, how it differs from Archon and multica, and when to use it.
Meta just launched Muse Spark, their first closed-weight frontier model. We break down the benchmarks, the 16-tool suite, and what it means for agent builders.
Block cut 40% of its workforce and hit its best quarter ever. Goose, their open-source AI agent, made it possible. Here's how it works.
Google's Gemma 4 31B just hit #3 on the open model global leaderboard — and it has a perfect Tool Call 15 score. Here's the complete agent developer review: benchmarks, deployment, Apache 2.0 license, and how it stacks up against DeepSeek V3.2, Qwen 3.5, and Llama 4.
The LiteLLM supply chain attack compromised ~500K machines in 40 minutes. Here's why AI agent pipelines are uniquely vulnerable — and 5 concrete steps to protect your stack today.
Alibaba Qwen 3.6-Plus offers a 1M context window and agent benchmarks rivaling Claude Sonnet 4. Real Claude/GPT alternative? We cut through the benchmark spin.
GitHub trending is exploding with agent orchestration frameworks. We cover the top 6: Superpowers, oh-my-claudecode, hermes-agent, learn-claude-code, claude-mem, and AgentScope — what they do, who they're for, and which to pick.
François Chollet launched ARC-AGI V3 — interactive video game environments where agents must learn goals and controls with zero instructions. Humans: 100%. GPT-5.4 + Opus 4.6: 0.3%. This is the benchmark that exposes the gap between trained intelligence and actual intelligence.
Deep-dive review of Anthropic Dispatch — the AI desktop agent that takes over your Mac, opens apps, clicks through UIs, and delivers completed work while you're away. How it compares to basic Computer Use, Open Interpreter, and what the 'finished work' paradigm actually means in practice.
MiniMax M2.7 participated in its own training. Meta's Darwin-Gödel HyperAgent rewrites its own code to become a better coder. The era of self-evolving AI agents has arrived — here's how it works technically, what it means for agent builders, and why open-source weights change everything.
Claude Code's unannounced Auto Dream feature consolidates agent memory like REM sleep. Meanwhile, ETH Zurich found context files hurt more than they help. The agent memory problem is the unsolved infrastructure challenge of 2026 — here's what's actually working.
McKinsey predicts AI agents will mediate over $1 trillion in consumer purchases. But most businesses are invisible to agents — blocked by the very anti-bot infrastructure they spent 20 years building. Here's what's actually required to become agent-ready, why wrapping an API in MCP isn't enough, and what Walmart's failed ChatGPT checkout reveals about the real challenges of agent commerce.
An in-depth review of OpenCode, the open-source AI coding agent with 120K GitHub stars that hit 1099 points on Hacker News. How does it compare to Claude Code, Codex CLI, and GSD 2?
Three new studies paint a brutal picture of AI agent reliability in 2026. Scale AI's benchmark shows a 97.5% failure rate on real freelance work. Alibaba finds 75% of frontier models break working code. Harvard data reveals employers already regret AI-driven layoffs. Here's what the data actually says.
The agent stack is standardizing around model → runtime → harness → agent. We compare LangChain Deep Agents, CrewAI, AutoGen, Agency Swarm, Haystack, and OpenClaw — the best open-source frameworks for building your own AI agents in 2026.
GSD 2, Claude Code, and Codex CLI compared head-to-head. Architecture, autonomy, pricing, and git workflow — which coding agent CLI fits your workflow?
A comprehensive comparison of the best AI browser automation agents in 2026 — from Claude's Browser Extension to BrowserBase, Browser-Use, AgentQL, and more. Covers personal automation, enterprise scraping, and QA testing use cases.
A comprehensive comparison of the best AI agents transforming finance and accounting in 2026. Covers Ramp AI, Vic.ai, Truewind, Stampli, Puzzle, Zeni, and more — with practical guidance on evaluation, compliance, and choosing the right tool for your team.
A comprehensive comparison of the best AI computer-use agents in 2026, including Perplexity Computer, Claude Computer Use, OpenAI Operator, and top open-source alternatives. Capabilities, pricing, security, and practical recommendations.
A curated guide to the best AI agents and models you can self-host in 2026. From NVIDIA Nemotron to Ollama-powered agents, discover what runs on your hardware — with full privacy, zero API bills, and no data leaving your machine.
The shift from standalone AI agents to embedded AI agents built into your existing apps is accelerating. See how Google Gemini, Microsoft Copilot, and others are integrating agents directly into productivity tools — and what it means for you.
Discover the top AI agents transforming creative industries in 2026 — from Suno's music generation to Sora's video creation to Midjourney's design capabilities. A hands-on guide to AI creative tools.
A step-by-step guide to automating your work with AI agents in 2026. Real workflows for developers, marketers, researchers, and business professionals with specific tool recommendations.
Compare the top AI research agents of 2026 — OpenAI Deep Research, Perplexity, Grok, and Elicit. We test them on research depth, accuracy, speed, and best use cases.
A comprehensive guide to AI agent security risks and best practices, covering prompt injection, data exfiltration, over-permissioning, and how to safely deploy AI agents.
Everything you need to know about AI coding agents in 2026: how they work, the best options available, real-world use cases, and how to integrate them into your development workflow.
Explore how AI agents are revolutionizing customer service with faster response times, 24/7 availability, personalized interactions, and reduced costs for businesses of all sizes.
Discover the best free AI agents available in 2026, from coding assistants to productivity tools, research agents, and creative AI — all with generous free tiers.
Understand the key differences between AI agents and AI chatbots, including capabilities, use cases, and how each technology is transforming business and productivity.
A practical framework for evaluating and selecting AI agents that align with your business needs and goals.
A comprehensive roundup of the best AI coding agents available in 2026, from pair programming to autonomous development.
Learn what AI agents are, how they work, and why they're transforming the way we interact with technology.