Andyyyy64/whichllm is a local LLM benchmarker that hit GitHub trending on 2026-05-17 with +230 stars in 24 hours (988 total). The pitch is operator-grade and narrow: given your hardware profile (CPU, RAM, GPU, VRAM), it benchmarks the local-LLM candidates that can actually run on your machine and ranks them by quality-per-watt and quality-per-token. The bet is that as the local-inference wave continues (openhuman, llama.cpp, mlx, candle, etc.), the bottleneck moves from 'which model exists' to 'which model fits this laptop' — and a benchmark tool tuned for actual on-device constraints becomes the operator's first stop. whichllm pairs naturally with openhuman (Rust local-AI shell), cc-switch (provider routing), and the broader open-stack local-inference movement.
whichllm is a local LLM benchmarker from Andyyyy64 that crossed 988 stars on 2026-05-17 (+230 in 24 hours). The premise is operator-focused: given your hardware, what’s the best local LLM you can actually run? The tool benchmarks the candidates that fit, ranks them on metrics operators care about (quality-per-watt, quality-per-token), and returns a concrete recommendation rather than a generic leaderboard.
The differentiator is hardware-awareness. Most benchmark leaderboards rank models by raw quality on standardized infrastructure that does not match what’s on the operator’s actual machine. whichllm flips the framing — it starts with the hardware constraint and rejects any model that won’t fit before ranking the rest. The output is a short list of viable models with operator-grade comparison metrics, not a 50-row leaderboard most of which is irrelevant for the available silicon.
The local-inference wave has been accelerating through 2026, with openhuman (Rust local-AI shell) hitting GitHub #1 multiple days running and the broader llama.cpp / mlx / candle ecosystem maturing. As more operators adopt local-first stacks, the question shifts from “can I run a model locally” to “which model is the right one for this laptop.” whichllm is the first widely-trending tool to address that question with operator-grade rigor. It’s a small, focused utility that fits the moment.
Persistent memory layer for AI coding agents — benchmark-backed (95.2% on LongMemEval-S), 92% fewer tokens per session vs full-context pasting, zero manual memory.add() calls.
Open-source AI pair programming tool that works in your terminal to edit code across your entire repository.
AWS's AI-powered coding assistant that helps developers build, deploy, and optimize applications on AWS with code generation and transformation.