About whichllm

Andyyyy64/whichllm is a local LLM benchmarker that hit GitHub trending on 2026-05-17 with +230 stars in 24 hours (988 total). The pitch is operator-grade and narrow: given your hardware profile (CPU, RAM, GPU, VRAM), it benchmarks the local-LLM candidates that can actually run on your machine and ranks them by quality-per-watt and quality-per-token. The bet is that as the local-inference wave continues (openhuman, llama.cpp, mlx, candle, etc.), the bottleneck moves from 'which model exists' to 'which model fits this laptop' — and a benchmark tool tuned for actual on-device constraints becomes the operator's first stop. whichllm pairs naturally with openhuman (Rust local-AI shell), cc-switch (provider routing), and the broader open-stack local-inference movement.

Key Features

Hardware-aware benchmarking — won't recommend models that exceed your VRAM or RAM ceiling

Quality-per-watt and quality-per-token rankings — operator metrics, not abstract benchmarks

Local-only — no cloud calls, no telemetry, runs against models already on disk

Multi-backend support — llama.cpp, mlx, candle, vLLM-local profiles

Single-command setup — designed for operators evaluating local stacks in <10 minutes

Open-source and free — MIT-licensed, no API key required

Overview

whichllm is a local LLM benchmarker from Andyyyy64 that crossed 988 stars on 2026-05-17 (+230 in 24 hours). The premise is operator-focused: given your hardware, what’s the best local LLM you can actually run? The tool benchmarks the candidates that fit, ranks them on metrics operators care about (quality-per-watt, quality-per-token), and returns a concrete recommendation rather than a generic leaderboard.

Key Capabilities

The differentiator is hardware-awareness. Most benchmark leaderboards rank models by raw quality on standardized infrastructure that does not match what’s on the operator’s actual machine. whichllm flips the framing — it starts with the hardware constraint and rejects any model that won’t fit before ranking the rest. The output is a short list of viable models with operator-grade comparison metrics, not a 50-row leaderboard most of which is irrelevant for the available silicon.

Why It Matters

The local-inference wave has been accelerating through 2026, with openhuman (Rust local-AI shell) hitting GitHub #1 multiple days running and the broader llama.cpp / mlx / candle ecosystem maturing. As more operators adopt local-first stacks, the question shifts from “can I run a model locally” to “which model is the right one for this laptop.” whichllm is the first widely-trending tool to address that question with operator-grade rigor. It’s a small, focused utility that fits the moment.

Use Cases

Developers evaluating local-inference stacks for the first time and choosing a starter model
Operators upgrading hardware and re-benchmarking the local-model lineup against new VRAM headroom
Teams running an internal mix of laptops with different specs that need consistent model recommendations per device
Researchers comparing model quality-per-watt on consumer silicon (Apple Silicon, Ryzen AI, RTX 40-series)

whichllm

About whichllm

Key Features

Overview

Key Capabilities

Why It Matters

Use Cases

Similar Agents

agent-native

Agent-Reach

agentmemory