AgentConn
W

whichllm

Coding Free

About whichllm

Andyyyy64/whichllm is a local LLM benchmarker that hit GitHub trending on 2026-05-17 with +230 stars in 24 hours (988 total). The pitch is operator-grade and narrow: given your hardware profile (CPU, RAM, GPU, VRAM), it benchmarks the local-LLM candidates that can actually run on your machine and ranks them by quality-per-watt and quality-per-token. The bet is that as the local-inference wave continues (openhuman, llama.cpp, mlx, candle, etc.), the bottleneck moves from 'which model exists' to 'which model fits this laptop' — and a benchmark tool tuned for actual on-device constraints becomes the operator's first stop. whichllm pairs naturally with openhuman (Rust local-AI shell), cc-switch (provider routing), and the broader open-stack local-inference movement.

Key Features

  • Hardware-aware benchmarking — won't recommend models that exceed your VRAM or RAM ceiling
  • Quality-per-watt and quality-per-token rankings — operator metrics, not abstract benchmarks
  • Local-only — no cloud calls, no telemetry, runs against models already on disk
  • Multi-backend support — llama.cpp, mlx, candle, vLLM-local profiles
  • Single-command setup — designed for operators evaluating local stacks in <10 minutes
  • Open-source and free — MIT-licensed, no API key required

Overview

whichllm is a local LLM benchmarker from Andyyyy64 that crossed 988 stars on 2026-05-17 (+230 in 24 hours). The premise is operator-focused: given your hardware, what’s the best local LLM you can actually run? The tool benchmarks the candidates that fit, ranks them on metrics operators care about (quality-per-watt, quality-per-token), and returns a concrete recommendation rather than a generic leaderboard.

Key Capabilities

The differentiator is hardware-awareness. Most benchmark leaderboards rank models by raw quality on standardized infrastructure that does not match what’s on the operator’s actual machine. whichllm flips the framing — it starts with the hardware constraint and rejects any model that won’t fit before ranking the rest. The output is a short list of viable models with operator-grade comparison metrics, not a 50-row leaderboard most of which is irrelevant for the available silicon.

Why It Matters

The local-inference wave has been accelerating through 2026, with openhuman (Rust local-AI shell) hitting GitHub #1 multiple days running and the broader llama.cpp / mlx / candle ecosystem maturing. As more operators adopt local-first stacks, the question shifts from “can I run a model locally” to “which model is the right one for this laptop.” whichllm is the first widely-trending tool to address that question with operator-grade rigor. It’s a small, focused utility that fits the moment.

Use Cases

  • Developers evaluating local-inference stacks for the first time and choosing a starter model
  • Operators upgrading hardware and re-benchmarking the local-model lineup against new VRAM headroom
  • Teams running an internal mix of laptops with different specs that need consistent model recommendations per device
  • Researchers comparing model quality-per-watt on consumer silicon (Apple Silicon, Ryzen AI, RTX 40-series)

Similar Agents