▲llmtop

shipped

2026·python · textual · llm · vllm · ollama · gpu · monitoring · tui·github.com/rxxusp/llmtop

An nvtop for local LLM inference: zero-config autodiscovery of every running engine, plus live GPU and serving metrics in a terminal UI.

llmtop is an nvtop for local LLM inference. You run one command and it finds every inference engine on the machine, works out what each one is serving and how it is configured, and shows live GPU and serving metrics in a terminal UI. There are no flags, no config file, and no manual port list.

Discovery cross-correlates three independent signals: a scan of the process table for known launchers, a port scan on loopback, and read-only API fingerprinting that classifies each open port by the shape of its response. It ships adapters for vLLM, llama.cpp, Ollama, TGI, SGLang, a generic OpenAI compatible fallback, and routers, and adding a new engine is one small module.

It is read-only by default, so it only ever issues cheap introspection and metrics requests and never sends a generation request that would cost a token. On unified memory devices such as the NVIDIA GB10, where the GPU shares memory with system RAM, it detects the case, reports the correct totals instead of bogus zeros, and still attributes per-engine VRAM by walking the GPU process tree even when the engine runs inside a container.

Built with Python and Textual. MIT licensed.

view on github ← back to projects