llama.cpp

Overview

Nitro is an inference server on top of llama.cpp. It provides an OpenAI-compatible API, queue, & scaling.

Nitro is the default AI engine downloaded with Jan. There is no additional setup needed.