Using llama.cpp models with Codex keeps prompts on the local machine while letting Codex work against a GGUF model served from the local llama-server API.
The Codex --oss shortcut currently targets Ollama and LM Studio, so llama.cpp is best connected through a custom model provider. Current llama-server builds expose the OpenAI-compatible /v1/responses endpoint that Codex expects for custom providers.
The llama-server process must already be running, reachable from the local machine, and serving a model alias that matches the model name Codex will request. Keep the listener on 127.0.0.1 unless another host genuinely needs access, because any client that can reach the port can submit prompts to the local model.
$ llama-server -m ~/Models/Llama-3.2-3B-Instruct-Q4_K_M.gguf --alias llama3.2-3b-instruct-q4_k_m --host 127.0.0.1 --port 8080 build: 8680 main: HTTP server listening on http://127.0.0.1:8080
--alias sets the model name returned by the API, so it should match the model name you plan to configure in Codex.
[model_providers.llamacpp] name = "llama.cpp" base_url = "http://127.0.0.1:8080/v1" wire_api = "responses" [profiles.llamacpp] model_provider = "llamacpp" model = "llama3.2-3b-instruct-q4_k_m"
Save the block in /~/.codex/config.toml/.
This method uses a custom profile rather than --oss because the current Codex OSS shortcut only covers Ollama and LM Studio.
$ codex exec -p llamacpp -C /home/user/projects/example-repo "Return OK." OK
-C keeps the run anchored to a Git repository so the trusted-directory check stays enabled.
$ curl http://127.0.0.1:8080/v1/models
{"object":"list","data":[{"id":"llama3.2-3b-instruct-q4_k_m","object":"model"}]}
The id value should match the model name in the Codex profile or the name passed with -m on a one-off run.