How to use llama.cpp models with Codex

Using llama.cpp models with Codex keeps prompts on the local machine while letting Codex work against a GGUF model served from the local llama-server API.

The Codex --oss shortcut currently targets Ollama and LM Studio, so llama.cpp is best connected through a custom model provider. Current llama-server builds expose the OpenAI-compatible /v1/responses endpoint that Codex expects for custom providers.

The llama-server process must already be running, reachable from the local machine, and serving a model alias that matches the model name Codex will request. Keep the listener on 127.0.0.1 unless another host genuinely needs access, because any client that can reach the port can submit prompts to the local model.

Steps to use llama.cpp models with Codex:

Start llama-server with a local bind address and a model alias that Codex can request.
```
$ llama-server -m ~/Models/Llama-3.2-3B-Instruct-Q4_K_M.gguf --alias llama3.2-3b-instruct-q4_k_m --host 127.0.0.1 --port 8080
build: 8680
main: HTTP server listening on http://127.0.0.1:8080
```
--alias sets the model name returned by the API, so it should match the model name you plan to configure in Codex.

Related: [DRAFT] How to start the llama.cpp server
Related: [DRAFT] How to set server port set in the llama.cpp server
Add a custom Codex provider and a profile that points to the local llama.cpp server.
```
[model_providers.llamacpp]
name = "llama.cpp"
base_url = "http://127.0.0.1:8080/v1"
wire_api = "responses"
 
[profiles.llamacpp]
model_provider = "llamacpp"
model = "llama3.2-3b-instruct-q4_k_m"
```
Save the block in /~/.codex/config.toml/.

This method uses a custom profile rather than --oss because the current Codex OSS shortcut only covers Ollama and LM Studio.

Related: How to override Codex configuration for a single run
Related: How to set the default model in Codex
Run Codex with the llama.cpp profile and point it at a repository directory.
```
$ codex exec -p llamacpp -C /home/user/projects/example-repo "Return OK."
OK
```
-C keeps the run anchored to a Git repository so the trusted-directory check stays enabled.

Related: How to fix Codex trusted directory error
Confirm that the llama.cpp API exposes the same model alias that the Codex profile uses.
```
$ curl http://127.0.0.1:8080/v1/models
{"object":"list","data":[{"id":"llama3.2-3b-instruct-q4_k_m","object":"model"}]}
```
The id value should match the model name in the Codex profile or the name passed with -m on a one-off run.

Related: [DRAFT] How to set server host set in the llama.cpp server
Related: [DRAFT] How to set server context size set in the llama.cpp server

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.