Using llama.cpp models with Codex keeps prompts on the local machine while letting Codex work against a GGUF model served from the local llama-server API.

The Codex --oss shortcut currently targets Ollama and LM Studio, so llama.cpp is best connected through a custom model provider. Current llama-server builds expose the OpenAI-compatible /v1/responses endpoint that Codex expects for custom providers.

The llama-server process must already be running, reachable from the local machine, and serving a model alias that matches the model name Codex will request. Keep the listener on 127.0.0.1 unless another host genuinely needs access, because any client that can reach the port can submit prompts to the local model.

Steps to use llama.cpp models with Codex:

  1. Start llama-server with a local bind address and a model alias that Codex can request.
    $ llama-server -m ~/Models/Llama-3.2-3B-Instruct-Q4_K_M.gguf --alias llama3.2-3b-instruct-q4_k_m --host 127.0.0.1 --port 8080
    build: 8680
    main: HTTP server listening on http://127.0.0.1:8080

    --alias sets the model name returned by the API, so it should match the model name you plan to configure in Codex.

  2. Add a custom Codex provider and a profile that points to the local llama.cpp server.
    [model_providers.llamacpp]
    name = "llama.cpp"
    base_url = "http://127.0.0.1:8080/v1"
    wire_api = "responses"
     
    [profiles.llamacpp]
    model_provider = "llamacpp"
    model = "llama3.2-3b-instruct-q4_k_m"

    Save the block in /~/.codex/config.toml/.

    This method uses a custom profile rather than --oss because the current Codex OSS shortcut only covers Ollama and LM Studio.

  3. Run Codex with the llama.cpp profile and point it at a repository directory.
    $ codex exec -p llamacpp -C /home/user/projects/example-repo "Return OK."
    OK

    -C keeps the run anchored to a Git repository so the trusted-directory check stays enabled.

  4. Confirm that the llama.cpp API exposes the same model alias that the Codex profile uses.
    $ curl http://127.0.0.1:8080/v1/models
    {"object":"list","data":[{"id":"llama3.2-3b-instruct-q4_k_m","object":"model"}]}

    The id value should match the model name in the Codex profile or the name passed with -m on a one-off run.