Routing Codex tasks to local or hosted models reduces data exposure while keeping access to heavier reasoning for the jobs that need it.

Local routing uses --oss with a configured local provider (ollama or lmstudio), while hosted routing omits --oss and selects a hosted model with -m. You can override the local provider per run using --local-provider without changing config.toml.

Local providers must be running and reachable, model names and context limits vary by backend, and capability gaps (tools, long context, structured outputs) can change which tier is appropriate for a given task.

Steps to route Codex tasks to local or hosted models:

  1. Classify the task as local or hosted.

    Local fits prompts containing secrets, customer data, or internal identifiers
    Hosted fits deep reasoning, long-context synthesis, and cross-checking outputs
    Hosted prompts should be redacted when the original contains sensitive data

  2. Run the prompt on a local model with the last assistant message saved to a file.
    $ codex exec --oss --local-provider ollama -m llama3.2 --output-last-message /tmp/codex-local.txt "Summarize this incident update with risks: Login latency increased after a cache change, mitigation was applied, monitoring shows recovery, background jobs are still draining."
    Login latency spiked after the cache change and mitigation is in place, with monitoring showing recovery. Residual risk remains until the background jobs drain and retry rates normalize.

    --output-last-message writes only the final assistant response to /tmp/codex-local.txt/.

  3. Run the same prompt on a hosted model with the last assistant message saved to a file.
    $ codex exec -m gpt-5.2-codex --output-last-message /tmp/codex-hosted.txt "Summarize this incident update with risks: Login latency increased after a cache change, mitigation was applied, monitoring shows recovery, background jobs are still draining."
    Login latency increased after a cache-related change and mitigation is in place, with dashboards showing recovery. Residual risk remains while background jobs drain, which can trigger retries and secondary latency spikes, so monitor error rates and queue depth until stable.

    Hosted runs send prompt text to remote infrastructure, so redact secrets and regulated data before using a hosted model.

  4. Preview the saved responses before comparing them.
    $ head -n 2 /tmp/codex-local.txt
    Login latency spiked after the cache change and mitigation is in place, with monitoring showing recovery.
    $ head -n 2 /tmp/codex-hosted.txt
    Login latency increased after a cache-related change and mitigation is in place, with dashboards showing recovery.
  5. Compare the saved answers to decide whether the local response is sufficient.
    $ diff -u /tmp/codex-local.txt /tmp/codex-hosted.txt
    --- /tmp/codex-local.txt
    +++ /tmp/codex-hosted.txt
    @@
    -Login latency spiked after the cache change and mitigation is in place, with monitoring showing recovery. Residual risk remains until the background jobs drain and retry rates normalize.
    +Login latency increased after a cache-related change and mitigation is in place, with dashboards showing recovery. Residual risk remains while background jobs drain, which can trigger retries and secondary latency spikes, so monitor error rates and queue depth until stable.

    Differences in wording are normal, so focus on missing risks, unclear next actions, or incorrect assumptions when deciding which output to keep.