> ## Documentation Index
> Fetch the complete documentation index at: https://docs-apexspriteai.reliatrack.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Tune LM Studio model and server settings for ApexSpriteAI

> Tune LM Studio model settings including context window size, temperature, and server binding to optimize ApexSpriteAI performance for your workload.

LM Studio acts as the AI engine for ApexSpriteAI. It loads the language model you choose, exposes a local HTTP server, and accepts requests in the Anthropic messages format. Before Claude Code can make useful requests, you need to configure a handful of server and model parameters in LM Studio. This page covers each setting, explains why it matters, and shows you how to test your configuration end-to-end.

## Key settings overview

| Setting        | Recommended value        | Where to change            |
| -------------- | ------------------------ | -------------------------- |
| Server port    | `1234`                   | LM Studio → Local Server   |
| Bind address   | `0.0.0.0`                | LM Studio → Local Server   |
| Context window | 32,000 – 64,000 tokens   | LM Studio → Model Settings |
| Temperature    | `0.2` – `0.4` for coding | LM Studio → Model Settings |

## Server port

LM Studio's default server port is `1234`. ApexSpriteAI expects this port unless you override `ANTHROPIC_BASE_URL` in your config. If you change the port, update `ANTHROPIC_BASE_URL` to match.

## Bind address

Set the bind address to `0.0.0.0` so that LM Studio accepts requests from any network interface, including Tailscale. If you leave it at `127.0.0.1`, only processes on the same machine can reach the server. See [Network configuration](/configuration/networking) for more detail on when this matters.

## Context window

The context window is the maximum number of tokens — roughly words and punctuation marks — that the model can hold in memory at once. It includes your prompt, the conversation history, any tool call results, and the model's reply.

### Why context window size matters

A larger context window lets the model read more of your codebase at once, retain longer conversation histories, and process large tool outputs without truncation. However, larger windows consume more GPU memory and increase the time needed to process each token.

### Recommended sizes

For **Qwen2.5-Coder-32B** on a 128 GB GPU:

* **32,000 tokens** — Fast responses, suitable for most coding sessions.
* **64,000 tokens** — Slower but handles large files and long conversations without truncation.

<Note>
  Increasing the context window beyond what your hardware can comfortably hold causes LM Studio to offload layers to CPU RAM, which significantly reduces throughput. Start at 32k and increase only if you find responses are being cut off.
</Note>

### How to set it

In LM Studio, load your model and open **Model Settings**. Find **Context Length** and enter your target value. Changes take effect the next time the model is loaded.

## Loading a model

<Steps>
  <Step title="Open the Models tab in LM Studio">
    Use the search bar to find the model you want. ApexSpriteAI works best with models in the 32B–70B range. See [Optimize speed and performance](/troubleshooting/performance) for a full comparison.
  </Step>

  <Step title="Download the model">
    Click **Download**. Large models (32B at Q4 quantization) are roughly 18–20 GB. Ensure you have sufficient disk space before starting.
  </Step>

  <Step title="Load the model">
    Click **Load** after the download completes. LM Studio allocates GPU memory and displays a green status indicator when the model is ready.
  </Step>

  <Step title="Start the local server">
    Switch to the **Local Server** tab, confirm the port and bind address, then click **Start Server**.
  </Step>
</Steps>

## Testing the server with a direct request

Before connecting Claude Code, verify that LM Studio is responding correctly by sending a test request from your terminal. ApexSpriteAI uses the Anthropic messages format at the `/v1/messages` endpoint.

<CodeGroup>
  ```bash curl theme={null}
  curl -s http://localhost:1234/v1/messages \
    -H "Content-Type: application/json" \
    -H "x-api-key: local" \
    -d '{
      "model": "local-model",
      "max_tokens": 64,
      "messages": [
        {"role": "user", "content": "Reply with: ready"}
      ]
    }' | python3 -m json.tool
  ```

  ```json expected response shape theme={null}
  {
    "id": "msg_...",
    "type": "message",
    "role": "assistant",
    "content": [
      {
        "type": "text",
        "text": "ready"
      }
    ],
    "model": "local-model",
    "stop_reason": "end_turn"
  }
  ```
</CodeGroup>

<Tip>
  The `x-api-key: local` header satisfies LM Studio's authentication check without requiring a real Anthropic API key. Claude Code sends this header automatically when `ANTHROPIC_BASE_URL` is set to a local address.
</Tip>

If the response contains a `content` array with your expected text, the server is working and you can proceed to configure Claude Code. If you receive a connection error, check that the server is started and that the bind address and port match your request URL.

## Model format compatibility

LM Studio translates the Anthropic messages format into the prompt template required by the loaded model (for example, Qwen's ChatML template). You do not need to change the request format — Claude Code always sends Anthropic-formatted requests, and LM Studio handles the conversion automatically.
