Skip to main content

Documentation Index

Fetch the complete documentation index at: https://reliatrack.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

LM Studio acts as the AI engine for ApexSpriteAI. It loads the language model you choose, exposes a local HTTP server, and accepts requests in the Anthropic messages format. Before Claude Code can make useful requests, you need to configure a handful of server and model parameters in LM Studio. This page covers each setting, explains why it matters, and shows you how to test your configuration end-to-end.

Key settings overview

SettingRecommended valueWhere to change
Server port1234LM Studio → Local Server
Bind address0.0.0.0LM Studio → Local Server
Context window32,000 – 64,000 tokensLM Studio → Model Settings
Temperature0.20.4 for codingLM Studio → Model Settings

Server port

LM Studio’s default server port is 1234. ApexSpriteAI expects this port unless you override ANTHROPIC_BASE_URL in your config. If you change the port, update ANTHROPIC_BASE_URL to match.

Bind address

Set the bind address to 0.0.0.0 so that LM Studio accepts requests from any network interface, including Tailscale. If you leave it at 127.0.0.1, only processes on the same machine can reach the server. See Network configuration for more detail on when this matters.

Context window

The context window is the maximum number of tokens — roughly words and punctuation marks — that the model can hold in memory at once. It includes your prompt, the conversation history, any tool call results, and the model’s reply.

Why context window size matters

A larger context window lets the model read more of your codebase at once, retain longer conversation histories, and process large tool outputs without truncation. However, larger windows consume more GPU memory and increase the time needed to process each token. For Qwen2.5-Coder-32B on a 128 GB GPU:
  • 32,000 tokens — Fast responses, suitable for most coding sessions.
  • 64,000 tokens — Slower but handles large files and long conversations without truncation.
Increasing the context window beyond what your hardware can comfortably hold causes LM Studio to offload layers to CPU RAM, which significantly reduces throughput. Start at 32k and increase only if you find responses are being cut off.

How to set it

In LM Studio, load your model and open Model Settings. Find Context Length and enter your target value. Changes take effect the next time the model is loaded.

Loading a model

1

Open the Models tab in LM Studio

Use the search bar to find the model you want. ApexSpriteAI works best with models in the 32B–70B range. See Optimize speed and performance for a full comparison.
2

Download the model

Click Download. Large models (32B at Q4 quantization) are roughly 18–20 GB. Ensure you have sufficient disk space before starting.
3

Load the model

Click Load after the download completes. LM Studio allocates GPU memory and displays a green status indicator when the model is ready.
4

Start the local server

Switch to the Local Server tab, confirm the port and bind address, then click Start Server.

Testing the server with a direct request

Before connecting Claude Code, verify that LM Studio is responding correctly by sending a test request from your terminal. ApexSpriteAI uses the Anthropic messages format at the /v1/messages endpoint.
curl -s http://localhost:1234/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: local" \
  -d '{
    "model": "local-model",
    "max_tokens": 64,
    "messages": [
      {"role": "user", "content": "Reply with: ready"}
    ]
  }' | python3 -m json.tool
The x-api-key: local header satisfies LM Studio’s authentication check without requiring a real Anthropic API key. Claude Code sends this header automatically when ANTHROPIC_BASE_URL is set to a local address.
If the response contains a content array with your expected text, the server is working and you can proceed to configure Claude Code. If you receive a connection error, check that the server is started and that the bind address and port match your request URL.

Model format compatibility

LM Studio translates the Anthropic messages format into the prompt template required by the loaded model (for example, Qwen’s ChatML template). You do not need to change the request format — Claude Code always sends Anthropic-formatted requests, and LM Studio handles the conversion automatically.