Skip to main content

Documentation Index

Fetch the complete documentation index at: https://reliatrack.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

ApexSpriteAI pairs the Claude Code CLI with a locally hosted LLM to give you a fast, private AI coding assistant that supports the full Model Context Protocol (MCP) tool system. This guide walks you through every step — from installing LM Studio to sending your first query — and takes about 15 minutes to complete.
1

Install LM Studio

Download and install LM Studio on the machine that has your GPU. LM Studio provides a graphical interface for downloading models and running a local OpenAI-compatible API server.Once installed, open LM Studio and sign in (or skip sign-in). You should land on the Discover tab where you can search for models.
LM Studio 0.4.1 or later is required. Earlier versions may not support the /v1/messages endpoint that Claude Code uses.
2

Download the Qwen2.5-Coder-32B model

In LM Studio’s Discover tab, search for Qwen2.5-Coder-32B-Instruct and download a quantized GGUF variant (Q4_K_M is a good balance of speed and quality).This model is the recommended choice for ApexSpriteAI because it:
  • Runs at low latency on 32 GB or more of GPU or unified RAM
  • Matches Claude 3.5 Sonnet on many coding benchmarks
  • Reliably follows the tool-calling format that MCP depends on
If you have 128 GB of unified RAM, you can also try Llama-3.3-70B-Instruct for stronger general reasoning. For the fastest possible responses on any hardware, DeepSeek-Coder-V2-Lite-Instruct (16B) is a blazing-fast alternative.
3

Start the local server on port 1234

In LM Studio, switch to the Developer tab (the </> icon in the left sidebar). Load your downloaded model using the model selector at the top, then click Start Server.LM Studio will bind to 0.0.0.0:1234 by default, making the server reachable from other machines on your network (including over Tailscale). Confirm the server is running by checking for the green status indicator and a log entry like:
Server listening on http://0.0.0.0:1234
You can verify the server is reachable from your Mac with a quick connectivity check:
nc -vz 100.x.x.x 1234
Replace 100.x.x.x with your server’s Tailscale IP. You should see Connection to ... succeeded.
4

Install Claude Code CLI

On your Mac (or local workstation), install the Claude Code CLI globally using npm:
npm install -g @anthropic-ai/claude-code
Confirm the installation succeeded:
claude --version
Node.js 18 or later is required. Run node --version to check. If you need to upgrade, use nvm or download the latest LTS release from nodejs.org.
5

Configure Claude Code to use your local LM Studio server

Create (or edit) the Claude Code configuration file at ~/.claude/config.json. This tells Claude Code to send requests to your LM Studio server instead of Anthropic’s cloud API.
~/.claude/config.json
{
  "ANTHROPIC_BASE_URL": "http://100.x.x.x:1234",
  "ANTHROPIC_API_KEY": "lm-studio"
}
Replace 100.x.x.x with the actual Tailscale IP of your LM Studio server. The ANTHROPIC_API_KEY value is arbitrary — LM Studio does not validate it — but the field must be present.
If you are running LM Studio on the same machine as Claude Code, use http://localhost:1234 as the base URL instead.
You can also set these values as environment variables if you prefer not to store them in a config file: export ANTHROPIC_BASE_URL=http://100.x.x.x:1234 and export ANTHROPIC_API_KEY=lm-studio.
6

Run your first query

Open a terminal in any project directory and run:
claude "Explain what this project does and suggest one improvement"
Claude Code reads your local files, formats the request with available MCP tools, and sends it to LM Studio. You should see a response stream back within a few seconds.For an interactive session where you can ask follow-up questions, run claude with no arguments:
claude
If you see a response, your setup is complete. Claude Code is now running fully locally with MCP tool support.

Troubleshooting

Check that ANTHROPIC_API_KEY is present in ~/.claude/config.json. The value can be anything — LM Studio ignores it — but omitting the key causes Claude Code to reject the configuration.
Confirm LM Studio’s server is running (green indicator in the Developer tab). If you are connecting over Tailscale, verify both machines appear as connected peers with tailscale status and that you are using the correct 100.x.x.x IP.
Switch to a smaller model. The 32B Qwen model is the recommended starting point. If you loaded a 70B or 120B model, latency will be significantly higher. See the model selection guide for a detailed comparison.
MCP tools run locally on your Mac through the Claude Code CLI, regardless of where the LLM is hosted. Run claude mcp list to see which tools are registered. If the list is empty, add tools with claude mcp add <name> <command>.