ApexSpriteAI connects four distinct components to deliver a private, GPU-accelerated AI coding assistant. Your local Claude Code CLI handles user interaction and tool execution, a Tailscale VPN tunnel carries requests securely to your server, LM Studio runs the language model on dedicated GPU hardware, and MCP tools extend the agent’s capabilities by letting the model read files, run commands, and call custom integrations — all without sending any data to the cloud.Documentation Index
Fetch the complete documentation index at: https://reliatrack.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
System components
Frontend: Claude Code CLI
Runs on your Mac. Accepts your input, renders the terminal UI, formats requests in the Anthropic messages format, and executes MCP tools locally when the model calls them.
Network layer: Tailscale VPN
Creates a secure, peer-to-peer encrypted tunnel between your Mac and your GPU server. API requests travel over a private
100.x.x.x address — your LM Studio server is never exposed to the public internet.Backend AI engine: LM Studio
Runs on your NVIDIA GPU server with 128 GB of unified RAM. Hosts an OpenAI-compatible API on port 1234 and processes every inference request locally using the loaded model.
Tool execution: MCP
When the model decides to use a tool (for example, reading a file or running a shell command), Claude Code executes it on your Mac and injects the result back into the conversation context.
Component details
Frontend: Claude Code CLI
The Claude Code CLI (@anthropic-ai/claude-code) is the entry point for every interaction. It runs on your local Mac and is responsible for:
- Accepting your natural-language prompts from the terminal
- Assembling the full message payload, including the list of available MCP tools
- Sending HTTP POST requests to the
/v1/messagesendpoint on your LM Studio server - Parsing the model’s response and executing any tool calls it contains
- Displaying the final answer in your terminal
~/.claude/config.json. Setting ANTHROPIC_BASE_URL to your server’s Tailscale address is all that is required to redirect traffic away from Anthropic’s cloud.
~/.claude/config.json
Network layer: Tailscale VPN
Tailscale creates a private mesh network between your Mac and your GPU server. Each device gets a stable100.x.x.x IP address that persists across restarts and network changes. For example, your GPU server might be assigned 100.82.56.40 — replace this with whatever address tailscale ip -4 reports on your server.
Key properties of the Tailscale layer:
- Encrypted in transit. All traffic between your Mac and the server uses WireGuard encryption.
- No public exposure. LM Studio binds to
0.0.0.0:1234, but only Tailscale peers can reach that address. - Zero configuration routing. Once both devices are on the same Tailscale network, no additional firewall rules or port forwarding are required.
Backend AI engine: LM Studio
LM Studio runs on your NVIDIA GPU server and hosts a local API that speaks the same format as Anthropic’s Messages API. Claude Code sends requests to it without any modification.| Property | Value |
|---|---|
| API port | 1234 |
| Bind address | 0.0.0.0 (all interfaces) |
| API format | Anthropic /v1/messages |
| Minimum version | LM Studio 0.4.1 |
| Recommended model | Qwen2.5-Coder-32B-Instruct |
Qwen2.5-Coder-32B-Instruct — is state-of-the-art for coding and tool use at the 32B parameter scale and delivers low-latency responses on 128 GB of unified RAM. You can also run Llama-3.3-70B-Instruct for deeper reasoning or DeepSeek-Coder-V2-Lite-Instruct (16B) for maximum speed.
Tool execution: MCP
The Model Context Protocol (MCP) lets the language model declare that it wants to use a tool rather than generating a plain-text answer. The tool call itself is just a JSON object in the model’s response — the actual execution happens locally on your Mac through Claude Code. This design means:- Tool execution is always local. No matter where the model is hosted, your files and shell stay on your machine.
- Adding tools is straightforward. Run
claude mcp add <name> <command>on your Mac to register any MCP-compatible tool. - The model’s location does not limit tool access. LM Studio on a remote server returns the tool-call JSON; Claude Code on your Mac carries out the action.
Data flow walkthrough
The following sequence shows exactly what happens when you type a prompt into Claude Code.You enter a prompt
You run
claude "Add a new endpoint to my Express app" in your terminal. Claude Code reads your project files for context and compiles the list of registered MCP tools.Claude Code sends a request to LM Studio
Claude Code formats your prompt plus the tool definitions into an Anthropic-style messages payload and sends an HTTP POST to
http://100.82.56.40:1234/v1/messages over the Tailscale tunnel.LM Studio generates a response
LM Studio passes the request to the loaded model (for example,
Qwen2.5-Coder-32B). The model generates a response — either a plain-text answer or a tool-call JSON object — and LM Studio returns it to Claude Code.Claude Code checks for tool calls
If the response contains a tool call (for example,
read_file on a source file), Claude Code executes that tool locally on your Mac. The result — the file contents, command output, or other data — is injected back into the conversation as a new message.Architecture diagram
The MCP tool execution step (between Claude Code and MCP tools) happens entirely on your Mac, even though the model that requested the tool is running on a remote server. The model never has direct access to your filesystem or shell.