How ApexSpriteAI works: a system architecture guide

ApexSpriteAI connects four distinct components to deliver a private, GPU-accelerated AI coding assistant. Your local Claude Code CLI handles user interaction and tool execution, a Tailscale VPN tunnel carries requests securely to your server, LM Studio runs the language model on dedicated GPU hardware, and MCP tools extend the agent’s capabilities by letting the model read files, run commands, and call custom integrations — all without sending any data to the cloud.

System components

Frontend: Claude Code CLI

Runs on your Mac. Accepts your input, renders the terminal UI, formats requests in the Anthropic messages format, and executes MCP tools locally when the model calls them.

Network layer: Tailscale VPN

Creates a secure, peer-to-peer encrypted tunnel between your Mac and your GPU server. API requests travel over a private 100.x.x.x address — your LM Studio server is never exposed to the public internet.

Backend AI engine: LM Studio

Runs on your NVIDIA GPU server with 128 GB of unified RAM. Hosts an OpenAI-compatible API on port 1234 and processes every inference request locally using the loaded model.

Tool execution: MCP

When the model decides to use a tool (for example, reading a file or running a shell command), Claude Code executes it on your Mac and injects the result back into the conversation context.

Component details

Frontend: Claude Code CLI

The Claude Code CLI (@anthropic-ai/claude-code) is the entry point for every interaction. It runs on your local Mac and is responsible for:

Accepting your natural-language prompts from the terminal
Assembling the full message payload, including the list of available MCP tools
Sending HTTP POST requests to the /v1/messages endpoint on your LM Studio server
Parsing the model’s response and executing any tool calls it contains
Displaying the final answer in your terminal

The CLI is configured via ~/.claude/config.json. Setting ANTHROPIC_BASE_URL to your server’s Tailscale address is all that is required to redirect traffic away from Anthropic’s cloud.

~/.claude/config.json

{
  "ANTHROPIC_BASE_URL": "http://100.x.x.x:1234",
  "ANTHROPIC_API_KEY": "lm-studio"
}

Network layer: Tailscale VPN

Tailscale creates a private mesh network between your Mac and your GPU server. Each device gets a stable 100.x.x.x IP address that persists across restarts and network changes. For example, your GPU server might be assigned 100.82.56.40 — replace this with whatever address tailscale ip -4 reports on your server. Key properties of the Tailscale layer:

Encrypted in transit. All traffic between your Mac and the server uses WireGuard encryption.
No public exposure. LM Studio binds to 0.0.0.0:1234, but only Tailscale peers can reach that address.
Zero configuration routing. Once both devices are on the same Tailscale network, no additional firewall rules or port forwarding are required.

Backend AI engine: LM Studio

LM Studio runs on your NVIDIA GPU server and hosts a local API that speaks the same format as Anthropic’s Messages API. Claude Code sends requests to it without any modification.

Property	Value
API port	`1234`
Bind address	`0.0.0.0` (all interfaces)
API format	Anthropic `/v1/messages`
Minimum version	LM Studio 0.4.1
Recommended model	Qwen2.5-Coder-32B-Instruct

The recommended model — Qwen2.5-Coder-32B-Instruct — is state-of-the-art for coding and tool use at the 32B parameter scale and delivers low-latency responses on 128 GB of unified RAM. You can also run Llama-3.3-70B-Instruct for deeper reasoning or DeepSeek-Coder-V2-Lite-Instruct (16B) for maximum speed.

Tool execution: MCP

The Model Context Protocol (MCP) lets the language model declare that it wants to use a tool rather than generating a plain-text answer. The tool call itself is just a JSON object in the model’s response — the actual execution happens locally on your Mac through Claude Code. This design means:

Tool execution is always local. No matter where the model is hosted, your files and shell stay on your machine.
Adding tools is straightforward. Run claude mcp add <name> <command> on your Mac to register any MCP-compatible tool.
The model’s location does not limit tool access. LM Studio on a remote server returns the tool-call JSON; Claude Code on your Mac carries out the action.

Data flow walkthrough

The following sequence shows exactly what happens when you type a prompt into Claude Code.

You enter a prompt

You run claude "Add a new endpoint to my Express app" in your terminal. Claude Code reads your project files for context and compiles the list of registered MCP tools.

Claude Code sends a request to LM Studio

Claude Code formats your prompt plus the tool definitions into an Anthropic-style messages payload and sends an HTTP POST to http://100.82.56.40:1234/v1/messages over the Tailscale tunnel.

Mac  ──── HTTPS over Tailscale (100.82.56.40) ────▶  LM Studio :1234

LM Studio generates a response

LM Studio passes the request to the loaded model (for example, Qwen2.5-Coder-32B). The model generates a response — either a plain-text answer or a tool-call JSON object — and LM Studio returns it to Claude Code.

Claude Code checks for tool calls

If the response contains a tool call (for example, read_file on a source file), Claude Code executes that tool locally on your Mac. The result — the file contents, command output, or other data — is injected back into the conversation as a new message.

LM Studio generates the final answer

Claude Code sends the tool result back to LM Studio. The model incorporates it and generates a final response, which Claude Code renders in your terminal.

LM Studio  ──── JSON response ────▶  Mac  ──── displayed in terminal

Architecture diagram

The MCP tool execution step (between Claude Code and MCP tools) happens entirely on your Mac, even though the model that requested the tool is running on a remote server. The model never has direct access to your filesystem or shell.

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

How ApexSpriteAI works: a system architecture guide

System components

Frontend: Claude Code CLI

Network layer: Tailscale VPN

Backend AI engine: LM Studio

Tool execution: MCP

Component details

Frontend: Claude Code CLI

Network layer: Tailscale VPN

Backend AI engine: LM Studio

Tool execution: MCP

Data flow walkthrough

Architecture diagram

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

Documentation Index

​System components

Frontend: Claude Code CLI

Network layer: Tailscale VPN

Backend AI engine: LM Studio

Tool execution: MCP

​Component details

​Frontend: Claude Code CLI

​Network layer: Tailscale VPN

​Backend AI engine: LM Studio

​Tool execution: MCP

​Data flow walkthrough

​Architecture diagram

System components

Component details

Frontend: Claude Code CLI

Network layer: Tailscale VPN

Backend AI engine: LM Studio

Tool execution: MCP

Data flow walkthrough

Architecture diagram