> ## Documentation Index
> Fetch the complete documentation index at: https://docs-apexspriteai.reliatrack.org/llms.txt
> Use this file to discover all available pages before exploring further.

# How ApexSpriteAI works: a system architecture guide

> Learn how ApexSpriteAI's four components — CLI, VPN, LM Studio, and MCP tools — work together to deliver private, GPU-accelerated AI agent workflows.

ApexSpriteAI connects four distinct components to deliver a private, GPU-accelerated AI coding assistant. Your local Claude Code CLI handles user interaction and tool execution, a Tailscale VPN tunnel carries requests securely to your server, LM Studio runs the language model on dedicated GPU hardware, and MCP tools extend the agent's capabilities by letting the model read files, run commands, and call custom integrations — all without sending any data to the cloud.

## System components

<CardGroup cols={2}>
  <Card title="Frontend: Claude Code CLI" icon="terminal">
    Runs on your Mac. Accepts your input, renders the terminal UI, formats requests in the Anthropic messages format, and executes MCP tools locally when the model calls them.
  </Card>

  <Card title="Network layer: Tailscale VPN" icon="shield">
    Creates a secure, peer-to-peer encrypted tunnel between your Mac and your GPU server. API requests travel over a private `100.x.x.x` address — your LM Studio server is never exposed to the public internet.
  </Card>

  <Card title="Backend AI engine: LM Studio" icon="microchip">
    Runs on your NVIDIA GPU server with 128 GB of unified RAM. Hosts an OpenAI-compatible API on port 1234 and processes every inference request locally using the loaded model.
  </Card>

  <Card title="Tool execution: MCP" icon="wrench">
    When the model decides to use a tool (for example, reading a file or running a shell command), Claude Code executes it on your Mac and injects the result back into the conversation context.
  </Card>
</CardGroup>

## Component details

### Frontend: Claude Code CLI

The Claude Code CLI (`@anthropic-ai/claude-code`) is the entry point for every interaction. It runs on your local Mac and is responsible for:

* Accepting your natural-language prompts from the terminal
* Assembling the full message payload, including the list of available MCP tools
* Sending HTTP POST requests to the `/v1/messages` endpoint on your LM Studio server
* Parsing the model's response and executing any tool calls it contains
* Displaying the final answer in your terminal

The CLI is configured via `~/.claude/config.json`. Setting `ANTHROPIC_BASE_URL` to your server's Tailscale address is all that is required to redirect traffic away from Anthropic's cloud.

```json ~/.claude/config.json theme={null}
{
  "ANTHROPIC_BASE_URL": "http://100.x.x.x:1234",
  "ANTHROPIC_API_KEY": "lm-studio"
}
```

### Network layer: Tailscale VPN

Tailscale creates a private mesh network between your Mac and your GPU server. Each device gets a stable `100.x.x.x` IP address that persists across restarts and network changes. For example, your GPU server might be assigned `100.82.56.40` — replace this with whatever address `tailscale ip -4` reports on your server.

Key properties of the Tailscale layer:

* **Encrypted in transit.** All traffic between your Mac and the server uses WireGuard encryption.
* **No public exposure.** LM Studio binds to `0.0.0.0:1234`, but only Tailscale peers can reach that address.
* **Zero configuration routing.** Once both devices are on the same Tailscale network, no additional firewall rules or port forwarding are required.

### Backend AI engine: LM Studio

LM Studio runs on your NVIDIA GPU server and hosts a local API that speaks the same format as Anthropic's Messages API. Claude Code sends requests to it without any modification.

| Property          | Value                      |
| ----------------- | -------------------------- |
| API port          | `1234`                     |
| Bind address      | `0.0.0.0` (all interfaces) |
| API format        | Anthropic `/v1/messages`   |
| Minimum version   | LM Studio 0.4.1            |
| Recommended model | Qwen2.5-Coder-32B-Instruct |

The recommended model — `Qwen2.5-Coder-32B-Instruct` — is state-of-the-art for coding and tool use at the 32B parameter scale and delivers low-latency responses on 128 GB of unified RAM. You can also run `Llama-3.3-70B-Instruct` for deeper reasoning or `DeepSeek-Coder-V2-Lite-Instruct` (16B) for maximum speed.

### Tool execution: MCP

The Model Context Protocol (MCP) lets the language model declare that it wants to use a tool rather than generating a plain-text answer. The tool call itself is just a JSON object in the model's response — the actual execution happens locally on your Mac through Claude Code.

This design means:

* **Tool execution is always local.** No matter where the model is hosted, your files and shell stay on your machine.
* **Adding tools is straightforward.** Run `claude mcp add <name> <command>` on your Mac to register any MCP-compatible tool.
* **The model's location does not limit tool access.** LM Studio on a remote server returns the tool-call JSON; Claude Code on your Mac carries out the action.

## Data flow walkthrough

The following sequence shows exactly what happens when you type a prompt into Claude Code.

<Steps>
  <Step title="You enter a prompt">
    You run `claude "Add a new endpoint to my Express app"` in your terminal. Claude Code reads your project files for context and compiles the list of registered MCP tools.
  </Step>

  <Step title="Claude Code sends a request to LM Studio">
    Claude Code formats your prompt plus the tool definitions into an Anthropic-style messages payload and sends an HTTP POST to `http://100.82.56.40:1234/v1/messages` over the Tailscale tunnel.

    ```
    Mac  ──── HTTPS over Tailscale (100.82.56.40) ────▶  LM Studio :1234
    ```
  </Step>

  <Step title="LM Studio generates a response">
    LM Studio passes the request to the loaded model (for example, `Qwen2.5-Coder-32B`). The model generates a response — either a plain-text answer or a tool-call JSON object — and LM Studio returns it to Claude Code.
  </Step>

  <Step title="Claude Code checks for tool calls">
    If the response contains a tool call (for example, `read_file` on a source file), Claude Code executes that tool locally on your Mac. The result — the file contents, command output, or other data — is injected back into the conversation as a new message.
  </Step>

  <Step title="LM Studio generates the final answer">
    Claude Code sends the tool result back to LM Studio. The model incorporates it and generates a final response, which Claude Code renders in your terminal.

    ```
    LM Studio  ──── JSON response ────▶  Mac  ──── displayed in terminal
    ```
  </Step>
</Steps>

## Architecture diagram

```mermaid theme={null}
flowchart LR
    A["You\n(Mac terminal)"] -- "claude &quot;...&quot;" --> B["Claude Code CLI\n(~/.claude/config.json)"]
    B -- "POST /v1/messages\nover Tailscale VPN" --> C["LM Studio\nport 1234"]
    C -- "JSON response\n(text or tool call)" --> B
    B -- "executes tool call\nlocally on Mac" --> D["MCP tools\n(file, shell, custom)"]
    D -- "tool result" --> B
    B -- "final answer" --> A
```

<Note>
  The MCP tool execution step (between Claude Code and MCP tools) happens entirely on your Mac, even though the model that requested the tool is running on a remote server. The model never has direct access to your filesystem or shell.
</Note>
