Skip to main content

Documentation Index

Fetch the complete documentation index at: https://reliatrack.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

LM Studio turns your GPU server into a drop-in replacement for the Anthropic API. Once it’s running, any tool that speaks the /v1/messages protocol — including the Claude Code CLI — can send requests to your local machine instead of the cloud. This guide walks you through installing LM Studio, loading a model, and confirming the server is reachable.

Prerequisites

  • A GPU server with at least 32 GB of VRAM or unified RAM (128 GB recommended for 32B+ models)
  • A supported operating system: macOS, Windows, or Linux
  • Network access to the server (direct or via Tailscale)
1

Download LM Studio

Go to lmstudio.ai and download the installer for your server’s operating system. You need version 0.4.1 or later — earlier releases do not include the local server feature used in this guide.
LM Studio v0.4.1+ ships with a built-in OpenAI-compatible server. If your installed version is older, update it before continuing.
2

Install LM Studio on your server

Run the downloaded installer and follow the on-screen prompts. On Linux, the package is distributed as an AppImage:
chmod +x LM_Studio-*.AppImage
./LM_Studio-*.AppImage
On macOS, drag LM Studio into your Applications folder and open it. On Windows, run the .exe installer directly.
3

Load a model

After LM Studio opens, navigate to the Discover tab and search for a model to download. The recommended starting point is Qwen2.5-Coder-32B-Instruct — it delivers state-of-the-art coding and tool-use performance at 32B parameters, with low latency on hardware with 64 GB or more of memory.
If your server has 128 GB of unified RAM, Qwen2.5-Coder-32B-Instruct loads comfortably and responds quickly. For complex reasoning tasks, Llama-3.3-70B-Instruct is a solid alternative at the cost of slightly higher latency. See Choose the right AI model for a full comparison.
Select a quantized variant (Q4_K_M or Q5_K_M) to balance quality and speed, then click Download. Wait for the download to complete before moving to the next step.
4

Enable the local server

Switch to the Developer tab (the </> icon in the left sidebar). You will see a Local Server panel.Configure the server with these settings:
SettingValue
Port1234
Bind address0.0.0.0
CORSEnabled
Setting the bind address to 0.0.0.0 allows connections from other machines on the same network or VPN — this is required if you are connecting from a separate Mac or workstation over Tailscale.
Binding to 0.0.0.0 exposes the server on all network interfaces. Use a VPN such as Tailscale or a firewall rule to restrict access to trusted clients only. Do not expose port 1234 directly to the public internet.
Click Start Server. The status indicator turns green when the server is accepting connections.
5

Select the loaded model in the server

In the Local Server panel, open the model selector dropdown and choose the model you downloaded in the previous step (e.g., Qwen2.5-Coder-32B-Instruct). LM Studio loads it into memory and makes it available at the /v1/messages endpoint.
6

Verify the server with a curl test

From any machine that can reach your server, run the following command to send a test message. Replace <SERVER_IP> with your server’s IP address (or localhost if you are testing from the same machine).
curl http://<SERVER_IP>:1234/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: local" \
  -d '{
    "model": "local-model",
    "max_tokens": 64,
    "messages": [
      { "role": "user", "content": "Reply with: Server is running." }
    ]
  }'
A successful response looks like this:
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Server is running."
      }
    }
  ]
}
If you receive a connection refused error, confirm that the server is started in LM Studio and that your firewall allows traffic on port 1234.

Next steps

With your LM Studio server running, you can connect the Claude Code CLI to it so all inference requests are handled locally. Follow Connect Claude Code CLI to your local model to complete the setup.