LM Studio turns your GPU server into a drop-in replacement for the Anthropic API. Once it’s running, any tool that speaks theDocumentation Index
Fetch the complete documentation index at: https://reliatrack.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
/v1/messages protocol — including the Claude Code CLI — can send requests to your local machine instead of the cloud. This guide walks you through installing LM Studio, loading a model, and confirming the server is reachable.
Prerequisites
- A GPU server with at least 32 GB of VRAM or unified RAM (128 GB recommended for 32B+ models)
- A supported operating system: macOS, Windows, or Linux
- Network access to the server (direct or via Tailscale)
Download LM Studio
Go to lmstudio.ai and download the installer for your server’s operating system. You need version 0.4.1 or later — earlier releases do not include the local server feature used in this guide.
LM Studio v0.4.1+ ships with a built-in OpenAI-compatible server. If your installed version is older, update it before continuing.
Install LM Studio on your server
Run the downloaded installer and follow the on-screen prompts. On Linux, the package is distributed as an AppImage:On macOS, drag LM Studio into your Applications folder and open it. On Windows, run the
.exe installer directly.Load a model
After LM Studio opens, navigate to the Discover tab and search for a model to download. The recommended starting point is Qwen2.5-Coder-32B-Instruct — it delivers state-of-the-art coding and tool-use performance at 32B parameters, with low latency on hardware with 64 GB or more of memory.Select a quantized variant (Q4_K_M or Q5_K_M) to balance quality and speed, then click Download. Wait for the download to complete before moving to the next step.
Enable the local server
Switch to the Developer tab (the
Setting the bind address to
</> icon in the left sidebar). You will see a Local Server panel.Configure the server with these settings:| Setting | Value |
|---|---|
| Port | 1234 |
| Bind address | 0.0.0.0 |
| CORS | Enabled |
0.0.0.0 allows connections from other machines on the same network or VPN — this is required if you are connecting from a separate Mac or workstation over Tailscale.Click Start Server. The status indicator turns green when the server is accepting connections.Select the loaded model in the server
In the Local Server panel, open the model selector dropdown and choose the model you downloaded in the previous step (e.g.,
Qwen2.5-Coder-32B-Instruct). LM Studio loads it into memory and makes it available at the /v1/messages endpoint.Verify the server with a curl test
From any machine that can reach your server, run the following command to send a test message. Replace A successful response looks like this:If you receive a connection refused error, confirm that the server is started in LM Studio and that your firewall allows traffic on port 1234.
<SERVER_IP> with your server’s IP address (or localhost if you are testing from the same machine).