Skip to main content

Local Scaler

By default, Synthetic starts new clusters on a GPU cloud (eg. Together k8s or runpod), even in development.

Sometimes, you may wish to run LLM clusters locally for faster development loops, or to test new vLLM container changes.

Setup

  1. Install Docker

  2. Install the NVIDIA Container Toolkit.

  3. Verify your local Docker + GPU setup is working:

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Usage

warning

This should only be enabled temporarily for vLLM image development, as this disables several code paths (eg. parts of autoscaler).

Add this to your local .env.

USE_LOCAL_SCALER="true"