Local Scaler

By default, Synthetic starts new clusters on a GPU cloud (eg. Together k8s or runpod), even in development.

Sometimes, you may wish to run LLM clusters locally for faster development loops, or to test new vLLM container changes.

Setup

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

warning

This should only be enabled temporarily for vLLM image development, as this disables several code paths (eg. parts of autoscaler).

Add this to your local .env.

USE_LOCAL_SCALER="true"