Local Scaler
By default, Synthetic starts new clusters on a GPU cloud (eg. Together k8s or runpod), even in development.
Sometimes, you may wish to run LLM clusters locally for faster development loops, or to test new vLLM container changes.
Setup
-
Install Docker
-
Install the NVIDIA Container Toolkit.
-
Verify your local Docker + GPU setup is working:
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Usage
warning
This should only be enabled temporarily for vLLM image development, as this disables several code paths (eg. parts of autoscaler).
Add this to your local .env
.
USE_LOCAL_SCALER="true"