Kubernetes Scaler

First-time Setup

Set up the Together.ai Kubernetes cluster.

We currently partner with Together AI who manages a kubernetes GPU cluster for us.

info

At the time of writing (Janurary 2025), we currently rent a single 8xA100 8GB node, which hosts both the worker pods and the Kubernetes control plane.

(Optional) Set up a DNS alias for the Together AI control plane node.

# /etc/hosts

# Note that we do not have a guarantee from Together AI that this IP is static.
45.63.2.146 e328afc8-01.cloud.together.ai together-gpu

(Alternatively, you can configure your local DNS server if you have one set up.)

Get and set up the Together AI kubeconfig.

# Option 1: (Recommended) Get config from azure key vault.
az keyvault secret download --name GLHF-DEV-KUBERNETES-SCALER-TOGETHER-KUBECONFIG --vault-name glhf-key-vault --file $HOME/kube/synthetic_labs.kubeconfig

# Option 2: Download the kube config directly from the control node.
# Note: The Together AI SSH key is stored in 1Password. Please keep this key safe and protect it as you would any other production secret.
# Note: The canonical root-only kubeconfig is at /etc/together/synthetic_labs.kubeconfig
scp -i ~/.ssh/togetherai.id_ed25519 syntheticlab@e328afc8-01.cloud.together.ai:/home/syntheticlab/.kube/synthetic_labs.kubeconfig ~/kube/synthetic_labs.kubeconfig

# Merge this kubeconfig with your existing one.
cp ~/.kube/config ~/.kube/config.bak  # Make a backup
KUBECONFIG=$HOME/.kube/config:$HOME/.kube/synthetic_labs.kubeconfig kubectl config view --merge --flatten > ~/.kube/config

# Verify that the new config contains the `synthetic_labs` context.
kubectl config get-contexts

# (Optional) Set the current context to Together AI's cluster.
kubectl config use-context synthetic_labs

Feel free to rename the context or cluster in your ~/.kube/config to be more descriptive.

Set up a local Kubernetes cluster.

Some tools for this include minikube, kind, and k3s.

We use k3s in this guide.

Prerequisites:

Machine with a NVIDIA GPU with proper drivers installed. (eg. nvidia-smi should work)
Docker w/ NVIDIA Container Toolkit installed. (eg. sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi should work)
kubectl, helm

warning

This will create a new context in your .kube/config and set it to default.

To see all contexts, kubectl config get-contexts, and to set a new default, kubectl config use-context <context>

# Configure containerd to support the nvidia runtime.
sudo nvidia-ctk runtime configure --runtime=containerd

# Install k3s
curl -fsL https://get.k3s.io | sh -

# Verify the NVIDIA container runtime has been recognized by k3s
sudo grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml

# Add "--default-runtime nvidia" to the server runtime command
sudo vi /etc/systemd/system/k3s.service
sudo systemctl daemon-reload
sudo systemctl restart k3s

# (optional) Merge the new k3s kube config with your original one.
cp /home/$(whoami)/.kube/config /home/($whoami)/.kube/config.bak
sudo KUBECONFIG=/home/$(whoami)/.kube/config:/etc/rancher/k3s/k3s.yaml kubectl config view --flatten > /home/$(whoami)/.kube/config

# Select the new k3s context.
kubectl config use-context default

# Verify k3s is working
kubectl cluster-info

# Install the NVIDIA GPU operator
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
# DISABLE_DEV_CHAR_SYMLINK_CREATION: https://github.com/NVIDIA/gpu-operator/issues/569
helm install gpu-operator --wait -n gpu-operator nvidia/gpu-operator \
  --set driver.enabled=false \
  --set toolkit.enabled=false \
  --set "validator.driver.env[0].name=DISABLE_DEV_CHAR_SYMLINK_CREATION" \
  --set-string "validator.driver.env[0].value=true"

# Wait for all pods to finish initializing (~seconds)
kubectl get pods --namespace gpu-operator

# Verify our node has capability nvidia.com/gpu
kubectl get node -o json | jq '.items[0].status.capacity'
# {
#   ...
#   "nvidia.com/gpu": "1",
#   ...
# }

# Create a pod with a CUDA workload to test our configuration.
kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  runtimeClassName: nvidia
  containers:
  - name: cuda-vectoradd
    image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
    resources:
      limits:
        nvidia.com/gpu: 1
EOF

# Verify everything is running!
kubectl logs cuda-vectoradd
# [Vector addition of 50000 elements]
# Copy input data from the host memory to the CUDA device
# CUDA kernel launch with 196 blocks of 256 threads
# Copy output data from the CUDA device to the host memory
# Test PASSED
# Done

Appendix

Run `nvidia-smi` in a pod

kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-version-check
spec:
  restartPolicy: OnFailure
  runtimeClassName: nvidia
  containers:
  - name: nvidia-version-check
    image: "nvidia/cuda:12.6.0-base-ubuntu22.04"
    command: ["nvidia-smi"]
    resources:
      limits:
         nvidia.com/gpu: 8
EOF

First-time Setup

Set up the Together.ai Kubernetes cluster.​

(Optional) Set up a DNS alias for the Together AI control plane node.​

Get and set up the Together AI kubeconfig.​

Set up a local Kubernetes cluster.​