Skip to main content

Kubernetes Scaler

First-time Setup

Set up the Together.ai Kubernetes cluster.

We currently partner with Together AI who manages a kubernetes GPU cluster for us.

info

At the time of writing (Janurary 2025), we currently rent a single 8xA100 8GB node, which hosts both the worker pods and the Kubernetes control plane.

(Optional) Set up a DNS alias for the Together AI control plane node.

# /etc/hosts

# Note that we do not have a guarantee from Together AI that this IP is static.
45.63.2.146 e328afc8-01.cloud.together.ai together-gpu

(Alternatively, you can configure your local DNS server if you have one set up.)

Get and set up the Together AI kubeconfig.

# Option 1: (Recommended) Get config from azure key vault.
az keyvault secret download --name GLHF-DEV-KUBERNETES-SCALER-TOGETHER-KUBECONFIG --vault-name glhf-key-vault --file $HOME/kube/synthetic_labs.kubeconfig

# Option 2: Download the kube config directly from the control node.
# Note: The Together AI SSH key is stored in 1Password. Please keep this key safe and protect it as you would any other production secret.
# Note: The canonical root-only kubeconfig is at /etc/together/synthetic_labs.kubeconfig
scp -i ~/.ssh/togetherai.id_ed25519 syntheticlab@e328afc8-01.cloud.together.ai:/home/syntheticlab/.kube/synthetic_labs.kubeconfig ~/kube/synthetic_labs.kubeconfig

# Merge this kubeconfig with your existing one.
cp ~/.kube/config ~/.kube/config.bak # Make a backup
KUBECONFIG=$HOME/.kube/config:$HOME/.kube/synthetic_labs.kubeconfig kubectl config view --merge --flatten > ~/.kube/config

# Verify that the new config contains the `synthetic_labs` context.
kubectl config get-contexts

# (Optional) Set the current context to Together AI's cluster.
kubectl config use-context synthetic_labs

Feel free to rename the context or cluster in your ~/.kube/config to be more descriptive.

Set up a local Kubernetes cluster.

Some tools for this include minikube, kind, and k3s.

We use k3s in this guide.

Prerequisites:

  • Machine with a NVIDIA GPU with proper drivers installed. (eg. nvidia-smi should work)
  • Docker w/ NVIDIA Container Toolkit installed. (eg. sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi should work)
  • kubectl, helm
warning

This will create a new context in your .kube/config and set it to default.

To see all contexts, kubectl config get-contexts, and to set a new default, kubectl config use-context <context>

# Configure containerd to support the nvidia runtime.
sudo nvidia-ctk runtime configure --runtime=containerd

# Install k3s
curl -fsL https://get.k3s.io | sh -

# Verify the NVIDIA container runtime has been recognized by k3s
sudo grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml

# Add "--default-runtime nvidia" to the server runtime command
sudo vi /etc/systemd/system/k3s.service
sudo systemctl daemon-reload
sudo systemctl restart k3s

# (optional) Merge the new k3s kube config with your original one.
cp /home/$(whoami)/.kube/config /home/($whoami)/.kube/config.bak
sudo KUBECONFIG=/home/$(whoami)/.kube/config:/etc/rancher/k3s/k3s.yaml kubectl config view --flatten > /home/$(whoami)/.kube/config

# Select the new k3s context.
kubectl config use-context default

# Verify k3s is working
kubectl cluster-info

# Install the NVIDIA GPU operator
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
# DISABLE_DEV_CHAR_SYMLINK_CREATION: https://github.com/NVIDIA/gpu-operator/issues/569
helm install gpu-operator --wait -n gpu-operator nvidia/gpu-operator \
--set driver.enabled=false \
--set toolkit.enabled=false \
--set "validator.driver.env[0].name=DISABLE_DEV_CHAR_SYMLINK_CREATION" \
--set-string "validator.driver.env[0].value=true"

# Wait for all pods to finish initializing (~seconds)
kubectl get pods --namespace gpu-operator

# Verify our node has capability nvidia.com/gpu
kubectl get node -o json | jq '.items[0].status.capacity'
# {
# ...
# "nvidia.com/gpu": "1",
# ...
# }

# Create a pod with a CUDA workload to test our configuration.
kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia
containers:
- name: cuda-vectoradd
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
resources:
limits:
nvidia.com/gpu: 1
EOF

# Verify everything is running!
kubectl logs cuda-vectoradd
# [Vector addition of 50000 elements]
# Copy input data from the host memory to the CUDA device
# CUDA kernel launch with 196 blocks of 256 threads
# Copy output data from the CUDA device to the host memory
# Test PASSED
# Done

Appendix

Run nvidia-smi in a pod

kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: nvidia-version-check
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia
containers:
- name: nvidia-version-check
image: "nvidia/cuda:12.6.0-base-ubuntu22.04"
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 8
EOF