Kubernetes Scaler
First-time Setup
Set up the Together.ai Kubernetes cluster.
We currently partner with Together AI who manages a kubernetes GPU cluster for us.
info
At the time of writing (Janurary 2025), we currently rent a single 8xA100 8GB node, which hosts both the worker pods and the Kubernetes control plane.
(Optional) Set up a DNS alias for the Together AI control plane node.
# /etc/hosts
# Note that we do not have a guarantee from Together AI that this IP is static.
45.63.2.146 e328afc8-01.cloud.together.ai together-gpu
(Alternatively, you can configure your local DNS server if you have one set up.)
Get and set up the Together AI kubeconfig.
# Option 1: (Recommended) Get config from azure key vault.
az keyvault secret download --name GLHF-DEV-KUBERNETES-SCALER-TOGETHER-KUBECONFIG --vault-name glhf-key-vault --file $HOME/kube/synthetic_labs.kubeconfig
# Option 2: Download the kube config directly from the control node.
# Note: The Together AI SSH key is stored in 1Password. Please keep this key safe and protect it as you would any other production secret.
# Note: The canonical root-only kubeconfig is at /etc/together/synthetic_labs.kubeconfig
scp -i ~/.ssh/togetherai.id_ed25519 syntheticlab@e328afc8-01.cloud.together.ai:/home/syntheticlab/.kube/synthetic_labs.kubeconfig ~/kube/synthetic_labs.kubeconfig
# Merge this kubeconfig with your existing one.
cp ~/.kube/config ~/.kube/config.bak # Make a backup
KUBECONFIG=$HOME/.kube/config:$HOME/.kube/synthetic_labs.kubeconfig kubectl config view --merge --flatten > ~/.kube/config
# Verify that the new config contains the `synthetic_labs` context.
kubectl config get-contexts
# (Optional) Set the current context to Together AI's cluster.
kubectl config use-context synthetic_labs
Feel free to rename the context or cluster in your ~/.kube/config
to be more descriptive.
Set up a local Kubernetes cluster.
Some tools for this include minikube, kind, and k3s.
We use k3s in this guide.
Prerequisites:
- Machine with a NVIDIA GPU with proper drivers installed. (eg.
nvidia-smi
should work) - Docker w/ NVIDIA Container Toolkit installed. (eg.
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
should work) - kubectl, helm
warning
This will create a new context in your .kube/config
and set it to default.
To see all contexts, kubectl config get-contexts
, and to set a new default, kubectl config use-context <context>
# Configure containerd to support the nvidia runtime.
sudo nvidia-ctk runtime configure --runtime=containerd
# Install k3s
curl -fsL https://get.k3s.io | sh -
# Verify the NVIDIA container runtime has been recognized by k3s
sudo grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml
# Add "--default-runtime nvidia" to the server runtime command
sudo vi /etc/systemd/system/k3s.service
sudo systemctl daemon-reload
sudo systemctl restart k3s
# (optional) Merge the new k3s kube config with your original one.
cp /home/$(whoami)/.kube/config /home/($whoami)/.kube/config.bak
sudo KUBECONFIG=/home/$(whoami)/.kube/config:/etc/rancher/k3s/k3s.yaml kubectl config view --flatten > /home/$(whoami)/.kube/config
# Select the new k3s context.
kubectl config use-context default
# Verify k3s is working
kubectl cluster-info
# Install the NVIDIA GPU operator
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
# DISABLE_DEV_CHAR_SYMLINK_CREATION: https://github.com/NVIDIA/gpu-operator/issues/569
helm install gpu-operator --wait -n gpu-operator nvidia/gpu-operator \
--set driver.enabled=false \
--set toolkit.enabled=false \
--set "validator.driver.env[0].name=DISABLE_DEV_CHAR_SYMLINK_CREATION" \
--set-string "validator.driver.env[0].value=true"
# Wait for all pods to finish initializing (~seconds)
kubectl get pods --namespace gpu-operator
# Verify our node has capability nvidia.com/gpu
kubectl get node -o json | jq '.items[0].status.capacity'
# {
# ...
# "nvidia.com/gpu": "1",
# ...
# }
# Create a pod with a CUDA workload to test our configuration.
kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia
containers:
- name: cuda-vectoradd
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
resources:
limits:
nvidia.com/gpu: 1
EOF
# Verify everything is running!
kubectl logs cuda-vectoradd
# [Vector addition of 50000 elements]
# Copy input data from the host memory to the CUDA device
# CUDA kernel launch with 196 blocks of 256 threads
# Copy output data from the CUDA device to the host memory
# Test PASSED
# Done
Appendix
Run nvidia-smi
in a pod
kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: nvidia-version-check
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia
containers:
- name: nvidia-version-check
image: "nvidia/cuda:12.6.0-base-ubuntu22.04"
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 8
EOF