The Affinode Dashboard

Affinode deploys a lightweight controller on each GPU node. A per-node dashboard gives you real-time visibility into every process consuming GPU resources — and the ability to reclaim that memory automatically when processes go idle.

Getting Started

One Helm chart, zero code changes

Affinode is distributed as a Helm chart and deploys entirely within your existing Kubernetes cluster. A single helm install command creates everything it needs:

  • DaemonSet A node controller runs on every GPU node, monitoring CUDA activity and managing offload/restore cycles.
  • MutatingWebhookConfiguration Intercepts GPU pod creation and injects the lightweight CUDA shim into each container — no changes to your workload manifests.
  • Deployment + Service The GPU shim injector webhook, responsible for transparently wiring up CUDA interception at pod startup.
  • RBAC Least-privilege service account, role, and role binding scoped to what the controller needs.
Terminal showing Affinode Helm install completing successfully with all Kubernetes objects created

01

GPU resource visibility, per node

Affinode installs a controller on each GPU node as a Kubernetes DaemonSet. The dashboard connects to the node controller and shows every process currently holding GPU resources — their pod and container name, compute utilization, memory footprint, and a live stream of intercepted CUDA API calls. Node-level stats (total memory usage, rolling compute utilization) are shown at the top.

Affinode dashboard showing GPU processes, compute utilization and memory usage on a Tesla T4 node

02

Idle detection

Affinode intercepts CUDA API calls at the container level. When a process stops issuing CUDA calls for a configurable period, the controller marks it IDLE. The status updates in real time — you can see exactly which processes are actively using the GPU and which are holding memory without doing any work.

Affinode dashboard showing two pods marked as IDLE after stopping CUDA activity

03

Auto offload idle processes

Enabling Auto Offload Idle Processes tells the controller to automatically checkpoint idle processes using NVIDIA's CUDA checkpoint API, moving their GPU memory (VRAM) to host CPU RAM. In this example, two pods with a combined footprint of ~9 GiB are offloaded — dropping total GPU memory usage from 9 GiB to 123 MiB and freeing the GPU for other workloads. Both processes remain alive on the CPU and resume the instant they issue a new CUDA call, with no code changes and no state loss.

Affinode dashboard with auto offload enabled, showing GPU memory reduced from 9 GiB to 123 MiB

See It Running on Your Cluster

We're working with early customers to deploy Affinode in production environments.