LogoTensorFusion Docs
LogoTensorFusion Docs
HomepageDocumentation

Workload Configuration

This doc explains how to allocate vGPU resources for your AI workloads using annotations and WorkloadProfile custom resources.

Add Pod Annotations

Add the following annotations to your Pod metadata to configure GPU workload requirements.

Annotation Reference

Basic Annotations

AnnotationDescriptionExample Value
tensor-fusion.ai/tflops-requestRequested TFLOPs(FP16) per vGPU worker per GPU device'10'
tensor-fusion.ai/vram-requestRequested VRAM(stand for Video Memory or Frame Buffer) per vGPU worker per GPU device4Gi
tensor-fusion.ai/tflops-limitMaximum TFLOPs(FP16) allowed per vGPU worker per GPU device'20'
tensor-fusion.ai/vram-limitMaximum VRAM(stand for Video Memory or Frame Buffer) allowed per vGPU worker per GPU device4Gi
tensor-fusion.ai/inject-containerContainer to inject GPU resources into, could be comma split format for multiple containerspython
tensor-fusion.ai/qosQuality of service levellow medium high critical
tensor-fusion.ai/is-local-gpuSchedule the workload to the same GPU server that runs vGPU worker for best performance, default to false'true'
tensor-fusion.ai/gpu-countRequested GPU device count, each vGPU worker will map to N physical GPU devices set by this field, and vram/TFLOPs resource consumption will be scaled by this field, default to 1, your AI workloads can get cuda:0 device'4'
tensor-fusion.ai/gpupoolSpecifies target GPU pooldefault-pool
tensor-fusion.ai/vendorSpecify GPU/NPU vendor, NVIDIA AMD Ascend Intel Hygon MetaX MThreads Cambricon Enflame Qualcomm Cerebras AWSNeuron GoogleNVIDIA

Advanced Annotations

AnnotationDescriptionExample Value
tensor-fusion.ai/gpu-modelSpecifies the GPU/NPU modelA100 H100 L4 L40s
tensor-fusion.ai/dedicated-gpuUse along with tensor-fusion.ai/gpu-model annotation, occupancy whole GPU for this workload'true'
tensor-fusion.ai/isolationIncludes shared soft hard partitioned four modessoft
tensor-fusion.ai/compute-percent-requestCompute resource request percentage, range 0-100, mutually exclusive with TFLOPS request, only one of them needs to be set'100'
tensor-fusion.ai/compute-percent-limitCompute resource limit percentage, range 0-100, mutually exclusive with TFLOPS limit, only one of them needs to be set'100'
tensor-fusion.ai/gpu-indicesSpecify GPU device index, range 0-N, limit the range of scheduled devices, comma separated if requesting multiple cards'0,1'
tensor-fusion.ai/workloadTensorFusionWorkload name, if exists, will share the same vGPU workerspytorch-example
tensor-fusion.ai/workload-profileReference to a WorkloadProfile to reuse pre-defined parametersdefault-profile
tensor-fusion.ai/enabled-replicasSet to any number less or equal to ReplicaSet replicas, for grey releasing TensorFusion'1','42'
tensor-fusion.ai/auto-requestsAuto set vram and/or TFLOPs requests based on workload historical metrics, for detail settings please use WorkloadProfile custom resource'true'
tensor-fusion.ai/auto-limitsAuto set vram and/or TFLOPs limits based on workload historical metrics, for detail settings please use WorkloadProfile custom resource'true'
tensor-fusion.ai/auto-replicasAuto set vGPU worker replicas based on workload historical metrics, for detail settings please use WorkloadProfile custom resource'true'
tensor-fusion.ai/standalone-worker-modeWhen is-local-gpu is true, this option is false, it means vGPU worker will be injected into init container, not running standalone vGPU worker, to achieve best performance, the trade-off is user might bypass vGPU worker and directly use physical GPU, when is-local-gpu is false, this option is invalid'true'
tensor-fusion.ai/disable-featuresKiller switch to disable tensor fusion built-in features partially, could be comma split format for multiple features'gpu-limiter,gpu-opt,mem-manager'

Example Config

kind: Deployment
apiVersion: apps/v1
metadata: {}
spec:
  template:
    metadata:
      labels:
        tensor-fusion.ai/enabled: "true"
      annotations:
        tensor-fusion.ai/inject-container: python # could be comma split if multiple containers using GPU
        tensor-fusion.ai/tflops-limit: '20'
        tensor-fusion.ai/tflops-request: '10'
        tensor-fusion.ai/vram-limit: 4Gi
        tensor-fusion.ai/vram-request: 4Gi
        tensor-fusion.ai/qos: medium
        tensor-fusion.ai/workload-profile: default-profile # WorkloadProfile has lower priority as template
        tensor-fusion.ai/is-local-gpu: 'true'
        tensor-fusion.ai/gpu-count: '1' # GPU device number per TensorFusion Worker
    spec: {}

Configure WorkloadProfile Custom Resource

For advanced features like auto-scaling, create a WorkloadProfile custom resource and reference it in your Pod annotations.

apiVersion: tensor-fusion.ai/v1
kind: WorkloadProfile
metadata:
  name: example-workload-profile
  namespace: same-namespace-as-your-workload
spec:
  # Specify AI computing resources needed
  resources:
    requests:
      tflops: "5"
      vram: "3Gi"
    limits:
      tflops: "15"
      vram: "3Gi"
  # Specify the number of vGPU workers, usually the same as Deployment replicas
  replicas: 1

  # Schedule the workload to the same GPU server that runs GPU worker for best performance
  isLocalGPU: true

  # Specify pool name (optional)
  poolName: default-pool

  # Specify QoS level (defaults to medium)
  qos: medium

  # Specify the number of GPU devices per vGPU worker (optional, default to 1)
  gpuCount: 1

  # Specify the GPU/NPU model (optional)
  gpuModel: A100

  # Auto-scaling configuration options (optional)
  autoScalingConfig: {}

Then reference this profile in your Pod annotation:

tensor-fusion.ai/workload-profile: example-workload-profile

For more details on WorkloadProfile schema, see the WorkloadProfile Schema Reference.

Table of Contents

Add Pod Annotations
Annotation Reference
Example Config
Configure WorkloadProfile Custom Resource