LogoTensorFusion Docs
LogoTensorFusion Docs
HomepageDocumentation

Getting Started

OverviewKubernetes InstallVM/Server Install(K3S)Helm On-premises InstallHost/GuestVM InstallTensorFusion Architecture

Application Operations

Create WorkloadConfigure AutoScalingMigrate Existing WorkloadBest Practices

Customize AI Infra

Production-Grade DeploymentConfig QoS and BillingBring Your Own CloudManaging License

Maintenance & Optimization

Upgrade ComponentsSetup AlertsGPU Live MigrationPreload ModelOptimize GPU Efficiency

Troubleshooting

HandbookTracing/ProfilingQuery Metrics & Logs

Reference

Comparison

Compare with NVIDIA vGPUCompare with MIG/MPSCompare with Run.AICompare with HAMi

TensorFusionWorkload

TensorFusionWorkload is the Schema for the tensorfusionworkloads API.

TensorFusionWorkload is the Schema for the tensorfusionworkloads API.

Resource Information

FieldValue
API Versiontensor-fusion.ai/v1
KindTensorFusionWorkload
ScopeNamespaced

Spec

WorkloadProfileSpec defines the desired state of WorkloadProfile.

PropertyTypeDescription
autoScalingConfigobjectAutoScalingConfig configured here will override Pool's schedulingConfig This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation, user can set tensor-fusion.ai/auto-resources
gpuCountinteger<int32>The number of GPUs to be used by the workload, default to 1
gpuModelstringGPUModel specifies the required GPU model (e.g., "A100", "H100")
isLocalGPUbooleanSchedule the workload to the same GPU server that runs vGPU worker for best performance, default to false
nodeAffinityobjectNodeAffinity specifies the node affinity requirements for the workload
poolNamestring
qosstringQos defines the quality of service level for the client. Allowed values: low, medium, high, critical
replicasinteger<int32>If replicas not set, it will be dynamic based on pending Pod If isLocalGPU set to true, replicas must be dynamic, and this field will be ignored
resourcesobject
sidecarWorkerbooleanWhen set to sidecar worker mode, its always Local GPU mode, and hard-isolated with shared memory default to false, indicates the workload's embedded worker is same process, soft-isolated
workerPodTemplateobjectWorkerPodTemplate is the template for the worker pod, only take effect in remote vGPU mode

Status

TensorFusionWorkloadStatus defines the observed state of TensorFusionWorkload.

PropertyTypeDescription
activeCronScalingRuleobjectThe currently active cron scaling rule
appliedRecommendedReplicasinteger<int32>The number of replicas currently applied based on the latest recommendation
conditionsarrayRepresents the latest available observations of the workload's current state.
phasestring(default: Pending) Allowed values: Pending, Running, Failed, Unknown
podTemplateHashstringHash of the pod template used to create worker pods
readyWorkersinteger<int32>readyWorkers is the number of vGPU workers ready
recommendationobjectThe most recently GPU resources recommended by the autoscaler
workerCount *integer<int32>workerCount is the number of vGPU workers

Table of Contents

Resource Information
Spec
Status