LogoTensorFusion Docs
LogoTensorFusion Docs
HomepageDocumentation

Getting Started

OverviewKubernetes InstallVM/Server Install(K3S)Helm On-premises InstallHost/GuestVM InstallTensorFusion Architecture

Application Operations

Create WorkloadConfigure AutoScalingMigrate Existing WorkloadBest Practices

Customize AI Infra

Production-Grade DeploymentConfig QoS and BillingBring Your Own CloudManaging License

Maintenance & Optimization

Upgrade ComponentsSetup AlertsGPU Live MigrationPreload ModelOptimize GPU Efficiency

Troubleshooting

HandbookTracing/ProfilingQuery Metrics & Logs

Reference

Comparison

Compare with NVIDIA vGPUCompare with MIG/MPSCompare with Run.AICompare with HAMi

GPUPool

API documentation for GPUPool

Resource Information

FieldValue
API Versiontensor-fusion.ai/v1
KindGPUPool
ScopeCluster

Spec

GPUPoolSpec defines the desired state of GPUPool.

PropertyTypeDescription
capacityConfigobject
componentConfigobjectCustomize system components for seamless onboarding.
nodeManagerConfigobject
qosConfigobjectDefine different QoS and their price.
schedulingConfigTemplatestring

Status

GPUPoolStatus defines the observed state of GPUPool.

PropertyTypeDescription
availableTFlops *any`pattern: ^(+
availableVRAM *any`pattern: ^(+
budgetExceededstringIf the budget is exceeded, the set value in comma separated string to indicate which period caused the exceeding. If this field is not empty, scheduler will not schedule new AI workloads and stop scaling-up check.
clusterstring
componentStatus *objectwhen updating any component version or config, pool controller will perform rolling update. the status will be updated periodically, default to 5s, progress will be 0-100. when the progress is 100, the component version or config is fully updated.
conditionsarray
lastCompactionTimestring<date-time>
notReadyNodes *integer<int32>
phase *string(default: Pending) Allowed values: Pending, Running, Updating, Destroying, Unknown
potentialSavingsPerMonthstring
provisioningPhasestring(default: None) Allowed values: None, Initializing, Provisioning, Completed
readyNodes *integer<int32>
runningAppsCntinteger<int32>
savedCostsPerMonthstring
totalGPUsinteger<int32>
totalNodesinteger<int32>
totalTFlops *any`pattern: ^(+
totalVRAM *any`pattern: ^(+
virtualAvailableTFlopsany`pattern: ^(+
virtualAvailableVRAMany`pattern: ^(+
virtualTFlops *any`pattern: ^(+
virtualVRAM *any`pattern: ^(+

Table of Contents

Resource Information
Spec
Status