LogoTensorFusion Docs
LogoTensorFusion Docs
Homepage

Getting Started

OverviewKubernetes InstallVM/Server Install(K3S)Helm On-premises InstallHost/GuestVM InstallTensorFusion Architecture

Application Operations

Create WorkloadConfigure AutoScalingMigrate Existing Workload

Customize AI Infra

Production-Grade DeploymentConfig QoS and BillingBring Your Own CloudManaging License

Maintenance & Optimization

Upgrade ComponentsSetup AlertsGPU Live MigrationPreload ModelOptimize GPU Efficiency

Troubleshooting

HandbookTracing/ProfilingQuery Metrics & Logs

Comparison

Compare with NVIDIA vGPUCompare with MIG/MPSCompare with Run.AICompare with HAMi

Migrate Existing Workload

Migrate existing workload to TensorFusion GPU Pool

This guide walks you through migrating your existing GPU workloads to TensorFusion's virtualized GPU infrastructure. The migration process is designed to be gradual and safe, allowing you to test the new setup before fully switching over.

Prerequisites

  • Existing workload running on physical GPUs, using Device Plugin to allocate GPU resources
  • TensorFusion cluster deployed and configured
  • Access to your current workload's GPU specifications

Step 1: Map Current GPU Requests to vGPU TFlops/VRAM Requests

Before migrating, you need to understand your current GPU resource requirements and map them to TensorFusion's vGPU specifications.

1.1 Identify Your GPU Instance Type

First, determine the GPU instance type currently used by your workload:

# Check your current pod's GPU requests
kubectl describe pod <your-pod-name> | grep -A 5 "Requests:"

1.2 Analyze Resource Requirements

Look up the total TFlops and VRAM specifications for your GPU instance type. You can find this information in:

  • Cloud provider documentation (AWS, GCP, Azure)
  • GPU manufacturer specifications (NVIDIA, AMD)
  • Your cluster's node specifications

Also, you need to know the total replicas and required GPU number of your workload, then query your monitoring system to find the best TFLOPs and VRAM pattern that match your workload actual usage.

Note that some framework may not report the actual VRAM usage, because they have GPU memory pool inside, you need to calculate the actual VRAM usage by model parameters and context window.

1.3 Configure Pod Annotations

Add the following annotations to your workload's pod specification to match your current GPU resources:

metadata:
  annotations:
    tensor-fusion.ai/tflops-limit: "{total_tflops_of_instance_type}"
    tensor-fusion.ai/tflops-request: "{total_tflops_of_instance_type}"
    tensor-fusion.ai/vram-limit: "{total_vram_of_instance_type}"
    tensor-fusion.ai/vram-request: "{total_vram_of_instance_type}"

Example:

metadata:
  annotations:
    tensor-fusion.ai/tflops-limit: "312"
    tensor-fusion.ai/tflops-request: "312"
    tensor-fusion.ai/vram-limit: "24Gi"
    tensor-fusion.ai/vram-request: "24Gi"

Step 2: Deploy and Test New Workload with TensorFusion

Deploy a test version of your workload using TensorFusion's GPU pool to validate the migration.

2.1 Enable TensorFusion for Your Workload

Add the following configuration to enable TensorFusion:

Labels:

metadata:
  labels:
    tensor-fusion.ai/enabled: "true"

Annotations:

metadata:
  annotations:
    tensor-fusion.ai/enabled-replicas: "1"  # Start with 1 replica for testing

2.2 Deploy Test Workload

Deploy your workload with the TensorFusion configuration:

kubectl apply -f your-workload-with-tensorfusion.yaml

2.3 Validate the Migration

Test your workload to ensure it functions correctly with virtualized GPUs:

  • Verify GPU resource allocation
  • Run your typical workload tests
  • Monitor performance metrics
  • Check for any compatibility issues

Step 3: Gradual Traffic Migration

Once testing is successful, gradually shift traffic from your old workload to the new TensorFusion-enabled workload.

3.1 Control Traffic Distribution

Use the enabled-replicas annotation to control the percentage of pods using virtualized GPUs:

metadata:
  annotations:
    tensor-fusion.ai/enabled-replicas: "{number_of_replicas_to_use_tensorfusion}"

Migration Strategy:

  • Start with 25% of replicas: tensor-fusion.ai/enabled-replicas: "2" (if you have 8 total replicas)
  • Gradually increase to 50%, 75%, and finally 100%
  • Monitor performance and stability at each stage

3.2 Complete Migration

When you're confident in the new setup, set all replicas to use TensorFusion:

metadata:
  annotations:
    tensor-fusion.ai/enabled-replicas: "{total_replicas}"

Configure AutoScaling

Configure AutoScaling Strategies for AI Workloads, including auto-scaling vGPU resource requests, limits etc.

Production-Grade Deployment

Deploy for production env, with high availability, observability, gray release, rollback, and high performance

Table of Contents

Prerequisites
Step 1: Map Current GPU Requests to vGPU TFlops/VRAM Requests
1.1 Identify Your GPU Instance Type
1.2 Analyze Resource Requirements
1.3 Configure Pod Annotations
Step 2: Deploy and Test New Workload with TensorFusion
2.1 Enable TensorFusion for Your Workload
2.2 Deploy Test Workload
2.3 Validate the Migration
Step 3: Gradual Traffic Migration
3.1 Control Traffic Distribution
3.2 Complete Migration