Migrate Existing Workload

This guide walks you through migrating your existing GPU workloads to TensorFusion's virtualized GPU infrastructure. The migration process is designed to be gradual and safe, allowing you to test the new setup before fully switching over.

Prerequisites

Existing workload running on physical GPUs, using Device Plugin to allocate GPU resources
TensorFusion cluster deployed and configured
Access to your current workload's GPU specifications

Step 1: Map Current GPU Requests to vGPU TFlops/VRAM Requests

Before migrating, you need to understand your current GPU resource requirements and map them to TensorFusion's vGPU specifications.

1.1 Identify Your GPU Instance Type

First, determine the GPU instance type currently used by your workload:

# Check your current pod's GPU requests
kubectl describe pod <your-pod-name> | grep -A 5 "Requests:"

1.2 Analyze Resource Requirements

Look up the total TFlops and VRAM specifications for your GPU instance type. You can find this information in:

Cloud provider documentation (AWS, GCP, Azure)
GPU manufacturer specifications (NVIDIA, AMD)
Your cluster's node specifications

Also, you need to know the total replicas and required GPU number of your workload, then query your monitoring system to find the best TFLOPs and VRAM pattern that match your workload actual usage.

Note that some framework may not report the actual VRAM usage, because they have GPU memory pool inside, you need to calculate the actual VRAM usage by model parameters and context window.

1.3 Configure Pod Annotations

Add the following annotations to your workload's pod specification to match your current GPU resources:

metadata:
  annotations:
    tensor-fusion.ai/tflops-limit: "{total_tflops_of_instance_type}"
    tensor-fusion.ai/tflops-request: "{total_tflops_of_instance_type}"
    tensor-fusion.ai/vram-limit: "{total_vram_of_instance_type}"
    tensor-fusion.ai/vram-request: "{total_vram_of_instance_type}"

Example:

metadata:
  annotations:
    tensor-fusion.ai/tflops-limit: "312"
    tensor-fusion.ai/tflops-request: "312"
    tensor-fusion.ai/vram-limit: "24Gi"
    tensor-fusion.ai/vram-request: "24Gi"

Step 2: Deploy and Test New Workload with TensorFusion

Deploy a test version of your workload using TensorFusion's GPU pool to validate the migration.

2.1 Enable TensorFusion for Your Workload

Add the following configuration to enable TensorFusion:

Labels:

metadata:
  labels:
    tensor-fusion.ai/enabled: "true"

Annotations:

metadata:
  annotations:
    tensor-fusion.ai/enabled-replicas: "1"  # Start with 1 replica for testing

2.2 Deploy Test Workload

Deploy your workload with the TensorFusion configuration:

kubectl apply -f your-workload-with-tensorfusion.yaml

2.3 Validate the Migration

Test your workload to ensure it functions correctly with virtualized GPUs:

Verify GPU resource allocation
Run your typical workload tests
Monitor performance metrics
Check for any compatibility issues

Step 3: Gradual Traffic Migration

Once testing is successful, gradually shift traffic from your old workload to the new TensorFusion-enabled workload.

3.1 Control Traffic Distribution

Use the enabled-replicas annotation to control the percentage of pods using virtualized GPUs:

metadata:
  annotations:
    tensor-fusion.ai/enabled-replicas: "{number_of_replicas_to_use_tensorfusion}"

Migration Strategy:

Start with 25% of replicas: tensor-fusion.ai/enabled-replicas: "2" (if you have 8 total replicas)
Gradually increase to 50%, 75%, and finally 100%
Monitor performance and stability at each stage

3.2 Complete Migration

When you're confident in the new setup, set all replicas to use TensorFusion:

metadata:
  annotations:
    tensor-fusion.ai/enabled-replicas: "{total_replicas}"

Prerequisites

Existing workload running on physical GPUs, using Device Plugin to allocate GPU resources
TensorFusion cluster deployed and configured
Access to your current workload's GPU specifications

Step 1: Map Current GPU Requests to vGPU TFlops/VRAM Requests

Before migrating, you need to understand your current GPU resource requirements and map them to TensorFusion's vGPU specifications.

1.1 Identify Your GPU Instance Type

First, determine the GPU instance type currently used by your workload:

# Check your current pod's GPU requests
kubectl describe pod <your-pod-name> | grep -A 5 "Requests:"

1.2 Analyze Resource Requirements

Look up the total TFlops and VRAM specifications for your GPU instance type. You can find this information in:

Cloud provider documentation (AWS, GCP, Azure)
GPU manufacturer specifications (NVIDIA, AMD)
Your cluster's node specifications

Also, you need to know the total replicas and required GPU number of your workload, then query your monitoring system to find the best TFLOPs and VRAM pattern that match your workload actual usage.

Note that some framework may not report the actual VRAM usage, because they have GPU memory pool inside, you need to calculate the actual VRAM usage by model parameters and context window.

1.3 Configure Pod Annotations

Add the following annotations to your workload's pod specification to match your current GPU resources:

metadata:
  annotations:
    tensor-fusion.ai/tflops-limit: "{total_tflops_of_instance_type}"
    tensor-fusion.ai/tflops-request: "{total_tflops_of_instance_type}"
    tensor-fusion.ai/vram-limit: "{total_vram_of_instance_type}"
    tensor-fusion.ai/vram-request: "{total_vram_of_instance_type}"

Example:

metadata:
  annotations:
    tensor-fusion.ai/tflops-limit: "312"
    tensor-fusion.ai/tflops-request: "312"
    tensor-fusion.ai/vram-limit: "24Gi"
    tensor-fusion.ai/vram-request: "24Gi"

Step 2: Deploy and Test New Workload with TensorFusion

Deploy a test version of your workload using TensorFusion's GPU pool to validate the migration.

2.1 Enable TensorFusion for Your Workload

Add the following configuration to enable TensorFusion:

Labels:

metadata:
  labels:
    tensor-fusion.ai/enabled: "true"

Annotations:

metadata:
  annotations:
    tensor-fusion.ai/enabled-replicas: "1"  # Start with 1 replica for testing

2.2 Deploy Test Workload

Deploy your workload with the TensorFusion configuration:

kubectl apply -f your-workload-with-tensorfusion.yaml

2.3 Validate the Migration

Test your workload to ensure it functions correctly with virtualized GPUs:

Verify GPU resource allocation
Run your typical workload tests
Monitor performance metrics
Check for any compatibility issues

Step 3: Gradual Traffic Migration

Once testing is successful, gradually shift traffic from your old workload to the new TensorFusion-enabled workload.

3.1 Control Traffic Distribution

Use the enabled-replicas annotation to control the percentage of pods using virtualized GPUs:

metadata:
  annotations:
    tensor-fusion.ai/enabled-replicas: "{number_of_replicas_to_use_tensorfusion}"

Migration Strategy:

Start with 25% of replicas: tensor-fusion.ai/enabled-replicas: "2" (if you have 8 total replicas)
Gradually increase to 50%, 75%, and finally 100%
Monitor performance and stability at each stage

3.2 Complete Migration

When you're confident in the new setup, set all replicas to use TensorFusion:

metadata:
  annotations:
    tensor-fusion.ai/enabled-replicas: "{total_replicas}"

Prerequisites

Step 1: Map Current GPU Requests to vGPU TFlops/VRAM Requests

1.1 Identify Your GPU Instance Type

1.2 Analyze Resource Requirements

1.3 Configure Pod Annotations

Step 2: Deploy and Test New Workload with TensorFusion

2.1 Enable TensorFusion for Your Workload

2.2 Deploy Test Workload

2.3 Validate the Migration

Step 3: Gradual Traffic Migration

3.1 Control Traffic Distribution

3.2 Complete Migration

Table of Contents

Migrate Existing Workload

Prerequisites

Step 1: Map Current GPU Requests to vGPU TFlops/VRAM Requests

1.1 Identify Your GPU Instance Type

1.2 Analyze Resource Requirements

1.3 Configure Pod Annotations

Step 2: Deploy and Test New Workload with TensorFusion

2.1 Enable TensorFusion for Your Workload

2.2 Deploy Test Workload

2.3 Validate the Migration

Step 3: Gradual Traffic Migration

3.1 Control Traffic Distribution

3.2 Complete Migration

Table of Contents