LogoTensorFusion Docs
LogoTensorFusion Docs
HomepageDocumentation

Getting Started

OverviewKubernetes InstallVM/Server Install(K3S)Helm On-premises InstallHost/GuestVM InstallTensorFusion Architecture

Application Operations

Create WorkloadConfigure AutoScalingMigrate Existing WorkloadBest Practices

Customize AI Infra

Production-Grade DeploymentConfig QoS and BillingBring Your Own CloudManaging License

Maintenance & Optimization

Upgrade ComponentsSetup AlertsGPU Live MigrationPreload ModelOptimize GPU Efficiency

Troubleshooting

HandbookTracing/ProfilingQuery Metrics & Logs

Reference

Comparison

Compare with NVIDIA vGPUCompare with MIG/MPSCompare with Run.AICompare with HAMi

Terminology

This glossary explains the key terms used across the TensorFusion docs.

Basic concepts

  • TFLOPS: Trillions of floating-point operations per second. This is the core unit for compute allocation and scheduling. The system standardizes on FP16 dense TFLOPs.
  • VRAM: GPU/NPU memory (often referred to as GPU Mem). The system uses MiB as the minimum unit for accounting, allocation, and scheduling.
  • vGPU: A software-defined virtual GPU created by isolating and limiting GPU/NPU resources. From the application perspective, a vGPU behaves like a physical GPU.

Model and inference terms

  • FP16: 16-bit floating-point precision (half precision), widely used for training and inference.
  • BF16: BFloat16 16-bit floating-point precision with a wider exponent range for better training stability.
  • INT8: 8-bit integer precision, commonly used for inference acceleration and lower memory usage via quantization.
  • KV Cache: The cache of attention keys/values used to speed up long-context or multi-turn inference; cache size grows with sequence length.
  • MoE: Mixture of Experts architecture that sparsely activates expert networks to scale parameters efficiently.

Table of Contents

Basic concepts
Model and inference terms