Terminology
This glossary explains the key terms used across the TensorFusion docs.
Basic concepts
- TFLOPS: Trillions of floating-point operations per second. This is the core unit for compute allocation and scheduling. The system standardizes on FP16 dense TFLOPs.
- VRAM: GPU/NPU memory (often referred to as GPU Mem). The system uses MiB as the minimum unit for accounting, allocation, and scheduling.
- vGPU: A software-defined virtual GPU created by isolating and limiting GPU/NPU resources. From the application perspective, a vGPU behaves like a physical GPU.
Model and inference terms
- FP16: 16-bit floating-point precision (half precision), widely used for training and inference.
- BF16: BFloat16 16-bit floating-point precision with a wider exponent range for better training stability.
- INT8: 8-bit integer precision, commonly used for inference acceleration and lower memory usage via quantization.
- KV Cache: The cache of attention keys/values used to speed up long-context or multi-turn inference; cache size grows with sequence length.
- MoE: Mixture of Experts architecture that sparsely activates expert networks to scale parameters efficiently.
TensorFusion Docs