LogoTensorFusion Docs
LogoTensorFusion Docs
HomepageDocumentation

Getting Started

OverviewKubernetes InstallVM/Server Install(K3S)Helm On-premises InstallHost/GuestVM InstallTensorFusion Architecture

Application Operations

Create WorkloadConfigure AutoScalingMigrate Existing WorkloadBest Practices

Customize AI Infra

Production-Grade DeploymentConfig QoS and BillingBring Your Own CloudManaging License

Maintenance & Optimization

Upgrade ComponentsSetup AlertsGPU Live MigrationPreload ModelOptimize GPU Efficiency

Troubleshooting

HandbookTracing/ProfilingQuery Metrics & Logs

Reference

Comparison

Compare with NVIDIA vGPUCompare with MIG/MPSCompare with Run.AICompare with HAMi

Command Line Reference

This document provides a comprehensive reference for all command line interfaces in TensorFusion.

Operator & Scheduler CLI

CLI Parameters

ParameterDescriptionDefault
-enable-http2Enables HTTP/2 for the metrics and webhook servers-
-health-probe-bind-addressThe address the probe endpoint binds to:8081
-kubeconfigPath to a kubeconfig file (only required if out-of-cluster)-
-leader-electEnable leader election for controller manager to ensure only one active instance-
-metrics-bind-addressThe address the metrics endpoint binds to0 (disabled)
-metrics-secureServe metrics endpoint securely via HTTPS (use --metrics-secure=false for HTTP)-
-zap-develUse development mode for loggingtrue
-zap-encoderZap log encoding format (json or console)-
-zap-log-levelVerbosity level for logging (debug, info, error, or any integer value > 0)-
-zap-stacktrace-levelLevel at which stacktraces are captured (info, error, panic)-
-zap-time-encodingTime encoding format (epoch, millis, nano, iso8601, rfc3339, rfc3339nano)epoch

Environment Variables

VariableDescriptionExample
INITIAL_GPU_NODE_LABEL_SELECTORInitial label selector for GPU nodesnvidia.com/gpu.present=true
ENABLE_WEBHOOKSEnable webhook functionalitytrue
OPERATOR_NAMESPACENamespace for the operatortensor-fusion-sys
KUBECONFIGPath to kubeconfig file<kubeconfig>

Hypervisor CLI

CLI Parameters

ParameterDescriptionDefault
--sock_pathWorker unix socket path/tensor-fusion/worker/sock
--gpu_metrics_fileGPU metrics file location/logs/metrics.log
--schedulerScheduling policy for multiple processes on single GPU node (when GPU load is high)Options: FIFO for simple first-in-first-out, MLFQ for multi-level feedback queue

Node Discovery CLI

CLI Parameters

ParameterDescriptionExample
--hostnameCustom hostname for binding current node with GPUNode custom resource<hostname>
--gpu-info-configPath to the GPU info configuration fileSee example below

GPU Info Config Example

- model: RTX5090
  fullModelName: "NVIDIA GeForce RTX 5090"
  vendor: NVIDIA
  costPerHour: 0.65
  fp16TFlops: 419

Environment Variables

VariableDescriptionExample
HOSTNAMENode hostname<hostname>
KUBECONFIGPath to kubeconfig file<kubeconfig>
NODE_DISCOVERY_REPORT_GPU_NODEGPU node custom resource name<gpu-node-custom-resource-name>

Worker CLI

CLI Parameters

ParameterDescriptionDefault/Notes
-nNetwork protocolCurrently only native (native TCP communication)
-pWorker portRandom value assigned by TensorFusion Operator-Scheduler
-sUnix socket path folderShould be /tensor-fusion/worker/sock/ in Kubernetes

Environment Variables

VariableDescriptionValue
TF_ENABLE_LOGEnable logging1

GPU Client Stub

The GPU Client Stub consists of two libraries that use LD_PRELOAD to run before every process started inside the container or server:

  • libadd_path.so: Adds additional library paths for AI application environments (e.g., hooked NVML)
  • libcuda.so: Hooks into CUDA runtime

Example configuration in worker template:

env:
- name: LD_PRELOAD
  value: /tensor-fusion/libadd_path.so:/tensor-fusion/libcuda.so

Environment Variables

VariableDescriptionValue/Notes
TF_PATHAppended to PATH environment variable/tensor-fusion
TF_LD_PRELOADAppended to LD_PRELOADVaries
TF_LD_LIBRARY_PATHAppended to LD_LIBRARY_PATH/tensor-fusion
TF_ENABLE_LOGDisable/Enable logging, default to disabled0

Table of Contents

Operator & Scheduler CLI
CLI Parameters
Environment Variables
Hypervisor CLI
CLI Parameters
Node Discovery CLI
CLI Parameters
GPU Info Config Example
Environment Variables
Worker CLI
CLI Parameters
Environment Variables
GPU Client Stub
Environment Variables