LogoTensorFusion 文档
LogoTensorFusion 文档
首页文档

命令行参考

本文档提供了TensorFusion所有命令行接口的全面参考。

Operator & Scheduler CLI

CLI Parameters

ParameterDescriptionDefault
-enable-http2Enables HTTP/2 for the metrics and webhook servers-
-health-probe-bind-addressThe address the probe endpoint binds to:8081
-kubeconfigPath to a kubeconfig file (only required if out-of-cluster)-
-leader-electEnable leader election for controller manager to ensure only one active instance-
-metrics-bind-addressThe address the metrics endpoint binds to0 (disabled)
-metrics-secureServe metrics endpoint securely via HTTPS (use --metrics-secure=false for HTTP)-
-zap-develUse development mode for loggingtrue
-zap-encoderZap log encoding format (json or console)-
-zap-log-levelVerbosity level for logging (debug, info, error, or any integer value > 0)-
-zap-stacktrace-levelLevel at which stacktraces are captured (info, error, panic)-
-zap-time-encodingTime encoding format (epoch, millis, nano, iso8601, rfc3339, rfc3339nano)epoch

Environment Variables

VariableDescriptionExample
INITIAL_GPU_NODE_LABEL_SELECTORInitial label selector for GPU nodesnvidia.com/gpu.present=true
ENABLE_WEBHOOKSEnable webhook functionalitytrue
OPERATOR_NAMESPACENamespace for the operatortensor-fusion-sys
KUBECONFIGPath to kubeconfig file<kubeconfig>

Hypervisor CLI

CLI Parameters

ParameterDescriptionDefault
--sock_pathWorker unix socket path/tensor-fusion/worker/sock
--gpu_metrics_fileGPU metrics file location/logs/metrics.log
--schedulerScheduling policy for multiple processes on single GPU node (when GPU load is high)Options: FIFO for simple first-in-first-out, MLFQ for multi-level feedback queue

Node Discovery CLI

CLI Parameters

ParameterDescriptionExample
--hostnameCustom hostname for binding current node with GPUNode custom resource<hostname>
--gpu-info-configPath to the GPU info configuration fileSee example below

GPU Info Config Example

- model: RTX5090
  fullModelName: "NVIDIA GeForce RTX 5090"
  vendor: NVIDIA
  costPerHour: 0.65
  fp16TFlops: 419

Environment Variables

VariableDescriptionExample
HOSTNAMENode hostname<hostname>
KUBECONFIGPath to kubeconfig file<kubeconfig>
NODE_DISCOVERY_REPORT_GPU_NODEGPU node custom resource name<gpu-node-custom-resource-name>

Worker CLI

CLI Parameters

ParameterDescriptionDefault/Notes
-nNetwork protocolCurrently only native (native TCP communication)
-pWorker portRandom value assigned by TensorFusion Operator-Scheduler
-sUnix socket path folderShould be /tensor-fusion/worker/sock/ in Kubernetes

Environment Variables

VariableDescriptionValue
TF_ENABLE_LOGEnable logging1

GPU Client Stub

The GPU Client Stub consists of two libraries that use LD_PRELOAD to run before every process started inside the container or server:

  • libadd_path.so: Adds additional library paths for AI application environments (e.g., hooked NVML)
  • libcuda.so: Hooks into CUDA runtime

Example configuration in worker template:

env:
- name: LD_PRELOAD
  value: /tensor-fusion/libadd_path.so:/tensor-fusion/libcuda.so

Environment Variables

VariableDescriptionValue/Notes
TF_PATHAppended to PATH environment variable/tensor-fusion
TF_LD_PRELOADAppended to LD_PRELOADVaries
TF_LD_LIBRARY_PATHAppended to LD_LIBRARY_PATH/tensor-fusion
TF_ENABLE_LOGDisable/Enable logging, default to disabled0

目录

Operator & Scheduler CLI
CLI Parameters
Environment Variables
Hypervisor CLI
CLI Parameters
Node Discovery CLI
CLI Parameters
GPU Info Config Example
Environment Variables
Worker CLI
CLI Parameters
Environment Variables
GPU Client Stub
Environment Variables