LogoTensorFusion Docs
LogoTensorFusion Docs
Homepage

Getting Started

OverviewKubernetes InstallVM/Server Install(K3S)Helm On-premises InstallHost/GuestVM InstallTensorFusion Architecture

Application Operations

Create WorkloadConfigure AutoScalingMigrate Existing Workload

Customize AI Infra

Production-Grade DeploymentConfig QoS and BillingBring Your Own CloudManaging License

Maintenance & Optimization

Upgrade ComponentsSetup AlertsGPU Live MigrationPreload ModelOptimize GPU Efficiency

Troubleshooting

HandbookTracing/ProfilingQuery Metrics & Logs

Comparison

Compare with NVIDIA vGPUCompare with MIG/MPSCompare with Run.AICompare with HAMi

Tracing/Profiling

Advanced troubleshooting tools, including vGPU call tracing/profiling, hypervisor TUI etc.

🚧 Under Construction

Step 1. Enable Logging

Add the following environment variable to both business container and worker:

- name: TF_ENABLE_LOG
  value: '1'
# Log level, error/warn/info/trace
- name: TF_LOG_LEVEL
  value: 'warn'

# Log into file rather than stdout
- name: TF_LOG_PATH
  value: '/tmp/tensor-fusion/tf.log'

Logs will be output to the container and collected by vector, sink to TSDB.

Tracing with NCU Tool

/usr/local/cuda-12.8/nsight-compute-2025.1.1/target/linux-desktop-glibc_2_11_3-x64/ncu --config-file off --export profile-$(date +%Y%m%d-%H%M%S) --call-stack --force-overwrite python3 main.py --batch_size=1 --num_synth_data=10 --num_epochs=2

Hypervisor TUI

./hypervisor tui

Handbook

Troubleshooting handbook for TensorFusion common issues

Query Metrics & Logs

Query raw metrics and logs data from GreptimeDB

Table of Contents

Step 1. Enable Logging
Tracing with NCU Tool
Hypervisor TUI