Aggregate remote GPUs across hosts and expose them as a single logical pool to workloads.
Vendor integration for discovery, telemetry, and isolation modes where available.
Support Neuron devices for scheduling, monitoring, and isolation templates where applicable.
Schedule and route workloads across clusters/regions with "compute to data" policies and global quotas.
Cross-device GPU resource sync for GPU Go personal/team plans.
Build your own private MaaS (Model-as-a-Service) with model caching and preloading.
Production-ready limiter workflow and observability for Ascend NPU oversubscription scenarios.
Hook-based time-sharing isolation for AMD GPUs, aligned with TensorFusion quota + scheduler.
First-class support for multi-vGPU / multi-accelerator workloads requiring atomic placement.
Place workloads with awareness of NUMA/NVLink/PCIe/IB topology to maximize performance and stability.
Standard benchmark suite across vendors, isolation modes, transport (Ethernet/RDMA), and frameworks.
Remote GPU support for AMD GPUs with TensorFusion scheduling and telemetry.
Remote GPU path for Hygon DCU devices with unified scheduling integration.
Standardized partition/isolation templates for NPUs to accelerate onboarding and operations.
Support multiple GPU/NPU vendors in the same cluster with unified scheduling.
Space-sharing mode for stronger isolation guarantees (no oversubscription).
Hardware-partitioned isolation scheduling for MIG and similar technologies.
Dedicated controller for managing accelerator lifecycle and health.
Three isolation modes for compute percent scheduling with different trade-offs.
Adaptive compute throttling with PID controller for smooth resource sharing.
Strict memory enforcement for GPU workloads requiring hard memory limits.
Auto-scale GPU workloads based on utilization and pending demand.
Auto-expand GPU nodes when pods are pending, integrated with Karpenter.
Preempt lower-priority GPU workers to improve scheduling fairness.
RDMA path for low-latency/high-throughput remote GPU access and scheduling.
Healthz/readyz APIs for hypervisor liveness and readiness monitoring.
Performance optimization for high GPU count clusters based on benchmarking.
Cloud vendor integration and Karpenter auto-scaling for GPU nodes.
Migrate from existing NVIDIA operator/device-plugin setups incrementally.
Native K8s device plugin integration in hypervisor for standard resource management.
Real-time terminal UI for monitoring workers and GPU state.
Production-grade GPU-over-IP for NVIDIA, including Windows vGPU and Remote GPU.
Refactored to Kubernetes scheduler framework for advanced scheduling policies.
Integrated alerting with Prometheus Alertmanager for GPU cluster monitoring.
Allow workloads to request multiple GPUs with model filtering.
Set CUDA limits per GPU using device UUIDs or indices.
Weighted scheduler for fair GPU resource distribution.
Gradual rollout support for TensorFusion-enabled Pods.
Hook CUDA memory APIs for strict memory limit enforcement.
Limit GPU resources based on TFLOPs for fine-grained control.
Control workload distribution across nodes with maxSkew parameter.
Monitor GPU temperature for thermal management and alerting.
TFLOPs/VRAM metrics pipeline across controller and engine.
Manage GPU resources as pools with component configuration.