Leak Detection

gpulse's leak detection engine continuously analyzes GPU memory usage patterns and predicts out-of-memory events before they crash your workload. It uses three pluggable detection algorithms running in real time.

gpulse Predict view showing a risk overview with GPU 0 flagged as HIGH risk, linear trend detector output, and ETA to OOM
A leak detected with high confidence — the Predict view shows the growth rate, R², and estimated time to OOM

Detection Algorithms

gpulse ships with three detection algorithms. By default, the CompositeDetector runs both sub-algorithms and returns the highest-confidence result.

1. LinearTrendDetector

Performs least-squares linear regression on memory usage over time. If the slope is positive and the R² value exceeds the configured threshold, it flags a leak.

How It Works

Collects memory snapshots at the configured sampling interval. Fits a line y = mx + b where y is VRAM usage and x is time. The slope m is the growth rate. R² measures how well the data fits the line.

Best For

Gradual, steady memory growth typical of training loops with accumulating tensors, unreleased CUDA contexts, or growing optimizer state.

Default Thresholds

  • Minimum R²: 0.7 (configurable)
  • Minimum slope: 1 MB/s (configurable)
  • Minimum analysis duration: 5 minutes

2. SpikeDetector

Detects sudden, large increases in memory usage within a sliding time window. Unlike the trend detector, it does not require a sustained linear pattern.

How It Works

Compares memory usage at the start and end of a configurable window (default: 30 seconds). If the increase exceeds the spike percentage threshold, it fires. Useful for catching allocation bursts.

Best For

Sudden allocation bursts during batch processing, model loading events, or resource exhaustion scenarios where memory jumps in large steps rather than growing linearly.

Default Thresholds

  • Spike percentage: 20% increase
  • Time window: 30 seconds

3. CompositeDetector

Runs both LinearTrendDetector and SpikeDetector in parallel, then returns the result with the highest confidence score. This is the default detector.

Why Use Composite

You do not need to know what kind of leak you are looking for. The composite approach catches both gradual leaks and sudden spikes with a single configuration. It is the recommended default for unknown or heterogeneous workloads.

Reading the Predict View Output

When you press P to open Predict view, the following metrics appear for each GPU:

Metric Meaning Example
Growth Rate Rate of memory increase per unit time 12.4 MB/min
Goodness of fit (0.0 = no trend, 1.0 = perfect line) 0.982
Confidence Composite score combining R², slope significance, and pattern 91%
ETA to OOM Projected time until VRAM exhaustion 0h 47m
Detector Which algorithm produced the result LinearTrend
Pattern Classified leak pattern type Gradual Leak

Pattern Types

The pattern recognition engine classifies detected leaks into one of five categories:

Gradual Leak

Steady, linear memory growth over time. Common in training loops where tensors accumulate due to missing del or torch.cuda.empty_cache() calls. R² is typically high (0.9+).

Sudden Spike

A large, one-time jump in memory. Often caused by loading a new model, opening a large dataset, or a batch size that exceeds expectations. Check if the spike is followed by stabilization (expected) or continued growth (problem).

Cyclic

Memory rises and falls in a repeating pattern, but the peaks gradually increase over time. Typical of batch processing where each cycle leaks a small amount. The net effect is slow growth hiding inside normal-looking oscillations.

Step Increase

Memory increases in discrete jumps separated by stable plateaus. Common when a process periodically allocates a new buffer or loads additional model layers without releasing previous ones.

Unknown

The data does not clearly match any recognized pattern. May occur when there is too little data, too much noise, or a genuinely novel memory behavior.

Framework Detection

gpulse identifies the ML framework responsible for GPU processes, which helps contextualize the leak:

Framework Detection Method Common Leak Causes
PyTorch Process name, linked libraries Retained computation graphs, growing loss history, unclosed DataLoaders
TensorFlow Process name, linked libraries Session accumulation, eager mode tensor leaks, TensorBoard writer buffers
JAX Process name, XLA indicators JIT cache growth, unreleased XLA buffers
Ollama Process name ollama Context window accumulation, concurrent model loading
MLX Metal API indicators Array accumulation on Apple Silicon unified memory

Configuration

Detection thresholds can be tuned in the configuration file. The defaults work well for most workloads, but you can adjust sensitivity if you experience false positives or missed detections.

toml
[detection]
# Minimum R² for trend significance (0.0-1.0)
min-r-squared = 0.8

# Minimum slope for leak classification (MB/sec)
min-slope-mb-per-sec = 1.0

# Confidence threshold for leak alerts (0.0-1.0)
confidence-threshold = 0.75

# Minimum samples required before analysis starts
minimum-samples = 20

# Analysis window in hours
analysis-window-hours = 1.0

See Configuration for the full reference.

Tips

  • Give it time. The detector needs at least 20 samples (default) before it has enough data to analyze. With a 30-second interval, that is about 10 minutes of data.
  • R² below 0.7 is usually noise. Memory fluctuations are normal. Only R² above 0.8 strongly suggests a real trend.
  • Check the suspected process. The Predict view names the process most likely causing the leak. Use d to switch to Detail view and examine it further.
  • False positives happen. If a workload legitimately grows over time (e.g., loading checkpoints), it may trigger a detection. Adjust confidence-threshold upward to reduce false alerts.