HierarchicalForecast: Reconciling Forecasts Across Hierarchies

What it does

HierarchicalForecast takes any base forecasts — from ARIMA, neural nets, whatever — and adjusts them so that the numbers actually add up across your hierarchy. If your total sales forecast doesn't equal the sum of your regional forecasts, this fixes that without you having to re-train anything.

Why I starred it

Most forecasting libraries stop at generating predictions. Nobody ships a clean solution for the reconciliation step that actually happens in practice: you have 200 product-region combinations and they all need to be consistent when you roll them up. The usual answer is "just use bottom-up" — sum the leaf nodes and aggregate. But bottom-up ignores useful signal from the aggregate levels. This library implements the full arsenal of alternatives, including the proper statistical solution (MinTrace), without requiring you to understand the matrix algebra behind it.

What pushed me over the edge: they implemented probabilistic reconciliation, not just point estimates. You can generate coherent prediction intervals, not just coherent means.

How it works

The central idea is captured in one equation in core.py:

ỹ = S @ P @ ŷ

S is the summing matrix — a binary matrix describing which bottom-level series aggregate into which higher-level series. P is the reconciliation projection matrix — this is what each method computes differently. ŷ is your base forecast. The output ỹ is guaranteed to be coherent by construction.

Every reconciler in methods.py inherits from HReconciler and implements _get_PW_matrices() to produce P and an optional weight matrix W. The differences between methods live entirely in how they compute P:

BottomUp: P just selects the bottom-level series and aggregates via S. Ignores top-level forecasts entirely.
TopDown: P distributes the top-level total down through the tree using historical proportions or forecast proportions.
MinTrace: Solves a GLS problem minimizing the trace of the reconciled forecast variance. The method parameter selects the covariance estimator — ols, wls_struct, wls_var, mint_shrink, mint_cov.
ERM: Instead of minimizing variance, it directly minimizes squared error on validation data. The Lasso-regularized variant (reg_bu) anchors P toward the BottomUp solution to prevent overfitting.

MinTrace with mint_shrink is the most statistically principled option. It estimates the covariance matrix of the forecast errors using the Schäfer-Strimmer shrinkage estimator — a regularized approach that avoids numerical instability when you have more series than observations. This is delegated to a C++ backend called through _shrunk_covariance_schaferstrimmer_no_nans() in utils.py, which wraps _lib_recon for performance.

For probabilistic output, the _get_sampler() method in methods.py selects from four approaches: Normality (closed-form variance under Gaussianity), Bootstrap (Gamakumara's resampling), PERMBU (bottom-up aggregation with rank permutation copulas to reintroduce multivariate dependence), and Conformal (distribution-free, valid under exchangeability). These are all in probabilistic_methods.py.

The HierarchicalReconciliation class in core.py orchestrates the whole thing. It recently gained a diagnostics flag that computes coherence residuals before and after reconciliation — the _compute_coherence_residual() function checks y - S @ y_bottom at each level, which is a clean way to verify the method actually worked.

Narwhals handles the DataFrame abstraction layer, so the library accepts both pandas and polars inputs without separate code paths.

Using it

from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import MinTrace, BottomUp

# Base forecasts from any model
fcst = StatsForecast(models=[AutoARIMA(season_length=4)], freq='QE', n_jobs=-1)
Y_hat_df = fcst.forecast(df=Y_train_df, h=4)

# Reconcile with multiple methods in one call
hrec = HierarchicalReconciliation(reconcilers=[
    BottomUp(),
    MinTrace(method='mint_shrink'),
    MinTrace(method='ols'),
])
Y_rec_df = hrec.reconcile(
    Y_hat_df=Y_hat_df,
    Y_df=Y_train_df,
    S_df=S_df,
    tags=tags,
)

For probabilistic intervals:

Y_rec_df = hrec.reconcile(
    Y_hat_df=Y_hat_df,
    Y_df=Y_train_df,
    S_df=S_df,
    tags=tags,
    level=[80, 95],
    intervals_method='normality',
)

The output DataFrame contains columns like AutoARIMA/BottomUp, AutoARIMA/MinTrace_method-mint_shrink, and corresponding quantile columns. Running multiple reconcilers in a single call is efficient — the summing matrix construction and data alignment only happens once.

Rough edges

The tags argument is the friction point for new users. It's a dict mapping each hierarchy level name to an array of unique_id values at that level. Getting this right requires understanding how your hierarchy is encoded. The aggregate() utility in utils.py can build S_df and tags from your raw data, but the docs don't make it obvious that this is the entry point you should reach for first.

Temporal reconciliation (reconciling weekly vs. monthly vs. annual forecasts of the same series) is supported but listed as experimental. The temporal=True flag in reconcile() changes the internal grouping logic — I haven't stress-tested how it handles unbalanced temporal hierarchies.

The sparse variants (MinTraceSparse, BottomUpSparse, etc.) exist for large hierarchies, but not every method has a sparse counterpart. ERM doesn't, which matters if you have thousands of series and need the regularized reconciliation.

Maintenance is active — recent commits added Polars support via Narwhals, sparse S-matrix handling, and the conformal prediction intervals. Not a dormant repo.

Bottom line

If you're doing hierarchical forecasting and your aggregation constraints matter — retail, energy, demand planning — this library handles the reconciliation step cleanly. MinTrace with mint_shrink is a strong default; start there before reaching for ERM.