mlforecast: Scalable ML Forecasting Without the Feature Engineering Pain

mlforecast wraps any scikit-learn-compatible regressor into a complete time series forecasting pipeline — lag features, rolling statistics, differencing, target scaling, recursive prediction, conformal intervals, and distributed training. You bring the model; it handles the surrounding machinery.

Why I starred it

The standard approach to ML-based time series forecasting has a painful middle section: constructing lag features across thousands of series without leaking future data, running recursive prediction where each forecast step feeds into the next, and doing all of this in a way that scales beyond a single in-memory DataFrame. Most people re-solve these problems from scratch every project.

mlforecast bundles these into a coherent pipeline with a consistent API. What caught my attention was that it isn't just a convenience wrapper — the feature engineering runs through coreforecast, a C++ library under the hood, and the whole thing supports pandas, polars, Dask, Spark, and Ray with a single unified interface.

How it works

The main abstraction is MLForecast, which wraps a TimeSeries object (mlforecast/core.py). When you call fit, the TimeSeries._fit method sorts the input DataFrame by series ID and timestamp, then flattens all series values into a single contiguous numpy array — the GroupedArray (mlforecast/grouped_array.py).

GroupedArray stores data in one data array plus an indptr index array, similar to how scipy stores sparse matrices. Every series sits in the same memory block; indptr holds the boundary offsets. This design means lag transforms never need to allocate per-series arrays — they operate in-place over slices.

# grouped_array.py — the core data structure
class GroupedArray:
    def __init__(self, data: np.ndarray, indptr: np.ndarray):
        self.data = data
        self.indptr = indptr

    def __getitem__(self, idx: int) -> np.ndarray:
        return self.data[self.indptr[idx] : self.indptr[idx + 1]]

Lag transforms delegate to coreforecast, which implements rolling means, expanding statistics, exponentially weighted averages, and differencing in C++ (coreforecast/src/rolling.cpp, expanding.cpp, diff.cpp). The Python wrappers in mlforecast/lag_transforms.py call through via coreforecast.lag_transforms — pure sklearn BaseEstimator subclasses that carry a _core_tfm attribute pointing to the compiled implementation.

When num_threads > 1, GroupedArray.apply_multithreaded_transforms splits work between a ThreadPoolExecutor for numba-based transforms and the C++ CoreGroupedArray's built-in parallelism for the compiled transforms — two different threading strategies coexisting in the same call.

Recursive prediction is handled in TimeSeries._predict_recursive. For each step in the forecast horizon, it appends the model's output back to the GroupedArray, recomputes only the necessary lag features (via updates_only=True), then runs inference again. The updates_only flag triggers a different code path in _transform_series — instead of computing the full transform over the series, it computes only the last value, which is all the recursive loop needs.

Target transforms (mlforecast/target_transforms.py) sit between raw data and feature computation. Differences subtracts previous values to remove trend; LocalStandardScaler, LocalMinMaxScaler, and LocalBoxCox scale per-series. These are inverted automatically during prediction, so you never manually undo them.

The conformal prediction intervals (forecast.py:_add_conformal_distribution_intervals) work by running cross-validation to collect residuals, then using those error distributions to construct quantile bounds at inference time — no distributional assumptions required.

Using it

import lightgbm as lgb
from mlforecast import MLForecast
from mlforecast.lag_transforms import RollingMean, ExpandingMean
from mlforecast.target_transforms import Differences

fcst = MLForecast(
    models=lgb.LGBMRegressor(n_estimators=100, verbosity=-1),
    freq='D',
    lags=[7, 14, 28],
    lag_transforms={
        7: [RollingMean(window_size=28)],
        1: [ExpandingMean()],
    },
    date_features=['dayofweek', 'month'],
    target_transforms=[Differences([1])],
    num_threads=4,
)

fcst.fit(df)  # df: unique_id, ds, y columns
preds = fcst.predict(h=14)

Prediction intervals via conformal prediction:

from mlforecast.utils import PredictionIntervals

preds = fcst.predict(
    h=14,
    prediction_intervals=PredictionIntervals(n_windows=3, h=14),
    level=[80, 95],
)
# returns columns: model-lo-80, model-hi-80, model-lo-95, model-hi-95

For Dask or Spark, you swap MLForecast for DistributedMLForecast from mlforecast.distributed, pass a Fugue execution engine, and the rest of the API stays identical.

Rough edges

The library is built on notebooks (nbs/) and compiled with nbdev. This means the "source" files are generated — editing mlforecast/forecast.py directly works, but the authoritative source is actually the Jupyter notebooks. That's a workflow choice that can confuse contributors expecting a standard Python package layout.

The pandas version pin (pandas<3.0) is a known friction point in 2025 when many projects are already on 3.x. The polars support exists but isn't always on equal footing with the pandas path in terms of test coverage — the test suite has explicit polars fixtures, but some edge cases in the distributed path are pandas-only.

The freq parameter accepts pandas offset strings, integers, or offset objects — but validation only runs at _fit time, so a bad frequency string raises deep in the stack rather than at construction. Not a showstopper, but it's a debugging annoyance.

Documentation is thorough for the happy path — end-to-end walkthrough, cross-validation, conformal intervals — but sparse on how to write custom lag transforms or integrate with pipelines outside the sklearn.pipeline.Pipeline pattern it already handles.

Bottom line

If you're building ML forecasts across many time series and don't want to write your own lag feature engine, mlforecast is the most complete option in the Python ecosystem. The C++ core and multithreaded transform pipeline make it genuinely fast at scale, not just fast in benchmarks on one series.