Build, test, and deploy ML-driven trading strategies — from data sourcing to live execution.
This repository hosts the code for Machine Learning for Trading, 3rd Edition by Stefan Jansen — a ground-up rebuild, organized around one end-to-end workflow: how you define a research idea and develop it iteratively into a strategy you can actually run, and keep running, in a live market.
- Nine case studies illustrate the workflow throughout the 27 chapters of the book, from raw data through features, models, backtests, costs, and risk to deployment.
- Generative AI and autonomous agents are new to this edition and cut across that workflow, bringing retrieval-augmented generation, knowledge graphs, and multi-agent systems to financial research.
- The companion website features 112 primers, 56 agent skills, and six production Python libraries that facilitate substantial parts of the workflow.
For the first time, the third edition comes with a live cohort course, hands-on workshops, and free lightning lessons taught by Stefan on Maven — full schedule on the courses page.
- ▶ Machine Learning for Trading: From Research to Production — the flagship live cohort course: take a research idea all the way to a deployed, monitored strategy, working through the book's end-to-end workflow with direct feedback. The first cohort starts Monday, July 6, 2026 — enrollment closes Friday, July 3.
- Getting Stuff Done with Coding Agents — a free lightning lesson on putting coding agents to work.
- Building Multi-Agent Forecasting Systems — a hands-on workshop on engineering the forecasting-agent loop: building auditable, debate-driven multi-agent systems for financial research.
The whole book traces one path: from data infrastructure and strategy research, across an evidence boundary that separates tuning from evaluation, to deployment and monitoring — with a feedback loop that retrains, pauses, or retires a strategy as its edge decays.
Where earlier editions moved technique by technique, the third edition runs that one process end to end — and adds substantial new material:
- A wider model toolkit: from gradient boosting (XGBoost, LightGBM, CatBoost) to deep time-series architectures (PatchTST, iTransformer, TSMixer, TCN, Mamba) and newer tabular and latent-factor models (TabPFN, TabM, conditional and supervised autoencoders).
- Dedicated strategy-design chapters: transaction costs and risk management are now full chapters, neither of which existed before, joining portfolio construction and strategy synthesis so a raw signal is carried through to a sized, cost- and risk-aware portfolio.
- A full production track: live trading systems (Interactive Brokers, Alpaca, QuantConnect), MLOps and governance (drift detection, safe rollout, circuit breakers, feature stores, experiment tracking), and the operational reality of running strategies, not just building them.
- Generative AI: retrieval-augmented generation grounded in SEC filings, knowledge graphs and Graph RAG, and autonomous, multi-agent research systems.
- Causal machine learning: Double ML, Bayesian structural time series, and causal discovery for separating real effects from spurious correlation.
- Reinforcement learning: optimal execution, market making with inventory, and deep hedging.
- Synthetic financial data: TimeGAN, Tail-GAN, Sig-CWGAN, and diffusion-based generators for validation when history is short.
Methodological rigor is treated as a first-class topic rather than an afterthought. The book draws an explicit line between exploration and confirmation — the evidence boundary — uses walk-forward cross-validation throughout, and confronts the multiple-testing and overfitting problems that quietly invalidate most backtests, with tools like the Deflated Sharpe Ratio, the Rademacher Anti-Serum, and White's Reality Check, plus conformal prediction for honest uncertainty estimates.
The data layer moves to Polars for fast, expression-based manipulation, and every chapter ships in reproducible Docker environments so results repeat across machines; PyTorch, LightGBM, Optuna, and Plotly round out the modeling and visualization stack.
The structural centerpiece of the third edition is nine case studies that run the length of the book. ETFs, crypto perpetuals, intraday equities, options, FX, futures, and equity factor panels are each carried through the same pipeline — from raw data and labels to features, models, backtests, costs, risk overlays, and a final deployment assessment. One disciplined process applied to nine very different markets shows where it works, where it breaks, and why.
| Case Study | Asset Class | Frequency | What It Explores |
|---|---|---|---|
| ETFs | Multi-asset ETFs | Daily | Cross-asset momentum and mean-reversion across 100 ETFs |
| Crypto Perps | Crypto | 8-hourly | Funding-rate arbitrage on perpetual futures |
| NASDAQ-100 | Equities | 15-min | Intraday microstructure signals from order flow and the LOB |
| S&P 500 Equity + Options | Equities + Options | Daily | Equity selection enhanced with implied-volatility features |
| US Firm Characteristics | Equities | Monthly | Firm-level characteristics panel (size, value, momentum, quality) |
| FX Pairs | FX | Daily | Carry and momentum across major currency pairs |
| CME Futures | Futures | Daily | Term-structure and roll-yield signals across commodity and financial futures |
| S&P 500 Options | Options | Daily | Options-only strategies (straddles, delta-hedged positions) |
| US Equities | Equities | Daily | Broad cross-section of US stocks with classic factor exposures |
Free concept explainers for every idea the book relies on. Each part links to its full list; a few topics show the range:
- Foundations: 8 topics spanning limit order book mechanics, bitemporal data models, and the stylized facts a simulator must reproduce.
- Research Design and Feature Engineering: 21 topics, including multiple testing in factor research, fractional differencing, and path signatures for financial sequences.
- Model Development: 22 topics, among them regularization geometry, conformal prediction in finance, and the mechanism behind double machine learning.
- Strategy Implementation: 27 topics, from the deflated Sharpe ratio and hierarchical risk parity to Almgren-Chriss optimal execution.
- Advanced AI: 8 topics such as Markov decision processes, the policy-gradient theorem, and proper scoring rules for event forecasts.
- Production: 2 topics, champion-challenger evaluation and training-serving skew with feature stores.
- Cross-cutting concepts: 20 building blocks referenced across chapters, for example momentum and mean reversion, the bias-variance tradeoff, and walk-forward validation.
Reusable, guard-railed tasks for coding agents, each with built-in defenses against lookahead bias, data leakage, and multiple-testing errors. Each category links to its full set; a few skills show the range:
- Concepts: 10 skills, including lookahead bias, data leakage, and the information coefficient.
- Data Acquisition: 7 skills spanning fetching data, building bars, and data validation.
- Feature Engineering: 10 skills, among them computing features, triple-barrier labels, and feature selection.
- Evaluation & Validation: 8 skills, from walk-forward CV and purging-and-embargo to the deflated Sharpe ratio.
- Backtesting: 5 skills such as running backtests, cost models, and tear sheets.
- Portfolio Management: 5 skills, including position sizing, risk metrics, and kill switches.
- Infrastructure: 4 skills, for example the canonical schema, the registry system, and Polars patterns.
- Workflows: 5 skills covering factor research, model validation, and production readiness.
- Production: 2 skills, live trading and monitoring & alerting.
The notebooks are built on six production Python packages, each documented and usable on its own — one per stage of the workflow:
| Library | Stage | What it does |
|---|---|---|
ml4t-data |
Data | Unified market-data acquisition from 19+ providers behind one interface |
ml4t-engineer |
Signal | Features, labels, alternative bars, and leakage-safe dataset preparation |
ml4t-models |
Models | Finance-native latent factors, SDFs, direct prediction, and portfolio learning |
ml4t-diagnostic |
Evaluation | Feature validation, strategy diagnostics, and the Deflated Sharpe Ratio |
ml4t-backtest |
Strategy | Event-driven backtesting with realistic execution |
ml4t-live |
Deployment | Production trading with broker integrations |
An introduction and a closing chapter bookend six workflow-aligned parts. Chapter titles link to their guides as each part is published; the rest are added part by part over the coming weeks.
Why process discipline beats model sophistication. Introduces the ML4T workflow as a research-to-production system, regime detection on factor returns and macro indicators, and the evidence boundary that separates exploration from confirmation.
The markets, instruments, and infrastructure the rest of the book builds on: a taxonomy of sources, raw exchange messages turned into feature-ready bars, point-in-time fundamentals, and synthetic histories for robust validation.
A taxonomy of market, fundamental, and alternative data. Surveys eight asset classes, quantifies survivorship bias, benchmarks storage formats (Parquet, DuckDB, kdb+, TimescaleDB), and establishes the data-quality framework used throughout the book.
From raw exchange messages to feature-ready bars. Parses NASDAQ ITCH, reconstructs limit order books from multiple data sources, validates Lee-Ready trade classification, and compares bar-sampling methods — dollar bars deliver the best return normality.
Point-in-time pipelines for SEC EDGAR filings, entity resolution across identifier systems, macro and commodity fundamentals, and alternative-data evaluation — including on-chain crypto fundamentals and prediction markets (Kalshi, Polymarket).
Generating alternative market histories for robust validation. Implements TimeGAN, Tail-GAN, Sig-CWGAN, Diffusion-TS, and LLM-based tabular generation, evaluated through a fidelity–utility–privacy framework.
Define the trading problem, then turn data into model-ready signals: research design, labels, features, and the evaluation that determines what any model can learn.
Defining the trading game before building models: universe rules, decision schedule, cost model, evaluation protocol, and run logging. Introduces the nine case studies and the walk-forward cross-validation discipline that anchors Chapters 7–20.
Label engineering (forward returns, triple-barrier, trend scanning), univariate feature evaluation (information coefficients, quantile analysis, feasibility screens), multiple-testing control (BH-FDR, Deflated Sharpe Ratio), and causal plausibility checks.
Five feature families from price data (momentum, reversal, volatility, liquidity, microstructure), structural and cross-instrument features (yield curve, term structure, relative value), contextual features (macro regime, calendar, sentiment), and feature selection with robustness testing.
Features from fitted models: stationarity diagnostics, Kalman filters, Fourier and wavelet spectral features, GARCH volatility, and HMM regime probabilities — with point-in-time correctness enforced throughout.
From bag-of-words through transformers: TF-IDF, Word2Vec and GloVe embeddings, LSTM sequence models, FinBERT sentiment, financial NER fine-tuning, and news-return signal construction.
Five model families applied to the same nine case studies, each building on the linear baseline.
Regularized linear models (Ridge, LASSO, Elastic Net) as the baseline every later model must beat. Logistic regression for direction, SHAP interpretability, conformal prediction for uncertainty, and a cross-dataset comparison across all nine case studies.
XGBoost, LightGBM, and CatBoost with Optuna multi-objective tuning, plus deep-learning tabular alternatives (TabPFN, TabM). TreeSHAP explainability and cross-dataset results, where gradient boosting is the strongest tabular model in most case studies.
LSTM, N-BEATS, Transformers (PatchTST, iTransformer, TFT), TSMixer, TCN, and Mamba, set against the LTSF-Linear debate. A practitioner selection framework and cross-dataset evidence on when deep learning helps and when simpler models suffice.
PCA eigenportfolios, IPCA with time-varying loadings, conditional and supervised autoencoders, adversarial SDF estimation, and yield-curve decomposition — with cross-dataset results on when latent factors add predictive value.
Double Machine Learning for isolating factor treatment effects, Bayesian Structural Time Series for event impact, and causal discovery (PCMCI, NOTEARS, VAR-LiNGAM), applied across the nine case studies.
From predictions to deployable strategies — backtesting, portfolio construction, costs, risk, and synthesis.
Backtesting as falsification: trading-protocol specification, vectorized vs event-driven engines, an ETF baseline strategy, core metric reporting, regime diagnostics, and strategy-level overfitting control (Deflated Sharpe Ratio, Rademacher Anti-Serum, White's Reality Check).
From scores to portfolios: mean-variance optimization and its pitfalls, Hierarchical Risk Parity, the Kelly criterion, conformal position sizing, deep portfolio allocation, and a controlled allocator comparison across case studies.
Cost taxonomy, spread estimation, market-impact calibration, execution algorithms (VWAP, TWAP, Almgren-Chriss optimal execution), transaction-cost analysis, and practical guardrails — with breakeven costs that vary widely by asset class.
VaR/CVaR tail measurement, drawdown and path-risk controls, factor and sector decomposition, stress testing, adaptive risk overlays, deep hedging, and kill switches. Overlay effectiveness turns out to be strategy-specific.
What nine experiments reveal about translating ML predictions into strategies: IC–Sharpe decorrelation, Fundamental Law diagnostics, the model-family cascade, cost-survival analysis, holdout failure modes, and a practitioner's decision framework.
Reinforcement learning, large language models, knowledge graphs, and autonomous agents for finance.
MDP formulation for finance, DQN/PPO/SAC algorithms, optimal execution, market making with inventory management, deep hedging with PFHedge, inverse RL for strategy recovery, and the sim-to-real gap.
Retrieval-augmented generation grounded in SEC filings: ingestion, domain-specific embeddings, hybrid retrieval with re-ranking, constraint-based prompting, RAG evaluation and failure diagnostics, and the transition to agentic workflows.
When graphs earn their infrastructure cost: KG construction from SEC filings, Graph RAG for multi-hop reasoning, graph features for ML (GNN embeddings, centrality, community detection), financial networks, and temporal-leakage prevention.
Agent architectures (ReAct, Tree of Thoughts, Reflexion), memory systems, tool contracts, the engineering stack (LangGraph, Claude SDK), a stateful equity-research agent, multi-agent forecasting with adversarial debate, and production reliability.
Taking strategies live — trading systems and the operational infrastructure that keeps them running.
A unified framework bridging research and production: Interactive Brokers and Alpaca integration, managed platforms (QuantConnect), order-lifecycle management, pipeline verification, and operational readiness.
An ML failure taxonomy (pipeline divergence vs performance decay), drift detection, safe model rollout, circuit breakers, feature stores, experiment tracking, and the MLOps infrastructure financial ML systems need.
The systematic philosophy, quant career paths, learning resources, research frontiers, and how to build your own edge. The closing bookend to Chapter 1: the process is the edge.
Run everything from the repository root. Clone and set up with Docker or a local uv environment:
git clone https://github.com/stefan-jansen/machine-learning-for-trading.git
cd machine-learning-for-trading
cp .env.example .env
docker compose pull ml4t # Option A — Docker (recommended)
pip install uv && uv sync # Option B — local with uvSee the installation guide for platform-specific setup (Linux, Windows WSL2, macOS) and GPU instructions.
Download data. Most notebooks need datasets; start with the free ones (no API keys):
uv run python data/download_all.py --free-onlyThe data guide documents every dataset, API-key setup, the loaders, and storage tiers (≈70 MB free tier up to ≈7 GB full).
Run notebooks. Notebooks are paired Jupytext files (.py source + generated
.ipynb). Run a quick smoke test, or open Jupyter Lab:
uv run python 01_process_is_edge/factor_regimes.py
docker compose up -d ml4t # then open http://localhost:8888See the guide to running notebooks for Papermill parameters and the experiment workflow.
Most notebooks run on the default ml4t image; a few need a specialized one, and each such notebook says so in its preamble. Full details in the Docker environments guide.
| Image | Covers | When you need it |
|---|---|---|
ml4t |
All 27 chapters + 9 case studies (CPU) | Default for everything |
ml4t-gpu |
Same ml4t image, run with the NVIDIA runtime (--profile gpu) |
Deep-learning chapters |
ml4t-py312 |
Python 3.12 for signatory, esig, gensim, pfhedge, tfcausalimpact | ~10 notebooks |
benchmark |
Database clients (TimescaleDB, ClickHouse, QuestDB, InfluxDB) | Ch02 storage benchmarks |
rapids |
RAPIDS cuML + LightGBM CUDA (build locally) | One Ch12 GPU benchmark |
New chapters and notebooks are added over the coming weeks. ⭐ Watch or star the repo to follow along, and subscribe to the twice-weekly Insights newsletter.
Looking for the second edition? It is complete and stable on the second-edition branch —
git checkout second-edition, and everything is exactly where the book describes it.
Found an error, a broken link, or have a suggestion? Early feedback is especially valuable before the book launches.
- Issues: open a GitHub issue
- Website and contact: ml4trading.io
Code: MIT License · Book content: © 2026 Stefan Jansen. All rights reserved.


