[Back to Intro]


Advances in Financial Machine LearningMachine learning (ML) is changing virtually every aspect of our lives. Today ML algorithms accomplish tasks that until recently only expert humans could perform. As it relates to finance, this is the most exciting time to adopt a disruptive technology that will transform how everyone invests for generations. This website explains scientifically sound ML tools that have worked for me over the course of two decades, helping me to manage large pools of funds for some of the most demanding institutional investors.

Most of the publications in finance are written by authors who have not practiced what they teach. They contain extremely elegant mathematics that describes a world that does not exist. Just because a theorem is true in a logical sense does not mean it is true in a physical sense. Other publications are written by authors who offer explanations absent of any academic theory. They misuse mathematical tools to describe actual observations. Theirs models are overfit and fail when implemented. Academic investigation and publication are divorced from practical application to financial markets, and much application in the trading/investment world is not grounded in proper science.

The purpose of this website is to cross the proverbial divide that separates academia and the industry. I have been on both sides of the rift, and understand how difficult is to cross it, and how easy is to get entrenched on one side. Virtue is in the balance. This website does not advocate a theory merely because of its mathematical beauty, and does not propose a solution just because it appears to work. My goal is to transmit to my students the kind of knowledge that only comes from experience, formalized in a scientifically rigorous manner.


Most investment strategies uncovered by practitioners and academics are false. This partially explains the high rate of failure, especially among quantitative hedge funds (smart beta, factor investing, stat-arb, CTAs, etc.) This video examines why false discoveries are so prevalent in finance, why researchers fail (in many cases purposely) to detect them, and why firms are able to monetize their scheme.

Thanks to recent advances in financial machine learning, today we have tools that help detect and prevent false investment strategies. The SEC, FINRA and other regulatory agencies worldwide could use these tools to take a more active role in curving this rampant financial fraud. Investors can take matters into their own hands immediately, by: (1) Avoiding asset managers that do not disclose all trials involved in the development of an investment product; and (2) avoiding products where the firm has not certified the probability of backtest overfitting, the deflated Sharpe ratio, or an equivalent estimate of selection bias under multiple testing.


If I had to choose one plot that every Finance student should learn, it would be the one displayed on the right. Understand it well, and it will save you a lot of miseries. Ignore it, and it may cost you your career. Ask yourself why you were not taught this plot in college...

Suppose that you are looking for an investment strategy. You run multiple backtests on a bunch of ideas, coming up with results that achieve high Sharpe ratios, some of them above 3. You show these results to your boss, who decides to place the strategy in paper trading for a few weeks. Luckily, paper trading performance seems consistent with the backtest, so the investment committee approves its deployment. The strategy receives a $100 million allocation, but unfortunately a 20% loss follows shortly after. The strategy never recovers, and it is eventually decommissioned, alongside its author. What happened?

The plot helps us answer that question. The y-axis displays the distribution of the maximum Sharpe ratios (max{SR}) for a given number of trials (x-axis), when the true Sharpe ratio is zero. A lighter color indicates a higher probability of obtaining that result, and the dash-line indicates the expected value. For example, after only 1,000 backtests, the expected maximum Sharpe ratio (E[max{SR}]) is 3.26, even if the true Sharpe ratio of the strategy is null! How is this possible? Why does your backtest achieve such a high Sharpe ratio although there is no strategy?

The reason is Backtest Overfitting: When selection bias (picking the best result) takes place under multiple testing (running many alternative configurations), the chosen backtest is likely to be a false discovery. Finance books, academic journals and TV channels are filled with false discoveries. The retraction rate for Finance articles is essentially null, compared to thousands of papers retracted in other fields. Most quantitative firms invest in false positives, and fail, because they do not own the technology to prevent them.

This plot is an experimental verification of the "False Strategy" theorem, first proven in this 2014 NAMS paper. This theorem essentially states that, unless max{SR}>>E[max{SR}], the discovered strategy is likely to be a false positive. Moreover, the TP theorem is notable for providing a closed-form estimate of the raising hurdle that the researcher must beat as he conducts more backtests. The plot confirms that this estimated hurdle (the dash-line) is quite precise under a wide range of trials (in the plot, between 2 and 1,000,000). Read this 2014 JPM paper, to learn more about ways of preventing false discoveries.



As you can see in the innovations section, this website discusses a wide variety of investment subjects: From backtesting and strategy selection, to robust portfolio construction, to signal processing, ... all the way down to market microstructure. After many years searching for answers to critical questions in the academic literature, I began to collaborate with accomplished scientists and mathematicians, leaders in their fields. Our preferred approaches were inspired by Experimental math, modern Geometry and Topology. The outcome has been some of the most read publications in Finance. As Congresses and University programs invited me to present my methods, I developed materials for Seminars, including lectures, videos and software. Any earnings from these resources are donated in full to charities, like the John Hunter Memorial Fund. You can find my scheduled talks at the news section.

A few mathematical colleagues and I founded the first online community dedicated to Quantum Computational Finance, named Quantum4Quants.org. We also set up a blog and the M-A-F-F-I-A think tank, where we debunk popular misconceptions in academic and practitioner's research. We have been particularly vocal about the need to correct for selection bias and multiple testing in empirical studies.

Despite of my general criticism of academic financial research, there are quite a few excellent publications that have made useful contributions. Some are listed in the favorites section.



If you take anything away from this website, I hope this is that:

  1. The mathematical tools commonly used to solve financial problems are hopelessly rudimentary:

    • Regression analysis is a 200 years old technique. Yes, I know that every month comes out an expensive Econometrics textbook with new chapters, but any 19th century mathematician could read it and understand it promptly. It's mostly linear algebra with some basic calculus and inferential probability.

    • Economics is the only discipline that gives the Nobel prize to individuals that apply methods established in the 18th century. You won't strike gold at the same spot where everyone else has been digging for decades, and even if you do, you will have to share it with them. Search outside their reach.

  2. Markets are complex networks that require tools and techniques adequate to capture such complexity. If you want to have a chance at outperforming your peers, you will have to:

    • Use/build datasets that nobody has, or nobody is able to model: Unstructured, asynchronous, hierarchical, Big data.

    • Apply more advanced methods than theirs: Graph theory, combinatorial mathematics, integer optimization, Bayesian networks, algorithm complexity, machine learning, etc. Embrace math by experiment. You may want to start by reading the books written by two of my co-authors, David H. Bailey and Jon M. Borwein.

    • Solve intractable (NP-Hard) problems: The harder to compute, the better. A problem that can be solved with a commercial computer or server is probably unworthy of your time. You need to build a customized High-Performance Computing (HPC) cluster, supercomputer or, even better, gain access to a Quantum Computer.

    • Run your research department like a Physics laboratory:

      • If statistical analysis alone sufficed to identify investment strategies, most financial academics would be multi-millionaires. That is not the case, hence there is no scientific evidence supporting their claims. Toy models and simulations do not constitute scientific evidence. No matter how ingenious, every theory must be tested and validated in the real markets. Apply rigorous Quantitative Meta-Strategies to your investment process.

      • Avoid falling into the trap of backtest overfitting: Always control for the probability that your backtest is overfit. There is absolutely no merit in producing a backtest with a high Sharpe ratio. We have developed tools that, within a couple of minutes, find an investment strategy with a Sharpe ratio of 5.

Some of the most successful hedge funds in history are math-driven. They are the product of the second quant revolution, which combines: Big Data + Machine Learning + HPC + Meta Strategies.



Berkeley Lab: lopezdeprado(at)lbl(dot)gov
Personal e-mail: mldp(at)QuantResearch(dot)org



The statements made in this communication are strictly those of the author and do not necessarily represent the views of the entities and agencies he is affiliated with. No investment advice or particular course of action is recommended. All rights reserved. 2010-2018 Marcos Lpez de Prado.