Skip to content

Harnessing the Wisdom of Crowds to Assess Recession Risks in OECD Countries

by Thomas Chalaux, Dave Turner and Steven Cassimon, OECD Economics Department.

Macroeconomic forecasters have struggled to reliably pinpoint the precise timing of business cycle turning points and future recessions. Recognising this inherent difficulty, a growing body of work has shifted focus to probabilistic models, aiming to assess the risk of a future downturn rather than attempting exact prediction.

Researchers from major institutions, including the IMF, ECB, and the Bank of England, have lauded Random Forests (RF), or closely related methods, as the most consistently effective machine-learning method for identifying crisis episodes, often deemed superior to traditional probit/logit modelling [Bluwstein et al. (2020), Hellwig (2021), IMF (2021), Jarmulska (2020)]. However, the OECD Working Paper, Harnessing the wisdom of crowds to assess recession risks in OECD countries” (Chalaux et al, 2025), challenges this prevailing view, demonstrating that a customised algorithm based on enhanced probit modelling can match, and in some key areas surpass, the performance of Random Forests when predicting recession episodes across 20 OECD countries.

The key to this revitalisation of probit modelling lies in embracing the concept of ensemble forecasting, the “wisdom of crowds.”

The Doombot Algorithm and the Power of Averaging

The working paper introduces the latest version of a highly customised algorithm known as Doombot. While Random Forests achieve superior performance by averaging predictions across many decision trees, the newest Doombot algorithm mimics this strategy by averaging predictions from many well-fitting probit equations. This feature, termed the “wisdom of crowds,” boosts the algorithm’s out-of-sample predictive capability. The benefit of averaging is widely acknowledged in the broader forecasting literature, where simple averages often outperform more complex aggregation schemes.

Doombot’s design features substantial customisation. It employs a “brute force” method to test a large number of combinations of explanatory variables. To ensure the resulting predictions are credible and comprehensible to external audiences, the algorithm retains only well-fitting equations with statistically significant variables and imposes sign restrictions to maintain a coherent and consistent economic narrative across countries and forecast horizons.

An advantage highlighted in the paper is that Doombot is built on country-specific models. This contrasts with Random Forests, which performs best when pooling countries to estimate a single common model. The authors argue that country-specific models inherently produce more intuitively appealing properties, enhancing credibility when communicating with stakeholders.

The Predictive “Horse Race”

The OECD research compared the out-of-sample performance of five methods: Probit employing the “Wisdom of Crowds” [hereafter “Probit (WoC)”], the single-equation Probit, Random Forests estimated for individual countries (IRF), Pooled Random Forests (PRF), and LASSO.

The results show that Probit (WoC) successfully matches the performance of Random Forest methods in rolling out-of-sample quarterly predictions over a two-year horizon, including the turbulent period of the Global Financial Crisis (GFC) (Figure 1).  All methods show a much better performance in predicting a recession in the next 4 quarters compared to the subsequent 4 quarters (comparing panels A and B of Figure 1).  However, the application of the “Wisdom of Crowds” feature clearly improved the performance of the probit model compared to its single-equation predecessor at all horizons.

Disadvantages of Pooling

While pooling Random Forests (PRF) shows a superior performance to estimation of Random Forests using individual country models (IRF) on some conventional metrics like the median Area-Under-the-Curve score (AUC), the study highlights some disadvantages associated with pooling country data:

  1. Low Probability Ceiling: PRF rarely generates high recession probabilities that exceed 50%. This makes it difficult to ascertain when a recession is “more likely than not“. When tested using a higher F-score threshold of 50% rather than a low threshold of 15%, PRF dropped from a top performer to the last ranked method (Figure 2), demonstrating its poor ability to distinguish highly elevated risk cases.
  2. High Correlation: PRF predictions are typically highly correlated across countries. This approach may struggle to identify isolated recession risks for single countries or specific groups, such as the concentrated recession risk among European countries observed in 2022 and 2023. The more country-specific Probit (WoC) model successfully picked up a significantly higher differential risk for European countries during this period.

Figure 1. Distribution of out-of-sample AUC scores across 20 countries for 5 methods

Note: The box and whiskers chart summarise the distribution of Area-Under-the-Curve (AUC) scores in the out-of-sample tests for 20 OECD countries: the box shows the interquartile range, the horizontal line is the median; the cross is the average; and the whiskers are the extreme scores. The AUC score is a common measure of evaluating machine-learning models because it shows the accuracy of a model in predicting a binary outcome over different probability thresholds as to whether the occurrence of an event (here a recession) has been predicted or not. The AUC score ranges from 0 to 1, with a higher value indicating better performance. An AUC of 0.5 means the model is no better than chance at distinguishing recession from non-recession quarters, indicating it is essentially uninformative. The ordering of the methods on the x-axis reflects the ranking of their median country scores.

Figure 2. Distribution of F-scores across 20 countries with various thresholds over Q1-Q8

Note: The box and whiskers chart summarise the distribution of F-scores in the out-of-sample tests for 20 OECD countries: the box shows the interquartile range, the horizontal line is the median; the cross is the average; and the whiskers are the extreme scores.  The threshold for the F-score test (15% in panel A, 50% in panel B) reflects the threshold at which a probability prediction is classified as a recession or non-recession. The ordering of the methods on the x-axis reflects the ranking of their median country scores.

What Drives a Recession? Variables and Horizons

The robustness of this research comes from applying the same framework across 20 countries and eight consecutive quarterly horizons. This broad application confirms that the importance of explanatory variables shifts dramatically depending on the forecast horizon (Figure 3).

  • Shorter Horizons (Q1-Q2): Predictors for the immediate quarters are dominated by activity variables such as capacity utilisation, unemployment, and industrial production.
  • Longer Horizons: For horizons further out, financial cycle variables dominate, particularly credit and house prices.
  • Other Factors: Interest rates and inflation variables also make significant contributions. Consistent with previous OECD work, international or global indicators are found to be strong predictors of recession risks.

The Real-Time Data Innovation

Another important feature of this paper is the rigorous use of real-real time data for GDP in out-of-sample exercises. This means the estimation uses the precise vintage of data that would have been available at the point in time the predictions were made, rather than the most recent, often-revised data vintage (quasi-real time data).

The distinction is important because revisions to GDP data can be substantial. The study found that while using the latest vintage of data generally results in a slight aggregate performance gain, it can influence (and likely improve) forecast performance just when it matters most, such as on the eve of the GFC. For example, using the latest data vintage for June 2008 forecasts suggested an additional seven countries had already experienced negative GDP growth in Q1 2008 compared to the data available at the time. This change alone increased the predicted overall recession probability for those seven countries by 15 to 30 percentage points (Figure 4).

Concluding Insights

The findings of this working paper challenge the recent consensus regarding machine-learning superiority in crisis prediction. By harnessing the “wisdom of crowds”, averaging predictions from many well-fitting probit equations, the customized Probit (WoC) algorithm achieves out-of-sample performance comparable to Random Forests.

The country-specific nature of Doombot, combined with its ability to generate high probability predictions (exceeding 50%), offers practical advantages over pooled methods. Furthermore, the detailed, multi-horizon analysis confirms the critical role of financial cycle variables (credit and house prices) in predicting medium- to long-term recession risks, offering granular detail that can inform policy and forecasting. The use of real-real time data adds another layer of rigour, ensuring that forecast evaluations reflect the information environment actually available to policymakers at the time.

References

Bluwstein, K. et al. (2020), “Credit Growth, the Yield Curve and Financial Crisis Prediction: Evidence from a Machine Learning Approach” , Bank of England Working Paper No. 848, January, https://doi.org/10.1016/j.jinteco.2023.103773.

Chalaux, T., D. Turner and S. Cassimon (2025), “Harnessing the wisdom of crowds to assess recession risks in OECD countries”, OECD Economics Department Working Papers, No. 1837, OECD Publishing, Paris, https://doi.org/10.1787/46880adc-en.

Hellwig, K.-P. (2021), “Predicting fiscal crises: A machine learning approach”, IMF Working Papers, 150.  https://doi.org/10.5089/9781513573588.001.

IMF (2021), “How to Assess Country Risk: The Vulnerability Exercise Approach Using Machine Learning“, Technical Notes and Manuals (International Monetary Fund), TNM/21/03,  Washington, DC, https://doi.org/10.5089/9781513574219.005.

Jarmulska, B., (2020), “Random forest versus logit models: which offers better early warning of fiscal stress?”, ECB Working paper No 2408, May, doi:10.2866/214327.

Leave a Reply

Discover more from ECOSCOPE

Subscribe now to keep reading and get access to the full archive.

Continue reading