Predicting corporate bond illiquidity via machine learning
This paper examines the predictive performance of machine learning methods in estimating the illiquidity of U.S. corporate bonds. We compare the predictive performance of machine learning-based estimators (linear regressions, tree-based models, and neural networks) to that of the most commonly used benchmark model based on historical illiquidity. Machine learning techniques outperform the historical illiquidity-based approach from both a statistical and an economic perspective. Moreover, tree-based models and neural networks outperform linear regressions, which incorporate the exact same set of covariates. Gradient boosted regression trees perform particularly well. While historical illiquidity, due to its high persistence, is the most important single predictor variable, several fundamental, risk-, and return-based covariates also possess notable predictive power. Capturing interactions and nonlinear effects among these predictors further enhances predictive performance.