From: Machine learning methods in sport injury prediction and prevention: a systematic review
Authors | Train, Validate and Test Strategy | Data Pre-processing | Feature Selection/ Dimensionality Reduction | Machine Learning Classification Methods | Deficits of ML Analysis |
---|---|---|---|---|---|
AYALA ET AL | threefold stratified cross-validation for comparison of 68 algorithms | - Data imputation: missing data were replaced by the mean values of the players in the same division - Data discretization | No | - Decision tree ensembles - Adjusted for imbalance via synthetic minority oversampling - Aggregated using bagging and boosting methods | Discretization before data splitting |
CAREY ET AL | - Split in training dataset (data of 2014 and 2015) and test dataset (data of 2016) - Hyperparameter tuning via tenfold cross-validation - Each analysis repeated 50 times | NR | Principal Component Analysis | - Decision tree ensembles (Random Forests), Support Vector Machines - Adjusted for imbalance via undersampling and synthetic minority oversampling | Dependency between training and test dataset |
LĂ“PEZ-VALENCIANO ET AL | fivefold stratified cross-validation for comparison of 68 algorithms | - Data imputation: missing data were replaced by the mean values of the players in the same division - Data discretization using literature and Weka software | No | - Decision trees ensembles - Adjusted for imbalance via synthetic minority oversampling, random oversampling, random undersampling - Aggregated using bagging and boosting methods | Discretization before data splitting |
MCCULLAGH ET AL | tenfold cross-validation for testing | NR | No | Artificial Neural Networks with backpropagation | Dependency between training and test dataset |
OLIVER ET AL | fivefold cross-validation for comparison of 57 models | - Data discretization using literature and Weka software | No | - Decision trees ensembles - Adjusted for imbalance via synthetic minority oversampling, random oversampling, random undersampling - Aggregated using bagging and boosting methods | Discretization before data splitting |
RODAS ET AL | - Outer fivefold cross-validation for model testing - inner tenfold cross-validation for hyperparameters tuning | - Synthetic variant imputation | Least Absolute Shrinkage and Selection Operator (LASSO) | Decision tree ensembles (Random Forests), Support Vector Machines | Â |
ROMMERS ET AL | - Split in training (80%) and test (20%) dataset - Cross-validation for tuning hyperparameters | NR | No | Decision tree ensembles - Aggregated using boosting methods | Â |
ROSSI ET AL | - Split in dataset 1 (30%) for feature elimination and dataset 2 (70%) for training and testing - stratified two-fold cross-validation on dataset 2 - repeated 10,000 times | NR | Recursive Feature Elimination with Cross-Validation | - Decision tree ensembles - Adjusted for imbalance via adaptive synthetic sampling - Aggregated using Random Forests | Dependency between training and test dataset |
RUDDY ET AL | Between Year approach: - Split in training dataset (2013) and test dataset (2015) Within Year approach: - Split in training (70%) and test (30%) dataset Both approaches: - tenfold cross-validation for hyperparameter tuning - Each analysis repeated 10,000 times | - Data standardization | No | - Single decision tree, decision tree ensembles (Random Forests), Artificial Neural Networks, Support Vector Machines - Adjusted for imbalance via synthetic minority oversampling | Standardization independent in training and test dataset |
THORNTON ET AL | Split in training (70%), validation (15%), and test (15%) dataset | NR | No | Decision tree ensembles - Aggregated using Random Forests | Â |
WHITESIDE ET AL | fivefold cross-validation for comparison of models | NR | Brute Force feature selection: Every possible subset of features is tested | Support Vector Machines | Â |