Inspired by a recent conversation about machine learning and decision trees, I have made my own for predicting 50+ PTS. Using the same dataset we used in the previous post, I built the below decision tree model in R, which is ~84% accurate with a sensitivity score of 64% (of the predicted scores of 50+, 64% were right). This is an OK model and continues to back up what was found in the in regression model that was built.
This is a very simple model. One of the most impactful in this model is that I required the model to look for at least 10 examples per node before a split can be made. For example it would look in the data 50+ results and then see what variable provided the most impact and had to do it with 10 examples. This model is fairly generalized, but also, hitting 50+ isn’t a common occurrence. I ran the data through another algorithm and got back nothing!
Let’s use this decision tree against a recent 56 point performance from Trae Young.
- Had 17 FGM, so we go left with a yes
- FGM <17 , NO! We go right
- FTM <11? NO! He had 15, we go right
- FG3A < 7 ? NO! He had 12, so we go right and end with a YES
Cool! Let’s try another example!
- He made FGM = 19
- So we go to the right after this first node, to FG3M
- He had 5 so again, we go to the right, this time to FTM <5
- He had FTM = 7
- The model would have predicted a yes
Steph Curry had a 50 PT game earlier this year:
- Steph had FGM 14, so we go left this time, twice! to FTM <18
- FTM 13 … so we go left again to FG3M < 10
- FG3M he had 9… LEFT AGAIN to FGM <16
- FGM, we know he was less than 16 FGM, so that is a YES, which makes us go RIGHT to FTM < 11
- Steph had more than 11 FTM so that is a YES and goes RIGHT to FG3A < 8
- FG3A was 19, so NO! The model would have predicted Steph not to hit 50+