(This is Part 4 of a series highlighting some of the ways enrollment predictions are putting higher education institutions in peril. Don’t miss our previous entries, the Series Introduction; Part 1, Treating Adolescent Decision-Making as Linear; Part 2, Not Adequately Testing Models in Real World Scenarios; and Part 3, Trying to Forecast October’s Weather on January1.)
Choosing Interpretability Over Accuracy
The Weather Channel’s Jim Cantore loves the weather. He. Loves. It. What else would compel a sane person to hold a microphone in an electrical storm while bracing 40 MPH winds?
Unless you are like Jim though, you typically don’t care how the weather prediction gets made; you just want to know if you need an umbrella when heading outside.
Consider this statement: As a predictive model’s complexity increases, its interpretability decreases. This tension exists at the heart of a significant nerd battle (the best kind if you ask us) between traditional statisticians and computer scientists. In the SEM space, the interpretable model often is framed as a “7 factor model” with weighting of certain variables that form the recipe of exactly what is going into the prediction and in what proportion.
The machine learning approach is less concerned with providing such interpretable results — which may be exceedingly complex, in deference to prediction accuracy.
Thus goes the argument:
Traditional Statistician: After reading 20 academic papers, and understanding their results, it seems pretty clear that distance from campus is strongly correlated with propensity to enroll.
Machine Learning Practitioner: After testing 10,000 variables using 75 algorithms, we found that the price of coffee exports during the rainy season in Suriname is actually a better predictor than the campus visit.
The punch line though is that the truth lies somewhere between these technocratic Jets and Sharks. It all depends on the outcome you are seeking and the strategic value thereof. In situations where understanding what makes a student more likely to enroll is more critical to recruitment strategy, a frequentist statistical model with its relatively clean explanations can be quite valuable. Likewise, when accuracy is of paramount interest, then bring on the “algos.”
The central issue however is that the Strategic Enrollment Management prediction industry has never provided much of a choice between interpretable models and accurate ones. Maybe there are some enrollment managers out there who care more about why that email from the Nigerian prince was deemed to be spam than they do about the overall accuracy of their spam filter. I am betting though, that most enrollment managers value predictive accuracy above all else, especially in our overstuffed email inboxes.
Those of us who make predictions for a living should plan accordingly.
Check back here tomorrow for Part 5 of this series, “Bringing Too Few Tools to the Job Site.”
By Thom Golden, Ph.D., Vice President of Data Science; Brad Weiner, Ph.D., Director of Data Science; and Pete Barwis, Ph.D., Senior Data Scientist, Capture Higher Ed