We know that Envision, Capture Higher Ed’s cutting-edge predictive engine for higher education, makes predictions. We know that Envision needs student record data to make those predictions. But what happens between feeding student data in and getting predictions out?
Let’s crack open the black box and look inside.
Envision is a customized computer program with four basic steps: loading, cleaning, building and iterating.
The first step involves loading student data into a statistical package so we can analyze it. Every time you open an Excel file and view the contents of a worksheet, you are performing a similar task: you are loading data into statistical package.
We perform a similar task in Envision, just using a few lines of code instead of using mouse clicks to start the import. We import untidy historical data from a school, and we import supplementary behavioral and contextual data.
The second step is data cleaning. This step takes the most time and manual effort of any of the four steps. Every time you take some data in Excel and filter rows, find and replace values, convert dates, round numbers, or write if-statements to make the information more useful to you, you are cleaning data.
Similarly, we write customized code that organizes, assembles and converts the school’s untidy data into a tidy shape that predictive engines find useful.
Predictive engines are very particular about their data format. Since each school’s data are unique in structure and quirkiness, a capable analyst needs to perform this task. An analyst determines whether the data are coded consistently over time, and if they aren’t, she or he writes lines of code to fix it.
The result from the cleaning file is an analytic dataset that contains one row for each student, and columns contains numeric values that a predictive model can use.
The third step is the model building stage. Model building involves a process of testing the performance of different, well-suited algorithms on the same analytic dataset. If you’ve ever moved to a new city and found the fastest way to get to the grocery store by driving a few different promising-looking routes and timing them, you’ve developed an algorithm (turning right at Oak St., left at Green St., and doing 20 over on Colfax Ave for 5 miles is 2 minutes faster than turning right at Canoe Blvd, waiting at the light at Green St., and speeding 25 over down Colfax for 4 miles).
Similarly, we take an analytic dataset and apply different predictive algorithms to it, tuning those algorithms as we go, and assessing their performance. We determine which ensemble makes the most accurate predictions in the past, and then we select that combination of algorithms to generate predictions for the future.
The fourth step is the iteration stage. Each time you rely on your algorithm to drive the fastest route to the grocery store, you are iterating on that algorithm. It’s a new day with a new traffic pattern and different weather system, but based on your earlier testing, you drive the same way to the grocery store knowing that the route you selected earlier is likely to still be the fastest.
We perform a similar task with Envision. Like each new day’s grocery store trip, we take newly provided student data for the coming year and apply our best-performing ensemble algorithm to it. We expect the algorithm we selected earlier will maximize the outcome we want. But rather than making the quickest trip to the grocery store, we expect to make the most accurate predictions possible.
See what Capture’s next generation predictive engines can do for your admissions office. Sign up for a Free Envision Trial.
By Pete Barwis, Ph.D., Senior Data Scientist, Capture Higher Ed