The original version of this article appeared in Toward Data Science.
When I started teaching data science and artificial intelligence in Duke University’s Pratt School of Engineering, I was frustrated by how little insight I actually felt I had into how effective my teaching was, until the end-of-semester final exam grades and student assessments came in.
Being new to teaching, I spent time reading up on pedagogical best practices and how methods like mastery learning and one-on-one personalized guidance could drastically improve student outcomes. Yet even with my relatively small class sizes I did not feel I had enough insight into each individual student’s learning to provide useful personalized guidance to them. In the middle of the semester, if you had asked me to tell you exactly what a specific student had mastered from the class to date and where he or she was struggling, I would not have been able to give you a very good answer. When students came to me for one-on-one coaching, I had to ask them where they needed help and hope that they were self-aware enough to know.
Knowing that my colleagues in other programs and universities teach much larger class sizes than mine, I asked them how aware they felt they were of each of their students’ level of mastery at any point in time. For the most part, they admitted they were also largely “flying blind” until final assessment results came in. It is historically one of the most vexing problems in education that there is a tradeoff between scale and achievable quality of instruction: As class sizes grow larger, the ability of a teacher to provide the type of personalized guidance shown by learning science research to be most effective is diminished.
Yet as instructors in the new world of online education, we have access to ever-increasing amounts of data—from recorded lecture videos, electronically submitted homework, discussion forums, and online quizzes and assessments—that may give us insights into individual student learning. In summer 2020, we began a research project at Duke to explore how we could use this data to help us as instructors do our job better. The specific question we set out to answer was: “As an instructor, how can I use the data available to me to support my ability to provide effective personalized guidance to my students?”
Identifying Student Knowledge States
What we wanted to know was, for any given student in a class at any point during a semester, what material have they mastered and what are they struggling with? The model of Knowledge Space Theory, introduced by Doignon and Falmagne in 1985 and significantly expanded on since, posits that a given “domain” of knowledge (such as the subject of a course) contains a discrete set of topics (or “items”) that often have interdependencies. The set of topics that a student has mastered to date is called their “knowledge state.” In order to provide effective instruction for the whole class and to provide personalized guidance for individual students, understanding the knowledge state of each student at any point is critical.
So how does one identify a student’s knowledge state? The most common method is through assessment—either via homework or quizzes and tests. For my classes, I use low-stakes formative quiz assessments each week. Each quiz contains around 10 questions, with roughly half of the questions evaluating student knowledge of topics covered in last week’s lecture, and the remaining half covering topics from earlier in the course. In this way, I continue to evaluate students’ mastery of topics from the whole course each week. In addition we have weekly homework, which tests a variety of topics covered to date.
But digging through dozens or hundreds of quiz or homework question results for tens or hundreds of students in a class to identify patterns that provide insight on the students’ knowledge states is not the easiest task. Effective teachers need to be good at a lot of things—delivering compelling lectures, creating and grading homework and assessments, etc.—but most teachers are not also trained data scientists, nor should they have to be to do their jobs.
This is where machine learning comes in. Fundamentally, machine learning is used to recognize patterns in data, and in this case the technology can be used to identify students’ knowledge states from their performance patterns across quizzes and homework.
Building the Intelligent Classroom Assistant
To help improve my own teaching and that of my fellow faculty members in Duke’s AI for Product Innovation masters program, we set out to develop a system that could, given a set of class quiz and homework results and a set of learning topics, identify each student’s learning state at any time and present that information to both instructor and learner. This would facilitate more effective personalized guidance by the instructor and better awareness on the part of the student as to where they need to put additional focus in their study. Additionally, by aggregating this information across the class, an instructor could gain insight into where the class was successfully learning the material and where he or she needs to reinforce certain topics.
The project culminated in the creation of a prototype tool called the Intelligent Classroom Assistant. The tool reads instructor-provided class quiz or homework results and the set of learning topics covered so far in the course. It then analyzes the data using a machine learning algorithm and provides the instructor with three automated analyses about: quiz and homework topics with which the class has struggled; learning topics the class has and has not mastered; and the performance of each student.
One of the key challenges in developing the tool was the mapping of quiz and homework questions to the most relevant learning topic. To accomplish this, I developed a custom algorithm that uses natural-language processing and draws on open-source libraries to understand the context of each question and map it to the primary learning topic it was designed to evaluate.
Trying Out the Tool
The Intelligent Classroom Assistant tool was built while I taught the Sourcing Data for Analytics course at Duke, an introductory-level data science course for graduate engineering students that covered technical as well as regulatory and ethical topics. This gave me an opportunity to try out the tool on my class as the semester progressed.
One of the key things I wanted to evaluate was how well the algorithm behind the hood of the tool could classify each quiz or homework question into the most relevant of the 20 learning topics covered in the course. On the full set of 85 quiz questions I used during the semester, the algorithm identified the relevant learning topic correctly about 82 percent of the time. While not perfect, this was good enough to make the analyses provided by the tool useful to me.
During the course, I used the prototype in two main ways to inform my teaching. I spent extra time in lecture sessions covering learning topics and specific quiz questions that the tool flagged due to low student performance. And during one-on-one help sessions with students, I used the personalized student analysis module of the tool to understand where the student needed extra reinforcement and make tutoring sessions more focused.
It's too soon to quantify whether the tool changed student outcomes, because the course I used it in was new, which means there is no historical benchmark for comparison. But this year, we are expanding the tool's use and are working to evaluate the effects it has on student engagement and performance. We are trying it out in another engineering class of 25 and also in an undergraduate finance class of more than 200 students. I also plan to use the prototype in my spring machine learning class to guide my teaching through the semester. Since students can benefit from seeing the results of the tool’s analysis as much as instructors, for spring we hope to include the addition of a student portal allowing students to see their own results and providing personalized study recommendations to students based on their identified knowledge state.
The amount of electronic data now available to instructors can help support their teaching. But teachers are not (usually) data scientists themselves, and need analytics tools to help them extract value from the data. While such tools are helpful, however, their value is directly proportional to how well an instructor defines course learning objectives and structures material and assessments to support and evaluate those objectives.
Machine learning tools such as The Intelligent Classroom Assistant can not only help teachers to improve the quality of their classes (as measured by student learning outcomes), but also enable them to do so at increased scale, offering the promise of widespread personalized teaching. When teachers can teach more effectively, learners can learn more, and as a society we all reap the benefits.