As schools and districts choose from a growing menu of online educational products and services, many educators seek evidence to inform their purchasing decisions. Of little use are hyperbolic marketing flyers and bubbly testimonials. They’re looking for data—but which numbers are the most useful to collect and analyze?
Too often, parents, educators and, yes, reporters, simply ask “What works?” Those two words fail to capture the complexity of factors influence whether a tool can deliver results. In order to glean any meaningful insights into whether a product made any impact on outcomes, one must also ask: “In what context?”
Research organizations and universities have long examined the impact of technology products in the classroom. Yet the process is often labor and time intensive, often taking a year or more. By the time these studies are done, the product may have already changed.
Recognizing the need for research to keep pace with technological development, a few organizations aim to help schools seek answers about product efficacy more quickly. To do so, they seek to help schools figure out how ask the right questions—and what data to collect and analyze.
Measure What Matters
What should the result look like when something “works”? Before thinking about collecting and analyzing data, educators must first be able to articulate the desired outcomes.
For instructional technology products, the results often revolve around student achievement. An online math tool that “works” should be correlated with better grades in a class or higher scores on a summative assessment—data that schools already collect. Popular measures include the Measures of Academic Progress ( MAP) assessments created by the Northwest Evaluation Association, which aims to measure student growth in math and reading.
Yet not all student outcomes are tied to academic performance, says Alexandra Resch, associate director of human services research at Mathematica, which evaluates the impact of public policy and programs. “These days many of districts also want to focus on social-emotional learning” and other non-cognitive outcomes, she says. So far there is little consistency is how educators measure these outcomes; a proxy, for instance, may be attendance or disciplinary incidents. Data may also be self-reported by teachers and students on surveys developed by companies including Panorama Education or simply Google Forms.
Supported by a grant from the U.S. Department of Education, Mathematica recently unveiled its Ed Tech Rapid Cycle Evaluation (RCE) Coach, which aims to walk educators through asking questions and collecting data to ascertain how well a product has worked. Currently in public beta, the tool is primarily designed to evaluate impact on student outcomes on benchmark assessments. “Over the next year we’ll be building a database of other measures beyond typical student achievement measures,” says Resch.
Commitment & Fidelity
Companies design products with specific use cases in mind. But these ideal scenarios may not actually line up with how the tool is actually used in schools. A math product may be most “effective” in a classroom where students have their own devices and spend 90 minutes per week on the program. Yet not every school enjoys that luxury of time and resources.
“A district will buy a product based on an assumption on how often it will be used,” says Resch, “then leave it to teachers to decide how to use it. Some may use it as intended; others may not.” She recommends companies and schools should first agree on an implementation state—say, 90 minutes per week—before committing to a pilot.
Developers should then provide schools with data on whether the students are using the tool as intended. But these measures vary as companies report usage differently. Some systems track how long a student or teacher spends on different parts of a program. Others may only report how many times someone has logged in, with no further information about how he or she engaged with the features.
“The challenge with regard to evaluating the product efficacy across a variety of educational software programs is the variation in the way data are collected by the educational software provider,” says Bi Vuong, director of Proving Ground, an initiative at Harvard University’s Center for Education Policy Research. Her team is currently working with three school districts and 10 charter management organizations to study the impact and implementation of edtech programs including Achieve 3000 and ST Math.
Correlations, Comparisons
“Knowing these two categories of information—implementation and impact—is foundational,” Vuong adds, to drawing insights into whether a tool has made an impact on student outcomes. In other words, matching product usage data with user outcomes data is a starting point to understanding how well an intervention may be working.
Software developers should also be able to match outcomes data to student demographic information. Key characteristics may include gender, ethnicity, socioeconomic status and English language proficiency. “What we are interested in here is whether the software is more effective for certain students than others,” states Vuong.
But to thoroughly understand how well a product “works,” analyzing one group of students is not enough. Instructional technology tools are not cheap and so educators very reasonably ask: Would the students have progressed just as much without using the product? “To really assess effectiveness, you need a comparison group that isn’t using the technology,” says Resch. “You can’t make a strong claim unless you have data on what would happen if you weren’t using it.”
Here’s where the randomized controlled trial—what many researchers call the “gold standard” of causal data analysis—come in. These experiments compare two groups of students with similar demographics and achievement levels—one using the product (the “variable”) and another using the same tools as before (“control”).
Matching students to create these two comparable can be challenging. Steve Schneider, a senior program director of STEM at WestEd, a research nonprofit that performs efficacy studies of edtech products, offers a glimpse of the many variable involved in these studies:
“We might look at the school demographics at one point. We may look at years of teacher experience as another point. Also, how much professional development does the school or company provide? What test scores are available? These are a few of all the points we consider when matching classrooms and schools.”
A Quicker Fix
Running rigorous randomized controlled trials can take several years—time that neither students, teachers or software developers have. They are expensive, too; research organizations typically charge companies six- to seven figures to run these studies. (WestEd is running a $3 million study of Khan Academy’s impact on community college math achievement.) Most startup will run out of time and money before seeing any results.
The RCE tool that Resch’s team at Mathematica is building aims to cut that research time and price tag, but she cautions that evaluations “shouldn’t be too rapid. Changes in student achievement may take six months, and you should allow yourself that time.” Seeing impact for other outcomes, such as attendance, can be quicker. In these areas “maybe you can expect to measure meaningful changes in a month,” she adds.