As a lecturer at the Princeton School of Public and International Affairs, where I teach econometrics and research methods, I spend a lot of time thinking about the intersection between data, education and social justice — and how generative AI will reshape the experience of gathering, analyzing and using data for change.
My students are working toward a master’s degree in public affairs and many of them are interested in pursuing careers in international and domestic public policy. The graduate-level econometrics course I teach is required and it’s designed to foster analytical and critical thinking skills in causal research methods. Throughout the course, students are tasked with crafting four memos on designated policy issues. Typically, we examine publicly available datasets related to societal concerns, such as determining optimal criteria for loan forgiveness or evaluating the effectiveness of stop-and-frisk police policies.
To better understand how my students can use generative AI effectively and prepare to apply these tools in the data-related work they’ll encounter in their careers after graduate school, I knew I needed to try it myself. So I set up an experiment to do one of the assignments I asked of my students — and to complete it using generative AI.
My goal was twofold. I wanted to experience what it feels like to use the tools my students have access to. And, since I assume many of my students are now using AI for these assignments, I wanted to develop a more evidence-based stance on whether I should or shouldn’t change my grading practices.
I pride myself on assigning practical, yet intellectually challenging assignments, and to be honest, I didn’t have much faith that any AI tool could coherently conduct statistical analysis and make the connections necessary to provide pertinent policy recommendations based on its results.
Experiments With Code Interpreter
For my experiment, I replicated an assignment from last semester that asked students to imagine how they would create a grant program for health providers to give perinatal (before and after childbirth) services to women to promote infant health and mitigate low birth weight. Students were given a publicly available dataset and were required to develop eligibility criteria by constructing a statistical model to predict low birth weight. They needed to substantiate their selections with references from existing literature, interpret the results, provide relevant policy recommendations and produce a positionality statement.
As for the tool, I decided to test out ChatGPT’s new Code Interpreter, a tool developed to allow users to upload data (in any format) and use conversational language to execute code. I provided the same guidelines I gave to my students to ChatGPT and uploaded the dataset into Code Interpreter.
First Code Interpreter broke down each task. Then it asked me whether I would like to proceed with the analysis after it chose variables (or criteria for the perinatal program) for the statistical model. (See the task analysis and variables below.)
After running the statistics, analyzing and interpreting the data, Code Interpreter created a memo with four policy recommendations. While the recommendations were solid, the tool did not provide any references to prior literature or direct connection to the results. It was also unable to create a positionality statement. That part hinged on students reflecting on their own background and experiences to consider any biases they might bring, which the tool could not do.
Another flaw was that each part of the assignment was presented in separate chunks, so I found myself repeatedly going back to the tool to ask for omitted elements or clarity on results. It quickly became obvious that it was easier to manually weave the disparate elements together myself.
Without any human touch, the memo would not have received a passing grade because it was too high-level and didn’t provide a literature review with proper citations. However, by stitching together all the pieces, the quality of work could have merited a solid B.
While Code Interpreter wasn’t capable of producing a passing grade independently, it's imperative to recognize the current capabilities of the tool. It adeptly performed statistical analysis using conversational language and it demonstrated the type of critical thinking skills I hope to see from my students by offering viable policy recommendations. As the field of generative AI continues to advance, it's merely a matter of time before these tools consistently deliver “A caliber” work.
How I’m Using Lessons Learned
Generative AI tools like the one I experimented with are available to my students, so I’m going to assume they’re using them for the assignments in my course. In light of this impending reality, it’s important for educators to adapt their teaching methods to incorporate the use of these tools into the learning process. Especially since it’s difficult if not impossible, given the current limitations of AI detectors, to distinguish AI- versus human-produced content. That’s why I’m committing to incorporating the exploration of generative AI tools into my courses, while maintaining my emphasis on critical thinking and problem-solving skills, which I believe will continue to be key to thriving in the workforce.
As I consider how to weave these tools into my curriculum, two pathways have emerged. I can support students in using AI to generate initial content, teaching them to review and enhance it with human input. This can be especially beneficial when students encounter writer's block, but may inadvertently stifle creativity. Conversely, I can support students in creating their original work and leveraging AI to enhance it after.
While I’m more drawn to the second approach, I recognize that both necessitate students to develop essential skills in writing, critical thinking and computational thinking to effectively collaborate with computers, which are core to the future of education and the workforce.
As an educator, I have a duty to remain informed about the latest developments in generative AI, not only to ensure learning is happening, but to stay on top of what tools exist, what benefits and limitations they have, and most importantly, how students might be using them.
However, it's also important to acknowledge that the quality of work produced by students now requires higher expectations and potential adjustments to grading practices. The baseline is no longer zero, it is AI. And the upper limit of what humans can achieve with these new capabilities remains an unknown frontier.