A college probably wouldn’t hire a teaching assistant who tends to lie to students about course content or deadlines. So despite the recent buzz about how new AI software like ChatGPT could serve as a helper in classes, there’s widespread concern about the tendency of the technology to simply make up facts.
Researchers at the Georgia Institute of Technology think they may have a way to keep the chatbots honest. And they’re testing the approach in three online courses this summer.
At stake is whether it is even possible to tame so-called “large language models” like ChatGPT, which are usually trained with information drawn from the internet and are designed to spit out answers that fit predictable patterns rather than hew strictly to reality.
“ChatGPT doesn’t care about facts, it just cares about what’s the next most-probable word in a string of words,” explains Sandeep Kakar, a research scientist at Georgia Tech. “It’s like a conceited human who will present a detailed lie with a straight face, and so it’s hard to detect. I call it a brat that’s not afraid to lie to impress the parents. It has problems saying, ‘I don’t know.’”
As a result, researchers and companies working to develop consumer products using these new AI bots, including in education, are searching for ways to keep them from unexpected bouts of fabrication.
“Everybody working with ChatGPT is trying to stop hallucinations,” Kakar adds, “but it is literally in the DNA of large language models.”
Georgia Tech happens to have an unusual ally in its quest to tame ChatGPT. The university has spent many years building its own AI chatbot that it uses as a teaching assistant, known as Jill Watson. This digital TA has gotten so good that in some cases online students can’t tell whether they’re getting answers from a human TA or from the bot.
But the latest versions of ChatGPT and rivals from other tech giants are even more powerful. So Ashok K. Goel, a professor of computer science and human-centered computing at the university leading the creation of Jill Watson, devised an unusual plan. He’s asking Jill Watson to serve as a kind of monitor or lifeguard to ChatGPT. Essentially, Jill Watson is fact-checking the work of its peer chatbot before sending results on to students.
“Jill Watson is the intermediary,” Goel tells EdSurge.
The plan is to train Jill Watson on the specific materials of any course it is being used for, by feeding in the text of lecture videos and slides, as well as the contents of the textbook. Then Jill Watson can either instruct ChatGPT on which part of the textbook to look at before sending an answer to a student, or it can fact-check the results that ChatGPT drew from the internet by using the textbook material as a source of truth. “It can do some verification,” is how Goel puts it.
Kakar says that having the bots working together may be the best way to keep them honest, since hallucinations may just be a permanent feature of large language models.
“I doubt we can change the DNA, but we can catch those errors coming out,” Kakar says. “It can detect when ‘this doesn’t smell right,’ and it can basically stop [wrong answers] from going forward.”
The experimental chatbot is in use this summer in three online courses — Introduction to Cognitive Science (taught by Goel), Human-Computer Interaction, and Knowledge-Based AI. Those courses enroll between 100 and 370 students each. Students can try the experimental chatbot TA in one of two ways: They can ask the chatbot questions on a public discussion board where everyone in the class can see the answers, or they can pose questions to the chatbot privately. Students have consented to let the researchers pore through all the results, including the private chats, to monitor the bots and try to make improvements.
How is it going?
Kakar admits it’s a work in progress. Just this week, for instance, researchers were testing the chatbot and it gave an answer that included “a beautiful citation of a book and a summary of it.” But there was one catch. The book it cited with such confidence doesn’t exist.
The chatbot did pass along the made-up answer, but Kakar says it also detected that something wasn’t quite right, so it attached a warning to the answer that said “I have low confidence in this answer.”
“We don’t want hallucinations to get through,” Kakar says, “but hopefully if they get through, there will be a low-confidence warning.”
Kakar says that in the vast majority of cases — more than 95 percent of the time so far in tests — the chatbot delivers accurate information. And students so far seem to like it — some have even asked the chatbot out for dinner. (To which it is programmed to deliver one of several snappy comebacks, including “I’d love to but I eat only bytes.”)
Still, it’s hard to imagine Georgia Tech, or any college, hiring a TA willing to make up books to cite, even if only occasionally.
“We are fighting for the last couple of percentage points,” says Kakar. “We want to make sure our accuracies are close to 99 percent.”
And Kakar admits the problem is so tough that he sometimes wakes up at 3 in the morning worrying if there’s some scenario he hasn’t planned for yet: “Imagine a student asking when is this assignment due, and ChatGPT makes up a date. That’s the kind of stuff we have to guard against, and that’s what we’re trying to do is basically build those guardrails.”
Goel hopes that the summer experiment goes well enough to move to more classes in the fall, and in more subject areas, including biology and economics.
So if these researchers can create this robot TA, what does that mean for the role of professors?
“Jill Watson is just a teaching assistant — it’s a mouthpiece for the professor, it is not the professor,” Kakar says. “Nothing changes in the role of the professor.”
He points out that everything that the chatbot is being trained with are materials that students have access to in other forms — like textbooks, slides and lecture videos. Also, these days, students can go on YouTube and get answers to just about anything on their own. But he says that earlier experiments with free or low-cost online courses have shown that students still need a human professor to keep them motivated and make the material current and relatable.
“Teaching assistants never replaced professors,” he says, “so why would Jill Watson replace professors?”