Why do universities need to radically rethink exams in the age of artificial intelligence?

Since the launch of Chatbot ChatGPT in late 2022, educators have been grappling with how to harness AI to enhance learning while minimizing risks to educational outcomes and the fairness of assessments.

The use of artificial intelligence among students is now the norm. In February, a poll1 Of more than 1,000 full-time UK university students, they found that 92% use AI in some form, up from 66% in 2024. 88% of students reported that they rely on generative AI (a form of AI that can generate text, images and code from large data sets) to support their academic courses, compared to 53% in 2024.

As artificial intelligence continues to outperform humans in basic tasks such as reading comprehension and computer programming2Concerns have increased about its impact on learning and academic integrity. For example, the value of traditional essays and other writing assessments is increasingly questionable, given that artificial intelligence is now capable of producing writing that often exceeds the quality of most student work.

Other concerns include over-reliance on chatbots leading to superficial learning3Decreased opportunities for self-reflection4 Loss of student agency5where students become passive users of technology rather than active learners.

Universities have responded with tools to try to detect students’ use of generative AI. But these proved unreliable6. This has led to short-term fixes such as written assessments and “stress testing.”1 Replace them with oral tests, hand-written tests or reflective forms (portfolios and journals; see go.nature.com/43btcxf), as well as clearer guidance on when AI can and cannot be used. Although these measures help, their effectiveness is limited.

Instead, there is a need to fundamentally rethink learning and assessment. Here we highlight three promising screening approaches that adapt existing methods—such as conversation-based assessments—to the era of artificial intelligence. These strategies are intended to promote authentic intellectual development while ensuring that assessments accurately reflect students’ understanding and skills.

Use other types of assessment

One of the cornerstones of modern education is that “writing is thinking.”7. Writing is a non-linear process8 It requires real engagement, critical thinking and problem solving. All these activities stimulate human intellectual development.

When AI assists with or creates student texts, it becomes almost impossible to know how much of the final work reflects the student’s understanding and critical thinking (see go.nature.com/47tjv93). This uncertainty undermines the use of writing as evidence of learning.

Having the student and teacher follow a structured conversation is one way to enable critical thinking. For example, the Socratic questioning method is a form of disciplined inquiry that helps students work through complex ideas, question their assumptions, and judge the validity of information. In ancient Greece, the value placed on intellectual dialogue was so great that some philosophers of the time expressed concern that over-reliance on writing could impair human memory (see go.nature.com/43grxsp).

A large group of students taking an exam at desks facing away from the viewer.

Traditional exams can still have a place alongside AI-based assessment.Credit: Jorge Gil/Europa Press via Getty

A contemporary version of the discourse-based approach used in ancient Greece, known as conversation-based assessment, has been used for several decades in primary, secondary and tertiary education settings. For example, AutoTutor, developed at the University of Memphis, Tennessee, has been used to teach subjects such as Newtonian physics while improving skills in computer literacy and critical thinking.9. It engages students in natural language conversations and uses algorithmic techniques to measure their understanding – analyzing factors such as accuracy, word choice, and time taken to respond. However, these systems typically have limited conversational capabilities and still mostly rely on simple text analytics and detection of specific words and expressions.

This is where AI integration can be a game changer. AI can maintain an open and context-sensitive dialogue in a much more realistic way than current conversation-based assessment methods. AI tools can ask students follow-up questions, provide personalized hints and adapt to a student’s knowledge level in real time, providing flexible, personalized learning support. Questioning can span a broader spectrum than traditional speaking assessment systems, which are usually specialized in a particular area.

The critical opportunity for AI is not just to automate answering questions, but to enable students to learn through conversation with AI systems and use that dialogue as a form of assessment, making it a dynamic and personal process.

There are still challenges. First, AI systems will need to guide conversations in a balanced way, encouraging students to ask questions, explore topics that interest them and take an active role in their learning. At the same time, the dialogue must be structured enough so that the AI ​​system can collect meaningful evidence of the student’s understanding, such as how to solve a problem, explain a concept, or apply knowledge in context. Achieving this balance between open exploration and measurable evaluation remains a major research challenge.

Miscommunication is another concern – AI systems may misunderstand a student’s intent or provide inaccurate or misleading information. When this happens, students can have difficulty identifying the sources of their errors. The highly personal and open nature of AI-based learning and assessment will also make standardization difficult. Therefore, there will still be a place for traditional assessments, especially in the college admissions process, where consistency and fairness across large groups of students are a priority.

Evaluation on an ongoing basis

One critical issue with many of the proposed responses to students’ widespread adoption of AI is that although they are trying to protect academic integrity, they continue to operate within a high-stakes examination model. Even if the test is rephrased as a conversation, students still realize that the result carries significant weight. Students find high-stakes tests stressful and may perform poorly or be tempted to cheat. The key challenge then is to reduce the need for high-stakes tests in an AI-fueled age where cheating may become easier.

Continuous evaluation can be an effective alternative10. Replacing end-of-semester tests with a series of interconnected assessments that build a comprehensive picture of student learning is urgently needed in many academic fields. Continuous assessment is well established in medical education. For example, during clinical rotations, medical students are constantly evaluated by supervisors who monitor their clinical reasoning, teamwork skills, and communication with patients. This feedback, along with written reflections and peer assessments, creates a comprehensive picture of a student’s proficiency over time. However, such models remain rare in other disciplines, mainly because of the increased workload they place on teachers.

The increasing availability of AI-based systems makes continuous evaluation more feasible. Conversations between students and the AI ​​tool cannot be viewed as one-off exchanges, but as part of an ongoing learning process in which multiple low-stakes interactions gradually build a rich picture of a student’s progress10.

The key challenge is ensuring that AI systems can effectively track and analyze this learning progress. Existing general-purpose tools, such as ChatGPT, Gemini, and Copilot, are not designed for this purpose, and they do not analyze student responses over time to identify growth or persistent misconceptions. To truly support continuous assessment, there is an urgent need for learning-oriented AI platforms that can capture longitudinal data on student performance, provide useful insights into learning trajectories and integrate seamlessly into course and program design.

Leave a Comment