The Assessment System is Broken and we don’t seem to want to fix it

Our current assessment system is very much a case of shoving every peg through a round hole, regardless of the shape – do advances in technology, specifically the rise of AI offer a way to reform the system?

I was listening to Sunday’s Westminster Hour a couple of weeks back (I know, rock and roll) and one of the guests was the current headmaster of Epsom College and renowned educationalist Sir Anthony Seldon; he said “what we have in our schools is a massive emphasis and validation on exams; the very exams that are testing the skills that the algorithms with AI will be able to replicate and do far more successfully. What we haven’t got is the emphasis on social skills.”

In a recent Linkedin post Professor Ethan Mollick posted a graphic showing how ChatGPT 4 was now capable of passing a range of standardised exams at greater percentages than human candidates (https://www.linkedin.com/posts/emollick_in-every-group-i-speak-to-from-business-activity-7172084334709960704-fnsk?utm_source=share&utm_medium=member_desktop ).

I’ve encountered many voices that have spoken about this issue with greater eloquence and a wider knowledge than I have, particularly here in the educational blogosphere, but I think the argument bears repeating – If it can be automated should we assess it?

It seems that educationalists (and in this case a historian) can see the problem and it’s baffling that the education system seems completely unresponsive to the issue.

I’m not suggesting that this knowledge and these skills aren’t worth learning or practising, but I am suggesting that they are no longer effective ways of categorising a student’s abilities.

What is the issue?

We are currently assessing our students in ways that can be easily replicated by AI and on criteria that don’t demonstrate anything that is uniquely human. We are also testing in a way that excludes so many students with any form of neurodiversity – we aim for inclusive education, but not inclusive assessment. We do our best to support students with these assessments, but don’t change the assessment to support the students.

Okay, this is something of an overstatement – there are many systems out there that do assess creativity and critical thinking, but most written exams, certainly at the highschool level, don’t fall into this category and my own first subject, English, seems to be one of the worst offenders in this matter. We ask our students to look at interesting and stimulating texts and then require them to name the parts and give formulaic responses. I know they are formulaic as I literally teach formulas for question responses.

Practical subjects like Art, DT etc do assess qualities of creativity and seem to have more holistic methods, yet English, the subject that should be at the heart of creative expression, is reduced to being assessed along the lines of a set of simplistic formulaic responses that, truth be told, can be and often are taught parrot fashion and learned by rote.

It is, to say the least, frustrating. And I know it is not just my subject and not just at secondary level where this is an issue.

If the problem is so stark, why does it persist?

Because the exam system has always been this way?

Because all the other methods of assessment – written coursework, speaking and listening assessments etc – are ‘too easy’?

Because exams are the only fair way to test all students equally?

Because alternatives are costly, both in terms of resources and in terms of time?

None of these answers are satisfactory, although they all sit behind the reason for retaining a redundant mode of assessment.

Our system suffers from two distinct problems. One; it is a way to place value on a person’s abilities, but only within a very narrow range of contexts. Two; it is a way to measure and quantify and control the work of educators.

Education, or at least the assessment system, seems no longer to be a way to develop the skills and potentials of the individual, but a system designed to categorise individuals along very narrow, historically fixed lines that may have once served the needs of an industrial society. However, in the post industrial world these systems are inadequate. This is not to say schools and educators are not doing their level best to prepare students for the real world, it’s just that they are doing it despite the system, not because of it.

Beyond just measuring student abilities, the assessment system also serves another key purpose – acting as a mechanism for standardizing the work of educators and ensuring quality control. The issue of assessment as a way to standardise the work of and provide quality control over educators is a political one and not one that can easily be resolved, but as long as this is an end point in the assessment system then it will be a significant barrier to introducing meaningful reform.

What Now?

The question still remains – how long will the current system persist in the face of these changes and what will come next?

I don’t know if technology has the answer to this, but it has certainly posed the question. We need to address the elephant in the room – our current system values what it can measure and as a result we have placed value on the narrow range of things that can easily be measured, cagtegorised, ranked and ordered.

Here is the AI part – that is what we’re interested in after all.

I believe that personalised assistance will be one positive benefit of AI – the capacity to offer contextually relevant and adaptive assistance to students.

There are examples from the UAE: https://news.microsoft.com/source/europe/features/a-future-facing-minister-a-young-inventor-and-a-shared-vision-an-ai-tutor-for-every-student/

And from Singapore: https://www.todayonline.com/singapore/more-ai-schools-moe-success-maths-tool-pri5-2259496

Sal Khan has also been a well quoted proponent of the personalised learning capacities of AI: https://courier.unesco.org/en/articles/sal-khan-i-see-ai-additional-tool-very-powerful-one

Personalised and adaptive tutoring as an adjunct to the work of educators becomes a powerful tool for reinforcing the learning done in the classroom. It can never replace the face-to-face interactions of teacher and students, but to give opportunities for practice and revision – this could be a powerful lever.

It also provides opportunities to engage learners in material that is more directly relevant to their experience and the world in which they exist, rather than the world in which the education was conceived.

The issues with AI involvement in the assessment and delivery component are well known and well articulated;

Bias in the training data leading to reinforcement of current cultural biases – These biases exist in the same curriculums we currently deliver and as educators we need to highlight this and work to develop our students’ critical faculties so that they are aware of these complexities. I’m constantly puzzled at the bias argument as it is often stated that this is something unique to AI platforms – it is not. It is an issue inherent in all informational systems – throwing this up as reason not to trust AI is a reasonable caveat, but it is no more than the caveat we should be applying to books, academic journals or the internet as a source. Ryan Tannenbaum, writing on Linkedin has pointed out on more than one occasion that we should treat AI output as a form of media and evaluate it using the same critical lens we would other forms, and I completely agree with this assessment.

The fact the LLMs hallucinate – This is definitely a real problem, but as the models become more accurate the hallucinations are less prevalent. However, this is where we need to ensure that foundational knowledge is secure before going to any source and as with bias, being critical and sceptical of the source until we have triangulated and verified.

So Teaching can leverage AI, but what about assessment?

None of this changes the fundamental issue – if we only assess narrow criteria, even if learning is more powerful, more contextually relevant, more engaging, it is all for nothing if at the end we stick with these narrow tests that reward responses written by hand in timed conditions that conform to mechanistic mark schemes.

While AI may have a role to play in assisting assessment, there are still valid reasons for students to master foundational knowledge through repetitive practice. However we currently have the capacity to create a system where the foundational knowledge base is tested through consistent, low level, low stakes quiz style assessments, delivered, monitored and recorded by an AI system – Quizizz or Kahoot writ large. The power of gamifying interleaved, repetitive practice can remove the sting of end of unit tests and convert them into something fun and engaging. Here the build up of factual recall can be constantly monitored and the need for a regurgitative exam becomes unnecessary.

AI can’t replace humans in the assessment loop – this may be the dream of exam boards who are short of qualified staff or education ministers desperate to slash budgets and increase class sizes, but AI is not yet capable of and may never be capable of meaningfully assessing something like creativity, critical and evaluative thinking. LLMs remain stochastic text generators and whilst capable of responding and generating impressive and increasingly accurate and useful output, the level of judgment that they can make is restricted to structural accuracy and linguistic fluency – important elements yes, but not a complete snapshot of the deep learning, creativity and critical thinking that we need to be looking for in a truly holistic assessment model. However technology can automate some of the lower level, repetitive mechanisms of assessment and free up educators to focus on the more important and fundamentally human aspects of assessing learning.

Can AI give us more opportunities to explore a student’s true potential and their creative and critical processes?

What if, instead of the traditional mode of assessment where students respond to fixed questions instead we assessed their interactions with an adaptive AI? Imagine instead of assessing the answers we assessed the questions students asked and their exploration of a topic through the prompts that they input into an AI.

I’ve been using Mizou as a platform for developing AI chatbots to create interactive revision resources and some of the most interesting content I’ve seen has been in the dialogue windows where students engage with a chatbot that has been given a specific topic area and focus – for example post modernist concepts of the simulation and simulacra in the media. Here students explored a complex topic, were able to give specific and current examples and could interrogate the material in an interactive and personalised way. It was the interaction that was uniquely human and not the output summary of the content that the revision tool was used to create – I could have given the students this as a handout or they could simply have asked any one of a number of sources for this material.

While the logistics of standardising such interactive AI assessments would need to be worked out, this approach shows promise as a way to evaluate uniquely human exploration and inquiry.

There has also been a lot of discussion around the idea of portfolios of learning artifacts – gathering evidence of different skills and knowledge being applied in a variety of contexts. This is a more equitable way of assessing the individual, but there are logistical issues here that could easily give rise to a variety of issues, but not ones that are insurmountable – it works for the EQP and DofE as well as a variety of none examined components in current systems of assessment, it just becomes a question of scale.

AI and technology writ large may not have all of the answers – I never thought it did – but it has highlighted the weaknesses that we have all long known about and presents opportunities for us to change a system that is crying out for reform.

It’s no longer a question of if, but when we reform our assessment system and what should come next.

The AI English Teacher