How should we act when we’re not sure the problem is real? And how soon?

I keep hearing this concern. In the same week, I was asked “What if we draw incorrect inferences from a hinge question?” – by teachers – and “How can we be sure what a teacher needs to work on?” by heads of department (based on a brief observation). I get it: no one wants to make a mistake, waste students’ time, or misguide their colleagues.

But I think we’re fetishising familiar objects – the exam, the hour-long observation – and certainty itself. We’ve come to feel we need extensive evidence for a decision: a carefully-marked, written test; a structured, formal observation. We worry about acting based on a single question, or a five-minute drop-in. But responding to people’s needs must mean making decisions rapidly. We need to be confident acting before we’re certain: this post sets out why, and how.

Are we reporting, or deciding what do do next?

We can think of assessment as summative and formative, but every time I see this distinction reiterated, my eyes glaze over. A more helpful, and more interesting, distinction was articulated by Dylan Wiliam and Paul Black (1996), and revived by Daisy Christodoulou (2017). It separates assessing to create a shared meaning and assessing to create a consequence.

Assessing to create a shared meaning

When we assess to create shared meaning, we produce grades and results we agree on. Pretty much everyone agrees that an A at A level maths means you’re good at maths. Students (and parents) know it means they’ve done well. Universities and employers can use it to select promising candidates.

To achieve a shared meaning, inferences must be reliable: if Mike, Marnie and Mohammed all get an A, this should be because they have a similar degree of competence – not because Mike had a generous marker. And they must be valid: the questions they answer should cover the breadth of the A level curriculum (and mathematical competence). So we need carefully designed tests. We must ensure Mike, Marnie and Mohammed sit the exam in identical conditions. And we need examiner training and second marking. (Similarly, a shared meaning from a lesson observation requires a team of observers trained to work on the same rubric.) We can’t do this very often: it would cost a fortune, and the PE department would never get to use the hall.

Assessing to create a consequence

When we assess to create consequences we assess in order to respond. After Marnie has done her A levels, there’s not much I can do to help her – I can only improve my teaching next year. But after answering a hinge question, I can do something, straight away: perhaps just a two-minute explanation or activity. Because the stakes are lower, I don’t need as much evidence – or certainty – to act. Good questions and careful interpretation still matter. But if I’m saying Mike deserves only a B at A level (and should miss his university place), I’d better be certain. If I’m deciding to spend two minutes reviewing quadratic equations, I don’t need to be quite so sure.

Why not wait for greater certainty?

If we’re assessing to create a consequence, I suggest we need to act at the first twinge of suspicion, rather than waiting until we’re certain. We can stop the lesson and try again because a few students seem confused about the first question. Better this than waiting for ten wrong answers, or a full confession: “Sir, I wasn’t listening properly and nothing you said made sense.” Similarly, if it looks like a teacher is struggling with entry routines, we can do something about it – rather than waiting for another observation. There are three reasons why I think we should be more willing to act hastily (and less worried about being wrong):

  1. We can modify the next step if we realise we are wrong. We start explaining a misconception to students, check, and discover none of them hold it. Fine, two minutes lost, we move on. We suggest an action step to a teacher, visit their lesson a week later, and they execute it flawlessly – it was in their teaching repertoire all along. Fine, time for a different action step. Both scenarios are pretty unlikely. More importantly, at worst, we’ve revisited existing learning: we know retrieval and overpractice help. Most importantly, deciding exam grades and promotions rapidly is unlikely to be (or seem) fair. Deciding rapidly about next steps – when we may learn more, and change our mind two minutes later – is fine.
  2. Our next step tests our inference. I think students hold a specific misconception. I could wait and watch – for a long time – before acting. But I might end up none the wiser. If I respond immediately, I test my inference. By targeting the misconception, I can prove myself wrong more quickly.

More broadly, it’s worth recalling that we’re never actually certain. Having designed our entire school system around exams, we may forget that they’re fallible too. Yes, an exam tells us more about what students know than a hinge question. No, not every student gets the grade they deserve: an exam is accurate to within a grade; a student with a 4 at GCSE definitely deserves a 3, a 4 or a 5 (Ofqual, n.d., pp.15-24). Observation judgements are no different. As Rob Coe memorably put it: “If your lesson is judged ‘Outstanding’, do whatever you can to avoid getting a second opinion: three times out of four you would be downgraded. If your lesson is judged ‘Inadequate’ there is a 90% chance that a second observer would give a different rating.”

We’ll never be certain. We’ll learn more by acting quickly, and modifying our actions if necessary. So instead of looking for more evidence before deciding, perhaps we should looking to make more (low-stakes) decisions. Don’t worry so much about how accurate Mohammed’s predicted grade is. Worry more about how soon you can assess what help he needs right now.

More broadly: action under uncertainty is crucial

In writing this, I realised this argument connects to a much bigger worry I have. Across domains, expecting certainty is a pernicious brake on informed and intelligent action. Sure, let’s ask for the evidence, and weigh it carefully. But let’s stop allowing the demand for perfect studies, tailored to our context, to delay action until that conclusive evidence is produced, sometime around the end of the century.

This has come up a lot in the pandemic. Take rapid tests. We know they aren’t as accurate as PCR tests – that’s not why they’re useful. But this was seen as a black mark against them. Irene Bosch developed a rapid Covid test, applying for (US) authorisation for it in March 2020. The CDC never authorised it… because it wasn’t as accurate as a PCR. Even though hardly anyone could access a PCR. In the UK, ‘experts’ did their best to spike the introduction of rapid tests in Liverpool, claiming they would do “more harm than good” and were “putting people at risk.” Rapid tests cut hospital stays in Liverpool by a third and allowed key workers to keep working.

The critics missed two things. First, the value of speed. I’ve never had a PCR test result back in under 24 hours – sometimes it’s been three or four days. Better to know I’m likely to be infectious now, than to be certain I was infectious three days ago. Second, frequency. It’s affordable, feasible, convenient, easy, to use a rapid test every day. It’s impossible to test at this rate with PCRs. As we collectively realised over Christmas, fast, frequent, imperfect easy data permits better judgement than data which is slow, infrequent, hard to access – but certain. (Alex Tabarrok set this out a year and a half ago.)

It sounds very clever and very rigorous and very noble to argue “We can’t know that this principle applies because we don’t have an RCT done in a context just like ours.” But delaying action while we wait for certainty is frequently harmful. We may never have that RCT. Currently, you can count the number of randomised cognitive science experiments in history classrooms on no fingers. We have to extrapolate: what’s true of the human brain learning English is true of the human brain learning history. If we delay, we’ll learn nothing. If we act, we can learn, and we can modify.

Conclusion

The quest for certainty is flawed. Effective decision-making is uncertain, humble but rapid.

Asked how his decision-making had changed, Patrick Collison (co-founder of Stripe) said:

I now just place more value on decision speed. If you can make twice as many decisions at half the precision, that’s actually often better.”

Spending more time deliberating yields diminishing returns. Instead, we should:

Make more decisions with less confidence but in significantly less time. And just recognize that in most cases, you can course correct and treat fast decisions as a kind of asset and capability in their own (p.14).”

Consider this in the classroom. Would you rather find out – for sure – what students learned once a fortnight? Or have a pretty good guess every week? Would you rather give teachers feedback based on an hour-long observation once a term? Or four times a term, based on a fifteen-minute observation? Would you rather make a best guess about your teacher development programme now, or wait five years for the next (inconclusive) systematic review.

It’s time to embrace uncertainty. Responsive teaching (and coaching, and leadership) is fast. This means the boldness to act urgently, the humility to recognise we may be wrong, and the willingness to adapt our actions as we learn more. Act now, check later.

If you like this, you might appreciate

  • My piece on the false certainty to which school assessment systems pretend(ed).
  • My suggestion that instructional coaching changes school culture, by encouraging frequent, low-stakes observations.
  • Daisy Christodoulou on the problems of pursuing certainty with VAR (see also Tom Chivers)
  • Scott Alexander on pretended certainty in science journalism

References

Christodoulou, D. (2017) Making Good Progress: The Future of Assessment for Learning. Oxford, OUP.

Ofqual (n.d.) Marking consistency metrics: an update.

Wiliam, D., Black, P. (1996) Meanings and Consequences: A Basis for Distinguishing Formative and Summative Functions of Assessment? British Educational Research Journal 22(5) 537-548.