The Oakland A’s were a fairly successful baseball team facing a problem: a budget half that of their top rivals.  In Moneyball, Michael Lewis explained their response: exploiting market inefficiencies which left great baseball players undervalued.  Other analysts used statistics reflecting dramatic but unimportant aspects of the game; baseball scouts focused more on players’ looks than their abilities.  Smart buying allowed the A’s to recruit fantastic players who had gone unrecognised by richer teams.  The A’s achieved impressive winning streaks against far richer sides: well used, knowledge – data and statistics – is power.

In my career so far I’ve moved from outright suspicion of ‘data’ to a recognition of its usefulness – under certain circumstances, interpreted carefully.  Moneyball reminded me of the limitations of relying on instinct, experience and conventional wisdom; one passage discussing Bill James, the first person to collect baseball statistics identifying effective players, led me to rethink the role of data in schools:

What James’s wider audience had failed to understand was that the statistics were beside the point.  The point was understanding; the point was to make life on earth just a bit more intelligible; and that point, somehow, had been lost.  ‘I wonder,’ James wrote, ‘if we haven’t become so numbed by all these numbers that we are no longer capable of truly assimilating any knowledge which might result from them.”

None of the data I reported as a teacher made my classroom more intelligible; more often, it obscured reality.  The principle – close examination of student progress to better support those most in need – is great.  However, schools have wound up measuring the wrong things, in the wrong ways.  Baseball insiders incorrectly valued bunts and steals over less showy but more effective tactics, like walks.  Likewise, schools fetishise shallow, impermanent progress over deeper knowledge and understanding.

Two massive problems

Levels are broken.  Reiterating this may seem pointless given their ‘abolition,’ but of six secondary schools I’ve visited in the last month, only one wasn’t planning to continue using them.  In history, focusing on skills assumes that successfully explaining the causes of the First World War automatically confers the same ability for the Russian Revolution (and so encourages the prioritisation of flashy turns of phrase above deeper understanding).  Reporting an overall level for history gives the impression that a student’s ‘6b’ essay on the causes of the First World War allows them to evaluate interpretations to at least a ‘6b’ as well (based on the false assumption that progress is linear).  The gaps between levels aren’t equal; the descriptors could arguably be reordered!  In sum – stating that a student has reached ‘level 4b’ in history is meaningless (without extensive caveats and explanation); using movement between levels to measure progress is entirely fallacious.

This wouldn’t be so bad, if it weren’t for another problem:

We still carry the historical baggage of a Platonic heritage that seeks sharp essences and definite boundaries…  This… leads us to view statistical measures of central tendency wrongly, indeed opposite to the appropriate interpretation in our actual world of variation, shadings, and continua. In short, we view means and medians as the hard “realities,” and the variation that permits their calculation as a set of transient and imperfect measurements of this hidden essence. Stephen Jay Gould, The Median Isn’t the Message

Schools fall into the trap Gould explains: they take rough, average, best guess data, then treat it as fact, rather than a reflection of a messier reality.  I once found myself on the hit list because my Year 8 class was the worst performing (of ten).  Reading more closely, I was relieved to find this reflected ‘under-performance’ by only two students.  Looking more closely still, my worst-performing group comprised two students: one had been ill for most of the term; I had never met the other (a ‘persistent absentee’).  This is a petty example of the overall problem: when the data is aggregated, rough guesses are decontextualised and used to intervene – all the caveats about the inaccuracies of data are lost.

So how should we use data in schools?

I’ve fought the corner I’ve outlined in the previous two paragraphs for ages (to no effect).  What Moneyball reminded me was how powerful data can be: data ‘denial’ and relying on gut instinct alone is no solution.  The quotation above led me to wonder how we could use data to make our classrooms “just a bit more intelligible:”

1) Abolish whole-school data reporting on anything less than an annual basis.

Whether from teachers to school leaders, from school to parent, you name it, get rid of it.  It is only on an annual basis that we can conduct assessments thorough enough to provide valid inferences about students’ overall understanding of a subject, and where we can conduct sufficiently thorough moderation.  (If you doubt the former point, ask language teachers trying to mark assessments of speaking, reading, writing and listening on a termly basis).  Valid and reliable data would provide excellent justification for significant interventions for students.  This is necessary to create the time to pursue point 2 properly:

2) Devolve regular assessment to teachers and departments

I’m not arguing for less assessment, I’m arguing for frequent, useful assessment.    Levels do not help a teacher or a head of department (if they did, it would be possible to explain how to support a student at ‘level 5c’ in history without any further information).

What teachers need to assess on a regular basis is students’ knowledge of individual concepts and ideas, and their capacity to use that knowledge; this kind of analysis must happen at a departmental level.  Question-level analysis in departments provides usable insights: if 80% of students in Class A answered a question about the Blitz well, but only 40% in Class B did so, it seems highly likely that the teacher of Class A has done something her colleague would benefit from learning about.  Departments can create short-term solutions (Teacher Y spends a few minutes reteaching the Blitz using Teacher X’s approach) alongside longer-term ones (reconsidering the unit plans). ‘Intervention’ would be more frequent, and more useful than on a whole-school level.

Is this really original?

No…

No, because there are many teachers doing great question-level analysis already.  These approaches are often based on variations of hinge questions, sometimes combined with apps like QuickKey, Plicker or Kahoot; with them teachers like John Tomsett, Damian Benney and Kristian Still have found exciting ways to pinpoint what students have understood and where they need help.

No, because the idea of moving to less frequent summative assessment is not new having been articulated by, among others, Dylan Wiliam.

Yes…

Yes, because I don’t know any schools which have done this (although they’re out there, no doubt).

Yes (to me), because it was only with the flash of insight reading Moneyball offered that I realised that data is not the problem; the problem is over-simplified, time-consuming junk data masquerading as assessment, which (for me at least) crowded out the time I would have preferred to have spent focused on exactly what students needed me to change in my teaching.

But how will Ofsted evaluate progress?

Apart from being the worst question in the business, if Ofsted can’t deal with the complexity of question-level data, more closely representing reality, that departments should be working with this, this surely reinforces Michael Fordham’s case that inspectors must either be subject specialists or stay out of evaluating the quality of teaching and learning.


I wrote recently about measuring what matters, from a very different angle.

It’s not often I read sports literature, let alone recommend it: this is a worthy exception – Moneyball: The Art of Winning an Unfair Game