Moneyball for schools: can we use data like the Oakland A’s?

The Oakland A’s were a fairly successful baseball team facing a problem: a budget half that of their top rivals.  In Moneyball, Michael Lewis explained their response: exploiting market inefficiencies which left great baseball players undervalued.  Other analysts used statistics reflecting dramatic but unimportant aspects of the game; baseball scouts focused more on players’ looks than their abilities.  Smart buying allowed the A’s to recruit fantastic players who had gone unrecognised by richer teams.  The A’s achieved impressive winning streaks against far richer sides: well used, knowledge – data and statistics – is power.

Moneyballsbn

In my career so far I’ve moved from outright suspicion of ‘data’ to a recognition of its usefulness – under certain circumstances, interpreted carefully.  Moneyball reminded me of the limitations of relying on instinct, experience and conventional wisdom; one passage discussing Bill James, the first person to collect baseball statistics identifying effective players, led me to rethink the role of data in schools:

What James’s wider audience had failed to understand was that the statistics were beside the point.  The point was understanding; the point was to make life on earth just a bit more intelligible; and that point, somehow, had been lost.  ‘I wonder,’ James wrote, ‘if we haven’t become so numbed by all these numbers that we are no longer capable of truly assimilating any knowledge which might result from them.”

None of the data I reported as a teacher made my classroom more intelligible; more often, it obscured reality.  The principle – close examination of student progress to better support those most in need – is great.  However, schools have wound up measuring the wrong things, in the wrong ways.  Baseball insiders incorrectly valued bunts and steals over less showy but more effective tactics, like walks.  Likewise, schools fetishise shallow, impermanent progress over deeper knowledge and understanding.

Two massive problems

Levels are broken.  Reiterating this may seem pointless given their ‘abolition,’ but of six secondary schools I’ve visited in the last month, only one wasn’t planning to continue using them.  In history, focusing on skills assumes that successfully explaining the causes of the First World War automatically confers the same ability for the Russian Revolution (and so encourages the prioritisation of flashy turns of phrase above deeper understanding).  Reporting an overall level for history gives the impression that a student’s ‘6b’ essay on the causes of the First World War allows them to evaluate interpretations to at least a ‘6b’ as well (based on the false assumption that progress is linear).  The gaps between levels aren’t equal; the descriptors could arguably be reordered!  In sum – stating that a student has reached ‘level 4b’ in history is meaningless (without extensive caveats and explanation); using movement between levels to measure progress is entirely fallacious.

This wouldn’t be so bad, if it weren’t for another problem:

We still carry the historical baggage of a Platonic heritage that seeks sharp essences and definite boundaries…  This… leads us to view statistical measures of central tendency wrongly, indeed opposite to the appropriate interpretation in our actual world of variation, shadings, and continua. In short, we view means and medians as the hard “realities,” and the variation that permits their calculation as a set of transient and imperfect measurements of this hidden essence. Stephen Jay Gould, The Median Isn’t the Message

Schools fall into the trap Gould explains: they take rough, average, best guess data, then treat it as fact, rather than a reflection of a messier reality.  I once found myself on the hit list because my Year 8 class was the worst performing (of ten).  Reading more closely, I was relieved to find this reflected ‘under-performance’ by only two students.  Looking more closely still, my worst-performing group comprised two students: one had been ill for most of the term; I had never met the other (a ‘persistent absentee’).  This is a petty example of the overall problem: when the data is aggregated, rough guesses are decontextualised and used to intervene – all the caveats about the inaccuracies of data are lost.

So how should we use data in schools?

I’ve fought the corner I’ve outlined in the previous two paragraphs for ages (to no effect).  What Moneyball reminded me was how powerful data can be: data ‘denial’ and relying on gut instinct alone is no solution.  The quotation above led me to wonder how we could use data to make our classrooms “just a bit more intelligible:”

1) Abolish whole-school data reporting on anything less than an annual basis.

Whether from teachers to school leaders, from school to parent, you name it, get rid of it.  It is only on an annual basis that we can conduct assessments thorough enough to provide valid inferences about students’ overall understanding of a subject, and where we can conduct sufficiently thorough moderation.  (If you doubt the former point, ask language teachers trying to mark assessments of speaking, reading, writing and listening on a termly basis).  Valid and reliable data would provide excellent justification for significant interventions for students.  This is necessary to create the time to pursue point 2 properly:

2) Devolve regular assessment to teachers and departments

I’m not arguing for less assessment, I’m arguing for frequent, useful assessment.    Levels do not help a teacher or a head of department (if they did, it would be possible to explain how to support a student at ‘level 5c’ in history without any further information).

What teachers need to assess on a regular basis is students’ knowledge of individual concepts and ideas, and their capacity to use that knowledge; this kind of analysis must happen at a departmental level.  Question-level analysis in departments provides usable insights: if 80% of students in Class A answered a question about the Blitz well, but only 40% in Class B did so, it seems highly likely that the teacher of Class A has done something her colleague would benefit from learning about.  Departments can create short-term solutions (Teacher Y spends a few minutes reteaching the Blitz using Teacher X’s approach) alongside longer-term ones (reconsidering the unit plans). ‘Intervention’ would be more frequent, and more useful than on a whole-school level.

Is this really original?

No…

No, because there are many teachers doing great question-level analysis already.  These approaches are often based on variations of hinge questions, sometimes combined with apps like QuickKey, Plicker or Kahoot; with them teachers like John Tomsett, Damian Benney and Kristian Still have found exciting ways to pinpoint what students have understood and where they need help.

No, because the idea of moving to less frequent summative assessment is not new having been articulated by, among others, Dylan Wiliam.

Yes…

Yes, because I don’t know any schools which have done this (although they’re out there, no doubt).

Yes (to me), because it was only with the flash of insight reading Moneyball offered that I realised that data is not the problem; the problem is over-simplified, time-consuming junk data masquerading as assessment, which (for me at least) crowded out the time I would have preferred to have spent focused on exactly what students needed me to change in my teaching.

But how will Ofsted evaluate progress?

Apart from being the worst question in the business, if Ofsted can’t deal with the complexity of question-level data, more closely representing reality, that departments should be working with this, this surely reinforces Michael Fordham’s case that inspectors must either be subject specialists or stay out of evaluating the quality of teaching and learning.


I wrote recently about measuring what matters, from a very different angle.

It’s not often I read sports literature, let alone recommend it: this is a worthy exception – Moneyball: The Art of Winning an Unfair Game

23 responses to “Moneyball for schools: can we use data like the Oakland A’s?

  1. Yes! This!
    This is an issue I keep coming back to: we’ve got caught up confusing *assessment* with *tracking*. The latter is important for leaders (and annually should be sufficient); the former is what matters for teachers, students, parents and classrooms.
    And Ofsted should be able to deal with both! BUT… the reality is that Ofsted are still too often expecting six-weekly tracking trawls which leaves less time for the important business of assessment.

  2. As often, a great piece of analysis. The film, MoneyBall, was good, too. As you say, none of this is really new, and, ironically, our gut instinct would tell us that some data will be useful, and some not. Presently, data generation is driven by the nonsense accountability system in place. First, design the data system needed to inform teaching, then design the accountability system – where data will probably not feature large. Time for Change (obviously).

  3. Harry, despite your visits implying the contrary, have you come across any useful post-levels models which marry the related but different imperatives of assessment and tracking?
    All of this has worrying echoes of the to-grade-or-not-to-grade-lessons debate, which I’m currently rehearsing at my school.Teachers are sure they want feedback that develops them (formative assessment) while the Head still wants a spreadsheet she can use to track them.

    • Not yet. At the moment I’m trying to get an overview (for colleagues) of what systems schools are using/planning. Primarily I’m coming across either lack of change or quite a lot of uncertainty (I’m usually talking to classroom teachers not SLT/data managers – although they may well be uncertain too).

      If I come across anything new that looks worthwhile I’ll certainly try to understand how it works and can let you know.

  4. Excellent summary. It was a both interesting and slightly depressing feature of BETT the number of times I saw banners promising school the ‘solution to life beyond levels’. They always came with a spreadsheet.
    I think one of the key things will be for schools, and crucially subject disciplines to work out what attainment and progress actually look like in their subject.
    The one other element here is consider what, how and how often we report to parents. It may be that to be effective and time efficient we develop a short hand in schools to understand and record students learning. We have to be very careful about how we translate this into ‘human’ for our parents, without resorting to simple league tables of in-class performance (which is what so many seem to want)

    • Thanks for the comment Dave.
      Allowing/encouraging subjects to create models/assessments which reflect how students learn their subjects is absolutely crucial.
      Reporting to parents is a tricky aspect – and the idea of ‘humanising’ it is extremely important. If that’s our aim, I can’t see a way around some/all teachers writing something personalised (supplemented with parents’ evenings). I think most parents would prefer this – if done properly. My view is that only a minority want a league table – some of them would be happy to have this prefaced by knowing whether their child is ‘where we’d expect them to be’ (or, less attractively – where they stand in the class).

  5. I’m really glad that someone has finally written about Moneyball – it’s such a fascinating book (and a pretty good film too) – and I think you’ve nailed it here, Harry… it’s not statistics which are the problem: it’s the overextrapolation of and borderline fanaticism in very rough and imperfect snapshot data to make long term judgements. The challenge for teachers and schools is to focus on the right kind of measurements. My main interest in the short term is getting much more focused on measuring students’ habits and dispositions – this shows their direction of travel much better than a snapshot can… @thinklish PS Liked your piece on gambling too – familiar!

    • This is a fascinating question. I think there’s beginning to be some interesting work out there that reliably measures dispositions (alongside some more questionable stuff)… and I’d agree that changing students’ direction of travel can be what creates the real, long-term impact.

  6. This is going to be along reply, so it did cross my mind to post it on my own blog on the grounds that I should get the hits. Then I thought, Harrys a nice guy, he can have both clicks.

    My view is that whilst you have correctly diagnosed the problem – that is much of the data collected in schools is of variable quality and is often used for different reasons than it is collected – I would say that (my reading of) the suggested cure is in some ways almost the opposite of what is required.

    On the first point of abolishing whole-school data reporting on a less than annual basis, I am unsure here if you mean just the reporting, or the collection. You suggest that only by using “assessments thorough enough to provide valid inferences about students’ overall understanding of a subject, and where we can pursue reliably through moderation” can justify significant interventions for students. I’m assuming that by significant interventions you mean ones that require extra-departmental mandate?

    The issue that I have with this is two fold. The main one is that assessing interventions on an annual basis strikes me as being less than is required. It assumes that the interventions would be either short-term, work and everything could go back to the main track, or they would be year long interventions and the annualised testing would establish their efficacy. The second issue I have with this is that I disagree that only annual assessments can be thorough enough. All subjects (tho’ I agree some more than others) have discreet areas of knowledge that can be assessed and which can provide useful indicators as to the progress of the student. I would like to see a lot more work on establishing a shared understanding of these “assessable chunks” (inventing terminology as I go along here). You do nod to this in your second point.

    Of course, if you are only referring to the reporting of data on an annualised basis then I sort of agree with you. Except that I would say that all data should be open and accessible by all those that have a legitimate interest (which includes parents) and that there should (possibly) be no annualised reporting at all (more about that idea here – https://cogitateit.wordpress.com/2014/10/28/standin-in-line-marking-time/ ).

    On the second point about devolving regular assessment to teachers and departments. Well, here I would argue that this is predominantly the case in most schools already and is the main reason that we end up with a significant amount of useless data. So, sticking my head way above the parapet, I would say that as a profession teachers are less statistically knowledgeable than they need to be. Actually, that’s not controversial; it’s a statement of the obvious. I say this in the knowledge that there are many data experts out there in schools. But too many schools fall into the trap of equating ability to use someone else’s Excel spreadsheet with statistical capability. To digress a little, I would (sometimes) go as far as to say that the use of Excel as the key data analysis tool should be an indicator to Ofsted that leadership should be graded RI.

    But back to the point.

    If we devolve responsibility to departments to collect what data, when they want to collect it, then I would say that we will end up with data that is even more unreliable and more invalid than that we currently have. And also (in my experience) we end up with less data. We are then also looking to have 30,000 data experts across the secondary system, a number we are unlikely to reach in the short or medium term. We will also end up with an infinite number of spreadsheets, a situation no sane person would aspire to.

    Visibility (and consistency) of data at a whole school level, which requires professional data collection process and systems, helps to overcome some of these issues. This does not mean that the data should not be used at departmental level. On the contrary, use of good data at the departmental and teacher level is one of the levers to school improvement that few have thus far found the way to pull. It is an essential next step to better schools. On that point I think we are in agreement.

    We need better data collection systems, better understanding of statistic and data among all teachers, and we need an appropriately qualified data manager in every school.
    Mike

    • Mike, as ever your kindness is only exceeded by your charm (and generosity regarding clicks). Thanks for taking the time and thought to give such a thorough comment. I did wonder if I’d been sufficiently explicit about my thinking on exactly who might intervene where – but didn’t expect sufficient interest for it to matter.

      By significant interventions, I’m talking about anything that requires additional structure/resources/timetable changes. In my view, most effective interventions happen within lessons or over a handful of lessons: teachers identify a misconception, knowledge or skill gap, and close it. If this were done well, for the majority of students no further intervention would be needed.

      In terms of ‘reporting’ – I’m talking about ‘reporting both by teachers to the leadership team and the school to parents. I don’t feel the data created is sufficiently comparable to be comprehensible at a school level. So, rather than waste energy on such comparison (why is Jimmy a 6b in French but only a 5c in Maths), I’d argue we’re better doing away with it, rather than conning ourselves/others.

      You’re correct in diagnosing my first assumption – but I’m not entirely clear why you disagree with it from your comment. Regarding your second issue, I think we agree, save on the issue of when we can turn that into a report on a student’s progress. Certainly we can track and identify students’ acquisition of discrete content areas (and, subsequently, their retention of the same). My problem is that acquiring knowledge of a topic (which can be tested often) is too often assumed to represent general progress in a subject… a false assumption. But I agree, much more work on this is needed (not necessarily by schools alone – banks of multiple choice questions, for example, would be a great step forward).

      I could not agree more strongly that better statistical literacy would benefit anyone and everyone involved in schools (or anything). To me, misunderstanding of assessments is part of the reason why we have such poor data – but I also think that much of this is attributable to a very poorly designed system. I think we’re agreed on the power and need for strong evaluation of the data we collect- but it’s redesigning what we collect which, to me, is the crux of the change.

      • Cheers Harry.

        I think the certainty is that we agree on more than we disagree on this, and we disagree on the nuance rather than the substantive points.

        Your last point (in your reply) is the important one – we collect what we collect in the way that we collect it because that’s how we started doing it over 20 years ago. We could do so, so much more now if we redesigned data collection from the bottom up. But that bit I am keeping for my blog😉

    • I agree with the majority of both the main article and the comment above. I believe that if schools are pursuing a cross-departmental strategy of providing intelligent interventions to students then the in-year data collected which may inform where intervention is placed needs to carry a degree of consistency. However I concede that perhaps the whole intervention thing could be done differently (devolved) or not at all.

      I too am a big fan of question level analysis – in fact I provide a free tool via my blog that can assist for departments who are thinking of exploring this route in greater detail.

      I am also a big fan of Excel and the flexibility it can bring if harnessed appropriately. Excel is the only analysis tool used in my school and I see this as a positive rather than a negative. It means we can adapt quickly to the changing face of education, I place great value in keeping that flexibility and not being restricted by more formal systems.

      https://dataeducator.wordpress.com/the-way-forward/

  7. Great blog. I’ve read your other blogs on assessment too (which I agree with). Having thought, and read about it, over the past few months, I actually think there’s a way of reconciling authentic assessment, regular data tracking and intervention (thereby keeping everyone happy) but it does involve moving away from levels and focusing on assessment as a process of ‘understanding’ – I love this point in your blog. Working title for it is ‘Ronseal Assessment.’

  8. Very interesting and worthwhile read. There’s much that chimes with my views both in the blog and the comments below – particularly the idea of assessment being a means of understanding your pupils’ thinking and also the criticism of the current practise of over-extrapolating meaning from levels (which were first intended to be used as end of key stage descriptors – or a progression model – but were instead turned into a marking proforma).

    I am a History PGCE student and have begun to see and use marking and assessment as a way of getting to know my pupils and forming a dialogue with them. I am currently marking my Year 9 class’s assessments on the causes of WWII and it has provided me a fascinating insight into their causal thinking and has also made clear the ways in which I have taught them well and the areas I should have spent more time on.

    Data does not just mean numbers. It includes the messy markbooks, marking of pupils’ books, reflections on their thinking based on what they have said in class discussions. It is vital but only when used in the right way and for the right purposes.

    This post is spot on.

    • That sounds like a very promising use of assessments and marking. I think your final point is key – taking us back to an older meaning of the word ‘data’ as information in all its forms.

      Thanks for the comment.

  9. Thanks for the post – am very much with you on the “there’s stuff we teachers can learn from Moneyball” trip and I wholeheartedly support your idea of bottom-up departmental level approach.

    The main lesson I took away from the book was that Billy Bean was brilliant at using data to help him work out:
    a) where he was wrong &
    b) where his prejudices (small p) were hindering him.

    The talent-scouts hiring by going on “feel” and being able to see a top player through years of experience are not a million miles off many teachers I have spoken to. Many have a view of “what a good student looks like”, but this ranges from “neat and always underlines” to searingly good questions”. There’s nothing wrong with qualitative data, and I don’t think we should boot teachers’ experience out of consideration, but they are hugely prone to bias.

    The main use of data, for me, is to counter that bias. Not has a hard and fast – “the data is always right” – but as a “here are some indications we may not be getting it right, what if anything should we change?” Ironically, I think most teachers do that naturally in a pastoral manner. “Sue seems down today, any one know what is wrong?”

    Changing the culture of a school to one that uses academic data to spot where it’s getting things wrong is difficult, though, in that it’s often intertwined with accountability. A top-down “this is how we’re going to use data and we want you to show us where you’re getting it wrong” is doomed to inertia, at best.

    Bottom-up departmental level data seems a good way to start seeding the culture of data as a friend.

  10. Pingback: From junk data to quality data: assessment at Michaela | Bodil's blog·

  11. As @jack_marwood recommended, The Signal and the Noise is also an excellent read on the problems with data (Nate Silver started off in baseball stats and the Oakland As feature). My experience matches yours – I know very few schools that are even really talking about moving away from levels. I can’t help thinking that both your post, and Tom Sherrington’s a while back (think it was his one from #TLT14) remind me of the start of my teaching career when we did one big lot of school exams each year and that was how we tracked and reported achievement. I’m not suggesting that this was anything other than very crude but annual reporting is an old wheel (that perhaps needs dragging out of the brambles). More recently in the college I worked at, we used mastery-style topic tests every few weeks and that gave us great information on progress and learning in a way that meant we could identify problems. Combine that with other formative assessment and I think we usually had a pretty good idea of students’ learning (before I sound too conceited it took a lot of false starts before nailing good tests, and identifying the problems certainly wasn’t the same as fixing them). SLT expected us to know where our students were at but there was a lot of flexibility about how we did it. Of course, in 6th Form every year is an exam year – and I left whilst every semester was still an ‘exam semester’ – so that’s different from trying to spot a problem in Y8 that’s going to affect performance in Y11 but it’s an SLT obsession with data, built on harsh accountability, that has created the problem.
    Best wishes

  12. Pingback: Just say no to junk data: Assessment at Michaela | The Echo Chamber·

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s