Do we undermine formative assessment by confusing learning and performance? Student performance during a lesson is “a highly imperfect index of long-term learning” (Soderstrom and Bjork, 2015), but it’s easy to assume that correct answers mean students have learned something – and will remember it. Formative assessment relies on being able to elicit evidence of student achievement and adapt our teaching accordingly. If we can’t rely on student answers in the lesson as evidence, does this undermine formative assessment?
Learning vs performance
Learning and performance are different:
- Learning is a permanent change in behaviour or knowledge which supports retention and transfer.
- Performance is a temporary fluctuation in behaviour and knowledge which can be observed and measured during and immediately after acquisition (Soderstrom and Bjork, 2015).
Measuring progress in lessons – students knowing more at the end than the beginning – means measuring a temporary fluctuation: it means measuring performance, not learning.
Strategies to increase performance can hinder learning; strategies that decrease performance can help students apply knowledge better and retain it longer. One study asked children to throw beanbags at a target:
- Group A threw from 3 feet
- Group B threw from 2 and 4 feet
After a delay, both groups were tested throwing from 3 feet. Even though Group A had practised from 3 feet, and Group B hadn’t, Group B did better: the variation in their practice made the initial task harder, but they learned more (Kerr and Booth, 1978 in Soderstrom and Bjork, 2015).
The same effect – worse performance, better learning – applies in the classroom. Logic suggests students should complete several similar maths problems at once, for example, allowing them to gain confidence and facility in the calculation. The principles of the beanbag study apply here too however; in one study, on finding the area of unusual shapes:
- Group A studied four problems at a time for each shape
- Group B studied sixteen problems in a random order
Group A performed far better during acquisition, but Group B learned more: one week later they answered three times more questions correctly than Group A (Rohrer and Taylor, 2007, in Brown et al., 2014). Varying practice increases ‘germane cognitive load’: it forces students to think harder in ways which help them develop schemas more rapidly (Sweller et al., 1998). There are a wide range of studies demonstrating this effect, reviewed by Soderstrom and Bjork (2015): what does this mean for assessment?
Formative assessment undermined?
David Didau has argued that the big idea of formative assessment is “fundamentally, and fatally, flawed”. Didau suggests that, because performance in the lesson is no guarantee of learning, we cannot meaningfully adapt our teaching in response to evidence of student achievement. He also claims that students are likely to be mimicking desired answers without knowledge (Didau and Rose, 2016)). He and Nick Rose therefore conclude that:
Testing should not be used primarily to assess the efficacy of your teaching and students’ learning; it should be used as a powerful tool in your pedagogical armoury to help them learn (Didau and Rose, 2016: 102).”
Didau and Rose are right that testing is a powerful tool to help students learn – but this does not undermine formative assessment, for three reasons:
- Correct answers during lessons may not indicate what students will remember, but incorrect answers certainly indicate what students won’t remember correctly. Formative assessment remains valuable, because we can identify student misconceptions in the moment, and adapt our teaching accordingly. (Didau makes exactly this point, in the same blog post and a few sentences away from his declaration that formative assessment is entirely fatally flawed).
- No teacher would assume a correct answer from a student today means it will be remembered tomorrow. Teachers do not need to have encountered Ebbinghaus’s forgetting curve to have experienced student forgetfulness.
We can assess what students have understood today without losing sight of the need to assess it again in future. As Soderstrom and Bjork (2015) put it, we must:
Distinguish, in some way, between the relatively permanent changes in behavior and knowledge that characterize long-term learning and transfer and the momentary changes in performance that occur during the acquisition of such behavior and knowledge.”
This neatly introduces the third point:
3. Formative assessment allows us to do measure learning as well as performance. Although Didau argues that formative assessment is “predicated on the assumption that you can assess what pupils have learned in an individual lesson”, this simply isn’t the case. Within a lesson, we can get a sense of how much students have acquired; at the end of a week, unit or term, we can identify how much they have learned, and adapt our teaching accordingly. We can test performance, we can test learning: in both cases we can adapt our teaching accordingly.
Conclusion
Didau’s arguments are a worthy reminder of the difference between learning and performance, but they do not undermine formative assessment. If we assumed performance and learning were identical, formative assessment would be weakened – although not undermined. But teachers are reminded every day that students forget things – even things they seemed to understand last lesson. It’s important we distinguish between learning and performance and plan for learning. It’s just as important that we assess how well our plans have worked and adapt our teaching accordingly: we cannot do without formative assessment.
Hi Harry,
Great post as always!
I think that another issue arising from the learning-performance distinction is that lots of teachers – especially those who are very skillful in the use of formative assessment in class – use this to keep a very tight control of what students are learning and where mistakes and misconceptions are being made.
This means that lessons appear to be “outstanding” because students are never permitted to struggle, as teachers pick up on and deal with this as soon as it occurs.
What I took from the Soderstrom paper is that, in fact, this can counterintuitively lead to a decrease in long-term learning.
“Given that the goal of instruction and practice— whether in the classroom or on the field – should be to facilitate learning, instructors and students need to appreciate the distinction between learning and performance and understand that expediting acquisition performance today does not necessarily translate into the type of learning that will be evident tomorrow. On the contrary, conditions that slow or induce more errors during instruction often lead to better long-term learning outcomes, and thus instructors and students, however disinclined to do so, should consider abandoning the path of least resistance with respect to their own teaching and study strategies.”
I think a problem with how some people use AfL is that they don’t ever really allow students to struggle. Lessons chip along at a nice pace, and from the outside appear to be such that every student is making huge amounts of progress, whereas in reality it’s all too easy for the students and long term retention is harmed.
Tricky one to get teachers and school leaders to appreciate, however…
Thanks,
Josh
This makes a lot of sense: while we pinpointing student misconceptions is critical, it’s important that we then know when not to intervene as well as when to intervene… Interesting point, thank you.
I’ve already conceded point 1 although I’d say that most student mistakes & misconceptions are entirely predictable and ought to be accounted for in a decent teaching sequence. If this comes as a surprise to teachers it could well be evidence of poor planning.
You’re entirely wrong about point 2. Very many teachers assume “a correct answer from a student today means it will be remembered tomorrow.” This was certainly true of me and it still comes as a surprise when I point it out to teachers now.
As for point 3, yes of course you can use formative assessment as retrieval practice. That’s exactly what we suggested in Psychbook. But you’re mistaken to think that this is anything but a minority practice. Huge numbers of teachers really are still assessing progress in single lesson.
I’d missed your concession on point 1 – although I’d add that I think student misconceptions are only predictable if we have experience teaching the course or we have access to a collection of student misconceptions; we should be doing more to create these misconceptions.
A few thoughts on point 2… the key point I was seeking to make was that the distinction between learning and performance does not invalidate AfL. You describe a problem with teachers’ understanding of learning, not a problem with the foundations of AfL. So while we’re agreed that there’s much good evidence about learning which is little known or applied in education, AfL stands.
My third point was not that formative assessment can be used as retrieval practice, but that we can use it to assess performance or learning, depending on when we use it. Its effects on memory are a welcome additional benefit.
1. AfL is flawed because it’s only useful if something else (poor planning of pedagogical content knowledge) has gone wrong. It’s sub-optimal. The problem is with the foundation *and* exacerbated by teachers’ misunderstandings.
2. Giving formative feedback is different – this can certainly be useful although there are pitfalls: http://www.learningspy.co.uk/learning/the-feedback-continuum/
3. We can’t use it assess learning; we can only use it to assess performance more or less reliably. I can make a better inference by waiting and assessing elsewhere but I will still be looking at proxies.
1) It is most useful if other things have gone wrong, but we are so far away from having excellent planning and pedagogical content knowledge on a national scale that it remains a priority.
3) Many of the kind of studies cited by Soderstrom and Bjork are happy to call it learning if knowledge is sustained a week later. The longer the interval the more secure the inference, but if I assess something a month or six months after teaching it, I am certainly no longer assessing performance – which, as Soderstrom and Bjork define it, is ‘during and immediately after acquisition’.
As a trainee teacher (Secondary Maths), according to what I observed so far, I totally agree with you: AfL is indeed flawed.
Saying we should prioritise AfL because other areas are weal seems profoundly mistaken. Better to put our energy into curriculum expertise and on understanding what assessment actually is.
Again, you’re running into slippery definitions. Performance a week from now may not be replicated in a month or year.
I agree that a system we should be putting our energy into curriculum expertise. But almost none of the infrastructure for this exists, like collections of misconceptions. And I can’t believe classroom teachers have the time needed to gain sufficient expertise in curriculum design to do this for every class they teach. Therefore, a priority remains ensuring students learn what we’re trying to teach them.
Performance a week from now may not be replicated in a month or a year, and that’s why formative assessment will continue to prove useful: I will still want to know how much students know in a month, or a year, and adapt accordingly.
Fascinating blog Harry (and discussion in the comments). I think David’s point about misconceptions and addressing them in the planning and explanation stage is a valid one but with an important caveat. Thinking as a Science teacher I always try to think about teaching concepts in a way that adresses common misconceptions. However, students have such differing amounts of prior (subject) knowledge that we cannot be sure how they will link up the knowledge in the lesson with their own existing schema. This means that there are more possible misconceptions or misunderstandings than can ever be planned for in advance. This is why I think formative assessment is so valuable.
Damian