August 14, 2020.
The lack of merit in A-level grading shows a misplaced belief in the power of statistical modelling
The famous expression, “lies, damned lies, and statistics,” is extended by statistical modelling to take in the future as well as the past, whether to predict it, or, in the case of Ofqual, to dictate it. Those responsible, notably the Chair and Chief Regulator, are statisticians rather than specialists in education. They have educational advice, some of it excellent, but their main interest, as Roger Taylor, its Chair, put it, is “How do you measure stuff?” When the “stuff” is the standards reached by children in school, and when evidence, such as analysis of the demands of questions in maths exams, or of the language used by candidates in English answers, shows that standards have fallen, there is a clear need to improve the way they are measured, and to apply a brake.
Examination reform has addressed the first part of the problem by cutting out the corruption that had grown up around school-based assessments and coursework. Shortly before its abolition, I had six clear examples in GCSE German alone, including teachers doing the coursework instead of pupils, and a grammar school head who destroyed the integrity of his staff by refusing to accept any grade below a pupil’s target, usually an A. Honest people suffered, and a very senior Ofsted official – not an HMCI – told me it was so widespread that Ofsted could do nothing about it.
The brake is another matter. Before 1993, A level grading was simple. The top five per cent of entrants got an A, for example, and other grades were also decided on percentages of entry, with sample papers kept to ensure comparability between years. A major change in 1993 doubled the proportion of A grades and destroyed the comparison papers, an act of vandalism without parallel in educational record-keeping. It was perhaps no accident that this coincided with the redesignation of polytechnics are universities, setting the scene for Blair’s disastrous target of sending half the population to university at the expense of lifelong debt. Michelle Donelan’s recent comments on this issue are a welcome sign that the tide has turned. Her point that every student with the right grades should be able to obtain the right place is a matter of principle. But what are the right grades, and how do we ensure that students receive them?
Yesterday’s report on A levels shows record numbers of A* and A grades so claims that the government has set out deliberately to penalise schools and teachers are clearly untrue. Donelan pointed out that more candidates got into their first choice of university this year than last.
Nevertheless, schools and candidates cannot be confident about having been graded on their merits. One successful headteacher described the results as “a dog’s dinner”, blaming Ofqual for a statistical straitjacket that did not discriminate between subjects, seriously downgrading the school’s strengths in its specialist areas. It had also ignored a substantial improvement in GCSE results among this year’s candidates, pulling them back on the basis of earlier results. Fortunately, Oxbridge had accepted all but one of the candidates who had received offers, and that case was pending decision.
Ofqual’s failure to discriminate accurately is founded on a misplaced belief in the power of statistical modelling, and a chronic failure in its leadership to pay sufficient attention to detail when measuring “stuff”. We can expect hard cases to make headlines, and universities to do what they can to mitigate the most obvious injustices. We can also expect an application for Judicial Review.
August 20
Ofqual needs a Chairman and Chief Regulator who know about education. If these can’t be found, we must start again.
Ofqual’s A level grades could not stand. The standard for a judicial review – that no reasonable person, acting reasonably, could have reached the decision in question (Associated Provincial Picture Houses v Wednesbury Corporation, 1948) was met with ease.
Failing a person without even looking at their work can never be reasonable. It is equally clear that Ofsted’s Saturday night U-turn was the result of its Board, which not met since last September, deciding that it was not going to go down with the Chief Regulator and Chairman. Ofqual should have spent the money it wasted on Public First on some decent legal advice. A first-year law student could have told them.
Last week’s dog’s dinner has been followed by a dog’s breakfast. As universities struggle with the flood of candidates deemed successful, while the smaller number who feel let down by their schools are left with no redress, schools and sixth forms are hit with a huge increase in top GCSE grades.
In fairness to Gavin Wilkinson, his instruction to Ofqual when the exams were cancelled in March, was “that these students should be issued with calculated results based on their exam centres’ judgements of their ability in the relevant subjects, supplemented by a range of other evidence.”Ofqual was legally required to do this, but instead overruled these calculations via a statistical rigmarole that took no notice of them, except where they had five or fewer candidates in a subject.
The Chief Regulator and Chairman decided to do it their way, and so hit the rocks. To that extent, the Government is justified in saying that the mess is Ofqual’s fault, and its expression of confidence in the Chief Regulator would shame a football club chairman. The DfE’s own failure lay in not following its instructions through to ensure that they were carried out. The Daily Mail’s front page cartoon of the Prime Minister and the Secretary of State as Laurel and Hardy sums it all up.
So, what now? First, we need to get rid of the idea that these grades are results. They are not, and cannot be relied on. Geoff Barton, of the Association of School and College Leaders, said that schools had given borderline candidates the benefit of the doubt, but this is not quite the case.
A university source from the North of England told me that many had given the most optimistic estimate of what might have been achieved with full teaching and revision, but that some had simply entered mock results, even if these had been lower than teachers’ estimates. No appeal was available, and university places had been lost as a result.
Barton’s view is more realistic than the corruption that took over GCSE school-based assessments, but the conflict of interest can’t be disguised. When a school gives a pupil an A, it gives itself one too, and I’ve seen unjustified top grades lead to pupils struggling and failing in the next stage of education.
Ofqual itself is an odd fish. Devised by Labour in 2009 to counter well-founded suspicions of dumbing down and grade inflation, it is, like Ofsted, notionally independent, but must “have regard “ to government policy when publicly directed to do so.
This leaves the Chief Regulator very wide discretion, exemplified by Sally Collier’s statement, after lowering A level grade boundaries in 2017, that “I want the message to be that students have done fantastically well. All our kids are brilliant”. If all are brilliant, all must have prizes. In the end, Oqual’s Board meeting on Saturday simply obliged her to base judgements on Williamson’s instruction, rather than ignoring it. What the Board could not do was meet his instruction to take account of additional evidence, hence opening the floodgates.
The statute requires Ofqual to perform its functions “efficiently and effectively”. It has failed to do so, but it is unfair to judge an educational body on its handling of a pandemic. More important are its failure to ensure fair and equitable grading – leading to able pupils taking physics and languages receiving lower grades than in other subjects – and a structure that allows its chief regulator to base major decisions on personal views. Improving supervision by the Board, and appointing a Chairman and Chief Regulator who know about education may both help. Failing that, we need to start again.
September 4.
Ofqual’s evidence at a Select Committee this week demonstrated why it should be wound up
Ofqual’s appearance at the Education Select Committee on Wednesday showed more clearly than anything to date just how far the organisation’s faith in statistical modelling and lack of understanding of education led it into error – and the education system into chaos.
Roger Taylor, its Chairman, started confidently, saying that Ofqual had wanted examinations to continue, but had been overruled by the Secretary of State. A second option had been to delay the examinations, and the third to find “some form of calculated grades.”
Gavin Williamson wrote to Ofqual on March 31 to say that students should receive “calculated results based on their exam centres’ judgements of their ability in the relevant subjects, supplemented by a range of other evidence.”
He went on to say that the approach should be “standardised across centres”, and that steps should be taken to maintain a similar grade profile to previous years. Ofqual then used “statistics and teachers’ rankings” to produce something which, said Taylor, was as fair as it could be.
The first error was to advise that examinations continue. This was impossible because some schools, following trade union advice, stopped direct online teaching as soon as lockdown started, while others – only a handful in the state sector – did not.
Stopping teaching when it would have been perfectly possible to continue it for A level classes would have put the affected pupils at a serious disadvantage. The same issue would have affected delayed examinations.
Ofqual’s statisticians could not have been expected to understand these considerations, but ministers did. Ofqual’s Board, which has highly experienced and expert practitioners, would have been able to explain the position but, according to its official records, did not meet between 26th September 2019 and a late-night session on 15th August, when it put its collective foot down over the botched appeals process. Why not?
What seems to have happened instead is the delegation of the work to a technical group, which did not standardise teachers’ assessments, as instructed, but ignored them completely by applying a statistical model to their rankings. Michelle Meadows, Ofqal’s “Executive director, strategy, risk and research”, justified this by saying that teachers’ grades were not accurate, but that their rankings were.
There is some research evidence to support this view, notably from Daisy Christodoulou, but to ignore teachers’ grades completely was a victory for statistics over reality. Dr Meadows told the committee that 0.2 per cent of grades were “potentially anomalous” and that the statistical model – which I will not flatter with the term “algorithm” – was fair and unbiased.
Furthermore, as teachers were often unsure whether to enter candidates for lower or higher tiers in some subjects, Ofqual had removed any limitation on grades for foundation candidates. That sounds fair – until we see pupils with very limited skills awarded grade 9 on the basis of work they’d never even seen.
Conservative committee members Jonathan Gullis and Christian Wakeford made the case for reality, Gullis pointing to the unfairness of the model to large entries from FE colleges, and Wakeford echoing a pupil’s lament, “I’ve got somebody else’s D”.
The consequences of not applying the model to entries of fewer than five candidates, which favoured private schools and some subjects had clearly blind-sided Ofqual, as did the question why they did not run this year’s results, which they had had since June, through the model to see how far it worked.
Dr Meadows evaded this question, saying they had done all sorts of trials. The point is: why not this one, which would have allowed problems to be identified in advance? It is hard to see how a system that only claimed 60 per cent accuracy could result in only 0.2 per cent of potential anomalies, but Dr Meadows was undaunted. Analysis did not show any bias in the system.
Robert Halfon concluded by asking whether Ofqual was fit for purpose, to which the witnesses, all of them Ofqual officials, predictably replied in the affirmative.
I do not agree with them. Assigning children’s futures to a statistical model, without considering the quality of their work, or even looking at it, is not the action of a reasonable body, acting reasonably, and would have brought a well-deserved hammering on judicial review.
If Ofqual had moderated teachers assessments sensibly, perhaps, as suggested by Bob3142 in response to my previous article, by requiring schools to justify any overall change from past performance, we could have had a fair outcome. As it is, we have had to swallow the grade inflation, and leave schools and universities to sort out the mess. Ofqual should be wound up.
These three articles first appeared on the political website, Conservative Home. I've reposted them here as their significance is primarily professional.