This weblog aims to communicate with everyone interested in literacy, including researchers. The following paper is technical, but is needed to correct the error of trying to impose a single structure on educational studies, which may not always tell us what we need to know. As always, correspondence is appreciated. This is a slightly revised version to that originally posted. See also notes on randomisation of small trials, and Dr Ben Goldacre's paper on randomised trials.
Since 2001, Professors David and Carole Torgerson have argued for randomised controlled trials in educational research, and have lamented the lack of them. Their work continuously asserts the benefits of this approach, for example –
The randomised controlled trial is the best method of assessing causality[i]
The best method of ascertaining whether an intervention is effective or not is through the use of a randomised trial[ii]
The only research method that can adequately control for external confounding factors is the randomised controlled trial [iii]
This gold standard methodology should be more widely used as it is an appropriate and robust research technique.[iv]
At times their arguments amount to a reasonable suggestion that two fields of enquiry might learn from each other – for example:
Many aspects of health care research have sufficient similarities to educational research that some of the lessons learned by health care researchers over the past two decades are readily applicable to the re-emerging interest in randomised controlled trials methodology by educational researchers[v]
Elsewhere, their confidence in their “gold standard methodology” comes close to totalitarianism. In Carole Torgerson’s review of evidence on phonics, following the publication of the Clackmannanshire study in 2005[vi], she and her co-authors systematically disqualified every study that did not use this method, irrespective of any other strengths they may have had, including the key issue of long-term follow up to check on whether improvements are lasting. Several of the randomised controlled trials admitted by the authors had very serious flaws, including imprecise methodology, lack of follow-up of initial results, and sample sizes too small to detect potentially significant improvements. [vii]
The last issue is particularly important in view of Professors Torgerson’s convincing arguments on sample size in research. They say that most studies show a benefit at or below half a standard deviation (roughly 5% more progress than a control group), and that studies should be designed to detect this with 80% certainty. This requires a minimum sample of 64, with the same number for a control group.[viii] Of the 20 studies included in the DfES survey, only three (including an initial study by Johnstone and Watson) met this criterion for sample size. The rest had samples of under 50, and mostly well under 50, with some as low as 12. These studies used a wide range of methods, some to teach initial reading some to teach pupils described as “learning disabled”; there is no consistency in their methodology, and in some cases it is barely described. Basing conclusions on grouping flawed studies together, an approach that has had a long and sad history in phonics research, might – and did – produce a result that looks favourable to phonics, but it is not reliable science.
Professors Torgerson recognise potential flaws in studies using randomised controlled trials, including bias occurring in a sample by chance, technical bias, chiefly from a sample that is too small, subversion bias (from someone who does not believe in whatever they have been asked to do), attrition bias (where people lost from a sample may result in the final sample being skewed), attribution bias (in allocating people to one or the other group, in practice v similar to chance bias), reporting bias, dilution bias (aka seepage of a technique from the group being given it to other pupils), and exclusion bias[ix].
These problems and more are illustrated in a randomised controlled trial led by Professor Greg Brooks, one of the co-authors of the DfES review, with the participation of Carole Torgerson.[x]The study involved the complete Year 7 of a comprehensive school, 155, and its goal was to bring about an overall improvement in spelling using an unidentified computerised system that based spelling on pupils’ own pronunciation. Pupils were allocated to the additional spelling and control groups at random, and rigorously pre- and post- tested using NFER-Nelson reading and spelling tests. The results showed no improvements in spelling as a result of the exercise, and even a dip in reading scores from the group given the extra spelling, though this was corrected later. So, the answer to the question in the study’s title, Is an intervention using computer software effective in literacy learning? is, apparently, no.
But is it? First, the study used one computerised program only, and gives no clear reason for its selection against dozens of others. Any skilled teacher investigating the use of ICT to improve spelling would select software carefully, considering the needs of the pupils and the nature of their spelling difficulties. No such process took place – there is no evidence of children or teachers being consulted about the choice of the software, or any analysis of strengths and weaknesses in their spelling beyond the mean test scores. Next, it is not clear that the software was used properly. Children had to use it, on laptops, in groups of six, for one hour a day. However, one group had to have five two-hour sessions, and another had one two-hour session and eight hour sessions. Here again, teaching experience should have been taken into account and wasn’t. Would any sensible teacher put a child in front of a computer for two hours a day to improve their spelling? The effects on motivation might be predicted. Did the anonymous program’s manual provide for two hours a day? We should at least have been told if it did so. If it did not – and I have never heard of any that did - the error is serious enough to invalidate this group’s scores. Finally, the researchers retested children’s spelling and reading after just two weeks, with a further test after the control group had been given access to the machines. In other words, the pupils were tested three times in half a term, with no follow up to see whether any changes were permanent – or indeed whether there were any longer term benefits that were not immediately apparent.
Professors Torgerson's claims for randomisation’s ability to eliminate sampling bias range from the categoric randomisation only guarantees comparable groups at pre-test to (in the same paper!) the qualified although randomised controlled trials should, in theory, eliminate selection bias, there are instances where bias can and does occur.[xi] In this study, the second was true. The authors say that the randomised controlled trial is the only research method that can adequately control for all the unmeasured variables that may affect student outcomes. Randomisation ensures that all potential confounding factors are distributed without bias across the randomised groups, and controls for temporal and regression to the mean effects. Nevertheless, the group given the extra spelling had significantly more girls than boys, and a higher starting point that had to be dealt with by additional statistical analysis. It is not clear just how large a sample would be needed to meet the claim that randomisation could control for all variables, but it is clearly larger than the sample needed for statistical validity. How “regression to the mean” is meant to operate over a six week period is not explained, any more than the phenomenon itself can be shown to apply to reading and spelling, where there is evidence of increasing divergence as children move into secondary school. [xii]Finally, the agglomeration of the scores for all pupils in each group into a single figure does not allow us to say whether any group of pupils – higher, average or lower-attaining, or those with special educational needs – gained any more or less benefit from the extra spelling than any other. The effect of randomisation here is to not to improve the quality of evidence generated by the study, but to tell us less, and to make the evidence generated less precise. Overall, there is not a scrap of evidence that randomisation added to the value of this study or compensated for its elementary errors. Its only identifiable effect has been to make a bad study worse.
This study shows that sampling is only one element of research design, that randomisation is one of a range of options, and that it is not necessarily the best. There are examples of successful educational randomised controlled trials – one showed the effectiveness of extra funding for schools in poor areas in New York, and another the negative response of pupils to an anti-smoking campaign[xiii]. Full consideration of what made these studies successful is beyond the scope of this paper, though it seems that they were concerned with a single issue that was relatively easy to measure, and used large samples. Professors Torgerson estimate that a study designed to show the likelihood of raising by 1 the number in a class achieving 5+ higher grade GCSEs might run to 8-10,000.[xiv] The key point is whether a sample will give us the information we are looking for, and the success of some randomised trials does not constitute grounds for rejecting well-constructed studies that use different sampling techniques that are more suited to their purposes.
The final, crucial weakness in Professors Torgerson’s argument for randomisation as an essential factor in educational research is their application of statistical techniques without sufficient consideration of the context of education. They cite with approval a 1997 longitudinal study showing that socio-economic status is a consistent predictor of educational success[xv]. This is a generally accepted view, but the issue is complex and has consequences for researchers. To make an impact on the problem of low achievement by poorer socio-economic groups, we need to pinpoint why they are achieving less, and we cannot do this simply by agglomerating their results with those of other groups and randomising. This is obvious, but Brooks et al did not do it in the study cited above. Could this be another uncontrolled variable? We cannot tell, and yet randomisation is supposed to distribute such variables across the sample. At one point, Professors Torgerson suggest “stratified randomisation” as a possible solution to the problem[xvi]though they appear to retreat from this in their 2008 book. The key point is that educational performance is not the result of chance, but are the result of a complex interaction between children’s experiences, their personal characteristics, including the way their brain is structured (eg sensitivity to light, dyslexia[xvii]) and their intellect. To investigate these issues we need precision, not randomisation.
The issue’s political dimensions are summarised in this quotation from the former Secretary of State, David Blunkett -
We welcome studies which combine large-scale, quantitative information, on effect sizes that will enable us to generalise.
Generalisation leads to policy, which is, in this case, the commitment of the Labour and Conservative parties to the use of phonics as the main vehicle for teaching reading and spelling in infant schools, which was reinforced by the statistically significant, long-term gains in the Clackmannanshire study quoted above. The imposition of a new criterion for validating research was very useful to opponents of this approach, as it appeared to knock out the research that showed the strongest evidence in favour of phonics. As far as I can find, none of these opponents had organised randomised controlled trials themselves, but this did not limit their enthusiasm. Professors Torgerson and Brooks are not responsible for triggering this response, but no discussion of randomised controlled trials in the present context can ignore it. Perhaps the most important point of all, however, lies in the application of generalisations derived from statistical science to educational research. Statistical generalisations are derived from observed tendencies, and are not laws of nature. They need to be tested against established knowledge in whatever field they are applied.
John Bald, independent consultant.
Author, Using Phonics to Teach Reading and Spelling (Sage, 2007).
[i]Torgerson, DJ and Torgerson, CJ Designing Randomised Trials in Health, Education and the Social Sciences: an introduction. (Palgrave Macmillan, 2008) viii
[ii]Torgerson, DJ and Torgerson, CJ (2003) Avoiding bias in randomised controlled trials in educational research: British Journal of Educational Studies 51: 36-45
[iii]Torgerson, CJ and Torgerson, DJ (2003) The design and conduct of randomised controlled trials in education: Lessons from health care. Oxford Review of Education, 29: 67-80
[iv]Torgerson, CJ and Torgerson DJ (2001) The need for randomised controlled trials in educational research. British Journal of Educational Studies 49,3:316-328.
[v]Torgerson, CJ and Torgerson, DJ (2003) The design and conduct of randomised controlled trials in education: Lessons from health care. Oxford Review of Education, 29: 67-80 (p68)
[vi]Johnstone, R and Watson, J (2005) The Effects of Synthetic Phonics Teaching on Reading and Spelling Attainment: A Seven Year Longitudinal Study. Scottish Executive Central Research Unit.
[vii] Torgerson, C, Brooks G and Hall, J (2006) A Systematic Review of the Research Literature on the Use of Phonics in the Teaching of Reading and Spelling. DFES Research Report RR711
[viii]Torgerson and Torgerson (2008) 130
[ix]Torgerson, DJ and Torgerson, CJ (2003) Avoiding bias in randomised controlled trials in educational research: British Journal of Educational Studies 51: 36-45
[x]Brooks et al (2006) Is an intervention using computer software effective in literacy learning? A randomised controlled trial Educational Studies 32: 133-143
[xi]Torgerson, DJ and Torgerson, CJ (2003) Avoiding bias in randomised controlled trials in educational research: British Journal of Educational Studies 51: 36-45
[xii]Eg Chall, J et al, (1990) The Reading Crisis, Why Poor Children Fall Behind, Harvard UP.
[xiii]Crain and York (1976) in Torgerson and Torgerson (2001)
[xiv]Torgerson, CJ and Torgerson, DJ (2003) The design and conduct of randomised controlled trials in education: Lessons from health care. Oxford Review of Education, 29: 67-80
[xv]Robinson, P. (1997) Literacy, Numeracy and Economic Performance (LSE), in Torgerson, CJ and Torgerson, DJ (2003) The design and conduct of randomised controlled trials in education: Lessons from health care. Oxford Review of Education, 29: 67-80
[xvi]Torgerson, DJ and Torgerson, CJ (2003) Avoiding bias in randomised controlled trials in educational research: British Journal of Educational Studies 51: 36-45
[xvii]See, for example, Reading Through Colour, (Wilkins, A, 2003) The Learning Brain (Blakemore, S and Frith, U2005) and Pink Brain Blue Brain (Eliot, L 2010)