A case study of a graphical misrepresentation: Drawing the wrong conclusions about MMR vaccine
Anthony R Cox, Harold Kirkham
Graphs have been used in attempts to show a relationship between the MMR vaccine and autism. We examine the topic of graphical representation of data in general, and one of these graphs in particular, the one in a 1999 letter to The Lancet. That graph combined data from England and from California. The author alleged that this graph illustrated a rise in autism rates linked to the use of MMR vaccine. By examining the presentation closely, we are able to show how this graph misrepresented the data used. We give advice for both authors and publishers in the use of such graphical treatments of data.
Graphs are often used in scientific and technical papers. They can make things easier to understand. They can clarify relationships. They can allow straightforward extrapolation. They can present a lot of data concisely. However, they can also confuse and deceive, so they should always be constructed carefully by the author and viewed carefully by the reader. The controversy over a link between MMR vaccine and autism provides an example of why.
The 1999 Lancet graph
The publication in 1998 of a paper by Andrew Wakefield and co-authors , and a subsequent controversial press conference at which Wakefield called for suspension of the triple MMR vaccine , led to a crisis in confidence about MMR vaccine which has had a detrimental effect on vaccination rates.  However, in 2004, the 1998 paper was retracted by ten of the authors , and the editor of the Lancet stated the publication of the paper would have been handled differently if the full context in which the research had been done had been known . Public confidence in MMR vaccine has subsequently risen, with increased uptake of MMR vaccine in 2005-2006 .
The 1998 paper was not the only contribution to the MMR vaccine debate to appear in The Lancet. In September 1999, one of the original 1998 paper authors published a letter in The Lancet , containing a graph combining data from the Department of Development Services in California, with data from England obtained from a paper by Taylor et al . (Figure 1) The graph was used to allege that both sets of data illustrated a rise in autism rates coinciding with the introduction of MMR vaccine in each country.
Figure 1: The original graph published by The Lancet
The original caption accompanying the graph was
Temporal trends for autism in the USA (California*) and the UK (north-west London) In 1998 the expected numbers of newly diagnosed autistic children in California should have been 105â€“263 cases, according to DSM-IV; the actual figure was 1685 new cases. The temporal trend in north-west London is almost identical, although the rise is delayed by about 10 years. The two countries use the same diagnostic criteria. The sequential trends are consistent with the timing of introduction of MMR to both regions.
*Data from Department of Developmental Services, Sacramento,1987â€“98 (www.dds.ca.gov)
It is not uncommon to use a time-series graph such as this to show correlation between two variables. Probably the example that springs most readily to mind is the similarity between planetary temperature variations and carbon dioxide levels shown by studies of ancient ice. But it is often not good practice to use a time-series. Sometimes, the supposed association can be demonstrated only weakly. In any case correlation does not imply causation.
In the case of the autism data of Figure 1, the two data sets do not have a cause-and-effect relationship. They are presented by Wakefield as if both curves represent an effect (autism) with a common cause (MMR vaccine). Were that truly the case, the graphical approach would be both valid and useful. However, as we shall see, it is misleading to show the two curves in this way.
The California data were obtained from a 1999 graph produced by the Department of Developmental Services . The authors of this diagram were at pains to point out in their report that their graph did not show how many people entered their system in a given year, but instead the number in the system born in any given year. The graph was a distribution of birth dates. They argued that the quality and type of information examined in their report were not suitable for measuring incidence in the population of persons with autism. (Incidence in this context is a term of art meaning the number of people per unit of population diagnosed with a condition in a specified time such as a month or a year.) Nor does the California report present prevalence data. (Prevalence is the number of people diagnosed with the condition per unit of population regardless of the date of diagnosis.) That is why the vertical axis of the original California graph was labelled â€œNumber of Enrolled Persons with Autismâ€ as shown in Figure 2. The authors of the California report were making the case to the legislature for increased departmental funding- theirs was not an academic study.
Figure 2: The Californian chart
Instead of using a time-series graph, the authors of the California report should have used a bar chart. In fact, they should have used a specific kind of bar chart: a histogram. The use of a histogram, instead of a line graph, is required by the interaction between the bin width and the count. Were the bins to be made narrower (less than one year), the count in each bin would decrease. This is a characteristic of the histogram. The distribution of birth dates of people enrolled in a growing programme is not a time-series trend. Indeed, the data are neither continuous nor differentiable: the data do not represent a function of time. Given that the data show the count by years, the usual presentation is the population pyramid, a histogram with the bars horizontal.
Re-drawing as a population pyramid the data from the original Department of Developmental Services graphic presents the Californian data in a different light. It is shown in Figure 3 along with the English data, treated similarly. We discuss the English data later.
Figure 3: The combined data presented as a population pyramid
In this figure, the meaning of the bars is unambiguous. The first horizontal bar for California can only be interpreted as the number of people â€œin the systemâ€ who were 38 years old when the data were taken, and so on. The caption for the pyramid could have indicated that the data were analyzed in 1998 for patient records up to 1992. (It should also be pointed out that, in the period shown, the population of California grew by nearly a factor of 2. This fact alone must lead, ceteris paribus, to there being a higher number of young patients.)
This population pyramid presentation is valid, and a line graph is not. However, this was not the graph published in California. Nor was it used by Taylor et al.  In both cases, a line graph was used instead.
Examination of the 1999 Lancet graph
Taking the California graph (Figure 2) as the starting point, we can see that when the English data were added to create the 1999 Lancet graph (Figure 1), a number of changes were introduced,
- A new scale was added on the right, to be used with the added English data, labelled â€œnumber of new cases per yearâ€.
- The word â€œenrolledâ€ was deleted from the label on the left, so it reads â€œNumber of persons with autismâ€.
- The old figure caption, with its words about â€œDistribution of Birth Datesâ€ is gone.
- New words appear at the top of the graph, explaining that the arrows added to each curve indicate the â€œFirst birth cohorts [that were] eligible for MMR . . .â€
The data in each graph are a snapshot of the birth years of people in the system. We have no information about when they were diagnosed.
The California report included all diagnosed cases of autism, without exclusion criteria. In contrast, Taylor et al. obtained their data by selection from medical records. They selected patients who were in the records of eight North Thames health districts who were born since 1979 and before 1992, and who were aged 5 or less at the time of diagnosis. The use of these criteria makes the data not directly comparable with the California data.
We may note some problems with the changes made to the graphs in order to merge the data sets:
- The new title for the graph, in boldface, is â€œTemporal trends for autism in the USA (California) and the UK (north-west London).â€ However, this is in defiance of the injunction given by authors of the original California graph not to use the information in this way.
- The new scale on the right starts at zero, while the suppressed zero of the original left scale has been retained. It is not valid in terms of the graphics to present one curve with a zero and the other with a zero suppressed. When the two things graphed are the same, there is at least an expectation by the reader that the offset is zero and the scale factor (at least when the data are normalized) is the same. It is not appropriate to increase the scale factor and change the offset of one of the graphs. In fact, the California numbers have a smaller dynamic range than the English results.
- The deletion of the word â€œenrolledâ€ in the ordinate label is significant: it considerably broadens the meaning from the scope of the original, giving the impression that it was fair to compare the California data of enrolled children with the data from England, with its stricter inclusion criteria, supposedly representing â€œNumber of new cases per year.â€
- The 1977 arrow purporting to show the start of the MMR program in California is misleading in its precision. Combined MMR vaccine was licensed for use in the United States in 1971 , therefore the first eligible birth cohorts would have been those born a year or two before that. Throughout the 1970s, MMR vaccine replaced use of the individual measles, mumps, and rubella vaccines.
- Locating the appropriate age for the arrows in the population pyramid makes a very different case. For several years after the introduction of the MMR vaccine in California, the number of people who (at some time) entered the DDS system remained more or less constant. This contrasts with the English data, where the numbers appear to be increasing even before MMR was introduced.
The number of changes associated with adding the English data is unusually large, and many of them are important in creating an impression in the mind of the reader. Of course, unless the reader has taken the trouble to examine the original graphs, none of the differences listed above will be obvious. As readers, we take it for granted that a citation is valid, that the author of the citation actually said what he is alleged to have said. It is a matter of trust between reader and author.
It is worth noting that a later publication by the California Department of Health Services correlation shows no correlation between early childhood MMR immunization rates in California and the numbers of children with autism enrolled in California’s regional service center system .
We believe there are lessons to be learned by both authors and editorial staff from this case study.
There are some general points for those wishing to use graphics:
- Choose carefully the kind of graph you use to show your data. Even if you select an apparently conventional kind of graph, be sure your selection is appropriate. Had the California DDS authors and Taylor et al. selected the histogram or the population pyramid, it would have been much harder to misinterpret their birth year distribution as a time series. Wakefieldâ€™s choice of graph is understandable in view of the graphical choices made by the authors of the California report and by Taylor et al.
- Quote sources accurately. That applies both to verbal and graphical statements. In quoting verbal material, it is customary to show words left out (ellipsis) by printing dots or a long dash, and to show additions in square brackets. Changes to graphical material are not exempt from having such changes indicated.
- When referring to a graph, authors are under an ethical obligation to have read and understood the paper it was extracted from, and any surrounding explanatory text. In this case, the authors of the original California report clearly and repeatedly stated that the graph they had created did not show the incidence of autism in a given year or indicate any temporal trend. This word incidence is chosen correctly: to show incidence, the numbers would have to show how many new cases occurred in a given time per unit of population. The authors of the California report were not concerned with incidence, they were concerned with total numbers: theirs was a report to the legislature (for funding), not a science paper. (One might note in passing that Taylorâ€™s study does not show incidence, either.)
There is also a lesson for medical journals. The alleged link between MMR vaccine has been refuted by both epidemiological studies [11, 12] and virological studies [13, 14]. In addition, the World Health Organisationâ€™s Global Advisory Committee on Vaccines has also dismissed any link between autism and MMR vaccine .
However, the debate about MMR vaccine continues outside of the scientific community â€“ particularly in tabloid newspapers in the United Kingdom. Although a lack of trust in the scientific consensus runs through these concerns, paradoxically the high reputation of journals is invoked in â€œappeals to authorityâ€ by anti-vaccination campaigners. So, the respectability of The Lancet is invoked as a defence of the initial 1998 paper, and the 1999 graph can be invoked as further published evidence, in a peer-reviewed journal, for a link between MMR and autism. It is therefore important that the publication of such figures, even if they are correspondence items, perhaps not normally subject to formal peer review, should be done with great care, especially in crucial areas of public health.
Statistical review policies of biomedical journals are not consistent, and it has been argued that improvements could be made in biomedical publishing ; we would add the recommendation that specific review of visual presentations of data should be also be made. Although it would have necessitated some investigation of the original sources, a review of the 1999 graph may well have influenced the decision about the graphicâ€™s suitability for publication, and prevented propagation of its erroneous message. Whether such specialist review is easily available is open to discussion, but given the controversial situation that existed when the Lancet letter was published, further scrutiny would have been justified.
 Wakefield AJ, Murch SH, Anthony A, Linnell J, Casson DM, Malik M, et al. Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. The Lancet 1998;351:637-41
 Fitzpatrick M. MMR and Autism: What parents need to know. 1st ed. London: Routledge, 2004
 Asaria P, MacMahon E. Measles in the United Kingdom: can we eradicate it by 2010? BMJ 2006;333:890-895
 Murch SH, Anthony A, Casson DH, Malik M, Berelowitz M, Dhillon AP, et al. Retraction of an interpretation. Lancet 2004; 363: 750
 Horton R. The lessons of MMR. Lancet 2004; 363: 747-749
 The Information Centre. Immunisation Statistics England 2005-2005. Availible from http://www.ic.nhs.uk/pubs/immstats2005to2006 [accessed on 4th of April 2007]
 Wakefield AJ. MMR vaccination and autism. Lancet 1999;354:949-50.
 Department of Developmental Services. Changes in the Populations of Persons with Autism and Pervasive Developmental Disorders in Californiaâ€™s Developmental Services System: 1987 through 1998. Sacramento 1999. Availible from http://www.dds.ca.gov/autism/autism_main.cfm [accessed on 26th March 2007]
 Taylor B., Miller E., Farringdon C.P., Petropoulos, M.-C., Favot-Mayaud, I., Li, J., Waight, P.A., MMR vaccine and autism: no epidemiological evidence for a causal association. Lancet 1999; 353: 2026 â€“ 2029
 Dales L, Hammer SJ, Smith NJ. Time trends in Autism and in MMR immunization coverage in Calfornia. JAMA 2001; 285: 1183-1185
 Madsen KM, Hviid A, Vestergaard M, Schendel D, Wohlfahrt J, Thorsen P, et al. A population-based study of measles, mumps, and rubella vaccination and autism. N Engl J Med. 2002;347(19):1477-82
 Honda H, Shimizu Y, Rutter M. No effect of MMR withdrawal on the incidence of autism: a total population study. Journal of Child Psychology and Psychiatry 2005;46(6):572-9
 Afzal MA, Ozoemena LC, O’Hare A, Kidger KA, Bentley ML, Minor PD. Absence of detectable measles virus genome sequence in blood of autistic children who have had their MMR vaccination during the routine childhood immunization schedule of UK. Journal of Medical Virology. 2006;78(5):623-30
 Dâ€™Souza Y, Fombonne E, Ward BJ, No evidence of persisting Measles virus in peripheral blood mononuclear cells from children with autistic spectrum disorder. Pediatrics 2006;118(4):1164-1675
 Global Advisory Committee on Vaccines Safety. MMR and Autism. Availible from http://www.who.int/vaccine_safety/topics/mmr/mmr_autism/en/ [accessed on 4th April 2007]
 Goodman SN, Altman DG, George SL. Statistical reviewing policies of medical journals, caveat lector? J Gen Intern Med 1998;13:753-6
We would like to thank Don Eckley, a retired pharmacist, for bringing us together to write this article, and Dr Patrick Waller, Consultant in Pharmacoepidemiology, for his valuable advice relating to publication. We would also like to thank the anonymous reviewers of this article, whose thought-provoking comments improved the paper.
Contributors and sources:
The authors came together specifically to write this article. ARC is the Pharmacovigilance Pharmacist at The West Midlands Centre for Adverse Drug Reactions in Birmingham and Teaching Fellow at The School of Pharmacy, Aston University, Birmingham. HK is a principal engineer at the Jet Propulsion Laboratory, California Institute of Technology. He has an interest in the graphical treatment of data, and is in the process of writing a book on the topic.
ARC is also employed on a part-time basis at the Yellow Card Centre West Midlands, a regional education centre of the Medicines and Healthcare products Regulatory Agency (MHRA). The viewpoints expressed in this commentary are those of the authors and are not necessarily endorsed by the MHRA. HK includes the 1999 Lancet graph discussed in this article as one of several case studies in the proposed book. No funding was received for the preparation of this review.