Вы находитесь на странице: 1из 33

The social and phenotypic heterogeneity of autism: identifying clusters in a large population-based sample

Christine Fountain

Department of Sociology and Anthropology

Fordham University

Address Correspondence to: Christine Fountain, Department of Sociology and Anthropology, Fordham University, 113 W 60 th St., New York, NY 10023, Phone: 646-293-3959, email:


Acknowledgements: I thank Peter Bearman, Ka-Yuet Liu, Marissa King, Keely Cheslack- Postava, Soumya Mazumdar, and Alix Winter for their contributions to this work. This research was supported by the NIH Director's Pioneer Award program, part of the NIH Roadmap for Medical Research, through grant number 1 DP1 OD003635-01 and the National Institutes of Mental Health award number R21MH096122. Partial computing support for this research came from a Eunice Kennedy Shriver National Institute of Child Health and Human Development research infrastructure grant, R24 HD042828, to the Center for Studies in Demography & Ecology at the University of Washington.

Working Paper Draft: Do not cite or distribute


Autism is a “spectrum” disorder characterized by myriad combinations of behavioral symptoms and severity, and a heterogeneous set of risk factors. Although the “heterogeneity” of autism is a truism, most research fails to methodologically account for it. In this paper, I identify and describe five typical autism subgroups in a population-level dataset consisting of the birth records of all children with autism born in California in 1992-2005. Using cluster analysis, I find groups of children with similar attributes on socioeconomic, biological, and autism symptom variables. These clusters represent consistent and coherent groups and reveal important associations between sets of characteristics. These clusters are also shown to have clear and meaningful temporal patterns, and particular autism subtypes have risen and fallen in relative size as the diagnostic context has changed. Administrative boundaries relevant to the diagnosis of and provision of services for autism also show variability in their cluster composition. Cluster analysis reveals not only way that social and biological factors combine to jointly create this heterogeneous disorder, but also how diagnostic patterns vary over space and time

Working Paper Draft: Do not cite or distribute

The social and phenotypic heterogeneity of autism: identifying clusters in a large population-based sample


Autism is a neurodevelopmental disorder characterized by deficits of communication and

social interaction, as well as repetitive and stereotyped behaviors, typically diagnosed in early

childhood. Autism is considered a “spectrum” disorder, including persons with varying

constellations of symptoms of varying severity. Even within the core Autism diagnosis, there is

substantial heterogeneity of autistic symptoms and associated features. A diagnosis based on the

Diagnostic and Statistical Manual (DSM) requires a minimum of symptoms from a list of twelve,

distributed among social, communication, and repetitive behavior dimensions, as well as delayed

or abnormal functioning in at least one of these dimensions. Aside from variations in the severity

of these symptoms, this leaves a great deal of room for variability in the portfolio of symptoms

and behaviors captured by an autism diagnosis. Comorbidities are also common; persons with

autism may also carry diagnoses of mental retardation, ADHD, epilepsy, anxiety disorders, or

other conditions. In addition, there are several symptoms and traits, experienced by many

persons with autism, aside from the core features. These include gastrointestinal disorders, food

preferences and sensitivities, sensory hypersensitivities or abnormalities, motor coordination

abnormalities, and sleep problems as well as unusual skills.

Thus, the variety of core and secondary symptoms that may be expressed by persons with

autism is enormous. The multiplicity of autism phenotypes, not to mention autism etiologies,

points to the heterogeneity of this disorder. Even more complex is how this heterogeneity

unfolds over time, as children’s symptoms change through childhood and adolescence (Fountain,

Winter, and Bearman 2012).

The process by which autism is identified and diagnosed has also changed over the past

three decades. Public awareness of autism has increased, and the stigma associated with autism –

arising from the mistaken belief that autism was a psychogenic disorder – has decreased. Against

this background, autism prevalence has risen dramatically, if unevenly.

Of particular sociological interest is the possibility for social factors to intervene in the

diagnostic process. Children are typically diagnosed at a pre-school age, and diagnoses are based

solely on behavioral symptoms. Further, there is some ambiguity in distinguishing autism from

other communication disorders and developmental delays, particularly at the high and low

functioning ends of the autism spectrum and at the youngest ages. This ambiguity helps create

the potential for the understanding and resources of parents to play a role in which children get

which diagnoses. This has been amplified by changes in the diagnostic criteria, and increasing

awareness of autism among parents, teachers, and caregivers.

In this paper we explore the social and phenotypic heterogeneity in autism using a unique

population-level data set on the annual evaluations of all children with autism born in California

in 1992-2005 and enrolled with the Department of Developmental Services. Linking birth

certificate records to autism caseload records, we are able to connect parental socioeconomic

background, neighborhood characteristics, and information on the autism diagnosis and

symptoms. We use clustering methods to describe the subgroups of children with autism we find

in the data. These coherent groups reveal the ways that social and behavioral factors combine in

complex and shifting ways to produce the oft-observed heterogeneity of autism.


In this paper, I will describe what is known about the heterogeneity of autism and its

association with key social and biological risk factors. Next, I apply a k-medoids clustering

algorithm to identify the sub-groups within this population of children with autism. After

describing these groups, I examine their correlates: when, and in what contexts, are these

subtypes likely to be found? Finally, I explore the implications of our findings for autism

research in general.


Autism prevalence and heterogeneity

In recent years, autism has become more visible as both incidence and prevalence

increased. Once considered a very rare disorder, with a rate of about 4 per 10,000 population

until the 1970s (Fombonne 2005), recent prevalence estimates have reached 1 in 68 children

aged eight (Developmental Disabilities Monitoring Network Surveillance Year 2010 Principal

Investigators 2014). There are many reasons -- and much disagreement about their relative

importance -- for this increase.

Risk factors for autism

Autism’s heterogeneity is likely associated with different etiologies. Recent studies using

high resolution analysis techniques found inherited and de novo deletions and duplications in a

wide range of locations on the genome in autism cases as compared to controls, suggesting

heterogeneous genetic predispositions. Consistent with this idea, family aggregation studies

suggest that the three main characteristics of autism have different levels of heritability(Freitag

2006; Georgiades et al. 2007; Ronald et al. 2006).

Autism has also been associated with various factors relating to the prenatal and perinatal

environment and socioeconomic status that may also contribute to varying symptom

presentations, including: gestation length, birth weight, labor and birth complications, short inter-

pregnancy intervals, and multiple births (Larsson et al. 2005; King and Bearman 2011; Hultman,

Sparen, and Cnattingius 2002; Croen, Grether, and Selvin 2002; Larsson et al. 2005). Parental

characteristics such as advanced parental age, education, and history of schizophrenia are also

associated with increased autism risk (King et al. 2009; Croen et al. 2007; Durkin et al. 2008;

Larsson et al. 2005; King and Bearman 2011; Croen, Grether, and Selvin 2002) and may point to

mechanisms that lead to heterogeneous symptom presentation.

Socioeconomic Status and Autism

Although many of the substantial set of risk factors are biological, many others are at

least partly social in nature. It has been firmly established that children born to older mothers and

fathers are at higher risk of autism (Croen et al. 2007; Durkin et al. 2008; King et al. 2009).

Although this is certainly in large part for biological reasons (e.g. the higher risk of de novo copy

mutations and riskier pregnancies that come with older parental age), there may also be a social

component as children born to older parents are scrutinized more carefully for these symptoms.

For example, one analysis comparing autistic children conceived with assisted reproductive

technologies (ART) (who tended to have significantly older parents) to those without found that

the assisted reproduction group were diagnosed earlier and with milder symptoms, but that this

difference disappeared when controlling for socioeconomic factors, particularly parental age and

education. (Schieve et al. 2015) This suggests that the key difference between these groups may

be in ascertainment, not phenotype.

Further, the increasing prevalence of autism may then be in part a result of the broad

sociological phenomenon of increasing parental ages (Kayuet Liu, Noam Zerubavel, and Peter

Bearman 2010). The rising use of ART may also be contributing to this trend, by pushing on the

upper bound of the fertile age range. Recent evidence suggests that ART conceptions are at

higher risk of autism, as well as other developmental disorders (Fountain et al. 2015; Hvidtjørn

et al. 2011; Hvidtjorn et al. 2009). In addition to resulting in older parents and smaller families,

this may also contribute to shifts in the spacing of children. Some research has suggested that

short inter-pregnancy intervals, perhaps arising from delayed fertility, may also be associated

with increased autism risk (Cheslack-Postava, Liu, and Bearman 2011).

Increased parental education has been associated with autism risk (Croen, Grether, and

Selvin 2002; Durkin et al. 2010; King and Bearman 2011; Larsson et al. 2005). Many studies

have also found racial and ethnic disparities in autism rates (ADDMN 2007; Centers for Disease

Control and Prevention 2006; Fountain and Bearman 2011; Liptak et al. 2008; Mandell et al.

2009; Shattuck et al. 2009). There is no current evidence for a genetic explanation for these

genetic differences. Rather, most researchers consider race and ethnicity to be a proxy for other

socioeconomic variables, including wealth and income, education, and culture (Burchard et al.

2003; Link et al. 1998). In some cases cultural or language differences may contribute to these

differences; in addition, teachers and caregivers can interpret and make decisions on symptoms

based partly on race (Mandell et al. 2009; Palmer et al. 2009).

In fact, unlike almost every other known disease or disorder, autism shows a reversed

socioeconomic gradient, such that the socioeconomically disadvantaged tend to have lower risk,

likely due to under diagnosis. One of the most stable social facts is the socioeconomic gradient

for health and mortality. Specifically, higher status people -- whether status is based on education,

income, occupation, or Nobel Prizes and Academy Awards – have better health and lower risk of

death (Marmot 2004; Rablen and Oswald 2008; Redelmeier and Singh 2001). This stylized fact

has been true across all societies and groups where it has been studied, across time, and persisted

in the face of enormous social change and medical advances. The reasons for this are myriad,

including differential access to health care and insurance, knowledge and efficacy, health

behaviors, cultural and linguistic barriers, and social capital (Pescosolido 1992).

The resistance of this pattern to change in time, technology, and scale has motivated

some medical sociologists to argue that socioeconomic status is a “fundamental cause” of health

(Link and Phelan 1995). This concept is in contrast to the mainstream approach of epidemiology,

which focuses on modifiable, proximate risk factors. The fundamental cause theory argues that

risk factors such as nutrition, exercise, and exposure to toxic substances directly affect health

outcomes, but that they themselves are fundamentally caused by the social conditions in which

these individuals are embedded. They point to the fact that reducing or eliminating some of the

proximate causes of poor health or even the diseases disproportionately affecting the poor does

little to eliminate the association between SES and mortality. For example, the 19 th century poor

often experienced poor nutrition and sanitation, as well as overcrowding. These risk factors made

them especially vulnerable to typhoid, smallpox, tuberculosis, diphtheria, and other infectious

diseases. However, neither the improvement in social conditions that came with economic

development nor the effective eradication of these diseases due to widespread vaccination and

medical treatments, has reduced the socioeconomic gradient for mortality (Phelan, Link, and

Tehranifar 2010). In developed countries, infectious diseases have been replaced by cancers and

chronic conditions like heart disease and diabetes. Thus, they argue, the focus on modifiable

intervening risk factors will not reduce the mortality gap (although it may reduce mortality, as

did the war against infectious diseases) if there is no change in the underlying social conditions.

Because resources can be used in an adaptable and flexible way as conditions change,

there is no one mechanism linking SES to health, and interventions that focus on proximate

causes are likely to be ineffective as new mechanisms will replace the old ones. Perversely, new

information and technologies can actually exacerbate the SES gradient, if high-status people are

better able to harness these advances to improve their health. One example is smoking behavior.

Prior to 1954, when scientists began to definitively establish that smoking caused cancer, there

was little socioeconomic gradient to smoking or to the knowledge that smoking was unhealthy.

After this time, however, a strong socioeconomic gradient in both knowledge and behavior

opened up as more educated people were more likely to believe that smoking causes cancer and

to change their behavior in accordance (Link and Phelan 2009).

In the case of autism, this type of mechanism may account for the peculiarly reversed

socioeconomic gradient, much as it has done previously for certain types of cancers (Link et al.

1998). If the risk factors for a disease are not easily modifiable or understood, as is true of breast

cancer and autism, then people are unable to effectively use the resources that come with high

socioeconomic status to avoid the disease. However, SES may translate into differential

screening and identification of the disease, as it has with mammography. Similarly, although no

one knows how to reduce the risk of having a child with autism, parental resources do make a

difference when it comes to obtaining a diagnosis (Durkin et al. 2010; Fountain, King, and

Bearman 2010; King and Bearman 2011; Mandell et al. 2009; Russell, Steer, and Golding 2010).

Thus, the reversed SES gradient pattern for autism is consistent with the theory of SES as a

fundamental cause of health.

Spatial Distribution and Social Influence

Research on social influence and the social diffusion of health behaviors and outcomes

has a long history in sociology and has experienced a recent resurgence (Cacioppo, Fowler, and

Christakis 2009; Christakis and Fowler 2007, 2008; Fowler and Christakis 2008; Liu, King, and

Bearman 2010). This vein of research emphasizes the influence of the social context –

particularly the web of social relationships -- in which one is embedded on health. Some of the

latest work has found that smoking, obesity, and happiness, as well as loneliness, spread through

social networks (Christakis and Fowler 2007, 2008; Fowler and Christakis 2008), although there

has also been some criticism of the methodology of this work (Cohen-Cole and Fletcher 2008;

Lyons 2010; Shalizi and Thomas 2011).

Although autism is not, strictly speaking, a contagious disease 1 , knowledge and

information about the existence and symptoms of this once-rare disorder may spread through

social ties. Similarly, the ways that one is influenced by the behaviors, values, desires, and

persuasion of one’s social connections is known as social influence. This may also play an

important role in the spread of autism, from the decline in stigmatization of the disorder to the

receipt of advice and counsel from kin, friends, neighbors, and teachers. In a recent paper, Liu,

King and Bearman (2010) find that living close to a child with autism increases the chance, all

else equal, that a child will be diagnosed in the next year. They argue that the reason for this

relationship is the sharing of information locally on symptoms as well as finding doctors and

obtaining diagnoses and services.

Aside from social influence, the characteristics of local neighborhoods can have

important consequences for autism diagnoses. The density and visibility of autism in an area, the

availability of professionals qualified to diagnose and treat autism, the resources and experience

located the local school system, and the socioeconomic composition of a neighborhood, among

other factors, can affect the rate of diagnosis as well as the timing of those diagnoses (Fountain et

al. 2010; King and Bearman 2011). Autism cases are not spread evenly over space (Mazumdar et

1 There is some evidence that a subset of autism cases may be caused by prenatal exposure to viruses, including congenital rubella or, less compellingly, influenza (Chess 1971, 1977; Shi et al.


al. 2010, 2013), and the social influence processes identified by Liu et al. (2010) can amplify the

disparities in local autism resources, contributing to local variability in local diagnosis regimes

(Liu and Bearman 2012).

Prior Research on Autism Clusters

There has been a small amount of past research using clustering methods to identify

subtypes of autism (Eaves, Ho, and Eaves 1994; Prior et al. 1998; Stevens et al. 2000). However,

in addition to covering only short time periods, these studies have been based on small and non

representative samples (Eaves et al. 1994; Stevens et al. 2000), and focus exclusively on

behavioral symptoms without including socioeconomic or familial variables (Prior et al. 1998).

This research has identified coherent groups within their samples, although these symptom

groupings do not necessarily map onto standard autism diagnostic categories. In a previously

published paper I have used a related strategy, group based trajectory modeling, to identify sub-

groups with similar longitudinal symptom trajectories (Fountain et al. 2012). In this work I found

six unique trajectories, one of which was characterized by a surprisingly large amount of

improvement. Although the groups were identified based solely on the patterns of change in

symptoms, these trajectories were highly correlated with socioeconomic factors, such that more

advantaged children were higher functioning and more likely to display this pattern of marked


In summary, different contexts can lead to different autism risk factors as well as

different kinds of children being diagnosed. If we think of a risk factor as something that shifts a

child’s change of being diagnosed with autism by x amount, then we will have an incomplete

understanding of the phenomenon. Risk is specific to particular social contexts in ways that

matter greatly to our measured estimates of the prevalence of conditions like autism (Fountain

and Bearman 2011; King and Bearman 2011). We need to understand the diversity of autism in

not just its symptom expression, but also its social demography. This must include attention to

how the salience of risk factors changes over space and time. However it also requires applying a

more diverse set of methods to this problem in order to appreciate the variability in children

diagnosed with autism.


The data for this paper consist of birth and administrative records for all California

children with autism who were born from 1992 through 2007. The California Departmental of

Developmental Services (DDS) provides diagnoses and services to persons with developmental

disabilities including autism through its system of 21 Regional Centers. Although enrollment is

voluntary, the strong financial incentive to obtain services through the DDS means that the vast

majority of persons with autism in California are enrolled, making the DDS the largest

administrative source of data on autism diagnoses (Croen, Grether, Hoogstrate, et al. 2002).

Services and support are provided to persons with Autistic Disorder (DSM-IV code 299.0), but

not to those with other spectrum disorders or pervasive developmental disorders unless they have

another qualifying condition or substantial disability.

DDS caseload records were matched to birth certificate records, resulting in 42,362

children born 1992-2007 and diagnosed with autism before 2011. Linkage was conducted using

deterministic and probabilistic matching in Link Plus (Division of Cancer Prevention and

Control 2007); uncertain matches were reviewed manually. Ninety-one percent of DDS files for

children ever diagnosed with autism were successfully linked to birth records; typically, those

not linked were born outside of California and moved in later.

Variables extracted from birth records include maternal age at birth and education level,

child’s sex, birth weight (less than 2,500 grams was considered low birth weight). From the DDS

records I obtained characteristics on the features of the autism at diagnosis, including presence of

comorbid intellectual disability, age of child at diagnosis, and symptom severity. Information on

symptoms comes from the Client Development Evaluation Report (CDER), which is given to

each client at entry into the DDS system and approximately every year while they remain on the

caseload. Through the CDER, DDS clients are evaluated for symptom severity and function

across a variety of dimensions. The evaluative element of the CDER is designed to help

determine appropriate services and needs, not as a diagnostic instrument. However, these items

contain useful information on the presence and severity of core autism symptoms. In this

analysis, I use items measuring verbal communication and social interaction, respectively, two of

the three main dimensions of autism symptoms. To create scores for communication and social

function, we summed the five communication and three social items, collected at entry into the

DDS system, weighting each item equally in the index (further information can be obtained from

author). The highest and lowest quintiles on each index are categorized as high and low

functioning, respectively. In 2008 the DDS revised the CDER, reducing the items used to create

these indices to a single item each. This lack of comparability for those diagnosed before and

after 2008 is problematic, but these items are still the main source of information on social and

communication function. Collapsing the five ordered response options for each item, I combine

responses to created categories consisting of the approximately 20% highest and lowest

functioning children on these items.

Robustness checks are conducted to confirm that these

categories are substantively similar across the years.

I exclude all children born after 2005 in order to ensure that there is complete

ascertainment through at least age six for all cases. Listwise deletion was also used to eliminate

cases with missing data on key clustering variables, resulting in a final analysis sample of 36,180

children. Summary statistics on this sample are presented in Table 1. Since this sample is

composed of children with autism, is differs from the general population of children; for

example, the proportion of female children is much lower, and low birithweight births much

higher, than for California births in general.

Table 1. Sample Description for Key Clustering Variables




Female Low Birth weight (<2500 g) Maternal Age Young (<25) Middle (25-35) Older (>35) Maternal Education < High School HS or Some College College Grad Diagnosed Late (Age 5+) Intellectual Disability Dx Social Functioning Low (<20 percentile) Medium (20-80 percentile) High (>80 percentile Communication Functioning Low (<20 percentile) Medium (20-80 percentile) High (>80 percentile








































Motivation for Clustering Methods

The main approach used in this paper is cluster analysis, which is used to identify

subgroups within a population which tend to have similar attributes (Everitt et al. 2011a). The

idea is to find and group together similar observations in order to describe the subgroups in the

data. This method is used widely in many fields, including biology, genetics, marketing, and

computer science, although in the social sciences it is much less common. Clustering analysis

differs from regression analysis in that the purpose is mainly descriptive and not intended to

capture causal effects, but rather to show what characteristics of observations tend to hang

together. These groups can then be used to find associations with other outcomes, in conjunction

with regression or other methods, such as the particular contexts or mechanisms that produced

each subtype.

Clustering is especially useful for heterogeneous data in which there may be no “typical”

or average case. An excellent sociological example is the analysis of the diverse causation of

migration from Mexico to the US (Garip 2012). Autism is a perfect example of this sort of

phenomenon. During the study period, the diagnostic criteria for autism have changed multiple

times (King and Bearman 2009), the visibility and awareness of autism has increased and stigma

decreased, treatment options have expanded, and prevalence has risen, accompanied by a rise in

milder cases. Thus, there are likely to be multiple paths to an autism diagnosis during this period,

resulting in multiple types of children with autism. Regression, which assumes normal variation

around a mean, cannot capture this heterogeneity.

Steps in Cluster Analysis

The clustering analysis was accomplished using the cluster package for R. The first step

in identifying clusters is to choose a set of relevant variables on which the researcher expects the

data to cluster. I have created 12 dummy variables that capture the characteristics of the child,

the mother, and the autism symptoms and diagnosis (see Table 1 for a description of these

variables). Although paternal age has been shown in many studies to be an important risk factor

for autism, due to high rates of missing data on this variable and high correlation with maternal

data on this variable and high correlation with maternal age I do not include it as

age I do not include it as a clustering variable. [Note: this selection is still in progress, and the

final variables and clustering solution may change.]

Next, these variables are used to produce a

distance matrix. This is an N x N matrix containing, for each observation, its dissimilarity from

every other observation. This dissimilarity can be thought of as a social distance, or a measure of

how many or how few variables each of the observations have in common. The distance matrix

was calculated based on manhattan distances, which are well-suited to binary variables.

This distance matrix is then plugged into a clustering algorithm. I chose the k-medoids

method as implemented in the R pam function, a robust version of the popular k-means algorithm

(Everitt et al. 2011b). Briefly, the basic approach is to use the dissimilarities to identify the

center of each cluster. Observations are iteratively moved to the cluster whose center it is closest

to, and the distances are recalculated and a new center identified. The final clustering solution

then minimizes the distances between all the observations and the centers of their respective


An important step is determining the number of clusters. The analyst must specify the

number of clusters as an input to the k-medoids algorithm, but this quantity is a key question of

interest. I ran the clustering algorithm across a range of k values from 2 through 12. The best-

fitting solution, based on the criteria of mean silhouette index (which compares heterogeneity of

a cluster to its separation, and is higher when clusters are well-characterized) as well as

parsimony, was five clusters (Everitt et al. 2011b). 2

2 More information on cluster validation is available from the author.

After creating the clusters, the analyst must examine them to see if they make substantive

sense given what is known about the context. To do this, I examine the composition of each

cluster in order to understand what kinds of children have been aggregated into each group, and

which variable values are most salient to each. Finally, I plot these clusters over time and

administrative boundaries in order to reveal the temporal and spatial patterning of each subtype

of autism case, and to assess whether they are more prevalent in particular time periods or

regional centers.


Table 2 contains the composition of each cluster for the five cluster solution. Some

variables are more salient to some clusters than others; next I discuss the key characteristics of

each cluster.


[To do: calculate significance tests comparing each cluster to the overall sample

Beginning with Cluster 1 (about 21% of the group), it is immediately obvious that the

mothers are particularly salient to this classification. All of the children in this group have

mothers who are college graduates, and they are substantially older than the other clusters. 37%

are over 35 (compared to 23% for the population), and very few (<3%) are under 25. Children in

this group are unlikely to be diagnosed late, and have about half the rate of intellectual

disabilities as the population in general. Although there are not especially likely to be high

functioning, they are unlikely to be low functioning (especially on the communication domain).

We can think of these children as the autism cases associated with delayed fertility among the

highly educated. These cases tend to be less severe than others, unaccompanied by intellectual

disabilities, and identified earlier rather than later.

Table 2. Composition of Clusters


Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5











Female Low Birthweight (<2500 g) Maternal Age Young (<25) Middle (25-35) Older (>35) Maternal Education < High School HS or Some College College Grad Diagnosed Late (Age 5+) Intellectual Disability Dx Social Functioning Low (<20 percentile) Medium (20-80 percentile) High (>80 percentile Communication Functioning Low (<20 percentile) Medium (20-80 percentile) High (>80 percentile

































































































































































Total (% of sample)











Cluster 2 (24%) are quite mild, high functioning cases. In particular they have mild

communication symptoms (77% are categorized as high functioning on this domain) and only a

handful are low functioning on either domain. They were also very likely to be diagnosed at age

5 or later, which likely reflects the difficulty in identifying their less severe symptoms as autism.

Although the DDS does not provide services to those with an Asperger’s diagnosis 3 , these

children may be closer to that part of the autism spectrum. Their mothers do not differ so

dramatically as those in cluster 1, however they are unlikely to have very young or poorly

educated mothers.

Cluster 3 (14%), in contrast, are more severe cases. 77% and 82% are low functioning on

the social and communication domains, respectively, and 2/3 have a diagnosis of intellectual

disability. Although the differences are not enormous, they are also more likely to be male and to

have been born with low birthweight than the overall sample. These children have mothers who

are neither particularly old nor young, and are of average education (although a bit less likely

than average to have a college degree).

Cluster 4 (13%) is the smallest cluster, and as with cluster 1 the characteristics of the

mother are highly salient to classification. These mothers tend to be young (80% are under 25)

and poorly educated (77% did not complete high school and only 1% graduated from college).

This group has a higher than average rate of intellectual disability diagnosis, but their autism was

frequently identified at an older age (69%). Although their social and communication symptoms

are not especially likely to be categorized as low functioning, they are unlikely to be in the high

functioning social category.

[Note to self: should investigate the role of race for this category.]


3 The distinction between Asperger’s Syndrome and Autistic Disorder is that people with Asperger’s have social deficits combined with restricted interests or repetitive behaviors, but do not have sufficient communication deficits to be diagnosed with Autism.

seems likely that the cognitive deficits of this group, combined with the lack of maternal

resources, delayed the identification of autism. These group is composed mainly of relatively

disadvantaged children whose autism is comorbid with intellectual disability.

Finally, Cluster 5 (28%) is the largest cluster and is closest to the sample mean on many

variables. The children’s symptoms are similar to those of cluster 1, in that they have few

intellectual disability diagnoses, tend to be neither very high nor low functioning, and were all

diagnosed by age 4. The main difference is in the characteristics of their mothers: 85% graduated

from high school but not from college, and they are much less likely to be over age 35. More

than any other cluster, these are sort of the “typical” modern autism case, and as Figure 1 shows

below, they have risen most dramatically during the study period.

Figure 1 displays the cluster proportions of all children in the sample by birth year. By

tracing the rise (or fall) or particular sub-groups in the California autism population we can begin

to understand how the composition of autism cases has changed over time, and perhaps find

clues to the mechanisms producing each type of diagnosis. The first thing to notice about this

graph is that clusters 1 and 5 have risen the most dramatically and steadily during this period.

Recall these are the two groups with non-severe symptoms, no ID, and early diagnosis. These

kinds of cases are making up an increasing proportion of the population; combined they are 18%

of 1992 births but 66% of 2005 births.

0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Proportion of Annual Cases
Proportion of Annual Cases

"Typicals"0.25 0.2 0.15 0.1 0.05 0 Proportion of Annual Cases Older Educated Mothers Late Dx, Good

Older Educated Mothers Mothers

Late Dx, Goodof Annual Cases "Typicals" Older Educated Mothers Communication Severe Cases with ID 4 Young Low Education


Severe Cases with IDOlder Educated Mothers Late Dx, Good Communication 4 Young Low Education Mothers 1992 1993 1994 1995

Mothers Late Dx, Good Communication Severe Cases with ID 4 Young Low Education Mothers 1992 1993

4 Young Low

Education Mothers

1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

Birth Year

Figure 1. Proportion of All Cases in Each Cluster, By Birth Year

On the other hand, the pattern for cluster 2 shows a steady decline from the most

common cluster in 1992 to the least common in 2005. This reflects the declining age of diagnosis

over time as autism is more likely to be identified at a pre-school age, as has been documented in

other studies (Fountain, King, and Bearman 2011). This group also tends to be high functioning

on communication, and although the symptom severity of the autism population in California has

tended to become milder, this may reflect the changing association of symptoms with

socioeconomic status. In recent years, higher function, particularly on the communication

domain, has become increasingly associated with socioeconomic status. Thus, one possibility is

that during the study period less severe symptoms have become more closely associated with

maternal age and education, leading these children to be assigned to groups 1 and 5 rather than 2.

Clusters 3 and 4 show less dramatic temporal patterns. Both have remained relatively

steady, with perhaps a slight decline over time as a proportion of the population. Group 3,

composed of the most severe cases with comorbid ID, has a slight bump in 1995. At the risk of

over interpreting what may be a random fluctuation, this bump comes directly after the 1994

revision to autism diagnostic criteria in the DSM-IV, which other researchers have linked to an

increase in autism cases among those with ID (King and Bearman 2009). Group 4, which has

young and poorly educated mothers, also shows a slight downward trend. Although the number

of individuals in this cluster has risen over time, as a proportion of the population it has

diminished as the total autism population grew more quickly. These children continue to exist in

the population, although it has become increasingly a diagnosis for the children of more educated


Next, we examine how these clusters map onto the system of regional centers. These are

the local institutions that are responsible for establishing or confirming autism diagnosis,

monitoring symptoms and needs, and coordinating service provision for clients. Although all

regional centers use standardized diagnostic and evaluative instruments and follow the same

“best practices” guidelines, they can also differ in important ways, such as the personalities and

preferences of local leadership as well as the demographic composition of the regional center

catchment area.

In Figure 2 below, each bar depicts the cluster composition of a regional center (RC).

Bar width is scaled by the size of the autism caseload of that RC and they are presented in

alphabetical order. There is substantial variability in the kinds of autism clients diagnosed in

serviced in each RC. For example, cluster 1 (older, highly educated mothers) is much more

common in RCs located in well-off areas such as Orange County, Golden Gate (San Francisco)

and LA’s West Side. On the other hand, cluster 4 (young, poorly educated mothers) is most

common among RCs in the agricultural Kern and Central Valley as well as East and South

Central LA. The severe cases with comorbid ID in cluster 3 are most common in the agricultural

Valley Mountain RC (containing Stanislaus and San Joaquin Counties) as well as San Diego and

and San Joaquin Counties) as well as San Diego and Orange County. It is unclear why

Orange County. It is unclear why there is such a large cluster in southern California; more

analysis of the association between neighborhood characteristics and these clusters is needed to

understand the contextual production of autism subtypes.

Figure 2. Cluster Composition of Client Population by Regional Center

Figure 2. Cluster Composition of Client Population by Regional Center

[Additional analyses to come:

Clusters by the characteristics of their neighborhoods: density of pediatricians and child

psychiatrists, density of autism cases, socioeconomic composition.]


In this paper I have identified and described a heterogeneous set if five sub-groups within

the population of children with autism born in California between 1992 and 2005. These groups

differ in phenotypic, biological, and sociological ways. Although this analysis is descriptive in

nature, the varying associations among these sets of factors combine to reveal important patterns,

such as the way that symptom severity and maternal education are often linked.

Moreover, the production of children into these subgroups is not even over space and

time. Some sub-types are more common in the early years of the autism “epidemic” and other

groups have become dominant as the prevalence of autism has risen. This is no coincidence: as

visibility has increased, treatments options improved, and stigma diminished, autism has become

less a very severe disorder often accompanied by ID, and more common in its milder forms. As

the autism population has changed, its association with socioeconomic variables has changed as

well. So as parents increasingly delay fertility, we also observe the rapid rise of non-severe

autism among older, highly-educated mothers.

These findings also have important implications for autism research in general. If there is

no average or typical autism case, with others normally distributed about the center, but rather

distinct and variable then we need to be careful about the inferences made based on regression

analysis. At the very least, we need to take time and place more seriously in our analyses and

avoid clumping together individuals who were diagnosed under very different regimes. The

context in which autism is being diagnosed is changing, and so the kinds and magnitudes of risk

factors are changing as well. Several other papers in this area have exemplified the importance of

the shifting context of autism, with respect to immigration policy (Fountain and Bearman 2011),

parental age (Kayuet Liu et al. 2010; King et al. 2009), diagnostic change (King and Bearman

2009), and social influence (Liu et al. 2010).

This research does have some limitations. The data come from linked administrative

datasets that were collected for non-scientific purposes, and thus are not always ideal. Although

maternal education does appear to be highly salient to the identification of these clusters, it is

also only a proxy for socioeconomic status, and more detailed information on this would be

preferable. Similarly, the variables capturing severity of autism symptoms were chosen to assess

service needs, and are not consistent across the study period. Finally, as descriptive research

clustering methods cannot valuate causal explanations, but can only find patterns in the data and

suggest avenues for future study.

The next step in this research is to look at the spatial and socioeconomic distribution of

these autism sub-groups in a more detailed way, by locating these children in their

neighborhoods of birth and diagnosis. Then, we are able to link in data from other sources on the

density of pediatricians and child psychiatrists, number of other autism cases, as well as census

data on the socioeconomic composition of neighborhoods in order to understand how the

characteristics of places, including neighborhood resources, contribute to the production of these



This is the first cluster analysis of autism on a large and representative state-wide

population. It is also the first to jointly consider the phenotypic, biological, and social factors that

contribute to autism diagnoses. As such, it has revealed not only the patterns of association

between these factors among children with autism, but the variation over space and time.


ADDMN. 2007. “Prevalence of Autism Spectrum Disorders–autism and Developmental Disabilities Monitoring Network, 14 Sites, United States, 2002.” Morbidity and Mortality Weekly Report Surveillance Summaries 56:12:28.

Burchard, Esteban Gonzalez et al. 2003. “The Importance of Race and Ethnic Background in Biomedical Research and Clinical Practice.” N Engl J Med 348(12):1170–75.

Cacioppo, John T., James H. Fowler, and Nicholas A. Christakis. 2009. “Alone in the Crowd:

The Structure and Spread of Loneliness in a Large Social Network.” Journal of personality and social psychology 97(6):977–91.

Centers for Disease Control and Prevention. 2006. “Mental Health in the United States: Parental Report of Diagnosed Autism in Children Aged 4-17 Years--United States, 2003-2004.” MMWR. Morbidity and Mortality Weekly Report 55(17):481–86.

Cheslack-Postava, Keely, Kayuet Liu, and Peter S. Bearman. 2011. “Closely Spaced Pregnancies Are Associated With Increased Odds of Autism in California Sibling Births.” Pediatrics 127(2):246 –253.

Chess, Stella. 1971. “Autism in Children with Congenital Rubella.” Journal of Autism and Childhood Schizophrenia 1(1):33–47.

Chess, Stella. 1977. “Follow-up Report on Autism in Congenital Rubella.” Journal of Autism and Childhood Schizophrenia 7(1):69–81.

Christakis, Nicholas A. and James H. Fowler. 2007. “The Spread of Obesity in a Large Social Network over 32 Years.” New England Journal of Medicine 357(4):370–79.

Christakis, Nicholas A. and James H. Fowler. 2008. “The Collective Dynamics of Smoking in a Large Social Network.” New England Journal of Medicine 358(21):2249–58.

Cohen-Cole, E. and J. M. Fletcher. 2008. “Detecting Implausible Social Network Effects in Acne, Height, and Headaches: Longitudinal Analysis.” BMJ 337(dec04 2):a2533–a2533.

Croen, Lisa A., Judith K. Grether, Jenny Hoogstrate, and Steve Selvin. 2002. “The Changing Prevalence of Autism in California.” Journal of Autism and Developmental Disorders


Croen, Lisa A., Judith K. Grether, and Steve Selvin. 2002. “Descriptive Epidemiology of Autism in a California Population: Who Is at Risk?” Journal of Autism and Developmental Disorders 32(3):217–24.

Croen, Lisa A., Daniel V. Najjar, Bruce Fireman, and Judith K. Grether. 2007. “Maternal and Paternal Age and Risk of Autism Spectrum Disorders.” Archives of Pediatrics & Adolescent Medicine 161(4):334–40.

Developmental Disabilities Monitoring Network Surveillance Year 2010 Principal Investigators. 2014. “Prevalence of Autism Spectrum Disorder among Children Aged 8 Years - Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2010.” Morbidity and mortality weekly report. Surveillance summaries (Washington, D.C.:

2002) 63 Suppl 2:1–21.

Division of Cancer Prevention and Control. 2007. Link Plus. Atlanta, GA: Center for Disease Control and Prevention.

Durkin, Maureen S. et al. 2008. “Advanced Parental Age and the Risk of Autism Spectrum Disorder.” Am. J. Epidemiol. 168(11):1268–76.

Durkin, Maureen S. et al. 2010. “Socioeconomic Inequality in the Prevalence of Autism Spectrum Disorder: Evidence from a U.S. Cross-Sectional Study.” PLoS ONE


Eaves, Linda C., Helena H. Ho, and David M. Eaves. 1994. “Subtypes of Autism by Cluster Analysis.” Journal of Autism and Developmental Disorders 24(1):3–22.

Everitt, Brian S., Sabine Landau, Morven Leese, and Daniel Stahl. 2011a. “An Introduction to Classification and Clustering.” Pp. 1–13 in Cluster Analysis. John Wiley & Sons, Ltd. Retrieved January 2, 2016


Everitt, Brian S., Sabine Landau, Morven Leese, and Daniel Stahl. 2011b. “Optimization Clustering Techniques.” Pp. 111–42 in Cluster Analysis. John Wiley & Sons, Ltd. Retrieved January 2, 2016


Fombonne, E. 2005. “Epidemiology of Autistic Disorder and Other Pervasive Developmental Disorders.” The Journal of clinical psychiatry 66:3.

Fountain, C., M. D. King, and P. S. Bearman. 2011. “Age of Diagnosis for Autism: Individual and Community Factors across 10 Birth Cohorts.” Journal of epidemiology and community health 65(6):503–10.

Fountain, C., A. S. Winter, and P. S. Bearman. 2012. “Six Developmental Trajectories Characterize Children with Autism.” Pediatrics 129(5):e1112–e1120.

Fountain, Christine et al. 2015. “Association between Assisted Reproductive Technology Conception and Autism in California, 1997-2007.” American Journal of Public Health


Fountain, Christine and Peter Bearman. 2011. “Risk as Social Context: Immigration Policy and Autism in California.” Sociological Forum (Randolph, N.J.) 26(2):215–40.

Fountain, Christine, Marissa D. King, and Peter S. Bearman. 2010. “Age of Diagnosis for Autism: Individual and Community Factors across 10 Birth Cohorts.” Journal of Epidemiology and Community Health.

Fowler, James H. and Nicholas A. Christakis. 2008. “Dynamic Spread of Happiness in a Large Social Network: Longitudinal Analysis over 20 Years in the Framingham Heart Study.” BMJ 337(dec04 2):a2338–a2338.

Freitag, C. M. 2006. “The Genetics of Autistic Disorders and Its Clinical Relevance: A Review of the Literature.” Mol Psychiatry 12(1):2–22.

Garip, Filiz. 2012. “Discovering Diverse Mechanisms of Migration: The Mexico–US Stream 1970–2000.” Population and Development Review 38(3):393–433.

Georgiades, STELIOS et al. 2007. “Structure of the Autism Symptom Phenotype: A Proposed Multidimensional Model.” Journal of the American Academy of Child & Adolescent Psychiatry 46(2):188–96.

Hultman, Christina M., Par Sparen, and Sven Cnattingius. 2002. “Perinatal Risk Factors for Infantile Autism.” Epidemiology 13(4):417–23.

Hvidtjorn, D. et al. 2009. “Cerebral Palsy, Autism Spectrum Disorders, and Developmental Delay in Children Born after Assisted Conception: A Systematic Review and Meta- Analysis.” Archives of Pediatrics and Adolescent Medicine 163(1):72.

Hvidtjørn, D. et al. 2011. “Risk of Autism Spectrum Disorders in Children Born after Assisted Conception: A Population-Based Follow-up Study.” Journal of Epidemiology and Community Health 65(6):497–502.

Kayuet Liu, Noam Zerubavel, and Peter Bearman. 2010. “Social Demographic Change and Autism.” Demography 47(2):327–43.

King, Marissa D. and Peter S. Bearman. 2009. “Diagnostic Change and the Increased Prevalence of Autism.” Int. J. Epidemiol. 38(5):1224–34.

King, Marissa D. and Peter S. Bearman. 2011. “Socioeconomic Status and the Increased Prevalence of Autism in California.” American Sociological Review 76(2):320–46.

King, Marissa D., Christine Fountain, Diana Dakhlallah, and Peter S. Bearman. 2009. “Estimated Autism Risk and Older Reproductive Age.” Am J Public Health 99(9):1673–79.

Larsson, Heidi Jeanet et al. 2005. “Risk Factors for Autism: Perinatal Factors, Parental Psychiatric History, and Socioeconomic Status.” Am. J. Epidemiol. 161(10):916–25.

Link, B. G., M. E. Northridge, J. C. Phelan, and M. L. Ganz. 1998. “Social Epidemiology and the Fundamental Cause Concept: On the Structuring of Effective Cancer Screens by Socioeconomic Status.” The Milbank Quarterly 76(3):375–402.

Link, B. G. and J. Phelan. 1995. “Social Conditions as Fundamental Causes of Disease.” Journal of Health and Social Behavior 35:80–94.

Link, Bruce G. and Jo Phelan. 2009. “The Social Shaping of Health and Smoking.” Drug and Alcohol Dependence 104(Supplement 1):S6–S10.

Liptak, Gregory S. et al. 2008. “Disparities in Diagnosis and Access to Health Services for Children with Autism: Data from the National Survey of Children’s Health.” Journal of Developmental and Behavioral Pediatrics: JDBP 29(3):152–60.

Liu, Ka-Yuet, Marissa D. King, and Peter S. Bearman. 2010. “Social Influence and the Autism Epidemic.” American Journal of Sociology 115(5):1387–1434.

Liu, Kayuet and Peter S. Bearman. 2012. “Focal Points, Endogenous Processes, and Exogenous Shocks in the Autism Epidemic.” Sociological Methods & Research. Retrieved October 1, 2012 (http://smr.sagepub.com/content/early/2012/09/17/0049124112460369).

Lyons, Russell. 2010. “The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis.” 1007.2876. Retrieved August 1, 2011 (http://arxiv.org/abs/1007.2876).

Mandell, David S. et al. 2009. “Racial/ethnic Disparities in the Identification of Children with Autism Spectrum Disorders.” American Journal of Public Health 99(3):493–98.

Marmot, M. G. 2004. The Status Syndrome: How Social Standing Affects Our Health and Longevity. Macmillan.

Mazumdar, Soumya, Marissa D. King, Ka-Yuet Liu, Noam Zerubavel, and Peter S. Bearman.

2010. “The Spatial Structure of Autism in California, 1993-2001.” Health & Place


Mazumdar, Soumya, Alix Winter, Ka-Yuet Liu, and Peter Bearman. 2013. “Spatial Clusters of Autism Births and Diagnoses Point to Contextual Drivers of Increased Prevalence.” Social Science & Medicine (1982) 95:87–96.

Palmer, Raymond F., Tatjana Walker, David S. Mandell, Bryan Bayles, and Claudia S. Miller.

2009. “Explaining Low Rates of Autism Among Hispanic Schoolchildren in Texas.” Am

J Public Health AJPH.2008.150565.

Pescosolido, Bernice A. 1992. “Beyond Rational Choice: The Social Dynamics of How People Seek Help.” American Journal of Sociology 97(4):1096.

Phelan, Jo C., Bruce G. Link, and Parisa Tehranifar. 2010. “Social Conditions as Fundamental Causes of Health Inequalities.” Journal of Health and Social Behavior 51(1 suppl):S28 –


Prior, Margot et al. 1998. “Are There Subgroups within the Autistic Spectrum? A Cluster Analysis of a Group of Children with Autistic Spectrum Disorders.” The Journal of Child Psychology and Psychiatry and Allied Disciplines 39(06):893–902.

Rablen, Matthew D. and Andrew J. Oswald. 2008. “Mortality and Immortality: The Nobel Prize as an Experiment into the Effect of Status upon Longevity.” Journal of Health Economics


Redelmeier, Donald A. and Sheldon M. Singh. 2001. “Survival in Academy Award–Winning Actors and Actresses.” Annals of Internal Medicine 134(10):955 –962.

Ronald, Angelica et al. 2006. “Genetic Heterogeneity between the Three Components of the Autism Spectrum: A Twin Study.” Journal of the American Academy of Child and Adolescent Psychiatry 45(6):691–99.

Russell, Ginny, Colin Steer, and Jean Golding. 2010. “Social and Demographic Factors That Influence the Diagnosis of Autistic Spectrum Disorders.” Social Psychiatry and Psychiatric Epidemiology. Retrieved December 15, 2010


Schieve, Laura A. et al. 2015. “Does Autism Diagnosis Age or Symptom Severity Differ Among Children According to Whether Assisted Reproductive Technology Was Used to Achieve Pregnancy?” Journal of Autism and Developmental Disorders 45(9):2991–3003.

Shalizi, C. R. and A. C. Thomas. 2011. “Homophily and Contagion Are Generically Confounded in Observational Social Network Studies.” Sociological Methods & Research 40(2):211–


Shattuck, Paul T. et al. 2009. “Timing of Identification among Children with an Autism Spectrum Disorder: Findings from a Population-Based Surveillance Study.” Journal of the American Academy of Child and Adolescent Psychiatry 48(5):474–83.

Shi, Limin, S. Hossein Fatemi, Robert W. Sidwell, and Paul H. Patterson. 2003. “Maternal Influenza Infection Causes Marked Behavioral and Pharmacological Changes in the Offspring.” J. Neurosci. 23(1):297–302.

Stevens, Michael C. et al. 2000. “Subgroups of Children With Autism by Cluster Analysis: A Longitudinal Examination.” Journal of the American Academy of Child & Adolescent Psychiatry 39(3):346–52.




-AmericanSociologicalAssociationisthepropertyof CopyrightofConferencePapers-