Вы находитесь на странице: 1из 10

25/10/14 01:11 Job satisfaction versus family income (GSS)

Page 1 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Dat0and%20Stat%20Inference%20Duke/Project/dasi_project_template.html
Job satisfaction versus family Job satisfaction versus family
income (GSS) income (GSS)
October 18, 2014 October 18, 2014
Introduction: Introduction:
Steve Jobs once said The only way to be truly satised is to do what you believe is great work, and the
only way to do great work is to love what you do. This is a quote from Apple Inc.s deceased legend but
such a belief is widespread among the Worlds elite, big corporations and beyond. It is believed that a
satised sta! will produce a great work, which in turn will make the company successful. But is there a
nancial reward for employees that satises such expectations? In other words, does satised
employees have on average higher family income than less satised ones? Or we could ask the question
the other way round: Does a well paid job brings on average more satisfaction?
To evaluate those questions with data, a R-ready data-set from the General Social Survey (Smith et al,
2013) was used (see below for the full citation). In particular, the subsequent statistical analysis use
respondents answers about job satisfaction and (ination adjusted) family income.
Data: Data:
Data Collection / Cases
The GSS data were collected between 1972 and 2012. The population (universe) from which the data
were collected (individuals) are all non-institutionalized, English and Spanish speaking persons 18 years
of age or older, living in the United States. Until 2006, only the English speaking persons were surveyed.
Starting in 2006, the GSS sampled Spanish speakers in addition to English speakers.
These individuals were surveyed using standardized questionnaire through computer-assisted personal
interview (CAPI), face-to-face interview, and telephone interview. The individuals were selected using
modied probability design (from 1972 through 1974) and full probability design (from 1973 onward).
Each case (each respondents answers to the survey) is one row of the data-set (one lled questionnaire).
Selected Variables
The following GSS variables were used for the subsequent statistical analyses:
First variable = satjob: The cases were asked the following question: On the whole, how satised are
you with the work you do?. The answers were collected on an ordinal scale (from 1 to 4 - ordinal
categorical variable) labelled as follows: Very satised (1), Moderately satised (2), A little dissatised (3),
Very dissatised (4).
Second variable = coninc: The family income in constant dollars (ination-adjusted). The answers
varies between a minimum of 383 dollars and a maximum of 180386 dollars. It is thus quantitative /
numerical variable
Type of Study
25/10/14 01:11 Job satisfaction versus family income (GSS)
Page 2 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Dat0and%20Stat%20Inference%20Duke/Project/dasi_project_template.html
The General Social Survey (GSS) is an observational study. The data were collected using a survey
questionnaire. As it is a survey, respondents were not randomly assigned to treatment or control groups.
Therefore, and even if the authors did a great job at selecting a representative sample of the population,
this study cannot be considered as an experiment. Indeed, it would be both unethical and unpractical to
assign an individual to a group to that is Very satised, or Moderately satised, or A little
dissatised, or Very dissatised of his/her job.
Scope of Inference - generalizability
The ndings of this analysis can be generalized to all all non-institutionalized, English and Spanish
speaking persons 18 years of age or older, living in the United States. This is because the cases were
selected using a modied or full probability design (the investigators took care to develop a sampling
design that gives equal probability of selection to any individual belonging to the population). Since it is a
survey, there is potential for response and non-response bias. Since liking his/her job is a socially
desirable trait, a social desirability bias (a type of response bias) cannot be excluded. Also, since only
English and Spanish speakers were sampled, the results of the inference cannot be generalized to other
languages speakers (less than 1% of the USA household population according to the authors of the
study). Observations collected before 2006 did not include Spanish speakers (less than 2% of the
population).
Scope of Inference - causality
The inference intends to explore relationship between job satisfaction and family income. The data
cannot be used to establish a causal relationship between these two variables since it is an observational
study (and thus not an experiment). It would be di"cult to blindly administer job satisfaction (or
dissatisfaction) to someone and then see the impact on the family income ;-). In other words, correlation
is not causation, there could exist a confounding variable not included in the analysis.
Exploratory data analysis: Exploratory data analysis:
Exploratory data analysis for the rst variable: = satjob
The GSS survey questionnaire asked respondents to rate their job satisfaction on an ordinal scale as
follows: Very Satised, Moderately Satised, A Little Dissatised, or Very Dissatised. If (for any
reason) the respondent did not rate his/her job satisfaction, an NA was recorded in the data-set. The R-
ready data-set has 57061 rows (respondents / cases). Out of these, 19717 (35 %) declared they were
Very Satised with their job, 15736 (28 %) Moderately Satised, 4109 (7 %) A Little Dissatised,
1715 (3 %) Very Dissatised, and 15784 (28 %) did not rate their job satisfaction (NA). These data
show that the surveyed Americans are on average satised of their jobs, with approximately 63 % being
very or moderately satised, and only 10 % confessing some form of dissatisfaction. Nevertheless, these
high rates of job satisfaction could suggest some social desirability bias since being satised with his/her
job is a socially desirable trait. Also a striking 28 % did not rate their job satisfaction (NAs). This
important non-response rate could impact the generalizability of the subsequent statistical analyzes
(non-response bias). To get a better overview of the job satisfaction results and investigate this important
non-response rate, the results were broken down per year and displayed using a mosaicplot (where the
outermost bar on the left summarizes the whole GSS data-set):
25/10/14 01:11 Job satisfaction versus family income (GSS)
Page 3 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Dat0and%20Stat%20Inference%20Duke/Project/dasi_project_template.html
The mosaicplot shows non-response rates for job satisfaction around 20% for most surveyed years with
the exception of 1972, 2002, 2004, and 2006 when high non-response rates were recorded (41%, 62%,
50%, and 52% respectively).
For the remaining of the project, non-responses (NAs) were omitted. The mosaic plot showing annual job
satisfaction results without non-response can be found below along with a summary() of the 2-way
contingency table (i.e. years versus job satisfaction levels). Although job satisfaction rates seems to
remain fairly constant throughout the surveyed years (e.g. high level of Very Satised), the summary()
function applied to the contingency table displays a chi-squared statistics of 228 for 84 degrees of
freedom(= (29 years - 1) x (4 levels - 1) = 84), which yield a very tiny p-value. Therefore, the data do
support the rejection of the null hypothesis and, consequently we conclude that it exists small but
statistically signicant di!erences between annual job satisfaction results. Using a Pareto chart analysis
(not show), the major contributors to the chi-squared statistic of 228 were identied as years 1984 (14%
of the chi-squared statistic), 1987 (11%), 1975 (8.7%), 2000 (8.3%), and 1980 (5.7%). Nevertheless, for
the inference part of the project, annual results for job satisfaction were pooled (in order to keep only 2
variables, as requested).
## Number of cases in table: 41277
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 228, df = 84, p-value = 3e-15
25/10/14 01:11 Job satisfaction versus family income (GSS)
Page 4 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Dat0and%20Stat%20Inference%20Duke/Project/dasi_project_template.html
Exploratory data analysis for the second variable: = satjob
To start exploring respondents family income, basic summary statistics were produced using the whole
GSS data-set:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
383 18400 35600 44500 59500 180000 5829
Excluding NA, the family income of the sample ranges from $383 to $180,400. The median family income
is $35,600 and the mean family income is $44,500. This suggest a right skewed distribution of family
income. The histogram for family income conrms that the distribution is right skewed (with a few large
family incomes that pull the mean to the right of the median):
To explore how distribution of ination-adjusted family income has evolved from 1972 to 2012, a boxplot
(using years as factors) was produced. The plot shows that while the mean (red dots) family income has
increases from $38,418 in 1972 to $48,385 in 2012, the median family income (blue dots) has remain
25/10/14 01:11 Job satisfaction versus family income (GSS)
Page 5 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Dat0and%20Stat%20Inference%20Duke/Project/dasi_project_template.html
fairly constant (although there is some ups and downs, it was $33,333 back in 1972 versus $34,470 in
2012). Even more interestingly, the skewness of the distribution has increases throughout the study,
suggesting inequalities in family income have increased during the observational study.
For the inference part of the project, a subset of the GSS keeping only the cases for which job
satisfaction and family income are known was used (i.e. NAs were omitted).
Inference: Inference:
A boxplot (for the whole observational period) was drawn to explore for a relationship (if any) between job
satisfaction and ination-adjusted family income. The plot suggests that job satisfaction is correlated
with a reduced ination-adjusted family income:
25/10/14 01:11 Job satisfaction versus family income (GSS)
Page 6 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Dat0and%20Stat%20Inference%20Duke/Project/dasi_project_template.html
ANOVA analysis is a widely used method for comparing the means of more than two groups. There are
three conditions for an ANOVA analysis: First, all observations must be independent. This rst condition
is met since the GSS sample is a random sample taken from less than 10% of the population. The
second condition is that the groups must be approximately normal. This second condition is not met
since the boxplot shows that all the groups display deviation from normality (i.e. right skewed) and have
outliers (i.e. some very high family incomes). However, since the smallest group (i.e. the Very
Dissatised group) has 1570 observations, the Central Limit Theorem states that the sampling
distribution for the group means should be approximately normal. To verify this 5000 simulations were
run by taking a random sample of 1570 observations, with replacement (and set.seed = 100). For each
simulation the sample mean was determined and the sampling distribution of these means was tested for
normality using the Shapiro-Wilk normality test. The null hypothesis states that the sampling distribution
is normal. The alternative hypothesis states that the sampling distribution is not normal. The results are
displayed below:
Shapiro-Wilk normality test
data: sample_means
W = 0.9996, p-value = 0.4854
With a p-value of 0.4854 for the Shapiro-Wilk normality test, it is assumed that the second condition is
met.
The third condition is that the groups variances should be about equals. This third condition is not met
since, as display below, standard deviations decrease with decreasing job satisfaction:
Very Satisfied Mod. Satisfied A Little Dissat Very Dissatisfied
37346 33667 32662 30481
Therefore, a one-way ANOVA analysis with Welchs correction applied (i.e. homogeneity of variance is not
assumed) was used to test for di!erences between the groups means. The null hypothesis states that
means family income are the same for the 4 groups of job satisfaction. While the alternative hypothesis
states that at least one mean is di!erent. The results are displayed below:
One-way analysis of means (not assuming equal variances)
data: my_gss$coninc and my_gss$satjob
F = 261.5, num df = 3, denom df = 6103, p-value < 2.2e-16
With such a tiny p-value, the null hypothesis is rejected and it is concluded that at least one mean is
di!erent.
In order to identify which mean(s) is (are) di!erent, all the 6 possible pairwise comparisons were
performed using the Welch Two Sample t-test (i.e. equal variance not assumed). Since this test requires
the same conditions than the one-way ANOVA analysis, the above discussion is not repeated. To avoid
25/10/14 01:11 Job satisfaction versus family income (GSS)
Page 7 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Dat0and%20Stat%20Inference%20Duke/Project/dasi_project_template.html
nding di!erences just by chance, the Bonferroni correction was applied in order to use a more stringent
signicance level of 0.0083. Consequently, for each pairwise comparison a condence
interval of 0.9917 was used for the di!erence between means. The null hypothesis states that the means
of the Sample-1 and Sample-2 are equal (i.e. no di!erence in means - see columns 1 and 2). While the
alternative hypothesis states that the means are not equal. The results are summarized in the data frame
below (where the rst and second columns identify the levels being compared, the next 2 columns give
the lower and upper boundaries of the 0.9917 condence interval for the di!erence in means, and the
last column the actual p-value for the observed di!erence):
Sample_1 Sample_2 Low_CI_99.2 High_CI_99.2 p_values
1 Very Satisfied Mod. Satisfied 5484 7568 4.04e-61
2 Very Satisfied A Little Dissat 10384 13548 6.36e-86
3 Very Satisfied Very Dissatisfied 15243 19564 1.02e-90
4 Mod. Satisfied A Little Dissat 3856 7024 1.68e-19
5 Mod. Satisfied Very Dissatisfied 8715 13039 1.13e-38
6 A Little Dissat Very Dissatisfied 2970 7905 6.59e-09
Based on the tiny p-values listed above, we conclude that there is statistically signicant di!erence
between all family income groups that were compared.
Conclusion: Conclusion:
The GSS survey data-set was used to test for di!erences in ination-adjusted family incomes between
respondents that reported di!erent levels of job satisfaction. Statistically signicant di!erences were
observed between all the levels of job satisfaction: on average, decreasing levels of job satisfaction were
correlated with decreasing ination-adjusted mean family income. Nevertheless, it cannot be concluded
that a higher income brings more job satisfaction or, turning the statement around, that a higher level of
job satisfaction produces a higher income. Indeed, the GSS survey is an observational study (i.e. not a
randomized experiment) and therefore no causality can be derived from the data. It should also be noted
that non-negligible non-response rates were recorded throughout the whole study for both variables.
Therefore since there is a risk of non-response bias that could impact the generalization of the results to
the American population, the results can only be generalized to american that would answer both
questions if surveyed. Moreover, since a high family income and a high level of job satisfaction are both
socially desirable, there is also a risk of a social desirability bias (a type of response bias - e.g. inating
the income when low and the opposite when high, or refusing to confess any job dissatisfaction).
Therefore, the results of the statistical analysis presented herein should be used with caution.
To further explore factors that a!ect job satisfaction and/or family income, other variables of the GSS
survey could be used (such as gender, education, mental health, etc.). To identify these factors, a
principal components analysis (PCA) could be performed. This would feed the variables needed for
multivariate data analysis (MVDA) or to identify potential confounding factors for the e!ects that were
observed in this study. Ideas for subsequent GSS survey would be to include questions or other survey
methods to adjust/evaluate non-response and social desirability bias (or one could simply compare
family income and job satisfaction results from face-to-face interview versus computer assisted
interview). Also statistical regression (that will be seen during the last two weeks of this course) could be
= =

6
25/10/14 01:11 Job satisfaction versus family income (GSS)
Page 8 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Dat0and%20Stat%20Inference%20Duke/Project/dasi_project_template.html
used to further evaluate the observed correlation between decreasing job satisfaction and reduced mean
family income (although a glimse is already provided in Figure 5 where the regression line was plotted on
the boxplot).
References: References:
Data Citation
The data used herein are from Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social
Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion
Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social
Research [distributors], 2013-09-11. Persistent URL: http://doi.org/10.3886/ICPSR34802.v1
(http://doi.org/10.3886/ICPSR34802.v1)
The codebook describing the survey questions can be found at the following link:
https://d396qusza40orc.cloudfront.net/statistics%2Fproject%2Fgss1.html
(https://d396qusza40orc.cloudfront.net/statistics%2Fproject%2Fgss1.html)
The statistical analyses were performed using R. The corresponding citations are:
##
## To cite R in publications use:
##
## R Core Team (2014). R: A language and environment for
## statistical computing. R Foundation for Statistical Computing,
## Vienna, Austria. URL http://www.R-project.org/.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2014},
## url = {http://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please
## cite it when using it for data analysis. See also
## 'citation("pkgname")' for citing R packages.
25/10/14 01:11 Job satisfaction versus family income (GSS)
Page 9 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Dat0and%20Stat%20Inference%20Duke/Project/dasi_project_template.html
##
## The 'graphics' package is part of R. To cite R in publications
## use:
##
## R Core Team (2014). R: A language and environment for
## statistical computing. R Foundation for Statistical Computing,
## Vienna, Austria. URL http://www.R-project.org/.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2014},
## url = {http://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please
## cite it when using it for data analysis. See also
## 'citation("pkgname")' for citing R packages.
##
## The 'stats' package is part of R. To cite R in publications use:
##
## R Core Team (2014). R: A language and environment for
## statistical computing. R Foundation for Statistical Computing,
## Vienna, Austria. URL http://www.R-project.org/.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2014},
## url = {http://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please
## cite it when using it for data analysis. See also
## 'citation("pkgname")' for citing R packages.
25/10/14 01:11 Job satisfaction versus family income (GSS)
Page 10 of 10 le:///Users/paulacookimac/Documents/MATHIEU/MOOC/Statistics/Datand%20Stat%20Inference%20Duke/Project/dasi_project_template.html
Appendix: Appendix:
head(my_gss, n = 10)
## caseid year coninc satjob
## 1 1 1972 25926 A Little Dissat
## 3 3 1972 33333 Mod. Satisfied
## 4 4 1972 41667 Very Satisfied
## 6 6 1972 60185 Mod. Satisfied
## 7 7 1972 50926 Very Satisfied
## 8 8 1972 18519 A Little Dissat
## 9 9 1972 3704 Mod. Satisfied
## 10 10 1972 25926 Mod. Satisfied
## 12 12 1972 18519 Very Satisfied
## 13 13 1972 18519 Very Satisfied
tail(my_gss, n = 10)
## caseid year coninc satjob
## 57050 57050 2012 6894 Mod. Satisfied
## 57051 57051 2012 4213 Very Satisfied
## 57052 57052 2012 63195 Mod. Satisfied
## 57054 57054 2012 14363 A Little Dissat
## 57056 57056 2012 383 Mod. Satisfied
## 57057 57057 2012 14363 Mod. Satisfied
## 57058 57058 2012 383 Mod. Satisfied
## 57059 57059 2012 76600 Very Satisfied
## 57060 57060 2012 14363 Mod. Satisfied
## 57061 57061 2012 383 A Little Dissat

Вам также может понравиться