Introduction

This project will allow you to pull together many of the concepts you are studying this semester,

including organizing and analyzing data, drawing conclusions using confidence intervals and

hypothesis tests, and presenting your work in a well-organized paper. Your overall report will

be a narrative that clearly explains your process and your conclusions. Your included

mathematical calculations may be neatly handwritten and scanned, but overall summaries and

written conclusions must be word processed.

This is the signature assignment for this course. It is to be posted in your e-portfolio as a single

document. Your e-portfolio must be linked through your SLCC MyPage account. See the specific

e-portfolio instructions for details.

Data Collection

Because of the scope of this project (and because of issues with access to data), data from the

year 2014 has been collected in advance for you to analyze. This is actual data collected from a

simple random sample taken from the US Department of Labor statistics on college graduates.

Just in case you were thinking these numbers are a little high, know that these starting

compensations are including values of benefit packages (retirement contributions, insurance,

etc.). Using the information provided in the Excel file, calculate the following (using Excel

functions):

Major Field

Sample

Mean

Sample

Std. Dev.

Computer

ications

Science

51537

46227

59542

40021

60664

38817

41923

7232

7214

4670

2365

7879

7146

4938

Business

Education

Engi-

Humanities

Commun-

neering

& Social

Sciences

Math &

Sciences

Report Introduction

Your report must begin with an introduction explaining the process of the project. Do not

assume that your reader knows in advance what the assignment is about. Get started on your

introduction paragraph. Explain the overall procedures and goals of the assignment in your

own words. Note that you will be editing and adding to this introduction as you proceed

through all the parts of the project.

Whenever data is collected, the data is usually in a form or tables or arrays that make it difficult

to observe patterns in the data. In addition, the presented data becomes difficult to draw any

meaningful real life conclusions based on another set of data obtained from another sample of

the original population. Statistical methods provide a consistent way of organizing and

analyzing data, drawing conclusions using confidence intervals and hypothesis tests, and

presenting youre the data in an organized fashion.

This report is a summarized report on the 2014 compensation data on graduate students

employed into the major fields of practice. Using histograms, the data was plotted to show the

distribution of the data and the distribution curve that will best interpolate data values. The

data has been explored for their central tendencies using mean and median. The spread of the

data has also checked with the calculation of standard deviation, quartiles, maximum and

minimum values. The results of the analysis were compared across the major fields in order to

draw some evidence-based conclusions.

First, you will make a histogram for compensation distribution for each Major Field. Your

graphics must have descriptive titles and be appropriately labeled. Throughout this project,

represent calculations dollar values as whole numbers, and use round numbers for your

histogram categories.

Write a paragraph discussing your observations of this data. Do the graphs reflect what you

expected to see? Are the distributions skewed or symmetric? Are they Uniform, Normal, or do

they seem to follow some other distribution? Comment on the differences amongst different

Major Fields. In which Major Field does your intended career lie?

16

14

12

10

8

6

Frequency

4

2

0

Compensation Category

18

16

14

12

10

8

6

Frequency

4

2

0

Compensation Category

12

10

8

6

Frequency

4

2

0

Compensation category

14

12

10

8

6

Frequency

4

2

0

Compensation Category

0-35,000

Compensation Category

0-35,000

Compensation Category

Calculate the 5-Number Summary for each Major Field (please use Excel to make these

calculations). Make boxplots for the seven categories. Do any of them seem to have any

outliers? If so, give an explanation of why you think certain Major Fields might have such

extreme high or low values. Also determine if normal, t-distribution, and

2 -distribution

Write a paragraph discussing your observations of this data. Do any of them seem to have any

outliers? If so, give an explanation of why you think certain Major Fields might have such

extreme high or low values. Also determine if normal and t-distribution procedures seem to

be appropriate for this data.

Confidence Interval Estimates

Explain in general the purpose and meaning of a confidence interval.

Whenever a parameter of interest (eg mean) needs to be investigated in a

population, a sample is drawn from the population for such analysis. The purpose of

taking a random sample from the population and computing the parameter (eg

mean), is to approximate the mean of the population. Every random sample with

produce a different value for the parameter of interest. A confidence interval

addresses this issue because it provides a range of values which is likely to contain

the population parameter of interest. This means that any sample should have its

parameter of interest within this range called confidence interval, to an extent.

95% confidence interval estimate for the mean starting compensation for students graduating

in Humanities and Social Sciences.

Confidence interval = mean + (Z97.5*standard deviation)/sample size

Confidence interval = 38,817 + (1.96*7146)/50

Confidence interval = 38,817 + 1,981

80% confidence interval estimate for the proportion of all students with starting

compensation over $50,000.

Sample proportion with compensation above 50,000= 149/350=0.43

Confidence interval

= sample proportion + (Z90*[sample proportion*(1-sample proportion)/sample size]

= 0.43 + (1.28*[0.43*(1-0.43)/350]

= 0.43 + 0.03

=(0.40, 0.46)

Discuss and interpret the results of each of your interval estimates. Include neatly written and

copies of your work.

Based on our sample data, we are 95% confident that estimate for the mean starting

compensation for students graduating in Humanities and Social Sciences is between $36,836 and

$40,798. This implies that if we were to take a sample of 100 graduating students, 95 times out of

100, their mean starting compensation would fall between $36,836 and $40,798.

Based on our sample data, we are 80% confident that estimate for the proportion of all students

with starting compensation over $50,000 is between 40% and 46%. This implies that if we were

to take a sample of 100 graduating students, 80 times out of 100, the proportion of all students

with starting compensation over $50,000 would fall between 40% and 46%.

Hypothesis Tests

Explain in general the purpose and meaning of a hypothesis test.

A hypothesis test is a statistical test that is used to examine whether there is enough evidence in a

sample of data to infer that a certain condition is true for the entire population. Such a test

examines two opposing hypotheses about a population: the null hypothesis and the alternative

hypothesis. The null hypothesis is the statement being tested. Usually the null hypothesis is a

statement of "no effect" or "no difference". The alternative hypothesis is the statement you want

to be able to conclude is true. Based on the available data, the test determines whether to reject

the null hypothesis. The criteria for this decision is based on the p-value. If the p-value is less

than or equal to the level of significance, which is a cut-off point that is pre-defined, then you

can reject the null hypothesis.

Use a 0.05 significance level to test the claim that students graduating in Education have an

average starting compensation of under $35,000.

Ho= = 35,000

Ha= <35,000 (claim)

z= (sample mean value in hypothesis)/(standard deviation/sample size)

z= (40021 35000)/(2365/50)

z= 14.89

Use a 0.01 significance level to test the claim that 80% of students graduating with a college

degree will find a starting compensation package valued at over $40,000.

Proportion of graduates with starting compensation package valued at over $40,000= 269/350

=0.77

Ho= = 0.8

Ha= >0.8 (claim)

z= (0.77 0.8)/[0.8(1-0.8)/350]

z= -1.40

critical value for 0.01 significance level upper sided=2.33

Discuss and interpret the results of each of your two hypothesis tests. Include neatly written and

scanned copies of your work.

Since the z-value is greater than the critical value of -1.645, we will accept the null hypothesis

that students graduating in Education have an average starting compensation equal to $35,000.

At 95% significance level, there is no sufficient evidence to accept the claim.

Since the z-value is less than the critical value of 2.33, we will accept the null hypothesis that

80% of graduating students have an average starting compensation over $40,000. At 99%

significance level, there is no sufficient evidence to accept the claim.

Reflection

Interval estimates and hypothesis tests are done when the following conditions are met

1. Independence Assumption:

a. Random sampling condition: the sample must be random, usually less than 10%

of the population size.

2. Nearly Normal Condition: The sample data has a symmetric distribution, so we can

assume that it comes from a nearly normal population. In addition, when the sample

size is greater than 50, we van assume a normal distribution.

My samples meet both conditions because the sample is selected by simple random

sampling and the number is clearly less than 10% of the population so the independence

assumption is met. In addition, the sample can be assumed to be normally distributed from

the histogram as well as using the sample size which is greater or equal to 50.

The possible errors that can be made are Type I and Type II errors.

When the null hypothesis is true and you reject it, one may make a type I error. The

probability of making a type I error is , which is the level of significance you set for your

hypothesis test. An of 0.05 indicates that you are willing to accept a 5% chance that you

are wrong when you reject the null hypothesis. To lower this risk, you must use a lower

value for . However, using a lower value for alpha means that there will be less likelihood

to detect a true difference if one really exists.

When the null hypothesis is false and you fail to reject it, one makes a type II error. The

probability of making a type II error is , which depends on the power of the test. You can

decrease your risk of committing a type II error by ensuring your test has enough power.

You can do this by ensuring your sample size is large enough to detect a practical difference

when one truly exists.

The sampling can be improved by increasing the sample size for each major field.

From this data, it can be concluded that graduating students in major fields receive

compensations whose median and mean are different from one another.

You may use one or more of the following ideas to build your reflection paper. Your paper must

be at least one page (double spaced) and use correct spelling and grammar. See specific eportfolio instructions in your syllabus.

What have you learned as a result of this project?

Discuss how the math skills that you applied in this project will impact other classes you

will take in your school career.

Identify specific parts of the project and your own process in completing the project that

may have applications for other classes.

Discuss how the project helped to develop your problem solving skills.

Discuss how this project changed the way you think about real-world math applications.

If your thinking was not changed, then discuss how the project supported your views

about real-world math applications.

Kanlayanee Phothiworn

Spencer Bartholomew

Math 1040

December 8,2016

What have you learned a result of this project?

From this project, I have learned how to use the Mean, Standard Division, 5 number summary. It

is a great contribution to my learning, through this project to incorporate graphs, use of excel and box

plots. I love how I was able to use different formulas in creating a good excel spreadsheet.

Working in-group is hard, because we all have different schedules and are busy with our daily

life. It was great to learn about other people, who are from different country and with different belief.

Also, help me to realize that Math is a nerve wrecking subject for all us just as how it used to be in most

countries for example: statistics. Im glad that our team was good with communicating with one another

and working in unity.

Discuss how the math skills that you applied in this project will impact other classes you will take in

you school career.

I know we will use math skills in my daily life and know that all the higher science classes, which

Im going to take in future, will use the mean sample data. As I know from one of my friend that we will

be using statistical applications for Biology 1610.

Identify specific part of the project and your own process in completing the project that may have

application for other classes.

I worked with my team to organize the final work on the project like making sure of printing

materials, checking errors on the print and collecting the ideas from the group.

Discuss how the project helped to develop your problem solving skills.

This project helped me to resolve problems via talking to other people. I learned that as a

group, a work could be done more efficiently than working as an individual. It also, helped me to

understand that individuals can contribute in their own significant way. This knowledge will help

me in future to be more open to ideas to complete a particular work.

Discuss how this project changed the way you think about real-world math applications. If

your thinking was not changed, then discuss how the project supported your views about

real-world math applications.

Yes, this project has changed the way I think about real-world math applications. It

helped me to think critically to solve a problem.

