Attribution Non-Commercial (BY-NC)

Просмотров: 2

Attribution Non-Commercial (BY-NC)

- Statistics Solutions 2
- QNT 561 Weekly Learning Assessments | Questions and Answers @ UOP E Assignments
- A Study of Usage Pattern of Credit Card
- 2012 ECON 1203 S1 Solutions
- 10. Edu Sci - IJESR - Factors Associated With Quality Administration - EZENWAJI
- stats presentation
- P test
- Cassava
- HR Research
- Econ131 Research Paper. Topic: Unemployment
- QNT 561 Week Five Book Problems (With Hints)
- 1.Steel Report Revised Final
- MR Interpretation 4 to 8
- 5.IJESROCT20175
- Maulida MIK.rtf
- D2Dfinal16'
- 2627-10229-1-PB
- mkt352-assignment 4
- 33.Format. Hum - Inclusive Strategies to Empower the Physically Challenged _1
- 252anova

Вы находитесь на странице: 1из 17

To ensure the survival of self is one of the basic instincts that drive human beings. However, it is possible that other factors like ones own age, gender, wealth, education, cultural/religious norms and beliefs, care for ones family may affect the decision of a person to lay more/less importance to others lives. RMS Titanic was a passenger liner that struck an iceberg on her maiden voyage from Southampton, England, to New York City, and sank on 15 April 1912, resulting in the deaths of 1,517 people in one of the deadliest peacetime maritime disasters in the history of mankind. A short statistical analysis is done on the data about the passengers to uncover hidden dependencies among different variables. This paves the way to try and find reasons for the observed patterns. This report first tries to uncover any possible associations or differences among different variables used to capture information about the passengers. Subsequently, an attempt is made to ascribe reasons for the observed patterns.

I.

Is there a signicant difference in Age distribution between those who survived and those who did not?

To start with, a simple pictorial representation of data is presented. We break the given data into two sections based on survival. The histograms of age of survivors and non-survivors are shown below:

From the above histograms, we can comment that the age distribution seems different for the two samples. It seems that the infants (less than 5 years of age) survived more. The Box-plots of the same are presented:

The Random Variable of interest is age. Let X1 denotes age of survivors and X2 denotes the age of non-survivors. To answer the first question, we start with the following assumptions:

1. X1 and X2 are Continuous. 2. The samples of X1 and X2 are independent and identically distributed.

Let us check the assumption of normality of the populations of survivors and non-survivors.

The p-values for all the tests for the two samples are tabulated as follows:Test Shapiro-Wilk normality test Anderson-Darling normality test Cramer-von Mises normality test Lilliefors (Kolmogorov-Smirnov) normality test Shapiro-Francia normality test 0.0020591830 1.719147e-08 p value (Survived) 0.0004997854 0.0017831268 0.0068092810 0.0322461921 p-value (Non Survived) 1.461442e-09 5.014279e-16 4.824687e-10 2.646866e-16

The p-values for all the tests for the two samples suggest that the populations of survivors and non-survivors are not normal (at assumed -value of 0.05).

Further to the above assumptions, we assume that the CDFs of X1 and X2 have same shape. This allows us to apply the wilcoxons rank sum test. From the calculated p-value for Wilcoxon rank sum test (0.19), there is not enough evidence against Ho (Ho: X1 is stochastically equal to X2).

But from the histograms and ECDF of survivors and non-survivors, it appears that there is a significant difference in survival probability for people in age group of 0-5 years. So, we do a Kolmogorov-Smirnov two sample test and get a p-value of 0.03428, which suggests that there is evidently a significant difference in age distributions (at -value of 0.05). Based on observed data and our intuition from histograms, we categorise survivors and nonsurvivors in different age categories viz. 0 to 5, 5+ to 15, 15+ to 30, 30+ to 45, 45+ to 60, and 60+. Now, we do a chi-square test to see if survivors and non-survivors have a homogeneous distribution across these age categories. We get a p-value of 5.47710-6, which supports our belief that there is a difference in age distributions of survivors and non-survivors. Now, since the sample size of survivors is 313, and that of non-survivors is 443, we can do a z-test on problem of proportion for each age category separately, null hypotheses being 0-5; Survivors= 0-5; Non-Survivors 5-15; Survivors= 5-15; Non-Survivors 15-30; Survivors= 15-30; Non-Survivors

30-45; Survivors= 30-45; Non-Survivors 45-60; Survivors= 45-60; Non-Survivors 60+; Survivors= 60+; Non-Survivors We began with two tailed tests and single tailed tests were done wherever null was refuted. On performing Z tests, we get the following p values, and thus the adjoining conclusions:Age Category 0 to 5 5+ to 15 15+ to 30 30+ to 45 45+ to 60 60+ P value 2.8e-6 0.208 4.09e-4 0.556 0.1578 0.0355 Conclusion 0-5; Survivors >0-5; Non-Survivors 5-15; Survivors = 5-15; Non-Survivors 15-30; Survivors < 15-30; Non-Survivors 30-45; Survivors = 30-45; Non-Survivors 45-60; Survivors = 45-60; Non-Survivors 60+; Survivors < 60+; Non-Survivors

The above analysis suggests that there is a significant difference in age distribution between those who survived and those who did not.

II. (a) Is there a signicant dierence in Age distribution between male survivors and

male non survivors? Histograms and box-plots for male survivors and non survivors are presented and compared. The distributions do not seem to be normal, as supported by the normality tests. They dont even look same. It seems that infant males survived more and old males died more.

Normality tests on data The p-values for all the tests for the two samples of survivors and non-survivors are tabulated as follows:Test Shapiro-Wilk normality test p value (Survived) 0.004201117 p-value (Non Survived) 6.366845e-10

Anderson-Darling normality test Cramer-von Mises normality test Lilliefors (Kolmogorov-Smirnov) normality test Shapiro-Francia normality test

0.013508441

8.325134e-09

The p-values for all the tests for the two samples suggest that the populations of survivors and non-survivors are not normal (at assumed -value of 0.05). We do a Kolmogorov Smirnov two sample test to find out that the two samples come from different distributions (p value = 0.002) implying there is a significant difference in age distributions of male survivors and dead. We use the same approach of dividing the population into age categories to find out if there is a dependence of survival probability on age category as done in part (1), the only difference being that here the two samples come from Male. Chi-square p value of 1.47e-11 implies population of male survivors and non-survivors is not homogeneous with respect to age categories. Thus, we go ahead with 6 separate Z tests, one for each age category. Null hypotheses being as follows:0-5; Male_Survivors= 0-5; Male_Non-Survivors 5-15; Male_Survivors = 5-15; Male_Non-Survivors 15-30; Male_Survivors = 15-30; Male_Non-Survivors 30-45; Male_Survivors = 30-45; Male_Non-Survivors 45-60; Male_Survivors = 45-60; Male_Non-Survivors 60+; Male_Survivors = 60+; Male_Non-Survivors We began with two tailed tests and single tailed tests were done wherever null was refuted. On performing Z tests, we get the following p values, and thus the adjoining conclusions:Age Category P value Conclusion

0-5; Male_Survivors > 0-5; Male_Non-Survivors 5-15; Male_Survivors > 5-15; Male_Non-Survivors 15-30; Male_Survivors < 15-30; Male_Non-Survivors 30-45; Male_Survivors = 30-45; Male_Non-Survivors 45-60; Male_Survivors = 45-60; Male_Non-Survivors 60+; Male_Survivors < 60+; Male_Non-Survivors

The above analysis suggests that, there is a significant difference in age distribution between male survivors and male non-survivors.

II. (b) Is there a signicant dierence in Age distribution between females who survived and those who did not? Histograms and box-plots for male dead and survived are compared. The distributions do not seem to be normal, as supported by the normality tests.

Normality tests data The p-values for all the tests for the two samples of female survivors and female non survivors are tabulated below:Test p value (Survived) p-value (Non Survived)

Shapiro-Wilk normality test Anderson-Darling normality test Cramer-von Mises normality test Lilliefors (Kolmogorov-Smirnov) normality test Shapiro-Francia normality test

0.0077707718

0.11744076

The p-values for all the tests for the two samples suggest that the samples of survivors are not normal, whereas that of non survivors follow normal distribution (at assumed -value of 0.05). This clearly suggests that the distributions are not same. However, to reinforce on this, we do a Kolmogorov Smirnov two sample test. This also suggests that the two samples come from different distributions (p value = 0.01326) implying there is a significant difference in age distributions of female survivors and dead. We use the same approach of dividing the population into age categories to find out if there is a dependence of survival probability on age category as done in part (1), the only difference being that here the two samples come from Female sample and not from total sample. Chisquare p value of 0.03454 implies population of female survivors and non-survivors is not homogeneous with respect to age categories. Thus, we go ahead with 6 separate Z tests, one for each age category. Null hypotheses being as follows:0-5; Female_Survivors= 0-5; Female_Non-Survivors 5-15; Female_Survivors = 5-15; Female_Non-Survivors 15-30; Female_Survivors = 15-30; Female_Non-Survivors 30-45; Female_Survivors = 30-45; Female_Non-Survivors 45-60; Female_Survivors = 45-60; Female_Non-Survivors 60+; Female_Survivors = 60+; Female_Non-Survivors We began with two tailed tests and single tailed tests were done wherever null was refuted. On performing Z tests, we get the following p values, and thus the adjoining conclusions:-

Conclusion 0-5; Female_Survivors = 0-5; Female_Non-Survivors 5-15; Female_Survivors = 5-15; Female_Non-Survivors 15-30; Female_Survivors = 15-30; Female_Non-Survivors 30-45; Female_Survivors = 30-45; Female_Non-Survivors 45-60; Female_Survivors > 45-60; Female_Non-Survivors 60+; Female_Survivors = 60+; Female_Non-Survivors

The above analysis suggests that there is a significant difference in age distribution between female survivors and female non-survivors.

III. Remark on how Age aected the Survival Probability of a passenger on board the

Titanic, based on consolidations of your ndings in 1 and 2 above. The findings in 1 and 2 above suggest that females had higher survival probability than their counterparts. Given that the boarders are males, infants and teenagers had higher survival probability; however, age group of 15 to 30 and above 60 years had less survival probability. Given that the boarders are females, age group of 45 to 60 had higher survival probability. Possible reasons could have been that females and kids were given preference in going on life boats, old could have thought of sacrificing their lives for the young.

IV. Is there a signicant dierence in Survival Probability between the two genders?

Ho: No difference in the survival probability of the two genders viz. male and female Ha: Significant difference in the survival probability of the two genders viz. male and female (Two-sided)

Data: The below table displays the problems data:Survivor Non-Survivor Total

Test adopted for testing the hypothesis: Since its a problem of proportion and we would like to compare the survival probabilities of male and female, we can use the following tests: 1. Fishers exact test 2. Z-test Fishers exact test is more powerful test in this case but we can also do a Z-test as the sample size is large.

P-value of Fishers exact test is found to be less than 2.210-16. Since the p-value is very less, there is sufficient evidence to reject the null hypothesis. Conclusion: On the basis of Fishers exact test we conclude that there is a significant difference in the survival probability of the two genders. True odds ratio is not equal to 1. Odds Ratio is found to be 0.1003494 Note: Odds ratio is defined as the ratio of the odds of an event occurring in one group to the odds of it occurring in another group. P-value of Z-test is found to be approximately 0. Since the p-value is very less, there is sufficient evidence to reject the null hypothesis.

70% 60% 50% 40% 30% 20% 10% 0% Males F ales em S urvival P robability

Conclusion: On the basis of Z-test we conclude that there is a significant difference in the survival probability of the two genders.

V. Is there a signicant dierence in Survival Probability among the three passenger classes? We do a chi-square test to check the following hypothesis: Ho: The populations of survivors and non-survivors are distributed homogeneously across the three passenger classes.

Ha: The populations of survivors and non-survivors are not distributed homogeneously across the three passenger classes. We have the following data:Survivors Passenger Class I Passenger Class II Passenger Class III 193 119 138 Non-Survivors 129 161 573

The p-value of 2.210-16 suggests that there is enough evidence to reject the null hypothesis (at -value of 0.05). It can be said that there is a significant difference between population distributions across passenger classes. We further break the data to compare different classes. We did single-tailed Fishers test by taking sets of two classes at a time. This helped us find which passenger class had better chance of survival. It was observed that the survival probability is highest for Class I followed by Class II with Class III having the lowest probability for survival.

70% 60% 50% 40% 30% 20% 10% 0% Cla sI s Cla sII s Cla sIII s S urvived

The above conclusion agrees with the common knowledge that passengers in first class had the first option to mount the lifeboats. Passengers in third class were the last to mount the lifeboats.

VI. Is there a signicant dierence in Survival Probability between the two genders even after taking the eect of Passenger Class into Account?

We make three 22 contingency tables corresponding to each class, and do Fishers test as follows:Class I Male Female Survivors 59 134 Non Survivors 120 9

We did a two sided Fishers test which yielded a p value of less than 2.2e-16, i.e., there is a significant difference in Survival Probability between the two genders for class1. So, we did a one-sided fishers test with alternate hypothesis being that males survival probability is less than that of females given that they belong to class1, for which also we refute the NULL in favor of alternate because we get a p value of less than 2.2e-16.

Class II Male

Survivors 25

Female

94

13

We did a two sided Fishers test which yielded a p-value of less than 2.2e-16, i.e., there is a significant difference in Survival Probability between the two genders for Class II. So, we did a one-sided fishers test with alternate hypothesis being that males survival probability is less than that of females given that they belong to class2, for which also we refute the NULL in favor of alternate because we get a p value of less than 2.2e-16.

Survivors 58 80

We did a two sided Fishers test which yielded a p value of less than 2.2e-16, i.e., there is a significant difference in Survival Probability between the two genders for class2. So, we did a one-sided fishers test with alternate hypothesis being that males survival probability is less than that of females given that they belong to class2, for which also we refute the NULL in favor of alternate because we get a p value of 9.239e-15.

Conclusion: There is a significant difference in Survival Probability between the two genders even after taking the effect of Passenger Class into account for all the classes. The above three conclusions also fall in line with the cultural norm of saving the fairer sex.

- Statistics Solutions 2Загружено:Dorinda Powell
- QNT 561 Weekly Learning Assessments | Questions and Answers @ UOP E AssignmentsЗагружено:uopeassignments
- A Study of Usage Pattern of Credit CardЗагружено:hardeepsing
- 2012 ECON 1203 S1 SolutionsЗагружено:Wayne Zhu
- 10. Edu Sci - IJESR - Factors Associated With Quality Administration - EZENWAJIЗагружено:TJPRC Publications
- P testЗагружено:Alok Kashyap
- CassavaЗагружено:Dipo Orims
- HR ResearchЗагружено:Irfan Feroz Ali
- stats presentationЗагружено:api-433023283
- Econ131 Research Paper. Topic: UnemploymentЗагружено:Mon Mamon
- QNT 561 Week Five Book Problems (With Hints)Загружено:atkray
- 1.Steel Report Revised FinalЗагружено:akamani
- MR Interpretation 4 to 8Загружено:Rochak Vyas
- 5.IJESROCT20175Загружено:TJPRC Publications
- Maulida MIK.rtfЗагружено:aribal
- D2Dfinal16'Загружено:Abhishek Chandna
- 2627-10229-1-PBЗагружено:urduadab
- mkt352-assignment 4Загружено:api-222499110
- 33.Format. Hum - Inclusive Strategies to Empower the Physically Challenged _1Загружено:Impact Journals
- 252anovaЗагружено:Ronaldo Manaoat
- brianne_barclay_04Загружено:Anitha Sundaresan
- 10.1186@s13052-018-0591-9Загружено:Meylindha Ekawati Biono Putri
- researchЗагружено:Joy Manango
- Statistics Final Project - Kevin LinЗагружено:disizkevin
- Or Ian AЗагружено:Md Momenul Islam
- KUSTRA - Random Forest - Efficient P-Value Estimation in Massively Parallel Testing ProblemsЗагружено:ambrofos
- 10.1108%2FJFMM-03-2015-0028Загружено:msa_imeg
- If We Know That the Outcome of a Coin Toss is HeadsЗагружено:Michael Latt
- 1Загружено:Chachi Hasiman
- Phase 2 ContentЗагружено:dlulza

- Biology Laboratory Manual-tenth EditionЗагружено:assyauqie
- MxM Cross-Over DesignsЗагружено:scjofyWFawlroa2r06YFVabfbaj
- Emotional Memory in Depersonalization Disorder a Functional MRI StudyЗагружено:yochaiataria
- l25Загружено:nishit0157623637
- QTM 1 OutlineЗагружено:HiteshSharma
- Using R for Nonparametric AnalysisЗагружено:blaznspadz
- Social Research Methods[1]Загружено:api-3825778
- Testing the Efficiency of the Romanian Stock MarketЗагружено:Poenariu Aura
- Stats Cheat Sheet FebruaryЗагружено:Samantha Anne
- High School Biology 1-13Загружено:Saaqib Mahmood
- job satisfaction researchЗагружено:Dolah Chiku
- Rvcd TutorialЗагружено:PuneetMathur
- 9709_y12_syЗагружено:osirisaa
- 01 cs III & IV semЗагружено:Anonymous 78qce6tT
- 2. Anselin, Luc. (1988). Spatial Econometrics.pdfЗагружено:Aditya Resky
- Computer-Mediated Discourse Analysis an Approach to Researching Online BehaviorЗагружено:Novia Faradila
- INDE 3364 Final Exam Cheat SheetЗагружено:bassoonsrock
- D4007 4924-1Загружено:Erik
- Speech Information Rate.pdfЗагружено:David Antón
- One Way DesignЗагружено:Ukhtie Julie
- Chi Square TestЗагружено:Nylevon
- SAS t Test Case StudyЗагружено:paganinionspeed
- Risk ToleranceЗагружено:Ashok Venkata
- Statistika Teknik KimiaЗагружено:ayu rizki
- Investigating the Decision Making Style of College Student Regarding Online Apparel ShoppingЗагружено:archerselevators
- Bathtub DynamicsЗагружено:spithridates
- Unit 8Загружено:Prasanth Ravi
- Shalishali, Maurice K. “A Test of the International Fisher EffectЗагружено:Dan Oanta
- Chapter 4 - Thesis PUP Sta. Maria BulacanЗагружено:Akira
- Data Analysis using SPSSЗагружено:antonybuddha