Вы находитесь на странице: 1из 2

1. What is statistics?

Everything that deals even remotely with the collection, processing, interpretation and
presentation of data belongs to the domain of statistics, and so does the detailed planning of
that precedes all these activities.
Statistical methods can be used to find answers to the questions like:
What kind and how much data need to be collected?
How should we organize and summarize the data?
How can we analyse the data and draw conclusions from it?
How can we assess the strength of the conclusions and evaluate their uncertainty?
That is, statistics provides methods for
a. Design: Planning and carrying out research studies.
b. Description: Summarizing and exploring data.
c. Inference: Making predictions and generalizing about phenomena represented by the data.

major objective of statistics is to make inferences about population from an analysis of


information contained in sample data. This includes assessments of the extent of uncertainty
involved in these inferences.
2. Difference between Nominal, Ordinal, Interval and Ratio variables?
A simple example given below will explain the difference among 4 basic measurement scales:
Nominal, Ordinal, Interval and Ratio.
Let us take an example of 100 meter race in a tournament where three runners are
participating from three different states of Malaysia. Each runner is assigned a number
(displayed in uniform) to differentiate from each other. The number displayed in the uniform
to identify runners is an example of nominal scale. Once the race is over, the winner is declared
along with the declaration of first runner up and second runner up based on the criteria that
who reaches the destination first, second and last. The rank order of runners such as second
runner up as 3, first runner up as 2 and the winner as 1 is an example of ordinal scale.
During the tournament, judge is asked to rate each runner on the scale of 110 based on certain
criteria. The rating given by the judge is an example of interval scale. The time spent by each
runner in completing the race is an example of ratio scale.
Consider the temperature
- when T is given in Kelvin, it has an absolute zero and kT (k the Boltzmann constant) has the
dimension of an energy so that 400 K represents twice the energy corresponding to 200 K: ratio
variable!
- the same temperature given in Celsius has no absolute zero (the melting point of ice at
atmospheric pressure is just a useful but otherwise arbitrary point in the scale) but 10 more
degrees correspond to the same energy difference whatever the point on the scale: interval
variable!
the funny thing here is that it is the same quantity; depending on its coding, it is either
represented by a ratio or interval variable and the coding do not differ much: C = K - 273 (C in
Celsius, K in Kelvin: the absolute zero is about -273C).

Statistical Learning ML-1 ML-2 ML-3 ML-4


Linear Reg SL+ML Decision Tree Bayesian ML in a text
Optimization and in a video
Logistic Reg Boosting BLB stochastic
Random Forest
High dimensional Reg Bagging SVM Order -2
Generalized Additive Random
Model Forest
Non-Linear Support Vector
Machine Order
-1
Panel
Advance Clustering

Bootstrap = (Decision Tree, Boosting, Bagging, Random Forest)

Model Building Steps-


1. Read the Data
2. Data Understanding -> Analysis -> Type of variability distribution
3. Data Preparation Treatment of outliers, missing values treatment and Modification
of data and creating derived variables, creating dummy variables (n-1) etc.
4. Train and Test- divide the data into train (80%) and test data (20%)
5. Check for Multicollinearity Check ML=> VIF>2
6. Build the model => p-value <0.05
7. Test the model (R*2 method)
8. Check the model Accuracy, Case, Gini

Topic of Research - Deregulated Regulation of Banks using Big Data instead of the pre-
defined method of Central Bank

Вам также может понравиться