Вы находитесь на странице: 1из 8

MB 0040- Statistics for Management

Question 1. Write Short notes on

A. Descriptive Statistics
B. Measurement Scales
C. Editing of Data
D. Frequency Distribution
E. bar diagram

A. Descriptive statistics is the discipline of quantitatively describing the main features of a


collection of information, or the quantitative description itself. Descriptive statistics are
distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim
to summarize a sample, rather than use the data to learn about the population that the sample
of data is thought to represent. This generally means that descriptive statistics, unlike inferential
statistics, are not developed on the basis of probability theory.[2] Even when a data analysis
draws its main conclusions using inferential statistics, descriptive statistics are generally also
presented. For example in a paper reporting on a study involving human subjects, there typically
appears a table giving the overall sample size, sample sizes in important subgroups (e.g., for
each treatment or exposure group), and demographic or clinical characteristics such as the
average age, the proportion of subjects of each sex, and the proportion of subjects with
related comorbidities.
B. Measurement Scales:-when one hears the term measurement, they may think in terms of
measuring the length of something (e.g., the length of a piece of wood) or measuring a quantity
of something (ie. a cup of flour).This represents a limited use of the term measurement. In
statistics, the term measurement is used more broadly and is more appropriately termed scales
of measurement. Scales of measurement refer to ways in which variables/numbers are defined
and categorized. Each scale of measurement has certain properties which in turn determines
the appropriateness for use of certain statistical analyses. The four scales of measurement are
nominal, ordinal, interval, and ratio.

C. Data editing is defined as the process involving the review and adjustment of collected
Survey data. The purpose is to control the quality of the collected data.

Data editing can be performed manually, with the assistance of a computer or

a combination of both The term interactive editing is commonly used for modern

Computer-assisted manual editing. Most interactive data editing tools applied at National
Statistical Institutes (NSIs) allow one to check the specified edits during or after data entry, and if
necessary to correct erroneous data immediately. Several approaches can be followed to correct
erroneous data:
 Recontact the respondent
 Compare the respondent's data to his data from previous year
 Compare the respondent's data to data from similar respondents
 Use the subject matter knowledge of the human editor

Interactive editing is a standard way to edit data. It can be used to edit


both categorical and continuous data.[3] Interactive editing reduces the time frame needed to complete
the cyclical process of review and adjustment .

C . Frequency Distribution is In statistics, a frequency distribution is a table that displays the frequency
of various outcomes in a sample.[1] Each entry in the table contains the frequency or count of the
occurrences of values within a particular group or interval, and in this way, the table summarizes
the distribution of values in the sample.

A frequency distribution shows us a summarized grouping of data divided into mutually exclusive classes
and the number of occurrences in a class. It is a way of showing unorganized data e.g. to show results of
an election, income of people for a certain region, sales of a product within a certain period, student
loan amounts of graduates, etc. Some of the graphs that can be used with frequency distributions are
histograms, line charts, bar charts and pie charts. Frequency distributions are used for both qualitative
and quantitative data.
E. Bar diagram is a pictorial rendition of statistical data in which the independent variable can attain only
certain discrete values. The dependent variable may be discrete or continuous. The most common form
of bar graph is the vertical bar graph, also called a column graph.In a vertical bar graph, values of the
independent variable are plotted along a horizontal axis from left to right. Function values are shown as
shaded or colored vertical bars of equal thickness extending upward from the horizontal axis to various
heights. In a horizontal bar graph, the independent variable is plotted along a vertical axis from the
bottom up. Values of the function are shown as shaded or colored horizontal bars of equal thickness
extending toward the right, with their left ends vertically aligned.

Question 2 . What do you mean by Probability? Define following basic terminologies used in probability
theory .

Answer 2. Probability theory is the branch of mathematics concerned with probability, the analysis of
random phenomena. The central objects of probability theory are random variables, stochastic
processes, and events: mathematical abstractions of non-deterministic events or measured quantities
that may either be single occurrences or evolve over time in an apparently random fashion.It is not
possible to predict precisely results of random events.[1] However, if a sequence of individual events,
such as coin flipping or the roll of dice, is influenced by other factors, such as friction, it will exhibit
certain patterns, which can be studied and predicted.[2] Two representative mathematical results
describing such patterns are the law of large numbers and the central limit theorem.As a mathematical
foundation for statistics, probability theory is essential to many human activities that involve
quantitative analysis of large sets of data.[3] Methods of probability theory also apply to descriptions of
complex systems given only partial knowledge of their state, as in statistical mechanics. A great
discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic
scales, described in quantum mechanics.

Question 3. Define Probability Distributions. State the assumptions and usage of Binomial Distribution
and Normal Distribution.

Answer 3 . Probability Distribution -A probability distribution is a table or an equation that links each
outcome of a statistical experiment with its probability of occurrence. Consider a simple experiment in
which we flip a coin two times. An outcome of the experiment might be the number of heads that we
see in two coin flips. The table below associates each possible outcome with its probability.
Number of heads Probability
0 0.25
1 0.50
2 0.25

Suppose the random variable X is defined as the number of heads that result from two coin flips. Then,
the above table represents the probability distribution of the random variable X.

The Binomial Distribution and the Normal Approximation to the Binomial Distribution
Assumptions.A common difficulty in learning probability is to determine how to approach solving a
problem. You have learned a few ways already, but how do we know that we are supposed to use the
binomial distribution? We can solve the problem using the binomial distribution if the following
conditions are satis- fied. 1. Each trial has only have one of two possible outcomes (Bernoulli trial). That
is, on each trial of our experiment, we only have two possible outcomes: success or failure. It is
important to define what a success is. 2. The trials are independent. That is, the outcome of each trial
does not depend on the outcome of any other trial. This can be tricky sometimes. If the independence
assumption is violated, we can proceed if the sample size is not more than 10% of the population size. In
the binomial distribution, we also need to know two values: n and p. p represents the probability of
success, and n represents the number of trials. But, despite all this new stuff, the same general
principles of probability apply.

The Binomial Distribution to use the binomial distribution, we are given some fixed number of trials n
and a probability of success p that we may need to find first. We are interested in finding the probability
of achieving some number(s) of successes out of the n trials. First, we need to define what a success is in
the context of the problem. Then, let X be a random variable that represents the interested number(s)
of successes in n trials. Suppose we roll a die 3 times and we win if we roll a 6. We can use the binomial
distribution because each roll of the die is independent and the roll of a die is a Bernoulli trial (success: 6
appears, failure: something other than a 6 appears). Some questions we can answer are: “Find the
probability that we win on exactly 2 of the 3 rolls.” That is, find P(X = 2). Another:

“Find the probability that we never win.” That is, find P(X = 0).
Illustration. Consider the first question. We want to win (roll a 6) on exactly 2 rolls. This means that we
must roll a 6 on two of the rolls and roll something else (1 through 5) on the other roll.

Note that there are several ways to do this. Let W = win and L = loss.

WW L

W LW

LWW

So there are 3 ways that we can achieve this goal. So P(X = 2) = 3 1 6 1 6 5 6 = 3 1 6 2 5 6 Now
suppose we want to find the probability that we win at least once. We use the probability tricks we have
already learned to help us solve this problem. P(X ≥ 1) = 1 − P(X = 0) | {z } Plug into binomial formula! In
general the binomial formula is, P(X = k) = nCkp k (1 − p) n−k The nCk operator is called the “choose”
operator (or formally, the binomial coefficient) and it gives us the number of ways that the k successes
can be arranged in n trials. It can also be written as n k . This is read “n choose k” and is equal to n k =
nCk = n! k!(n − k)! where the ! is the factorial function n! = n × (n − 1) × (n − 2) × . . . × 1 For example, 5! =
5 × 4 × 3 × 2 × 1 = 120. A calculator is not necessary to calculate the choose function, however many
calculators offer this as a function (called nCr on some calculators). If done by hand, a lot of stuff can
cancel nicely. Here is an example: 10 8 = 10! 8!2! = 10 × 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 (8 × 7 × 6 × 5 × 4
× 3 × 2 × 1) × (2 × 1) = 10 × 9 2 = 90 2 = 45 Some quick shortcuts: n 0 = n n = 1 n 1 = n n − 1 = n Oh,
and always remember that 0! = 1 But what if n is too large and would require 10 million binomial
expressions? There may be a solution to that problem... 2 Normal Approximation to the Binomial
Suppose n is large and k is an incovenient value such that the tricks we have learned in class (particularly
the complement trick) do not help much. What can we do? To make this situation more clear, consider
the following example. Illustration. “Suppose an airline checks in 600 pieces of luggage and the
probability that a bag will arrive at its intended destination is 0.6. Find the probability that at least 400
bags arrive.” From the previous section we know that we must define success, a random variable, and
the parameters n and p. Define a success as the event that the bag arrives at its correct location. Then
the probability of success p is given in the problem as 0.6. We can consider each bag a Bernoulli trial
since a bag either arrives at its correct location or it doesn’t. Let’s assume that each bag (trial) is
independent of any other bag. Let X be the number of bags that arrive at the intended location. I can
calculate the probability as... P(X ≥ 400) = 600C400(0.6)400(0.4)200 + 600C401(0.6)401(0.4)199 + . . . +
600C600(0.6)600(0.4)0 which is insane. There may be a better way. We can instead use the normal
approximation to the binomial if some conditions are met first: 1. all of the conditions for the binomial
distribution must be satisfied, 2. np ≥ 10 3. nq ≥ 10 In this problem, we have already shown that we can
use the binomial distribution. Furthermore, we see that np = (600)(0.6) = 360 > 10 and nq = (600)(0.4) =
240 > 10 so both of those conditions are satisfied and we can use the normal approximation to the
binomial! Woof! So, define a random variable, say Y . We can say that Y follows a normal distribution
with mean np and standard deviation √npq. So, notice that we are replacing the usual µ in the normal
distribution with the expected value (mean) of the binomial distribution, and we are replacing the usual
σ in the normal distribution with the standard deviation of the binomial distribution. Now we can
compute a z score for this problem since Y is normal. Now I can consider the problem of finding P(Y ≥
400). z = Y − np √npq = 400 − 360 √ 144 ≈ 3.33 So, P(Y ≥ 400) = P(Z ≥ 3.33) = 1 − P(Z ≤ 3.33) = 1 − 0.9995 |
{z } from table = 0.0005 So if we were asked if we would be surprised if at least 400 of bags arrived, we
would say yes. If we were asked if it would be likely for 400 or more bags to arrive at the intended
destination, we would say no. 3 Challenging Problems Do not attempt these problems until you are
comfortable with the binomial distribution. They are designed to be tricky to illustrate the variations of
the problem that we can consider. These problems are challenging and are designed to not only assess
how well you understand the concepts, but how well you can interpret real-life situations and translate
them into what you have learned in class. 1. Packet loss is a problem that occurs very frequently on the
Internet. When some node A sends a piece of data (called a packet) to a second node B, we would like
to believe that it arrives. Sometimes this is not the case. Lossy data transmission results in slow speeds,
or loss of connectivity. Suppose a network technician sends 10,000 packets from A to B and observes
that 7,200 of them were not received by B. Define packet loss as the proportion of packets that were
not received at B. (a) Suppose that 10 packets are waiting to be transmitted from A to B. The data is only
useful if all of the packets arrive successfully. Find the probability that the data is usable; that is, find the
probability that all 10 packets arrive successfully. Assume that packet arrival is independent. Solution.
Define each packet transmission as a trial. Note that each trial has only two outcomes: a packet either
arrives or it doesn’t, so the Bernoulli trial condition is satisfied. Next, we assume that packet arrival is
independent. In this case, that condition is satisfied because it is given in the problem. Since both
conditions are satisfied, we can use the binomial distribution. Now let’s find n and p as well as our
random variable X. Since we transmit 10 packets and each packet’s transmission constitutes a trial, the
number of trials is n = 10. Define a success as the event that packet from A arrives at B. Let X denote the
number of packets that arrive at B. Now we need to find the probability of success p. Note here that
packet loss is the proportion of packets that did not arrive at B, so 1 − p = 7200 10000 . Then p = 1 −
7200 10000 = 1 − 0.72 = 0.28. The data is usable only if all 10 packets arrive, so we need to find P(X =
10). P(X = 10) = 10C10(0.28)10(0.72)0 = (0.28)10 = 3 × 10−6 ≈ 0 In other words, get a new Internet
connection. Aside. Suppose we want to find the probability that at least 9 of the 10 packets arrive at B.
We can use the basic probability tricks we have learned to solve this problem P(X ≥ 9) = P(X = 9) + P(X =
10) = 10C9(0.28)9 (0.72)1 + 10C10(0.28)10(0.72)0 4 (b) Suppose the technician sends 10,000 packets
from node A to node B. What is the probability that 2,850 packets or less will arrive at B? Solution. Note
that n is very large, which should imply using the normal approximation. We note that np = (10,
000)(0.28) = 2, 800 > 10 and nq = (10, 000)(0.72) = 7, 200 > 10. Also, we are given that packet arrival is
independent as a freebie in the problem, so all of the conditions for the normal approximation are
satisfied. We want to use the normal approximation to the binomial distribution. So, let Y follow a
normal distribution with mean np = 2800 and standard deviation √npq = √ 10000 × 0.28 × 0.72 = 44.9.
Since Y is normally distributed, we can calculate the z value as follows z = Y − np √npq = 2850 − 2800
44.9 = 1.11 P(Z ≤ 1.11) = 0.867 Therefore, there is a 0.867 probability that 2,850 packets or less out of
10,000 packets sent will arrive at node B under 72% packet loss.

Question 4 Write short notes on

a. Type I and Type II error


b. Level of Significance
c. Null Hypothesis
d. Two–tailed Tests and One–tailed Tests
e. Test Statistics
Answer a . Two types of errors are possible: type I and type II. The risks of these two errors are
inversely related and determined by the level of significance and the power for the test. Therefore, you
should determine which error has more severe consequences for your situation before you define their
risks. No hypothesis test is 100% certain. Because the test is based on probabilities, there is always a
chance of drawing an incorrect conclusion.

Type I error

When the null hypothesis is true and you reject it, you make a type I error. The probability of making a
type I error is α, which is the level of significance you set for your hypothesis test. An α of 0.05 indicates
that you are willing to accept a 5% chance that you are wrong when you reject the null hypothesis. To
lower this risk, you must use a lower value for α. However, using a lower value for alpha means that you
will be less likely to detect a true difference if one really exists.

Type II error

When the null hypothesis is false and you fail to reject it, you make a type II error. The probability of
making a type II error is β, which depends on the power of the test. You can decrease your risk of
committing a type II error by ensuring your test has enough power. You can do this by ensuring your
sample size is large enough to detect a practical difference when one truly exists.

The probability of rejecting the null hypothesis when it is false is equal to 1–β. This value is the power of
the test.

Answer B .Null Hypothesis

Decision True False

Fail to reject Correct Decision (probability = 1 - α) Type II Error - fail to reject the null when it is
false (probability = β)

Reject Type I Error - rejecting the null when it is true (probability = α) Correct Decision (probability = 1
- β)

Example of type I and type II error

To understand the interrelationship between type I and type II error, and to determine which error has
more severe consequences for your situation, consider the following example.

A medical researcher wants to compare the effectiveness of two medications. The null and alternative
hypotheses are:
Null hypothesis (H0): μ1= μ2

The two medications are equally effective.

Alternative hypothesis (H1): μ1≠ μ2

The two medications are not equally effective.

A type I error occurs if the researcher rejects the null hypothesis and concludes that the two
medications are different when, in fact, they are not. If the medications have the same effectiveness,
the researcher may not consider this error too severe because the patients still benefit from the same
level of effectiveness regardless of which medicine they take. However, if a type II error occurs, the
researcher fails to reject the null hypothesis when it should be rejected. That is, the researcher
concludes that the medications are the same when, in fact, they are different. This error is potentially
life-threatening if the less-effective medication is sold to the public instead of the more effective one.

Answer c. In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of
computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic.
A two-tailed test is used if deviations of the estimated parameter in either direction from some
benchmark value are considered theoretically possible; in contrast, a one-tailed test is used if only
deviations in one direction are considered possible. Alternative names are one-sided and two-
sided tests; the terminology "tail" is used because the extreme portions of distributions, where
observations lead to rejection of the null hypothesis, are small and often "tail off" toward zero as in
the normal distribution or "bell curve", pictured above right.

Answer D. Test Statistics is A test statistic is a statistic (a quantity derived from the sample) used in
statistical hypothesis testing.[1] A hypothesis test is typically specified in terms of a test statistic,
considered as a numerical summary of a data-set that reduces the data to one value that can be used to
perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to
quantify, within observed data, behaviours that would distinguish the null from the alternative
hypothesis, where such an alternative is prescribed, or that would characterize the null hypothesis if
there is no explicitly stated alternative hypothesis.
An important property of a test statistic is that its sampling distribution under the null hypothesis must
be calculable, either exactly or approximately, which allows p-values to be calculated. A test statistic
shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test
statistics and descriptive statistics. However, a test statistic is specifically intended for use in statistical
testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some
informative descriptive statistics, such as the sample range, do not make good test statistics since it is
difficult to determine their sampling distribution.

Question 5 . Explain the concept of Chi-Square. Following table depicts the production in three shifts and
the number of defective goods that turned out in three weeks in an organization.

Test at 5%level of significance whether weeks and shifts are independent.

Answer 5 , A chi-squared test, also referred to as a test (or chi-square test), is any statistical hypothesis
test wherein the sampling distribution of the test statistic is a chi-square distribution when the null
hypothesis is true. Chi-squared tests are often constructed from a sum of squared errors, or through
the sample variance. Test statistics that follow a chi-squared distribution arise from an assumption of
independent normally distributed data, which is valid in many cases due to the central limit theorem. A
chi-squared test can be used to attempt rejection of the null hypothesis that the data are independent.

Also considered a chi-square test is a test in which this is asymptotically true, meaning that the sampling
distribution (if the null hypothesis is true) can be made to approximate a chi-square distribution as
closely as desired by making the sample size large enough. The chi-squared test is used to determine
whether there is a significant difference between the expected frequencies and the observed
frequencies in one or more categories.

Вам также может понравиться