Академический Документы
Профессиональный Документы
Культура Документы
Categorical (qualitative) variables take categories as their values such as yes, no, or blue, brown, green. ie. Marital Status,
Political Party, Eye Color
Numerical (quantitative) variables have values that represent a counted or measured quantity.
Discrete variables arise from a counting process ie. Number of Children, Defects per hour, (Counted items)
Continuous variables arise from a measuring process. ie. Weight, Voltage, (Measured characteristics)
Operational definitions are universally accepted meanings that are clear to all associated with an analysis. It is a clear and precise
statement that provides a common understanding of meaning.
What does DCOVA stand for? It stands for Define, Collect, Organize, Visualize, Analyze.
Examples Of Business Objectives:
1. A marketing research analyst needs to assess the effectiveness of a new television advertisement.
2. A pharmaceutical manufacturer needs to determine whether a new drug is more effective than those currently in use.
3. An operations manager wants to monitor a manufacturing process to find out whether the quality of the product being
manufactured is conforming to company standards.
4. An auditor wants to review the financial transactions of a company in order to determine whether the company is in compliance
with generally accepted accounting principles
Primary Sources: The data collector is the one using the data for analysis. For example, Data from a political survey, Data collected from
an experiment, Observed data.
Secondary Sources: The person performing data analysis is not the data collector. For example, Analyzing census data, Examining data
from print journals or data published on the internet.
Sources of data fall into five categories:
1. Data distributed by an organization or an individual
2. The outcomes of a designed experiment
3. The responses from a survey
4. The results of conducting an observational study
5. Data collected by ongoing business activities
Structured Data Follows An Organizing Principle & Unstructured Data Does Not.
A Stock Ticker Provides Structured Data:
The stock ticker repeatedly reports a company name, the number of shares last traded, the bid price, and the percent
change in the stock price.
Due to their inherent structure, data from tables and forms are structured data.
E-mails from five people concerning stock trades is an example of unstructured data.
In these e-mails you cannot count on the information being shared in a specific order or format.
This book deals exclusively with structured data
Samples broken into two groups, 1. non-probability and 2. probability samples. The non-probability samples consist of 1. judgement and
2. convenience. The probability samples consist of 1. Simple random, 2. Systematic, 3. Stratified, and 4. Cluster.
In a nonprobability sample, items included are chosen without regard to their probability of occurrence.
In convenience sampling, items are selected based only on the fact that they are easy, inexpensive, or convenient to
sample.
In a judgment sample, you get the opinions of pre-selected experts in the subject matter.
In a probability sample, items in the sample are chosen on the basis of known probabilities.
Simple Random Sample = In simple random sample every individual or item from the frame has an equal chance of being selected.
Selection may be with replacement (selected individual is returned to frame for possible reselection) or without replacement (selected
individual isnt returned to the frame). Samples obtained from table of random numbers or computer random number generators.
Systematic Sample = In systematic sample you decide on sample size: n. You divide frame of N individuals into groups of k individuals:
Then K=big n divided little n. k=N/n. You randomly select one individual from the 1st group. Then select every k th individual thereafter.
Stratified Sample = In stratified sample you divide population into two or more subgroups (called strata) according to some common
characteristic. A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes. Samples from
subgroups are combined into one. This is a common technique when sampling population of voters, stratifying across racial or socio-
economic lines.
Cluster Sample = In a cluster sample the population is divided into several clusters, each representative of the population. A simple
random sample of clusters is selected
All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique. A
common application of cluster sampling involves election exit polls, where certain election districts are selected and sampled.