Вы находитесь на странице: 1из 31

This document contains descriptions and context information for many of the samp le data sets that are

provided with SAS Enterprise Guide. ----------------------------------------------------------------------Name: A Analysis: Logistic ANOVA, Logistic Regression, Mixed Model ANOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: A clinical trial was conducted comparing two treatments, an experim ental drug versus a control. The study was conducted at eight clinics. At each clinic, patients were assigned at random to either the experimental drug or the control. The variables in the data set are the clinic number (CLINIC), the a ssignment to the experimental drug or the control (TRT), the number of patients with favorable responses (FAV), the nu mber of patients with unfavorable responses (UNFAV), and the total number of patients at that clinic assigned to t hat treatment (NIJ). For educational purposes, the data can be analyzed with CLINIC as a random or a fixed effect. Graphic Analysis: scatter plot SAS Product: Enterprise Guide; SAS/STAT Size: 16 rows, 5 columns ----------------------------------------------------------------------Name: AML_Survival Analysis: Survival Analysis Reference: Embury, S.H., Elias, L., Heller, P.H., Hood, C.E., Greenberg, P.L., and Schrier, S.L. 1977. Remission maintenance therapy in acute myelogenous leukemia. Western Journal of Medicine. 126: 267-272. Description: The dataset AML_Survival Data contains information on a trial conducted by Embury et al. (1977) at Stanford University. The investigators were concerned with the efficacy of maintenance therapy for acute myelogenous leukemia (aml). Initially, patients were treated by chemotherapy until remission. Then, these patients were randomized into two groups-a treatment group that received maintenance therapy and a control group that did not. Individuals in both groups were followed until they suffered a relapse, the event of interest. The event time variable is defined as the length of time in remission, i.e., the time from entry into the study until relapse. Graphic Analysis: SAS Product: Enterprise Guide; SAS/STAT Size: 23 rows, 3 columns ----------------------------------------------------------------------Name: Arrestrates

Analysis: Descriptive statistics, time series analysis, ANOVA Reference: U.S. Department of Justice, Office of Justice Programs, Bureau of Ju stice Statistics http://www.ojp.usdoj.gov/bjs/dtdata.htm Description: The data set is a record of the arrests per 100,000 people in each age group in the United States from 1970 through 1999. The variables in the data set are: year (YEAR), arrests per 1 00 thousand in population (RATE), and the age group (AGEGROUP). The age groups are defined as (1) 14 and under, (2) 15 -17, (3) 18-20, (4) 21-24, and (5) 25 or over. The data set could be used to generate descriptive statistics by age group or to do a time series analysis to predict the arrest rates by age group. These predictions might be used in ass essing the need for judicial system infrastructure changes. Finally, the data could be used to compare age gr oups with an ANOVA. This data set is a subset of the data in the data set totarrests. Graphic Analysis: scatter plots SAS Product: Enterprise Guide; SAS/STAT; SAS/ETS Size: 150 rows, 3 columns ----------------------------------------------------------------------Name: auction Analysis: multiple linear regression Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: This is data from 19 livestock auction markets. The columns includ e: the number of head of different livestock sold (in thousands) including CATTLE, CALVES, HOGS, and SHEE P, the cost of operation of the auction market (in thousands of dollars) (COST), and the market identifie r (MARKETID). The object is to use multiple linear regression to describe the relationship between the co st of operations to the number of livestock sold in the various classes. COST will be the dependent vari able and CATTLE, CALVES, HOGS, and SHEEP the independent variables. An additional variable, VOLUME, is the total of all major livestock sold in each market. It is the sum of the variables CATTLE, CALVES, HOGS, and SHEEP, and can be used to demostrate an exact linear dependency between independent variables. Graphic Analysis: scatter plot SAS Product: Enterprise Guide; SAS/STAT

Size: 19 rows, 7 columns ----------------------------------------------------------------------Name: Beer_Sales Analysis: Spline Regression; Local Regression; Time Series Reference: Neter J., Wasserman, W., and Whitmore, G.A. 1988. Applied Statistics. 3rd edition. Allyn & Bacon, New York. 975-6. Description: Beer sales records monthly sales of beer in hectoliters, along with the average high and low temperatures in the region, over a period of five years. The object is to see how beer sales change over time. You can also consider the relationships between beer sales and temperatures. Graphic Analysis: Scatter Plots SAS Product: SAS/STAT; SAS/ETS; Enterprise Guide (for time series analysis) Size: 4 columns, 60 rows ----------------------------------------------------------------------Name: BloodPressure Analysis: Paired-sample t-test; one-sample t-test; regression Reference: Generated for use in SAS Course Notes Description: Consider an experiment to examine the effectiveness of a medication in reducing blood pressure. A random sample of individuals with high blood pressure is taken and their diastolic pressure is recorded. The individuals are then placed on medication and one month later their diastolic blood pressure is once again recorded. The dataset contains the following variables: subject, age, baseline blood pressure, and new blood pressure. Graphic Analysis: histograms, box plots, scatter plots SAS Product: Enterprise Guide; SAS/STAT Size: 60 rows, 4 columns ----------------------------------------------------------------------Name: Boston Housing Data Analysis: two-sample t-test; simple and multiple linear regression with transformation of the response variable Reference: Blake, C., Keogh, E. and Merz, C.J. (1998), UCI Repository of machine learning databases (http://www.ics.uci.edu/~mlearn/MLRepository.html), Irvine, CA: University of California, Department of Information and Computer Science. Description: The data set contains census information for 506 housing tracts in the Boston area. You can perform a two-sample t-test to examine the median values of owner-occupied homes in two groups of housing tracts, those near the Charles River and those farther away from it. The data can also be used to

develop a regression model to predict median home values based on the other variables in the data set, such as crime rate, the percentage of industrial business acres, nitrogen oxide concentration, the average number of rooms in a home, the percentage of home built before 1940, the accessibility to radial highways, the property tax rate, the percentage of lower economic status families in the housing tract, and the pupil/teacher ratios in the local school. Graphic Analysis: scatter plots, confidence ellipses SAS Product: Enterprise Guide; SAS/STAT Size: 506 rows, 13 columns ----------------------------------------------------------------------Name: bullets Analysis: Descriptive statistics, confidence intervals, two-sample t-test Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: The data was collected to determine if there is a difference in th e muzzle velocity (VELOCITY) of cartridges made from two types of gunpowder (POWDER). Graphic Analysis: box plot, histogram SAS Product: Enterprise Guide; SAS/STAT Size: 18 rows, 2 columns ----------------------------------------------------------------------Name: calves Analysis: ANOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Two types fo feed rations (FEED) are given to calves from three di fferent sires (SIRE). The dependent variable is the coded amount of weight gain for each calf (WEIGHTGAIN). Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 18 rows, 3 columns ----------------------------------------------------------------------Name: calves2 Analysis: ANOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc.

Description: Two types fo feed rations (FEED) are given to calves from three di fferent sires (SIRE). The dependent variable is the coded amount of weight gain for each calf (WEIGHTGAIN). (Note: t his is the same data as the calves data set but with an empty cell) Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 18 rows, 3 columns ----------------------------------------------------------------------Name: Candy Analysis: Descriptive Statistics; Confidence Intervals Reference: Using StatView, 2nd edition. SAS Institute Inc. Description: Since 1994, the United States Food and Drug Administration (FDA) has required uniform, easy-to-read nutrition labeling for nearly all foods. The purpose of the new label is to reduce confusion and help consumers choose more healthful diets. The United States Department of Agriculture (USDA) and the Department of Health and Human Service (HHS) have teamed up to produce the Food Guide Pyramid, which recommends eating a variety of foods, an appropriate number of calories, and a modest amount of fat-specifically, 30% or fewer of your total number of calories per day should be calories from fat, and only a third of those should be calories from saturated fat. For adults consuming 2000 calories per day, which works out to no more than 65 grams of fat, no more than 20 grams of which are saturated fat. We want to know how many candy bars can fit into this daily diet. We found nutritional facts about every candy bar we could find. We also included some non-bar candies like M&Ms, Reese's Pieces, Skittles, and Super hot Tamales. Graphic Analysis: box plots; scatter plots; histograms SAS Product: Enterprise Guide; Base SAS; SAS/GRAPH Size: 75 rows, 17 columns ----------------------------------------------------------------------Name: Candy data sets- Candy_Customers, Candy_Products, Candy_Sales_History, Candy_Sales_Summary, Candy_Time_Periods Analysis: Descriptive statistics, time series analysis, ANOVA, correlations, query, data management, data mining Reference: Stephen McDaniel, SAS, 2005 Description: This collection of data sets is for a fictional candy companyLots O' Calories. Graphic Analysis: scatter plots, bar, box, bubble, line, bar-line SAS Product: Enterprise Guide; SAS/STAT; SAS/ETS; SAS GRAPH;

SAS Forecast Studio; SAS Enterprise Miner ----------------------------------------------------------------------Name: Cars Analysis: Descriptive statistics; ANOVA Reference: StatView Reference, 2nd edition (1998). SAS Institute Inc. Description: The data set contains information on cars such as weight, gas tank size, turning radius, horsepower and engine displacement for 116 cars from different countries. Graphic Analysis: scatter plots; box plots; histograms SAS Product: Enterprise Guide; SAS/STAT; SAS/GRAPH Size: 116 rows, 8 columns ----------------------------------------------------------------------Name: cars_1993 Analysis: descriptive statistics, t-tests, ANOVA, Regression, ANCOVA, data transformation Reference: This represents a subset of the information reported in the 1993 Cars Annual Auto Issue published by Consumer Reports and from Pace New Car and Truck 1993 Buying Guide Description: A random sample of 92 1993 model cars is contained in this data set. The information for each car includes: manufacturer, model, type (sporty, van, small, midsize, large, or compact), price (in thousands of dollars), city mpg, highway mpg, engine size, horsepower, fuel tank size, weight, and origin (US or non-US). The data is excellent for doing descriptive statistics by groups or an ANOVA or regression with price as the response variable. Note that violations of the assumptions are probably present and transformation of the response variable is most likely necessary. Graphic Analysis: scatter plot; histogram; box plot; bar chart SAS Product: Enterprise Guide; base SAS; SAS/STAT Size: 92 rows, 12 columns ----------------------------------------------------------------------Name: challenger Analysis: Logistic Regression, Probit regression Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Data documenting the presence or absence of primary O-ring thermal distress in the 23 shuttle launches preceding the Challenger mission were collected. The focus of this data is to determine if there is a

relationship between the temperature at launch time and o-ring thermal distress. The variables in the data set are temperature at launch (TEMP), the number of launches in which a thermal distress occured for that temperature (TD), the total number of launches at that temperature (TOTAL), and the number of laun ches in which thermal distress did not occur at that temperature (NO_TD). This data exists in an alternative form in the data set O-ring. In that case, th e variables in the data set are the flight number (FLT), the temperature at launch (TEMP), and an indicator vari able for whether or not there was thermal distress during the launch (TD) (0=no distress, 1=distress). Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 16 rows, 4 columns ----------------------------------------------------------------------Name: chips Analysis: Crossed-Nested ANOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: An engineer in a semiconductor plant investigated the effect of se veral models of a process condition (ET) on the resistance in computer chips. Twele silicon wafers (WAFER) were drawn from a lot, and three wafers were randomly assigned to each of four modes of ET. Resistance (RES ISTANCE) in the chips was measured in four positions (POSITION) on each wafer after processing. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 48 rows, 4 columns ----------------------------------------------------------------------Name: Cholesterol Analysis: paired sample t-test Reference: Generated for SAS training course Description: Suppose that cholesterol measurements are taken on a group of subjects with high cholesterol levels. After these measurements are collected, the subjects attend a training session that discusses methods to control cholesterol levels including such things as diet and exercise. After a specified period of time, cholesterol measurements are collected on each of the subjects again. You want to determine whether there is a difference between the cholesterol measurements before and after the training. Graphic Analysis: scatter plot, box plot

SAS Product: Enterprise Guide; SAS/STAT Size: 95 rows, 3 columns ----------------------------------------------------------------------Name: Coffee Analysis: contingency table; simple logistic regression Reference: MacMahon, B., S. Yen, D. Trichopoulos, K. Warren, and G. Nardi. 1981. Coffee and cancer of the pancreas. New England Journal of Medicine. 304(11). 630-33. Description: This example is based on a dataset relating coffee consumption to incidence of pancreatic cancer. These data arose from a case-control study, and for this illustration we will use the data for male subjects. Case Outcome is a binary category variable recording whether each individual represents a case (pancreatic cancer) or a control (no cancer). Daily Coffee is a continuous variable recording how much coffee each individual drinks: 0 for none, 1.5 for 1-2 cups per day, 3.5 for 3-4 cups per day, or 5.5 for 5 or more cups per day. Graphic Analysis: SAS Product: Enterprise Guide; SAS/STAT Size: 523 rows, 3 columns ----------------------------------------------------------------------Name: College Analysis: One-Way ANOVA, Regression, ANCOVA Reference: Money magazine, 1991. Description: The data is a collection of information on colleges and universities collected in the early 1990's. The primary interest is in predicting graduation rates, the percent of students who graduate from the institution in four years. Potential predictor variables are tuition, type of college (public or private), and region of the country. Graphic Analysis: Box Plots, Scatter Plots Size: 200 rows, 6 columns ----------------------------------------------------------------------Name: Colonoscopy Analysis: Contingency Table Analysis, Ordinal Logistic Regression Reference: Grossman, S., M. Milos, I. S. Tekawa, and N. P. Jewell. 1989. Colonoscopic screening of persons with suspected risk factors for colon cancer: II. Past history of colorectal neoplasms. Gastroenterology. 96. 299-306. Description: The data are from a prospective study of the findings of a colonoscopy screening study on individuals considered to be at high risk of colon cancer. The purpose of the study was to determine the role of past history

in predicting the findings of a current colonoscopy. The cases considered here correspond to 406 individuals who had adenoma findings in previous colon examinations and who are therefore considered to be at high risk of a subsequent significant finding. The two variables in the data set are Finding (coded 0 for negative examination, 1 for small adenoma, and 2 for large adenoma) and Age. All ages have been rounded, ages 30-39 years coded as 35, ages 40-49 years coded as 45, etc. Graphic Analysis: bar charts SAS Product: Enterprise Guide; SAS/STAT Size: 406 rows, 2 columns ----------------------------------------------------------------------Name: corn Analysis: correlation and regression Reference: Draper, N.R. and Smith, H. (1981) Applied Regression Analysis, Second Edition, New York: John Wiley & Sons, Inc. Description: The data was collected to examine the effect of weather related phenomena on corn yield. The data set includes information on the total precipitation (in inches) for the year prior to the start of the growing season, the average daily temperature (in degrees Fahrenheit) for each of the months of May through August, the total rain (in inches) during each of the months June through August, and the corn yield (in bushels per acre). This information was collected for each of the years 1930 through 1962. The year is also included in the data set. You are interested in determining the relationship between the corn yield and the other variables. Graphic Analysis: scatter plots; box plots; histograms SAS Product: Enterprise Guide; SAS/STAT; SAS/GRAPH Size: 33 rows, 10 columns ----------------------------------------------------------------------Name: cotton Analysis: Mixed Model Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: This data is from a two factor factorial with two stages of subsam pling. The object of the study is to estimate the weight of usable lint (LINT) from the total weight of cotton bolls (BOLLWT). In addition, the researcher wants to see if lint estimation is affected by varieties of cotto n (VARIETY) and the distance between planting rows (SPACING). The study is a factorial experiment with two le vels of VARIETY and two levels of SPACING. There are two plants for each VARIETY x SPACING treatment combinatio n, and there are from five

to nine bolls per plant (PLANT). Graphic Analysis: box plots, histograms, means plots, scatter plots SAS Product: Enterprise Guide; SAS/STAT Size: 49 rows, 5 columns ----------------------------------------------------------------------Name: cotton1 Analysis: Multivariate ANOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: The total weight of a mature cotton boll can be divided into three parts: the weight of the seeds, the weight of the lint, and the weight of the bract. Lint and seed constitute th e economic yield of cotton. In this data, the differences in thre three compontnts of the cotton bolls due t o two varieties (VARIETY) and two plant spacings (SPACING) are studied. Five plants are chosen at random from each of the four treatment combinations. Two bolls are picked from each plant, and the weights of the seeds, linc, and br act are recorded. Graphic Analysis: box plots, histograms, means plots, scatter plots SAS Product: Enterprise Guide; SAS/STAT Size: 40 rows, 6 columns ----------------------------------------------------------------------Name: counts Analysis: Poisson Regression Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: This data is from an insect control experiment. The treatment desig n consisted of an untreated control group (TRT=0) and a 3x3 factorial for a total of 10 treatments. The experiment w as conducted as a randomized complete block design with four blocks (BLOCK). The response variable was the insect coun t (COUNT). The variable CTL_TRT is coded 0 for control and 1 otherwise. The two treatment factors, A and B, have 3 levels each (1, 2, and 3) and both are coded 0 for the control. Graphic Analysis: scatter plot SAS Product: Enterprise Guide; SAS/STAT Size: 40 rows, 6 columns -----------------------------------------------------------------------

Name: cult_inoc Analysis: Split-Plot Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: This data was collected to analyze the effect of three bacterial i noculation treatments (INOCULATION) apllied to two cultivars of grasses (CULTIVAR) on dry weight yields (DRYWT). The experiment is a split-plot design with CULTIVAR (levels a and b) as the main plot factor and INOCULATION as the subplot factor. INOCULATION has the values control, live and dead. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 24 rows, 4 columns ----------------------------------------------------------------------Name: defecttypes Analysis: quality control: pareto charts Reference: Generated for StatView Reference, 2nd edition. 1998. SAS Institute I nc. Description: You are in charge of the quality control effort at a bicycle manufacturer that specializes in limited production frames. The most popular model your company produces is a day touring model called the "Arribe!", which is a racing-style frame for weekend warriors. The seat tube has been a source of quality problems in the manufacturing plant in the past. This data set has information on reasons for rejection of the seat tubes during weeks 7 and 8 of a recent manufacturing cycle. You will use this data to determine if the pattern of defects is the same for both weeks. Graphic Analysis: pareto charts SAS Product: Enterprise Guide; SAS/QC Size: 1000 rows, 2 columns ----------------------------------------------------------------------Name: drug Analysis: ANOVA Reference: Generated for SAS training course Description: This is data from an experiment to evaluate the effect of four different drugs on blood pressure for individuals with one of three possible diseases. Each individual is administered one of the four drugs over a period of time and the increase in systolic blood pressure is recorded. You want to compare the average increase in blood pressure for the different drugs and diseases.

Graphic Analysis: box plot; histogram SAS Product: Enterprise Guide; SAS/STAT Size: 72 rows, 3 columns ----------------------------------------------------------------------Name: drugs Analysis: Unbalanced Anova, Mixed Model Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: A pharmaceutical company compared effects of two drugs, A and B, o n a clinical measurement called FLUSH. The studey untilized patients in 10 clinics in order to obtain respresent ation from diverse patient populations. The variables in the data set are STUDY (which is the clinic identifier), TREATM ENT, PATIENT, FLUSH0, FLUSH. The values of FLUSH0 were obtained prior to administration of the drugs. If you assume the clinics are ranodmly selected from a population of clinics, th en clinics becomes a random effect and the model is a mixed model. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 135 rows, 5 columns ----------------------------------------------------------------------Name: drugs1 Analysis: Unbalanced Anova with empty cells Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: A pharmaceutical company compared effects of two drugs, A and B, o n a clinical measurement called FLUSH. The studey untilized patients in 10 clinics in order to obtain respresent ation from diverse patient populations. The variables in the data set are STUDY (which is the clinic identifier), TREATM ENT, PATIENT, FLUSH0, FLUSH. The values of FLUSH0 were obtained prior to administration of the drugs. (Note: this is the same data as the DRUGS data set with the addition of STUDY 41 which only had observations for drug B. T herefore, there is an empty cell associated with the STUDY 41, DRUG A combination.) If you assume that the clinics were randomly selected from a larger population o f clinics, then CLINIC is a random effect and the model becomes a mixed model. Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT Size: 151 rows, 5 columns ----------------------------------------------------------------------Name: Exercise Analysis: ANOVA, MANOVA Reference: Generated for StatView Reference, 2nd edition. 1998. SAS Institute I nc. 108. Description: You are an exercise physiologist who wants to determine whether stretching and wearing ankle weights has any effect on the value of treadmill exercise. You could test this hypothesis by measuring calories burned, average speed in meters per minute, and oxygen consumed in liters for a number of subjects who you have previously determined have roughly the same level of physical fitness, divided randomly into four groups: with or without ankle weights, and with or without a period of stretching before the exercise. Graphic Analysis: box plots, histograms SAS Product: Enterprise Guide (ANOVA); SAS/STAT Size: 20 rows, 5 columns ----------------------------------------------------------------------Name: FEV1MULT Analysis: Repeated Measures Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: A pharmaceutical compnay examined effects of three drugs on respira tory ability of asthma patients. The drugs were randomly assigned to 24 patients each. The assigned drug was admi nistered to each patient. Then a standard measure of respiratory ability called FEV1 was measured hourly for eigh t hours following treatment. FEv1 was also measured immediately prior to administering the drug (BASEFEV1). This data set is organized to perform a mulitvariate analysis of repeated measur es data. That is, every patient appears in only one row of the data set with a separate column for each of the eight FEV1 measurements. The FEV1UNI data set is organized to perform a univariate ANOVA of the repeated measures data. That is, every FEV1 measurement is a different row in the data set. Therefore, each patient has 8 ro ws in the data set. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 72 rows, 11 columns

----------------------------------------------------------------------Name: FEV1UNI Analysis: Repeated Measures Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: A pharmaceutical compnay examined effects of three drugs on respira tory ability of asthma patients. The drugs were randomly assigned to 24 patients each. The assigned drug was admi nistered to each patient. Then a standard measure of respiratory ability called FEV1 was measured hourly for eigh t hours following treatment. FEv1 was also measured immediately prior to administering the drug (BASEFEV1). This data set is organized to perform a univariate ANOVA of the repeated measure s data. That is, every FEV1 measurement is a different row in the data set. Therefore, each patient has 8 ro ws in the data set. The data set FEV1MULT is organized to perform a mulitvariate analysis of repeate d measures data. That is, every patient appears in only one row of the data set with a separate column for each of the eight FEV1 measurements. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 576 rows, 5 columns ----------------------------------------------------------------------Name: filemarks Analysis: quality control: c/u charts Reference: Generated for StatView Reference, 2nd edition. 1998. SAS Institute I nc. Description: You are in charge of the quality control effort at a bicycle manufacturer that specializes in limited production frames. The most popular model your company produces is a day touring model called the "Arribe!", which is a racing-style frame for weekend warriors. The seat tubes of the bicycle frames are inspected. One of the most common problems with the tubes is stray file marks. Although this does not affect the functionality of the seat tube, it does affect the looks. The filemarks data set contains 10 weeks of inspection data and has the total number of files marks for all bicycle tubes inspected during each of the ten weeks. Graphic Analysis: quality control charts SAS Product: Enterprise Guide; SAS/QC Size: 10 rows, 3 columns

----------------------------------------------------------------------Name: Flaxoil Analysis: Randomized Complete Block ANOVA, Mixed Model ANOVA Reference: Steel, R. G. D., and Torrie, J.H. 1980. Principles and Procedures of Statistics: a Biometrical Approach. McGraw-Hill, New York. Description: The Flax Oil data set includes percentage measurements of oil content in flaxseed grown in each of four different locations for six different treatments. At each location one plant was inoculated with bacteria as a seedling, one plant in early bloom, one in full bloom, one at a lower dose in full bloom, and one when the plant was ripening. A sixth plant in each location was a control case, not inoculated at all. There was no replication of treatment by location combinations. The purpose of the experiment was to determine whether the treatments had any effect on the oil content of the flaxseed. The four locations represent the blocks in this experiment. One could consider that the blocking factor should be treated as a random effect, which would result in a mixed model. Graphic Analysis: histogram; box plot SAS Product: Enterprise Guide (ANOVA); SAS/STAT (Mixed Model) Size: 24 rows, 3 columns ----------------------------------------------------------------------Name: fr_t7_3 Analysis: Logistic ANCOVA, Logistic Regression Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: This data set has data from a bioassay involving two drugs, standar d (STD) and treated (TRT) injected in varying dosages to 20 mice per treatment-dose combination. The response varia ble of interest is the number of mice (out of the 20) that are ALIVE versus the number DEAD. An alternative analy sis approach is to use the base 2 log of dosage rather than the actual dosages. This is included as the var iable X. Graphic Analysis: scatter plot SAS Product: Enterprise Guide; SAS/STAT Size: 9 rows, 6 columns ----------------------------------------------------------------------Name: garments Analysis: Latin Square Design Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition.

Cary, N.C.: SAS Institute Inc. Description: Four materials (MAterial) used in permanent press garments are sub jected to a test for weight loss (WTLOSS) and shrinkage (SHRINK). The materials are placed in a heart chamber that has four control settings or positions (POSITION). The test is conducted in four runs (RU N), with each material assigned to each of the four positions in one run fo the experiment. The weight loss and shrinkage are measured on each sample after each test. Graphic Analysis: box plots, histograms SAS Product: Enterprise Guide; SAS/STAT Size: 16 rows, 5 columns ----------------------------------------------------------------------Name: grasses Analysis: Two-way ANOVA, Mixed Model Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Three methods of promoting seed growth (METHOD) are applied to see d from each of five varieties (VARIETY). Six plots are planted with seed from each METHODxVARIETY combination. The result ing 90 pots were randomly placed in a growth chamber and the dry matter yields were measured after clipping at the end of four weeks. The data are recorded in the data set with each of the six replicate measurement s on the same row in the data set. Therefore, this data will have to be reorganized in order to analyze it as a two -way ANOVA. This can be done using the stack columns function in Enterprise Guide or with a data step program. (Not e: the data set grasses1 contains the same data reorganized for analysis.) In the case where you are interested in a whole population of varieties and thes e five are a random sample from that population, VARIETY would be treated as a random effect, resulting in a mixed mo del. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 15 rows, 9 columns ----------------------------------------------------------------------Name: grasses1 Analysis: Two-way ANOVA, Mixed Model Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition.

Cary, N.C.: SAS Institute Inc. Description: Three methods of promoting seed growth (METHOD) are applied to see d from each of five varieties (VARIETY). Six plots are planted with seed from each METHODxVARIETY combination. The result ing 90 pots were randomly placed in a growth chamber and the dry matter yields were measured after clipping at the end of four weeks (YIELD). Note: this data set contains the same data as the data set grasses except the da ta has been reorganized for analysis. In the case where you are interested in a whole population of varieties and thes e five are a random sample from that population, VARIETY would be treated as a random effect, resulting in a mixed mo del. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 90 rows, 4 columns ----------------------------------------------------------------------Name: Hospice Analysis: Nonparametric ANOVA Reference: Kathryn Skarzynski, Wright State University, Dayton, OH Description: Consider a study done to determine whether there was a change in the number of referrals received from physicians after a visit by a hospice marketing nurse. A portion of Ms. Skarzynski's data about these hospice marketing visits includes physician ID, type of visit, type of practice, date, and change in referrals after one month (change1) and after three months (change3). Graphic Analysis: histogram; box plot SAS Product: Enterprise Guide, SAS/STAT Size: 54 rows, 6 columns ----------------------------------------------------------------------Name: Leppik Analysis: Poisson Regression with Repeated Measures Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: This data is from a study evaluating a new treatment for epilepsy. The variable ID identifies each patient in the study. The treatements are TRT=0, a placebo, and TRT=1, an anti-e pileptic drug. The response variable is the number of seizures over a two-week interval. For the eight weeks prior to placing the participants on treatment, the number of seizures was counted for each patient i n order to form a baseline

measurement (BASE). The patients' ages (AGE) in years are also included in the d ata set. The number of seizures was recorded for each of four two-week time intervals after being placed on the treatment and appears in the data set as Y1 through Y4. Two additional variables appear in the data set that might be used during the an alysis. One is the log of age (LOG_AGE) and the other is the log of (BASE/4). This data set will need to be reorganized for analysis so that one post treatmen t observation appears on each row of the data set. This can be done with Enterprise Guide using the stack function , or with a data set. Also, the appropriately organized data is included in the data set SEIZURE. Graphic Analysis: scatter plot SAS Product: Enterprise Guide; SAS/STAT Size: 59 rows, 10 columns ----------------------------------------------------------------------Name: Lipid Data Analysis: Descriptive Statistics, one-sample t-tests, paired t-tests, correlation analysis, regression, ANOVA Reference: Dr. Terence T. Kuske, Professor of Medicine, Medical College of Georgia, Augusta, GA. Description: Data has been collected from blood lipid screenings as well as patient history. Information such as gender, age, weight, total cholesterol level, blood pressure, coffee consumption, and history of heart disease was collected. The blood lipid screenings were conducted three months after the initial screenings. This data is rich for various analyses. Graphic Analysis: histograms, box plots, bar charts, scatter plots SAS Product: Enterprise Guide; SAS/STAT Size: 95 rows, 25 columns ----------------------------------------------------------------------Name: Marathons Analysis: Two-sample t-test Reference: Description: You are interested in comparing the time it takes to run the marathon in New York City and Boston. A random sample of 50 observations from the Boston marathon and 100 observations from the New York marathon have been recorded and saved. The variables in the dataset include city and time (in hours). Graphic Analysis: histogram; box plot SAS Product: Enterprise Guide

Size: 150 rows, 2 columns ----------------------------------------------------------------------Name: market Analysis: simple linear regression Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: This is data from 19 livestock auction markets, including the numb ers of head of cattle sold (in thousands) (CATTLE), the cost of operations of the auction market (in thousands of dollars) (COST), and the market identifier (MARKETID). The object is to use simple linear regress ion to describes the relationship between the cost of operations to the number of cattle sold. COST will be the de pendent variable and CATTLE the independent variable. Graphic Analysis: scatter plot SAS Product: Enterprise Guide; SAS/STAT Size: 19 rows, 3 columns ----------------------------------------------------------------------Name: methods Analysis: One-way ANOVA, ANOVA with blocks, Mixed Model with random block Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Five methods of providing irrigation (IRRIG) are used on an orange grove. At harvest, the fruit is weighed to determine if the method or irrigation affects fruit weight (FRUITWT). In this case, the grove was divided into eight blocks (BLOCK) to account for local variation in the grove. The assig nment of the irrigation method to the trees within the block was done randomly and each of the irrigation methods appears in every block. Therefore, this is a randomized complete block design. The blocking factor can also be considered a random effect in which case the ana lysis would be done with a mixed model. Graphic Analysis: box plots, histogram SAS Product: Enterprise Guide; SAS/STAT Size: 40 rows, 3 columns ----------------------------------------------------------------------Name: Microbgs Analysis: Nested Mixed Models

Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Microbial counts are made on samples of ground beef in a study to assess sources of variation in numbers of microbes. Tweny packages of ground beef (PACKAGE) are purchased. T hree samples are drawn from each package and two replicate counts are made on each sample. In the data set C T11 refers to the first sample, first replicate count for the package; CT12 refers to the first sample, second r eplicate count; CT21 refers to the second sample, first replicate count; and so on. Again this data set will have to be reorganized before analysis. (Note: the data set microbgs1 contains the reorganized data). Also, because of the skewed nature of the data, it is common to take the logarit hm of the cournts for analysis purposes. In this case, the sample is nested within the package and package is a random ef fect. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 20 rows, 7 columns ----------------------------------------------------------------------Name: Microbgs1 Analysis: Nested Mixed Models Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Microbial counts are made on samples of ground beef in a study to assess sources of variation in numbers of microbes. Tweny packages of ground beef (PACKAGE) are purchased. T hree samples are drawn from each package (SAMPLE) and two replicate counts are made on each sample (REPLICAT E). (Note: the data set here is the same as the data set microbgs except that the data has been reorgani zed for analysis). Also, because of the skewed nature of the data, it is common to take the logarit hm of the cournts for analysis purposes. In this case, the sample is nested within the package and package is a random ef fect. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 20 rows, 7 columns

----------------------------------------------------------------------Name: Nosocomial Analysis: ANOVA, ANCOVA, Regression Reference: Description: The data is a study conducted to determine whether the risk of nosocomial (hospital-acquired) infection is affected by other hospital characteristics such as: type of hospital (public or private), average number of patients at the hospital, average age of the patients, average number of beds, and average number of nurses on staff. Graphic Analysis: scatter plots; histograms; box plots SAS Product: Enterprise Guide; SAS/STAT Size: 113 rows, 7 columns ----------------------------------------------------------------------Name: oranges Analysis: ANCOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: The data are from a study of the relationship between the price of oranges and sales per customer. The hypothesis is that sales vary as a function of price differences for differe nt stores (STORE) and days of the week (DAY). The price is varied daily for two varieties of oranges. The vari ables P1 and P2 denote the prices for the two varieties, respectively. The variables Q1 and Q2 are the sales per c ustomer of the corresponding varieties. Q1 and Q2 are used as the dependent variables, with STORE, DAY, P1, a nd P2 as the independent variables. Graphic Analysis: box plots, histograms, means plots, scatter plots SAS Product: Enterprise Guide; SAS/STAT Size: 36 rows, 6 columns ----------------------------------------------------------------------Name: oysters Analysis: ANCOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Four bags with 10 oysters in each bag are randomly placed at each of five stations in the cooling water canal of a power-generating plant. Each location, or station, is considered a tr

eatment and is represented by the varialbe TRT in the data set. Each bag is considered to be one experimental unit . Two stations are located in the intake canal, and two stations are located in the discharge canal, one at the to p and the other at the bottom of each location. A single mid-depth station is located in a shallow portion of the bay near the power plant. The treatments are coded 1 through 5 in the data set as follows: (1) intake-bottom, (2) intake-surface, (3) discharge-bottom, (4) discharge-surface, and (5) bay. The purpose of the experiment is to determine if exposure to water heated artifi cially affects growth and if the position in the water column (surface or bottom) affects growth. Stations in the intake canal act as controls for those in the discharge canal, which has a higher temperature. The station in the bay is an overall control in case some factor other than the heat difference dure to water depth or location is re sponsible for an observed change in growth rate. The oysters are cleaned and measured at the beginning of the experiment (INITIAL ) and again about one month later (FINAL). These two weights are recorded for each bag. Graphic Analysis: box plots, histograms, means plots, scatter plots SAS Product: Enterprise Guide; SAS/STAT Size: 20 rows, 4 columns ----------------------------------------------------------------------Name: o_ring Analysis: Logistic Regression, Probit regression Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Data documenting the presence or absence of primary O-ring thermal distress in the 23 shuttle launches preceding the Challenger mission were collected. The focus of this data is to determine if there is a relationship between the temperature at launch time and o-ring thermal distress. The variables in the data set are the flight number (FLT), the temperature at launch (TEMP), and an indicator vari able for whether or not there was thermal distress during the launch (TD) (0=no distress, 1=distress). This data exists in an alternative form in the data set Challenger. In that case , the variables in the data set are temperature at launch (TEMP), the number of launches in which a thermal distress occured for that temperature (TD), the total number of launches at that temperature (TOTAL), and the number of laun ches in which thermal distress did not occur at that temperature (NO_TD). Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT Size: 23 rows, 3 columns ----------------------------------------------------------------------Name: peppers Analysis: Descriptive statistics, confidence interval, one-sample t-test Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: An engineer wants to design a mechanical harvester for bell pepper s. He measured and recorded the angle at which peppers hang on the plant. The purpose of the analysis to determi ne construct a 95% confidence interval for the mean and to determine if the average angle is equal to zero. Graphic Analysis: box plot, histogram SAS Product: Enterprise Guide; SAS/STAT Size: 28 rows, 1 columns ----------------------------------------------------------------------Name: pressure Analysis: Paired-sample t-test; one-sample t-test; regression; ANCOVA Reference: Generated for SAS Course Notes Description: Consider an experiment to examine the effectiveness of a medication in reducing blood pressure. A random sample of individuals with high blood pressure is taken and their diastolic pressure is recorded. The individuals are then placed on medication and one month later their diastolic blood pressure is once again recorded. The dataset contains the following variables: subject, age, baseline blood pressure, and new blood pressure. In addition there is a column in the data set that includes the type of drug taken (new, approved, or placebo). If this information is included, an analysis of covariance can be done. Graphic Analysis: histograms, box plots, scatter plots SAS Product: Enterprise Guide; SAS/STAT Size: 93 rows, 4 columns ----------------------------------------------------------------------Name: pulse Analysis: Descriptive statistics, confidence interval, paired-sample t-test Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: A drug is administered to animals and pulse rates before (PRE) and

after (POST) the drug are recorded. An additional column, D, is included in the data set, which is the difference be tween PRE and POST. The purpose of the experiment is to determine if the drug changes the pulse rates of the animal s. Graphic Analysis: box plot, histogram SAS Product: Enterprise Guide; SAS/STAT Size: 15 rows, 3 columns ----------------------------------------------------------------------Name: rats Analysis: ANOVA, MANOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Weight gains in rats given a special diet were measured at one (GA IN1), two (GAIN2), three (GAIN3), and four (GAIN4) weeks after beginning the administration of the diet. The question of interest is whether the rats' weight gains stayed constant over the course of th e experiment. In other words, were the average weight gains the same at each of the four weeks? An inte rcept only MANOVA model can be used to answer this question, or the data can be stacked and an ANOVA use d. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 10 rows, 4 columns See Index_Sorted for SAS products and analysis type per sample dataset. ----------------------------------------------------------------------Name: Sales Analysis: Logistic Regression Reference: Generated for use in SAS course notes Description: A mail-order company wants to identify those customers to whom their advertising efforts should be directed. They have decided that customers who spend 100 dollars or more are their target group. They have collected data on their customers such as purchase level (1 = at least $100; 0 = less than $100), gender, income level (Low, Medium, or High), and age. Graphic Analysis: bar chart, histogram SAS Product: Enterprise Guide; SAS/STAT Size: 431 rows, 4 columns -----------------------------------------------------------------------

Name: seizure Analysis: Poisson Regression with Repeated Measures Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: This data is from a study evaluating a new treatment for epilepsy. The variable ID identifies each patient in the study. The treatements are TRT=0, a placebo, and TRT=1, an anti-e pileptic drug. The response variable is the number of seizures over a two-week interval. For the eight weeks prior to placing the participants on treatment, the number of seizures was counted for each patient i n order to form a baseline measurement (BASE). The patients' ages (AGE) in years are also included in the d ata set. The number of seizures (Y) was recorded for each of four two-week time intervals after being placed on the treatment. The variable TIME indicates which two week time period is recorded on that row (1=first two weeks, 2=second two weeks, etc.) Two additional variables appear in the data set that might be used during the an alysis. One is the log of age (LOG_AGE) and the other is the log of (BASE/4). This data set is the same data that appears in the data set LEPPIK, but has been reorganized for analysis. Graphic Analysis: scatter plot SAS Product: Enterprise Guide; SAS/STAT Size: 236 rows, 8 columns ----------------------------------------------------------------------Name: teachers Analysis: ANOVA, MANOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: Student exam scores are collected, where each student is taught by one of three teachers. The purpose of the analysis is to compare the average scores for each of the thr ee teachers. The teachers can be compared using either the score on the first exam (SCORE1) or the score on th e second exam (SCORE2). A multivariate ANOVA (MANOVA) can be used to determine if there is a difference between teachers when considering both SCORE1 and SCORE2 simultaneously. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT

Size: 30 rows, 3 columns ----------------------------------------------------------------------Name: Teaching Analysis: Repeated Measures ANOVA Reference: Generated for StatView Reference, 2nd edition. 1998. SAS Institute I nc. Description: A study was conducted effectiveness of several techniques Subjects were divided randomly into no training in the use of the mask, in an industrial setting to test the for teaching the use of a respirator mask. three groups: a control group that received a group that received a detailed instruction

sheet, and a third group that attended a thirty minute class. The effectiveness of the mask was measured for each of the subjects before training and also one and two weeks after training. The purpose of the study is to determine whether, on average, there is any difference in effectiveness among the three teaching techniques. Graphic Analysis: box plot, histogram, scatter plot SAS Product: SAS/STAT Size: 28 rows, 4 columns ----------------------------------------------------------------------Name: Totarrests Analysis: Descriptive statistics, time series analysis, paired sample t-test, AN OVA Reference: U.S. Department of Justice, Office of Justice Programs, Bureau of Ju stice Statistics http://www.ojp.usdoj.gov/bjs/dtdata.htm Description: The data set is a record of the total number of arrests in the Unit ed States from 1970 through 1999. The variables in the data set are: year (YEAR), total number of arrests (TOTALARRESTS), total number of arrests by age group (AGE1 through AGE5), arrests per 100 thousa nd in population, total (ARRESTRATE) and by age group (AGE1RATE through AGE5RATE), total populatio n of the U.S. on July 1 of the given year (POPULATION), and total population for each age group (AGE1P OP through AGE5pop). The data set could be used to generate descriptive statistics for the total popu lation and by age group, or to do a time series analysis and predict total number of arrests for the populat ion as a whole or by age group. These predictions might be used in assessing the need for judicial sy stem infrastructure changes. Finally, the data could be used to compare age groups with an ANOVA, bu t the data would have to be reorganized to conduct such an analysis using the stack function in Enterpris e Guide or a data step.

The data set arrestrates has the arrest rates for the 5 age groups organized to do an analysis of variance. Graphic Analysis: scatter plots SAS Product: Enterprise Guide; SAS/STAT; SAS/ETS Size: 30 rows, 19 columns ----------------------------------------------------------------------Name: Tree Analysis: Regression, ANCOVA, Polynomial Regression, Nonlinear Regression Reference: StatView Reference, 2nd edition. 1998. SAS Institute Inc. Description: In the 1930's, the weights and trunk girths were measured for eight specimens from each of thirteen rootstocks, for a total of 104 tree specimens. The purpose is to determine if the girth and/or rootstock of the trees are useful in predicting the weight of trees. This would make it possible to get accurate estimates of weight without having to cut trees down and weigh them, a destructive and difficult process. Graphic Analysis: box plot, scatter plot, histogram SAS Product: Enterprise Guide; SAS/STAT Size: 104 rows, 3 columns ----------------------------------------------------------------------Name: tubeangle Analysis: quality control: xbar, r, and s charts, capability analysis Reference: Generated for StatView Reference, 2nd edition. 1998. SAS Institute I nc. Description: You are in charge of the quality control effort at a bicycle manufacturer that specializes in limited production frames. The most popular model your company produces is a day touring model called the "Arribe!", which is a racing-style frame for weekend warriors. The seat tube angle of a bicycle frame can dramatically affect the finished bicycle's handling characteristics. This is the angle formed by the intersection of the tube that holds the seat post with the top horizontal frame tube. A small seat tube angle endows the frame with forgiving handling characteristics. Weekend warriors want frames that are responsive and quick; they prefer frames with steep seat tube angles. The "Arribe!" is manufactured with these specifications in mind. The purpose of this analysis is to determine if the manufacturing process is in control. The target angle is 74 degrees, with specification limits of 73.7 and 74.3 degrees. Graphic Analysis: quality control charts SAS Product: Enterprise Guide; SAS/QC Size: 100 rows, 2 columns

----------------------------------------------------------------------Name: TubeDefects Analysis: quality control, p and np charts Reference: Generated for StatView Reference, 2nd edition. 1998. SAS Institute In c. Description: Often times, it is more cost-effective to simply evaluate whether an item is defective or not. This data is recorded from frame tubes prior to assembly. Frame tubes need to be meticulously filed, mitered and sanded before they are joined into a complete frame. The tube ends are then inspected to assure that they fit together properly. Rather than base your analyses on each of the measures that affect whether tubes fit together, you will analyze a single characteristic, specifically whether each individual tube is defective or not. Graphic Analysis: histogram, box plot, scatter plot SAS Product: Enterprise Guide; SAS/QC Size: 960 rows, 2 columns ----------------------------------------------------------------------Name: Turnips Analysis: ANOVA with two blocks, a latin square design Reference: Steel, R. G. D., and Torrie, J.H. 1980. Principles and Procedures of Statistics: a Biometrical Approach. McGraw-Hill, New York. Description: To determine whether the moisture content of turnip green leaves is affected by time in storage, researchers classified the leaves of five turnip plants into five size groups, subjected these leaves to one of five lengths of storage time according to a specific pattern, and finally measured the moisture content of each leaf. Plant and leaf size are the blocking factors for this experiment. Graphic Analysis: box plot, histogram SAS Product: Enterprise Guide; SAS/STAT Size: 25 rows, 4 columns ----------------------------------------------------------------------Name: type_dose Analysis: ANOVA Reference: Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SA S for Linear Models, 4th Edition. Cary, N.C.: SAS Institute Inc. Description: This data set contains data from an experiment designed to compare the response to increasing dosage for two types of drugs. There were three levels of the actual dosage (DOSE). The

data were analyzed using 10 log of dose (LOGDOSE). block design, where BLOCK blocking factor. Y is the

the base The experiemnt was conducted as a randomized complete is the response variable.

If the blocking factor is treated as a random effect this becomes a mixed model. Graphic Analysis: box plots, histograms, means plots SAS Product: Enterprise Guide; SAS/STAT Size: 24 rows, 6 columns ----------------------------------------------------------------------Name: Ulcers Analysis: nonparametric ANOVA, transformation of response for ANOVA Reference: Description: Consider an experiment to investigate the content of gastric juices of patients. The goal of the experiment is to determine the average lysozyme. (Lysozyme is an enzyme that can destroy the cell walls of some kinds of bacteria.) Graphic Analysis: histogram, box plot, normal probability plot SAS Product: Enterprise Guide; SAS/STAT Size: 60 rows, 2 columns ----------------------------------------------------------------------Name: veneer Analysis: nonparametric ANOVA Reference: Generated for SAS training course Description: Consider an experiment to investigate the durability of three brands of synthetic wood veneer. This type of veneer is often used in office furniture and on kitchen countertops. To determine durability, samples of each of the three brands were subjected to a friction test. The amount of veneer material that is worn away due to friction is measured. The resulting wear measurement is recorded for each sample. Brands that have small measurements are desireable. Graphic Analysis: box plot, histogram SAS Product: Enterprise Guide; SAS/STAT Size: 30 rows, 2 columns ----------------------------------------------------------------------Name: WCGS (Western Collaborative Group Study) Analysis: Survival Analysis Reference: Rosenman, R.H., Brand, R.J., Jenkins, C.D. et al. 1975 Coronary

heart disease in the Western Collaborative Group study. Journal of the American Medical Association. 223: 872-877. Description: The data are from a prospective study of the occurrence of coronary events-usually heart attacks. Covariates that may influence the risk of a coronary event include smoking, blood pressure history, and cholesterol level. These data are from a group of 3,154 male employees from ten California companies during 1960-1961. The original purpose of the study was to investigate the effects of behavior type and smoking habits on heart disease. After the recruitment, the study followed participants for nine years, although a few were lost to follow-up before the end of the study. The time variable of interest was the interval from entry into the study until the appearance, as determined by a medical expert, of coronary heart disease. The dataset contains event time and censor variables for 614 participants, as well as measurements of two covariates of interest: smoking behavior at study entry and behavior type. Individuals were classified into behavior types on the basis of an interview; in general terms, Type A behavior is characterized by aggressiveness and competitiveness, whereas Type B behavior is considered more relaxed and noncompetitive. In this subsample, events were observed in 60 individuals. Graphic Analysis: bar charts, mosaic plots SAS Product: Enterprise Guide; SAS/STAT Size: 614 rows, 4 columns ----------------------------------------------------------------------Name: westernrates Analysis: correlation Reference: Places Rated Almanac by Roger Boyer and David Savageau, Rand McNally. Description: Different western cities are rated by nine criteria. For all but two of the variables, the higher the score, the better. For Housing and Crime, the lower the score, the better. You want to determine whether there is a linear correlation between any two of the criteria. Graphic Analysis: scatter plots, histogram SAS Product: Enterprise Guide Size: 52 rows, 11 columns ----------------------------------------------------------------------Name: Wine Tasting Analysis: nonparametric ANOVA; measures of agreement. Reference: Collected for StatView Reference, 2nd edition. 1998. SAS Institute I nc. Description: In this experiment fifteen people rated six red wines. Each wine was rated using criteria commonly used to judge wine quality. The totals for each judge and wine were calculated. You will determine whether there is a difference in the quality of the wines as determined by the judges.

This data could also be used to rank the wines and then determine if there is agreement among the judges in terms of the ranks of the wines. Some data manipulation will be necessary to conduct either of these analyses. Graphic Analysis: scatterplot, histogram SAS Product: Enterprise Guide; SAS/STAT Size: 15 rows, 7 columns ----------------------------------------------------------------------Name: writing Analysis: ANCOVA Reference: Generated for StatView Reference, 2nd edition. 1998. SAS Institute I nc. Description: A university English department wants to know whether its firstyear composition course is as effective for history and math majors as it is for English majors. We could do a simple analysis of variance with final class scores as the dependent variable and major as the factor. However, students could have differing verbal abilities, and we must control for that by including their Verbal SAT scores as a covariate. The main question is whether the course is equally effective for students of different majors. Secondly, we want to estimate the average class score for students in each major. Finally, we want to know whether SAT scores are effective for controlling for variability among individual students. Graphic Analysis: box plot, histogram, SAS Product: Enterprise Guide; SAS/STAT Size: 19 rows, 3 columns

Вам также может понравиться