Вы находитесь на странице: 1из 2

SAS HW1 SAS Data Manipulation

Individual assignment. Submit your sas file AND the SAS output to e-learnings assignment drop
box. Due time: 09/14/2015.
Through this SAS homework, you are supposed to learn some SAS basic skills on data
manipulation. You will use the dataset regdiet.xls for the first SAS assignment. The dataset is
available at elearning. It consists of two worksheets: one for training and the other for testing.
Data documentation.
Tmm: month (01-jan, 12-dec): can be treated as a continuous variable
Twknd: 1 implies weekend, 0 for weekday
acnmod: binary variable (1484 for diet, 1553 for regular)
Pssex: Primary shopper's sex. 0 unknown, 1 male, 2 female
Psage: Primary shopper's age (1-9)
Coupuse: binary, 1 implies coupons were used in the transaction
Cad: Indicates the type of deals available (0-5), 0 for no ads in store, 1 for
ads in store etc.
hhsize: actual number of members in the household
hhcomp: household composition
1 - married
2 - female head living with others related
3 - male head living with others related
5 - Female living alone
6 - Female living with non-related
7 - male living alone
8 - male living with non-related
Child: indicates the self-reported number of children in the household (which
may not be very accurate).
income: 03-27 such that 03 is low and 27 is high.
Specifically,
1. Write a SAS program to read the training datatset in Excel, then view the contents (proc
Contents) and print the first 10 observations. Try Proc Import first. Then try to use the infile
option to read the data. Save the excel file as a space or tab delimited text file and then read the
data. Tab is coded as 09x, try dlm=09x to indicate a Tab delimited file.
2. Create a new variable to convert acnmod into a binary variable that takes only 0 or 1 as values.
Name the new variable as diet which indicates the diet coke: if acnmod =1484, let diet =1,
otherwise 0.

3. Use where to achieve the same as the previous question. Feel free to use multiple data
steps if you cant get it done in on data step.
4. For those who purchased diet coke, compute the total number of coupon users. Try Retain and
Sum.
5. For each income level, compute the rate of coupon usage (total number of coupon users/ total of
users). Using an Array to store the value for each level of income.
Hint: the result should look like the following table, and you are supposed to find a way to use
Array in SAS to produce the Table.
Income_L Coupon_U Total_us Usage_R
evel
sers
ers
ate
4
0
3
0
6
0
79
0.076

6. Create a new dataset that keeps only the observations with fewer or equal to 2 children. Then sort
the new data on hhsize and income and output the result to a permanent Tab-delimited text file.
Keep only four variables: UserID, hhsize, income and coupuse. Try to use libname to create a
library and use file to output variables to a file to the library. Print out the first 10 observations
to the final data.
7. Run Proc means over the final data created in the previous step and report the results.
8. Generate a one-way and two way table using Proc Freq. Feel free to use any variables you are
interested.
9. Now read in the testing dataset from regdiet.xls and combine it with the training dataset you
were using for the previous 7 steps. Try to interleave the two datasets by month (tmm).
10. Create an example of your own and practice with match merge, update and in clauses.
You need to first create two (small) datasets by yourself.

Вам также может понравиться