Вы находитесь на странице: 1из 6

SAS Programming by Example [3]

Chapter 3 SET, MERGES, and UPDATE Reading and Combining SAS Data Sets
. A subset of an old one . A combination of multiple existing SAS data sets . A new version (update) of an existing data set Example 1 Sub setting a SAS Data Set: Selecting Observations That Meet Certain Conditions Features: SET, IN, LIBNAME, sub setting IF, and WHERE Statements, DROP and KEEP Statements and Options LIBNAME MARY 'C: \EMPLOYEE\JOBDATA'; DATA NJEMPLOY; SET MARY.EMPLOY; IF STATE EQ 'NJ'; RUN; Note: LIBNAME creates a library reference to refer to an external SAS data library where you store SAS data sets. IF is called a sub setting statement, alternative WHERE = IF NOT condition THEN DELETE LIBNAME MARY 'C: \EMPLOYEE\JOBDATA'; DATA NJEMPLOY; SET MARY.EMPLOY; WHERE STATE EQ 'NJ'; RUN; Note: Difference between WHERE and IF 1. The Where statement may be more efficient than the Sub setting IF, especially if you are taking a very small Subset from a large file. 2. The WHERE statement can only be used with variables in the Existing data set, whereas a sub setting IF statement can Be used with raw data as well. 3. When the WHERE condition is not true, the observation is Not brought into the PDV, and therefore it does not affect The logical values of the FIRST. And LAST. Variables. 4. You may include WHERE statement in SAS procedures. Note: Some useful WHERE operators and their actions

BETWEEN - AND Selects observations which fall within a range Ex. WHERE AGE BETWEEN 20 AND 40; CONTAINS Used for character variables only, selects records Or that includes or contains the specified string ? Ex: WHERE NAME CONTAINS 'Mc'; WHERE NAME? 'Mc'; IS MISSING Select observations for which the value of the variable Or is missing. It works with both numeric and character IS NULL variable Ex: WHERE AGE IS MISSING; WHERE NAME IS NULL; LIKE Allows you to select observations based on patterns Using the percent sign (%) and underscore (_) Wildcard operators. % is a variable length wildcard _ Wildcard operator is a pattern match for one Character only Ex: WHERE NAME LIKE 'BOY%'; WHERE NAME LIKE 'AB__'; WHERE NAME LIKE 'A__%'; =* Attempts a phonetic match based on a Soundex algorithm. It is useful for "fuzzy matches". Ex: WHERE NAME =* 'CODY'; Given the names: CODY, Cody, KODY, etc KEEP and DROP Statements LIBNAME MARY 'C: \EMPLOYEE\JOBDATA'; DATA NJEMPLOY; SET MARY.EMPLOY; WHERE STATE EQ 'NJ'; KEEP ID DEPT; RUN; Note: DROP GENDER SALARY YEARS STATE; KEEP= and DROP= Data Set Options LIBNAME MARY 'C: \EMPLOYEE\JOBDATA'; DATA NJEMPLOY; SET MARY.EMPLOY (KEEP=ID DEPT STATE); IF STATE EQ 'NJ'; DROP STATE; RUN; Note: WHERE STATE EQ 'NJ' OR STATE EQ 'NY' OR STATE EQ 'FL' OR STATE EQ 'CA';

WHERE STATE IN ('NJ', 'NY', 'FL', 'AND CA); Example 2 Combining SAS Data Sets by Adding Observations Feature: SET Statement . DATA SET SURVEY1 ID GENDER HEIGHT WEIGHT YEAR . DATA SET SURVEY2 ID GENDER HEIGHT WEIGHT YEAR . DATA SET SURVEY3 ID GENDER HEIGHT YEAR IQ LIBNAME GEORGE 'C: \DATA\SURVEYS'; DATA GEORGE.SURV1_2; SET GEORGE.SURVEY1 GEORGE.SURVEY2; RUN; LIBNAME GEORGE 'C: \DATA\SURVEYS'; DATA GEORGE.SURV1_3; SET GEORGE.SURVEY1 GEORGE.SURVEY3; RUN; Example 3 Combining SAS Data Sets by Adding Variables Feature: MERGE Statement . Data set LEFT . Date set RIGHT ID HEIGHT WEIGHT GENDER RACE DATA AMBIDEX; MERGE LEFT RIGHT; RUN; Example 4 Adding Variables from One Data Set to Another Based on an Identifying Variable Features: MERGE and BY Statements, SORT Procedure . Data set DEMOG . Data set EMPLOYEE . Data set DEMOG2 ID GENDER STATE ID DEPT SALARY ID GENDER STATE 1M NJ 1 SAL 30,000 1M NJ 5F NJ 2 SAL 30,000 5F NJ 2F NJ 5 SAL 30,000 2F NJ 3M NJ Note: Both data sets are sorted by ID then match-merge them: PROC SORT DATA=DEMOG;

BY ID; RUN; PROC SORT DATA=EMPLOYEE; BY ID; RUN; DATA COMBINED; MERGE DEMOG EMPLOYEE; BY ID; RUN; Example 5 Controlling Which Observations are added to the Merged Data Set Features: MERGE, Sub setting IF, BY Statement, IN=Data Set Option Note: After Sorted both Data Sets, If EMPLOYEE2 has data to contribute, then EMP=1 else EMP=0 DATA BOTH; MERGE DEMOG2 EMPLOYEE (IN=EMP); BY ID; IF EMP=1; RUN; DATA BOTH; MERGE DEMOG2 (IN=DEM) EMPLOYEE (IN=EMP); BY ID; IF DEM=1 AND EMP=1; RUN; Example 6 Creating More Than One Data Set at a Time Feature: MERGE and OUTPUT Statement, IN=, DATA=, and RENAME= Data Set Options Note: After Sorted both Data Sets DATA ACTIVE INACTIVE (KEEP=ID GENDER STATE); MERGE DEMOG2 EMPLOYEE (IN=ACT); BY ID; IF ACT=1 THEN OUTPUT ACTIVE; ELSE OUTPUT INACTIVE; RUN; . What if the BY Variable Has a Different Name in Each Data Set? Ex: Let's say you have a variable called EMP_ID in data set ONE and A variable called EMP_NUM in data set TWO. Assume that the two Data sets are already sorted. DATA COMBINED;

MERGE ONE TWO (RENAME= (EMP_NUM=EMP_ID)); BY EMP_ID; RUN; . Merging Data Sets That Contain Variables with the Same Name, Not Used as BY variables (Later) Example 7 Performing "Fuzzy" Merges Features: "Fuzzy" Merges, SOUNDEX Function, Merging with Two BY Variables, IN= Option . Data set ONE . Data set TWO NAME DOB HEIGHT NAME DOB WEIGHT CODY 10/21/46 68 MCKLEARY 9/01/55 200 CLARK 5/01/40 70 COTY 10/21/46 152 CLARKE 5/10/45 72 CLARK 7/02/60 160 ALBERT 10/01/46 69 ALBIRT 10/01/46 200 CLARKE 5/01/40 210 DATA ONE_TEMP; SET ONE (RENEME= (NAME=NAME_ONE)); S_NAME=SOUNDEX (NAME_ONE); RUN; DATA TWO_TEMP; SET TWO (RENEME= (NAME=NAME_TWO)); S_NAME=SOUNDEX (NAME_TWO); RUN; PROC SORT DATA=ONE_TEMP; BY S_NAME DOB; RUN; PROC SORT DATA=TWO_TEMP; BY S_NAME DOB; RUN; PROC PRINT DATA=ONE_TEMP NOOBS; RUN; DATA BOTH; MERGE ONE_TEMP (IN=ONE) TWO_TEMP (IN=TWO); BY S_NAME DOB; IF ONE=1 AND TWO=1; FORMAT DOB MMDDYY8; RUN;

Example 8 Updating a master File from a Transaction File Feature: UPDATE Statement . Data set MASTER . Date set TRANS ID DEPT SALARY ID DEPT SALARY GENDER 1 PARTS 13,000 2 22,000 M 2 PERSON 21,000 3 SALES 24,000 F 3 PARTS 15,000 5 RECORDS M 4 EXEC 55,000 9 10,000 F 5 18,000 DATA NEWMAST; UPDATE MASTER TRANS; BY ID; RUN;

Вам также может понравиться