Вы находитесь на странице: 1из 10

Combine SAS data sets: -->one to one reading- set statement Data a; set b;(may be some cond);set c;run;

If both the data sets has same var then the value from last dataset overwrite the value that were from the first dataset. The number of obs in the new data set is the no of obs in the smallest data set. -->concatenating- set statement Data a; set b c;run; Will list all the var from both the datasets -->interleaving- set and by statement Data a; set b c;by varname;run; -->match merging-match and by statement Data a;merge b c;by varname;run; Data set b and c must be ordered by the varname -->Do loops: Do indexvar=start to stop by increment; Sas statements; End;run; To create an obs for each iteration of do loop, place an o/p statement inside the loop. The above syntex produce only one obs in o/p. Whereas the below one wilDo indexvar=start to stop by increment; Sas statements; End;run; l generate one for each iteration. Do indexvar=start to stop by increment; Sas statements;output; End;run; The by value can be - if u want to decremnt the loop. Startvalue>stopvalue Series of items: Do indexvar=value1,value2,value3,...; Sas statements; End;run; Value can be no or char-in ' '- 'a','b','c'; Value can be var names, don't put ' ' if var names. -->do...end; Iterative do statement:executes statement bet do and end statement repetitively based on the value of index var. Do until statement: executes the statements in a do loop repetitively until a condition is true-when the cond is true the do loop is not executed againchecking the cond after each iteration of do loop-at the bottom so it always executes once. Do while statement:executes the statements in a do loop repetitively while a condition is true-if the cond is false the first time it is evaluated then the do loop never executes-,checking the cond before each iteration of do loop-at the start on top.

-->Array: Array array-name{dimension} elements; Exa: Data a; Set b; Array wkday{7} mon tue wed thu fri. sat sun; Do i=1 to 7; Wkday{i} = 5*(wkday{i}-32)/9; End;run; Elements: Elements lists the variables to include in the array. They can be all numeric or all char. If not specified SAS will create new elements with default name. To create temp elements use keyword _temporary_. Can be range of var listvar1-varn, _all_, _numeric_ , _character_, can be just names- mon tue wed Dimension: one dimension: a number-4,range-96:99, *-by counting number of elements, (),{}, [], Dim function: to return the no of elements in the array. Dim(array-name) Array wt{*}wei1-wei6;do i=1 to dim(wt); Evenif u change the dimension of array noneed to re-specify the stop value. To create new char var from array use $ sign after the array dimension, if numeric then nthng required Assigning initial values to array element: Array wt{*}wei1-wei6 (10 20 3 20 20 2); If the initial values are char then put $ after array dimension also enclose each value in ' '. Seperate them by comma or blank, enclose them alltogether in (). Also u can assign initial values without specifying array element -->Two dimensional array: Array new {3-r, 4-c} x1-x12; -->The PAGESIZE= SAS system option controls how many lines each page of output contains Linesize=, Number|nonumber, date|nodate, yearcutoff= default is 1920 Page number: pageno To modify system options, option statement. Option Procedure to display the values of one or more SAS system options. Proc options option=yearcutoff; run; Proc print: data portion Proc contents: descriptor portion, CONTENTS procedure to create SAS output that describes the contents of a library. _ALL_ requests a listing of all files in the library, and NODS suppresses the printing of detailed information about each file in the output. Proc contents data=datasetname /libref._all_ nods; Proc datasets; contents data= datasetname /libref._all_ nods; quit; Proc fslist: reads raw data file into sas

Note: always read question throughly to check what is being asked- how many observations or how many variables.don't get confused with variables and observations. VARNUM : to print the list of variables in the order they were created. PAD : when using column or formatted input to read fixed fields data in variable length records, the PAD option in infile statement specifies that SAS pad variable length records with blanks. DLM- two consecutive delimiters are considered as one if DSD option is not specified. If you have any other delimter other than , then specify that in input statement dlm='*' with dsd option to read msng values correctly. DSD - reads missing values and assign them to variable, if not specified with dlm option then makes , as default delimiter and reads msng value correctly If you use DSD option in file statement to write data into raw data file, it encloses data values containing comma into " ", also then u'll not require to have DLM option and same for infile-while reading raw data. The LIBNAME command opens a LIBNAME window Point = variable, where variable is temporary numeric var that contains obs number. Exa: Data a;obsnum=5;set b point=obsnum; output;stop;run; Point= and end= can't be used together as point never reaches to eof -->SAS retains the values of var that were read from dataset with set statement,or that were created using sum statement. All other var values are set to missing.At the beginning of first iteration it is set to missing but from second iteration it retains the values as above. Where as if it is reading the raw data it sets the value of each var in the data step to missing at the beginning of each iteration. But it retains the values from retain statement,sum statement, _temporary_ array, var created by using options in the file or infile statement, automatic var List input: One or more blank spaces or other delimiters on a record must separate values of variables to be input. A delimiter is a defined marker that separates the value for one variable from another. A blank space is a commonly used delimiter. Other commonly used delimiters are commas or tabs. List input must be used when the values to be read are separated by blanks or other delimiters, but the columns vary from line to line. Modified list i/p: In the list input if the character variable has embedded blanks then use &$ instead of just $. Also make sure that the data variables are atleast two spaces apart. When using informat with list input, the colonformat modifier (:) is required to associate the informat with variable name. If the numeric or char var has length less than 8 then no need to specify. The informat $10. doesn't mean that sas needs to read 10columns. MISSOVER: in infile statement to read missing values at the end of line. Formatted input: unlike column i/p style it can read both standard and nonstandard numeric data. @n and +n are column pointer controls. Always put . after an informat in the input statement while reading data. +(n) to move column pointer in backward direction. You must specify w to

indicate the number of columns to b read. The informat $10. mean that sas needs to read 10columns. Column input: can be used when no spaces or other delimiters are used between values, or when numeric data are recorded without explicit inclusion of a decimal point, and values after the decimal point occur. When this occurs the number of digits that should be placed after the decimal point can be specified immediately following the column specification. The ampersand (&) isnt necessary for an embedded blank in the address field when column input is used because the columns, including the Space, are specified. You can read the column values in any order. No placeholder is required for missing data, it reads it correctly as column numbers are provided, fields can be re-read. An @ symbol is used to indicate the beginning column for reading a variable, followed by the variable name, with the number of columns and format for the variable indicated immediately after the name. This type of input statement is also used when reading data with a particular or unusual format. The most common instance is with reading date values. Multiple lines per record : (raw data multilines sas data single obs): Another option for reading from multiple lines per record is to use a slash (/) in the input statement to indicate that variables following the slash are to be read from the next line. / is line pointer control DATA NEW1; INPUT @1 ID 5. @7 HT 3.1 @10 WT 3. / @1 LNAME & $10. FNAME & $ @40 STNO 4. @45 STNAME $10.; CARDS; 23901 684145 Jovanovic Mary 69 North St. 45392 735199 Mc Alligator John-Paul 1239 Smith Ave. 38389 770201 Xzavior-McCullagh Nancy 37 Northwestern Ave. ; RUN; Data for multiple records may be recorded on the same line. (Raw data single line SAS data multi obs)To read such data, the current line read by the INPUT statement is held by using the trailing @@ symbol. It should not be used with missover nor with @ pointer control,with column input DATA NEW1; INPUT SID AGE 2. PULSE 2. EDUC 2. @@; CARDS;

01 221604 02 242216 03 332112 04 594007 ; Run; Reading Varying Numbers of Lines per Record : trailing @ holds the current line of data until a subsequent input statement has been given.(Raw data single line sas data multi obs, but first column from raw data same and creates multi obs for sas data) it is possible that all records in raw data are not of same length, in that case use missover option Data a; Infile b; Input Id$@; Do i= 1 to 3; Input sales : comma. @; Output; Run; File b: 0734 1,323.34 2,472.85 6,539.50 0658 6,985.54 2,528.45 3,5569.25 # is called line pointer. To indicate various lines from raw data. variety of missing value designations: IF VAR1=7 THEN VAR1=.R; ELSE IF VAR1=8 THEN VAR1=.N; ELSE IF VAR1=9 THEN VAR1=.M; The SAS special missing value R is assigned to refusals, originally entered as 7 The SAS special missing value N is assigned to the not applicable , originally entered as 8 The SAS special missing value M is assigned to the missing values, originally entered as 9 SAS orders missing value types. Possible alternatives for the coding of missing numeric values in SAS; from smallest to largest are: _ . A B C and so forth Z note: SAS treats the missing value _ as the smallest and .z the largest. The MISSING Statement: Sometimes missing numeric data will be provided to you as letters, rather than as periods or blanks. Use the MISSING Statement to manage missing numeric data that has been entered using a letter. In particular, take care to place a MISSING statement before an input statement so that SAS will read these as missing values rather than as invalid numeric data. DATA TEMP; MISSING R N; The INVALIDDATA option is a great device! It allows you to detect invalid data and provides you with a means of distinguishing it from actual missing data.

INVALIDDATA appears on the OPTIONS statement, not as part of the particular data step. Following is an example. OPTIONS INVALIDDATA = X; CNTLOUT writes or saves a SAS format data file CNTLIN reads in a SAS format data file that has been saved previously. If neither is used, the formats created are only available for use during the SAS session. Use a PUT statement to write information to the location indicated in a preceding FILE statement. Note A FILE statement does not require a companion libname statement. This is because a FILE statement indicates a full path address already If no FILE statement precedes, the PUT statement will write the information to the SAS log. Thus, the PUT statement with no preceding FILE statement is the way to write information to the SAS log. To write the data to the log as it is read in, use the statement: PUT _INFILE_; after the INPUT statement. numeric to char: when SAS converts numeric to character, sas writes the numeric value with the BEST12. and the resulting char value will be right aligned. Also if the original numeric value has fewer than 12digits than the resulting char value will have leading blanks. Concate, scan, substr,trim substr: when the starting position is known this function is used on the right side of assignment statement to extract the data. The variable must be character, if numeric SAS will convert it to character and will be right aligned with BEST12. Format A=substr(variable name, start position,length) length is default 1 The default o/p length is same as the source string no matter how many char did you cut frm original string When it is used on the left side of the statement it inserts the value. Substr(variable,startposition,length) =' value' - always in ' ' marks Scan: The SCAN function returns the nth word of a character value. It is used to extract words from a character value when the relative order of word is known, but their starting positions are not,meaning start pos varies. Also words are marked by delimiters. Can specify more than one delimiter A = scan(variable name, n-order of word,delimiters) Two consecutive delimeters will be considered as one and the leading delimiters will have no effect. Default 200 INTCK('interval',from,to): from and to are date constants- ' 'd and interval is char constant or variable-month,day, week, hour,minute,second,weekday For exa: Intck('week','31dec2000'd,01jan2001'd); = 0, no week is crossed-sun-mon, but if the to value is '07jan2001'd then the value is 1,though for '06jan2001'd the value is 0 as the week value sun is not crossed. Same for all, the interval value must be crossed.

Datdif,yrdif: (start,end-sas date value, basis), '30/360','act/act','act/360', 'act/365'- all valid in yrdif, 2 valid in dat Trim: removes trailing blanks from character value. Trim(argument) or trim(left(argument)) Catx: to concatenate character strings, remove leading and trailing blanks, and insert separators. Catx is combine use of trim and left Catx(seperator or delimiter, string1, ..., stringN) Find: find(variable,'string', modifier (i-case insensitive or t-removes trailing blanks), startpos (+ - L to R, - - R to L)) The INDEX function searches a character argument for the location of a specified character value and returns its location from left to right Index(variable,'character string') The INDEX function returns the starting position of the first occurrence of value within target, if value is found. 0, if value is not found. Indexw, indexc UPCASE: upcase(variable) LOWCASE: lowcase(variable) PROPCASE : propcase(argument, delimiter) The TRANWRD function replaces or removes all occurrences of a given word (or a pattern of characters) within a character string. The TRANWRD function does not remove trailing blanks in target or replacement. A=tranwrd(source, target, replacement) default length for A 200. COMPBL default 200 A SAS date constant must take the form of one- or two-digit day, three-digit month, and two- or four-digit year, enclosed in quotation marks and followed by a d ('ddmmmyy<yy>'d). U can assign date values to variables in assignment statements by using date constants: '17jan2009'd; Time constant: '9:25't; Datetime :'17jan2009:9:27:05'dt; mmddyy8. , ddmmyy8., yymmdd8.- 01/01/60 mmddyy10. - 01/01/1960 Format date7. - 01JAN60 Format date9. - 01JAN1960 Time11. - hh:mm:ss:ss functions:year-4 digit/qtr-1-4/month-1- 12/day(date)-1-31day of month-the date is either sas date constant or variable, mdy(mon,day,year)-creates sas date, today()-sas date, now(), time(), weekday(date)-day of week-1-7-the date is either sas date constant or variable, minute, second, hour(time) Date Format: worddatew. ,weekdayw. yearcutoff= is 1920 But sas stores date values as number of days from 1 jan,1960 Retain is used before the assignment statement to initialize the variable values. It is a compile time only statement dat creates var if doesn't exist Retain A=0;

A=A+X; Or directly write the statement A+X; the o/p is same, continuous summing. To permanently associate a format with a var:format statement in data step User defined format:format procedure -->if you don't put period . after informat in input statement then the varible will take length value as a value of variable for exa: input @1 height 2. @4 weight 2; then the value for weight variable is 2 -->Libname libref 'SAS-data-library'; Set libref.filename; Formats & informats : Formats write values out by using some particular form, into standard SAS values -determines how data values are read into a SAS data set. --> to permanantly store format use libname statement and assign libref which will be used later. Libname library ' path'; Proc Format library = library; Value $formatname($ if character value) 'Value of var' = 'Display name' 'Value of var' = 'Display name'; (No '' on left if value is number) Proc print data=xyz; Format varname $formatname.; Run; -->select group: Select(select expression); When-1(when expression1...when expression-n)statement; When-n(when expression1...when expression-n)statement; Otherwise statement; END; Select (a-varname); When(1-value of var, " "if char) assignment x=x*10; When(3,4,5-value of var, " "if char) assignment x=x*100; Otherwise (can also write assignment); End; -->Continuous numeric values use proc means and proc summary to calculate statistics. Discrete data(char or numeric) use proc freq. drop and keep statements can't be used with SAS procedures, can only be used with data step. --> basic report: proc print, Column totals - sum varname,Column subtotals - sum varname; by varname;(in o/p by var name and value appear before each by group. The by var name and subtotal appear at the end of each by group),Can use IDBY

and sum statement together to suppress the obs column, Must use label option if want to write label using proc print, Can use var variablenames, Where condition statements, Noobs option to remove obs column, IDstatement (if the var in the Idstatement is also included in var statement then it appears twice) CONTAINS-?- operator exa: where varname ? 'String';, IN operator in where statement, Pageby statement to request each by group on seperate page,double option for double spacing, Format,Label statement, -->list report: Proc report, nowd option in o/p window, column statement,where statement, define statement(usage-across,analysis, computed,display,group,order/attribute- formate=,width=no effect on html o/p, spacing=no effect on html o/p/options- formatingdescending,noprint,nozero,page/justification-center,left,right/column heading), default split = '/', char variables as display variables & numeric as analysis which are used to calculate sum statistics, order usage displays all the obs and the first occurance-only once -of each value of order variable, --> summary report: GROUP usage creates SUMMARY REPORT-group variable groups the detail rows in one, All the char varibles must be defined as group if any one is not defined as group then the report will be list report with order usage, or they must be defined as analysis, across or computed, by default does not print the o/p- print option to create -->proc means: Var statement, maxdec option, by default prints the o/p- noprint option to block,class statement for group processing-no statistics for class var, by statement-like class statement for group processing but no statistics for by var(class doen't require prior data to be sorted, class produce single large table, by produce small tables-each for one value of by var), statistical keyword to produce other statistics --> proc freq Tables statement(tables row*columns, tables variable names will produce freq table for each variable,) options for tables statement: nocum-no cumulative freq,nofreq-no freq values,nopercent-no percent values,crosslisto/p the crosstabulation tables in ods column format I.e. One big table, list-to generate list o/p for crosstabulation,nocol,norow syntax to create one obs for each detail record: libname; filename; data a; Infile b; Retain adress; Input type $1. @; If type ='H' then input @3 adress $15.; If type = 'P'; Input @3 name $10. @13 age 3. @16 gender $1. ; run;

syntax to create one obs for each header record: libname; filename; Data a; Infile b end=last; Retain adress; Input type $1. @; If type='H' then do; If _n_>1 then output; Total = 0; Input adress $3-17; End; Else if type ='P' then total+1; if last then output; run;

Вам также может понравиться