Вы находитесь на странице: 1из 31

DATA MANAGEMENT

Using EpiData and SPSS

References
Public domain (pdf) book on data management:
Bennett, et al. (2001). Data Management for
Surveys and Trials. A Practical Primer Using
EpiData. The EpiData Documentation Project. :
http://www.epidata.dk/downloads/dmepidata.pdf
EpiData Association Website: http://www.epidata.dk/
Importing raw data into SPSS:
http://www.ats.ucla.edu/stat/spss/modules/input.htm

Data Management

Planning data needs


Data collection
Data entry and control
Validation and checking
Data cleaning and variable transformation
Data backup and storage
System documentation
Other

Types of Data Base


Management Systems
(DBMSs)

Spreadsheets (e.g., Excel, SPSS Data Editor)

Prone to error, data corruption, & mismanagement


Lack data controls, limited programmability
Suitable only for small and didactic projects
Also good for last step data cleaning

Commercial DBMS programs (e.g., Oracle, Access)


Limited data control, good programmability
Slow & expensive
Powerful and widely available

Public domain programs (e.g., EpiData, Epi Info)


Controlled data entry, good programmability
Suitable for research and field use

We will use two


platforms:

EpiData

controlled data entry


data documentation
export (write) data

SPSS
import (read) data
analysis
reporting

What is EpiData ?

EpiData is computer program (small in size


1.2Mb) for simple or programmed data entry
and data documentation
It is highly reliable
It runs on Windows computers
Runs on Macs and Linus with emulator software (only)

Interface
pull down menus
work bar

History of EpiInfo & EpiData

19761995: EpiInfo (DOS program) created by


CDC (in wake of swine flu epidemic)
Small, fast, reliable, 100,000+ users worldwide

19952000: DOS dies slow painful death


2000: CDC releases EpiInfo2000
Based on Microsoft Jet (Access) data engine
Large, slow, unreliable (resembled EpiInfo in name only)

2001: Loyal EpiInfo user group decides it needs


real EpiInfo for Windows
Creates open source public domain program
Calls program EpiData

Goal: Create & Maintain


Error-Free Datasets

Two types of data errors


Measurement error (i.e., information bias)
discussed last couple of weeks
Processing errors = errors that occur during
data handling discussed this week

Examples of data processing errors

Transpositions (91 instead of 19)


Copying errors (O instead of 0)
Additional processing errors described on p.
18.2

Avoiding Data Processing


Errors

Manual checks (e.g., handwriting


legibility)
Range and consistency checks* (e.g., do
not allow hysterectomy dates for men)
Double entry and validation*

Operator 1 enters data


Operator 2 enters data in separate file
Check files for inconsistencies

Screening during analysis (e.g., look for


outliers)

* covered in lab

Controlled Data Entry

Criteria for accepting & rejecting data


Types of data controls
Range checks (e.g., restrict AGE to
reasonable range)
Value labels (e.g., SEX: 1 = male, 2 = female)
Jumps (e.g., if male, jump to Q8)
Consistency checks (e.g., if sex = male,
do not allow hysterectomy = yes)
Must enters
etc.

Data Processing Steps


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

File naming conventions


Variables types and names
QES (questionnaire) development
Convert .QES file to .REC (record) file
Add .CHK file
Enter data in REC file
Validate data (double entry procedure)
Documentation data (code book)
Export data to SPSS
Import data into SPSS

Filenaming and File


Management

c:\path\filename.ext
A web address is a good example of a filename, e.g.,
http://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.ppt

Some systems are case sensitive (Unix)


Others are not (Windows)

Always be aware of
Physical location (local, removable, network)
Path (folders and subfolders)

Filename (proper)
Extension

Demo Windows Network Explorer: right-click Start Bar


> Explore

File extensions you should


know
Extension

Software program

.qes

EpiInfo/EpiData questionnaire

.rec

EpiInfo/EpiData records (data)

.chk

EpiInfo/EpiData check (controls & labels)

.not

EpiData notes (data documentation)

.sav

SPSS permanent data file

.sps

SPSS syntax file (program)

.txt

Generic (flat) text data

.htm

Web Browser

.doc

Microsoft Word

.xls

Microsoft Excel

Selected EpiData
Variable Types
Variable Type
Text
Numeric
Date

Examples
_
<A
>
#
##.#
<mm/dd/yyyy>
<dd/mm/yyyy>
<IDNUM>

Auto ID
Sondex (sanitized) <S >

EpiData Variable
Names

Variable name based on text that occurs


before variable type indicator code
EpiData variable naming default vary
depending on installation
Create variable names exactly as specified
To be safe, denote variable names in {curly
brackets}

For example, to create a two byte numeric


variable called age, use the question:

What is your {age}? ##

Demo / Work Along

Create QES file [demo.qes]


Convert QES to REC [demo.rec]
Create CHK file [demo.chk]
Create double entry file [demo2.rec]
Enter data
Validate data

Fname

Lname

DOB

SEX

DEATHAGE

John

Snow

3/15/1813

45

George

Orwell

6/25/1903

46

We will stop here and


pick up the second part
of the lecture next
week
Stay tuned

Codebooks

Contain info that helps users decipher


data file content and structure
Includes:

Filename(s)
File location(s)
Variable names
Coding schemes
Units
Anything else you think might be useful

EpiData codebook
generators

File Structure
Codebook

Full codebook contains descriptive statistics (demo)

Full Codebook

Notice
descriptive
statistics

Conversion of Data
File

Requires common intermediate file format


Examples of common intermediate files
.TXT = plain text
.DBF = dBase program
.XLS = Excel

Steps

Export .REC file .TXT file


Import .TXT file into SPSS
Save permanent SAV file

Current Export Formats


Supported by EpiData

Plain (raw) TXT data

plain ASCII data format


no column demarcations
no variable names
no labels

TXT file with codebook

tox-samp.txt

tox-samp.not

SPSS Data Export /


Import
TXT
(raw data)

SAV

REC
SPS
(syntax)

Top of tox-samp.sps
Lines beginning with * are
comments (ignored by
command interpreter)

Next set of commands show


file location and structure
via SPSS command syntax

Bottom part of toxsamp.sps file


Labels being imported
into SPSS

Delete * if you want this


command to run

Opening the SPS (command)


file

Running the SPS file

Ethics of Data
Keeping

Confidentiality (sanitized files


free of identifiers)
Beneficence
Equipoise
Informed consent (To what extent?)
Oversight (IRB)

Вам также может понравиться