Вы находитесь на странице: 1из 3

1.

TITLE
Pima Indians Diabetes Database
2. USE IN STATLOG
2.1- Testing Mode
12 Fold Cross-Validation
2.2- Special PreProcessing
2.3- Test Results
Error Rate
TIME
Algorithm
Train Test
Train Test
-------------------------------------------LogDisc
0.219 0.223 31
7
Dipol92
0.220 0.224 36
1
Discrim
0.220 0.225 27
7
Smart
0.177 0.232 3762
?
Radial
0.218 0.243 5
0
Itrule
0.223 0.245 31
2
BackProp
0.198 0.248 7171
0
Cal5
0.232 0.250 237
0
Cart
0.227 0.255 30
1
Castle
0.260 0.258 35
5
QuaDisc
0.237 0.262 24
7
Bayes
0.239 0.262 25
7
C4.5
0.131 0.270 12
1
IndCart
0.079 0.2710 216
209
BayTree
0.008 0.271 10
0
LVQ
0.101 0.272 140
1
Kohonen
0.134 0.273 1966
3
Ac2
0.0
0.276 4377
241
NewId
0
0.289 10
10
Cn2
0.010 0.289 38
3
Alloc80
0.288 0.301 1374
?
KNN
0
0.324 1
2
Default
0.350 0.350 ?
?
Cascade
?
0.00
3. SOURCES and PAST USAGE
3.1 SOURCES
(a) Original owners: National Institute of Diabetes and Digestiv
e and
Kidney Diseases
(b) Donor of database: Vincent Sigillito (vgs@aplcen.apl.jhu.edu
)
Research Center, RMI Group Leader
Applied Physics Laboratory
The Johns Hopkins University
Johns Hopkins Road
Laurel, MD 20707
(301) 953-6231
(c) Date received: 9 May 1990
3.2 Past Usage:
1. Smith,J.W., Everhart,J.E., Dickson,W.C., Knowler,W.C., \&
Johannes,R.S. (1988). Using the ADAP learning algorithm to forec
ast
the onset of diabetes mellitus. In "Proceedings of the Symposi

um
on Computer Applications and Medical Care" (pp. 261--265). IEEE
Computer Society Press.
The diagnostic, binary-valued variable investigated is whether t
he
patient shows signs of diabetes according to World Health Organi
zation
criteria (i.e., if the 2 hour post-load plasma glucose was at le
ast
200 mg/dl at any survey examination or if found during routine
medical
care).

The population lives near Phoenix, Arizona, USA.

Results: Their ADAP algorithm makes a real-valued prediction bet


ween
0 and 1. This was transformed into a binary decision using a cu
toff of
0.448. Using 576 training instances, the sensitivity and specif
icity
of their algorithm was 76% on the remaining 192 instances.
4. DATASET DESCRIPTION
NUMBER of EXAMPLES:

768

NUMBER of CLASSES:
2
1, 2
(class value 1 is interpreted as "tested positive for
diabetes")
Class Distribution:
Class Value Number of instances
1
500 (65.1%)
2
268 (34.9%)
NUMBER of ATTRIBUTES: 8
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral
glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
Brief statistical analysis:
Attribute number:
1.
2.
3.
4.
5.
6.
7.
8.

Mean:
3.8
120.9
69.1
20.5
79.8
32.0
0.5
33.2

Standard Deviation:
3.4
32.0
19.4
16.0
115.2
7.9
0.3
11.8

Missing Attribute Values: None

Relevant Information:
Several constraints were placed on the selection of these instan
ces
from a larger database. In particular, all patients here are fe
males
at least 21 years old of Pima Indian heritage.
ADAP is an adaptive learning routine that generates and executes
digital analogs of perceptron-like devices. It is a unique algo
rithm;
see the paper for details.
CONTACTS
statlog-adm@ncc.up.pt
bob@stams.strathclyde.ac.uk

Вам также может понравиться