Вы находитесь на странице: 1из 5

DataMiningClassification&Prediction

Advertisements

PreviousPage NextPage

There are two forms of data analysis that can be used for extracting models describing
importantclassesortopredictfuturedatatrends.Thesetwoformsareasfollows

Classification

Prediction

Classification models predict categorical class labels and prediction models predict
continuousvaluedfunctions.Forexample,wecanbuildaclassificationmodeltocategorize
bank loan applications as either safe or risky, or a prediction model to predict the
expenditures in dollars of potential customers on computer equipment given their income
andoccupation.

Whatisclassification?
FollowingaretheexamplesofcaseswherethedataanalysistaskisClassification

Abankloanofficerwantstoanalyzethedatainordertoknowwhichcustomer(loan
applicant)areriskyorwhicharesafe.

A marketing manager at a company needs to analyze a customer with a given


profile,whowillbuyanewcomputer.

Inbothoftheaboveexamples,amodelorclassifierisconstructedtopredictthecategorical
labels.Theselabelsareriskyorsafeforloanapplicationdataandyesornoformarketing
data.

Whatisprediction?
FollowingaretheexamplesofcaseswherethedataanalysistaskisPrediction
Suppose the marketing manager needs to predict how much a given customer will spend
duringasaleathiscompany.Inthisexamplewearebotheredtopredictanumericvalue.

Thereforethedataanalysistaskisanexampleofnumericprediction.Inthiscase,amodel
or a predictor will be constructed that predicts a continuousvaluedfunction or ordered
value.

NoteRegressionanalysisisastatisticalmethodologythatismostoftenusedfornumeric
prediction.

HowDoesClassificationWorks?
Withthehelpofthebankloanapplicationthatwehavediscussedabove,letusunderstand
theworkingofclassification.TheDataClassificationprocessincludestwosteps

BuildingtheClassifierorModel

UsingClassifierforClassification

BuildingtheClassifierorModel

Thisstepisthelearningsteporthelearningphase.

Inthissteptheclassificationalgorithmsbuildtheclassifier.

The classifier is built from the training set made up of database tuples and their
associatedclasslabels.

Each tuple that constitutes the training set is referred to as a category or class.
Thesetuplescanalsobereferredtoassample,objectordatapoints.

UsingClassifierforClassification
Inthisstep,theclassifierisusedforclassification.Herethetestdataisusedtoestimate
theaccuracyofclassificationrules.Theclassificationrulescanbeappliedtothenewdata
tuplesiftheaccuracyisconsideredacceptable.

ClassificationandPredictionIssues
The major issue is preparing the data for Classification and Prediction. Preparing the data
involvesthefollowingactivities

Data Cleaning Data cleaning involves removing the noise and treatment of
missing values. The noise is removed by applying smoothing techniques and the
problem of missing values is solved by replacing a missing value with most
commonlyoccurringvalueforthatattribute.

Relevance Analysis Database may also have the irrelevant attributes.


Correlationanalysisisusedtoknowwhetheranytwogivenattributesarerelated.

Data Transformation and reduction The data can be transformed by any of


thefollowingmethods.

Normalization The data is transformed using normalization.


Normalizationinvolvesscalingallvaluesforgivenattributeinordertomake
themfallwithinasmallspecifiedrange.Normalizationisusedwheninthe
learningstep,theneuralnetworksorthemethodsinvolvingmeasurements
areused.

Generalization The data can also be transformed by generalizing it to


thehigherconcept.Forthispurposewecanusetheconcepthierarchies.
NoteDatacanalsobereducedbysomeothermethodssuchaswavelettransformation,
binning,histogramanalysis,andclustering.

ComparisonofClassificationandPredictionMethods
HereisthecriteriaforcomparingthemethodsofClassificationandPrediction

Accuracy Accuracy of classifier refers to the ability of classifier. It predict the


class label correctly and the accuracy of the predictor refers to how well a given
predictorcanguessthevalueofpredictedattributeforanewdata.

SpeedThisreferstothecomputationalcostingeneratingandusingtheclassifier
orpredictor.

Robustness It refers to the ability of classifier or predictor to make correct


predictionsfromgivennoisydata.

ScalabilityScalabilityreferstotheabilitytoconstructtheclassifierorpredictor
efficientlygivenlargeamountofdata.

InterpretabilityItreferstowhatextenttheclassifierorpredictorunderstands.

PreviousPage NextPage

Advertisements

TRY PROCESS MINING - UP AND


RUNNING IN MINUTES

celonis.com/free-trial

Test Celonis 30 days for free. Get started with tutorials!


Write for us FAQ's Helping Contact
Copyright 2016. All Rights Reserved.

Enter email for newsletter go

Вам также может понравиться