Вы находитесь на странице: 1из 2

Suppose a binary classifier that is to classify 1000 items. Of those items, 700 belong to A, and 300 to B.

The results are as follows:

True A | 550 | 150
True B | 50 | 250
We'll call class B a positive result (1) and class A a negative one (0). So there were 550 true negatives, 150 false positives, 50 false negatives and 250 true positives.

There are some metrics defined for this classification:




When a data scientist has chosen a target variable - the “column” in a spreadsheet they wish to predict - and have done the prerequisites of transforming data and
building a model, one of the most important steps in the process is evaluating the model’s performance.

Confusion Matrix
Choosing a performance metric often depends on the business problem being solved. Let’s say you have 100 examples in your data and you’ve fed each one to your
model and received a classification. The predicted vs. actual classification can be charted in a table called a confusion matrix.

Negative (predicted) Positive (predicted)

Negative (actual) 98 0

Positive (actual) 1 1

The table above describes an output of negative vs. positive. These two outcomes are the “classes” of each examples. Because there are only two classes, the model
used to generate the confusion matrix can be described as a binary classifier.
To better interpret the table, you can also see it in terms of true positives, false negatives, etc.

Negative (predicted) Positive (predicted)

Negative (actual) true negative false positive

Positive (actual) false negative true positive

Overall, how often is our model correct?

As a heuristic, accuracy can immediately tell us whether a model is being trained correctly and how it may perform generally. However, it does not give detailed
information regarding its application to the problem.

When the model predicts positive, how often is it correct?

Precision helps determine when the costs of false positives are high. So let’s assume the business problem involves the detection of skin cancer. If we have a model
that has very low precision, the result is that many patients will receive results that they have melanoma. Lots of extra tests and stress are at stake.

Recall helps determine when the costs of false negatives are high. What if our problem requires that we check for a fatal virus such as Ebola? If many patients are told
they don’t have Ebola (when they actually do), the result is likely a large infection of the population and an epidemiological crisis.

F1 Score

F1 is a helpful measure of a test’s accuracy. It is a consideration of both precision and recall, and an F1 score is considered perfect when at 1 and is a total failure
when at 0.