Model evaluation techniques in machine learning

Model Evaluation techniques

It is always necessary to answer this question or ask yourself ” How much is your model good? or how much confidence do you have on your model?“. Model evaluation techniques help us to answer this question.

Model evaluation techniques in machine learning are helping us to find a better model among all other models in machine learning. It is simply the selection of machine learning models or measuring the performance of machine learning models.

There are 2 types of model evaluation techniques in machine learning. They are

  • Evaluation techniques for Classification
  • Evaluation techniques for Regression

Classification is all about the classifying the data points based on the given data, like men or women, fraud or not, spam or not, etc.

A). Model Evaluation techniques for Classification

  1. Accuracy
  2. Confusion Matrix
  3. Precision
  4. Recall
  5. F1- Score
  6. ROC (Receiver Operating Characteristic curve) & AUC (Area Under Curve)
  7. Log loss

1. Accuracy

Accuracy can help us to find out how much our model is good, but not all the time. It is a very simple metric to measure the machine learning model performance. Accuracy always lies in 0 to 1.

 Accuracy = No.of correctly classified points / Total No.of Points 
                
                    0 - is Bad 
                    1 - is Good

Let us take an example of classifying gender whether it is male or not. here total points are 100. 60 data points are men and 40 points are women. These are actual data points in our data set. let’s find how much our model is good.

Evaluation techniques in machine learning -www.devpyjp.com

Here, out of 60 data points 53 correctly classified, and 7 incorrectly classified for men. For women data points 35 correctly classified and 5 are incorrectly classified. let’s calculate accuracy for it.

Accuracy = 35+53/100 = 88/100  = 0.88%

Oh! we got 88% of machine learning model accuracy. That’s fine and easy to calculate and easy to understand, right, isn’t it?

But we can’t believe accuracy all the time to measure model performance because if we have an Imbalanced dataset, in that men datapoints are 90 and only 10 are women then model always got 90+% accuracy.

2. Confusion Matrix

I know, we have fully confusion about the confusion matrix but it’s ok. let me give a clear picture of this matrix. whenever you forget the confusion matrix just hit our website www.devpyjp.com. Thank you!

The confusion matrix gives a clear picture of our model predictions, No matter whether it is correctly classified or not.

Confusion Matrix – www.devpyjp.com

we have predicted and actual class labels women and men. Actual – True class labels Predicted – Our ML Model predictions here 0 – Negative class (Men) 1 – Positive class label. we have the freedom to interchange class labels and Actual, Predicted class labels. The other terminology is below.

  • TN – True Negative
  • TP – True Positive
  • FN – False Negative
  • FP – False Positive

Here it is, Don’t worry, I don’t want to confuse you so I used the same above picture. are you ready to dig it, then come with me guys!

Evaluation techniques in machine learning -www.devpyjp.com

TN – True Negative :

It is a ratio of the no.of negative class labels predicted by model correctly to total negative class labels. In this case, we predicted men correctly 53 times.

True Negative Rate = Correctly Predicted Negatives / Total No.of Negative classes

                                TNR = 53 / 60

The actual is Negative class and our prediction is also Negative.

TP – True positive:

It is a ratio of the no.of positive class labels predicted by model correctly to total positive class labels. In this case, we predicted men correctly 35 times.

 True Positive Rate = Correctly Predicted Positives / Total No.of Positives classes 
                       
                               TPR = 35 / 40 
 
The actual is Positive class and our prediction is also Positive. 

FN – False Negative:

Here the actual class labels are Positive but we predicted as Negative. so it is called False Negative rate.

False Negative Rate = No.of Flase negative predictions /  Total No.of Negative classes 
                                         
                                FNR = 5/40

FP – False Positives:

Here the actual class labels are Negative but we predicted as Positive. so it is called a False positive rate.

False Positive Rate = No.of Flase Positive predictions /  Total No.of Positives classes  

                                 FNR = 7/60

Always remember these 2 things in your entire machine learning study.

Note -1: TP+TN gives Accuracy or Correct prediction Rate. we always try to maximize this.

Note – 2: FN+FP gives Incorrect prediction Rate. we always try to minimize this.

Here, we evaluate all aspects of our predictions by machine learning models. let’s move a little forward confusion matrix.

3. Precision

Precision is a metric that completely focussing on Positive class labels. Precision always ranges between 0 to 1.

It is mainly used in Information retrieval like google search. It always shows relevant information to the search queries.

Precision = TP/TP+FP
Precision = 35/35+7

"WHEN WE PREDICTED YES, HOW OFTEN OUR PREDICTION IS CORRECT."

4. Recall

The recall is a metric that completely focussing on Positive class labels. Precision always ranges between 0 to 1. It is also used in Information retrieval. The recall is simply known as the True Positive Rate.

Recall = TP/P
Recall = 35/40

here P - represents the Total positiv classes.

"OUT OF ALL ACTUAL POSTIVES, HOW MANY YOU PREDICTED AS POSTIVE."

5. F1 – Score

F1 – Score is a metric which is a combination of Precision and Recall. we need always precision high and recall high. so F1-score is a better metric to measure the model performance whenever you need precision and recall.

F1 - Score = 2 * (Precision * Recall)/(Precision + Recall)

6. ROC & AUC

This is mainly used for Binary classification metrics. It is very easy to understand and mostly used metric for Binary classification machine learning models.

There are a few steps to get ROC & AUC on our data.

  • Calculate prediction probabilities of all classes
  • Sort all probability scores.
  • Now decide your threshold
  • Write the condition like below
thershold = 0.90
if y_ pred_probability >= thershold:
   return 1
else:
    return 0
  • Now compute TPR and FPR
  • Change the threshold and repeat the process until you get good TPR and FPR.
  • now plot the graph with TPR and FPR
  • See MAGIC.
ROC AND AUC CURVE – www.devpyjp.com

It is a value range of 0 – 1. 0 is BAD and 1 is GOOD. For imbalanced data AUC is high. AUC does not depend on y_pred probability scores. AUC for a random model is 0.5.

7. log-loss:

log-loss is the most used evaluation matric in machine learning models even it is a multi-class classification. It uses probability scores. It simply sums of negative log of probabilities of a class. It is a loss it must be nearer to 0 and 1.

Log_loss = -1/n Σ[(log(pi)*yi + (1-yi)* log(1-pi)]

The above formula is for binary classification. It is simple to calculate.

There are some other classification metrics, but these are the most common and used evaluation techniques in machine learning.

Check out this also: BAG of Words in NLP

B). Model Evaluation techniques for Regression

Regression: Regression is an analysis technique that is a relationship between one dependent variable and one or more independent variables. Regression used for predicting stock price, sales, investing and other sectors.

We need to find which is a better model in regression also. There are so many regression techniques, here we are going to discuss the most commonly used evaluation metrics in machine learning.

  1. R-Squared
  2. MSE (Mean Squared Error)
  3. RMSE ( Root Mean Squared Error )

1. R – Squared

It is also called as coefficient determination. If the R-square value is low then our model is bad. If R -square value is high then our model is good. It is a range between 0 to 1.

R2  = 1 - SSresidual / SStotal  
SStotal = 1/n Σ(yi - ypred_mean)2 
SSresidual =  1/n Σ(yi - ypred)2         #SS - Sum of Squares
  • SSresidual = 0 then R2 = 1 ; machine learning model is good.
  • SStotal> SSresidual ; R2 lies in between 0 to 1.
  • SStotal= SSresidual ; R2 =0 is a simply mean model.
  • SStotal < SSresidual ; R2 is negative. so our machine learning model is worst.

2. MSE (Mean Squared Error)

Mean Squared Error is a technique to compute errors in regression machine learning models. It is very simple to compute and interpretable. we calculate the sum of squares of errors of model predictions.

MSE =  1/n Σ(yi - ypred)2 

Always errors or loss must be near to zero. so If MSE is high our model is worst, If MSE is low our model is good. check more about MSE here

3. RMSE ( Root Mean Squared Error )

It is simply the root of MSE. If RMSE is high our model is worst, If RMSE is low our model is good.

 RMSE =  Root( 1/n Σ(yi - ypred)2 )

some other model evaluation techniques for Regression are out there. You can check it them also through google. But these are commonly used model evaluation techniques for regression.

OK, Thank you for Reading. I hope you got a good idea about all the evaluation techniques in machine learning for both regression and classification. Please appreciate us, and if you have any queries and suggestions please comment on us.

Join our newsletter we will send a FREE pdf of complete machine learning tutorials. Thank you!

Don't miss out!
Subscribe To Newsletter

Receive top Machine learning & Python news, lesson ideas, project tips, FREE Ebooks and more!

Invalid email address
Thanks for subscribing!

Leave a Reply