Logistic regression in machine learning – Quick guide

Logistic regression is a classification algorithm, not a regression technique. let’s take an example men and women are two categories. we can classify them based on features like hair_length, height, and weight.

so many people often confused about linear and logistic regression. If you don’t know what is linear regression please check here and get clear: Linear regression in machine learning.

let’s talk about a few features of logistic regression.

Features of Logistic Regression

1. Easy to understand
2. Robust to outliers
3. provide feature importance
4. speed
5. works well on non-linear data ( we will do feature transformation )

Highly used in low latency applications because of the speed of logistic regression, training time also very low.

In Logistic regression, we assume that the data is linearly separable. so that we have drawn a line or plane between the points. let’s assume that we have positive and negative points in our dataset and they are linearly separable.

Here, w – normal to the plane, π is a plane that separates the green and red points. Green – +Ve points and red -Ve points.

Now dive into details of logistic regression. let’s look at the below picture

Here, di – the distance between +ve Xi (green) points to plane and dj – the distance between plane to -ve Xj (red) points. Notice that 2 points are opposite side, lets called them as misclassified points. where

lets consider ||w|| as a unit vector. so di = wTXi and w,xi are in same direction and dj = wTXj ; w,xj are in opposite direction.

• If wTXi > 0 then the model classified as +ve or green point.
• If wTXj < 0 then the model classified as -ve or redpoint.
``````<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
style="display:block"
data-full-width-responsive="true"></ins>
<script>
</script>``````

Objective:

Here our main objective is to minimize the misclassification and maximize the correct predictions as many as possible yi*wTXi >0.

How LR -model classified the data :

Case – 1: The actual class label yi is +ve or green point, and w, the green or +ve point are on the same sides, then wTXi > 0, so yi*wTXi > 0. The model classified as a +ve point or green point. [ + * + = + ]. These are correctly classified.

Case – 2: The actual class label yi is -ve or red point, and w, the red or -ve point are on the same sides, then wTXi < 0, so yi*wTXi > 0. The model classified as a +ve point or green point. [ – * – = + ] . These are correctly classified.

Case – 3: The actual class label yi is -ve or red point, and w, the green or +ve point are on opposite sides, then wTXi <0, so yi*wTXi < 0. The model classified as a +ve point or green point. [ – * + = – ]. These are missclassified.

Case – 3: The actual class label yi is +ve or green point, and w, the red or -ve point are on opposite sides, then wTXi >0, so yi*wTXi < 0. The model classified as a +ve point or green point. [ + * – = – ]. These are misclassified.

Optimization:

Here Dn = [ Xi, Yi ] is the dataset consists of features Xi and class labels Yi. The optimization equation of logistic regression is

Xi, Yi is already given in the dataset. We need to maximize the sum using W.

• The value of yi*wTXi is +ve if correctly classified.
• The value of yi*wTXi is -ve if the point is misclassified.

Note: Outlier points can affect the signed distances, so sometimes we will get accuracy high but W* value is very low, and sometimes W* got high but accuracy very low.

The squashing or sigmoid function helps us to squash all values in the range of 0 to 1. Instead of using signed distance we

• if the signed distance is small, then use as it is.
• if the signed distance is high, then make it small.

The sigmoid function always squashes the signed distances below 1 which are high values.

where X = yi*wTXi .

Properties of the sigmoid function

• The minimum value of the sigmoid function is 0.
• The maximum value of the sigmoid function is 1.
• If x = 0 then f(x) = 0.5

Now the optimization function of logistic regression is like below.

Or

Here our task is to find W that maximizes our optimization function. We calculate Wi for different πi. which gives the maximum W, That plane equation is our classifier.

I hope you guys, you get the topic of logistic regression, Please check more Machine learning Algorithms here: Machine learning Algorithms.

Please Appreciate us through a comment or Subscribe to our newsletter below. If you have any doubts about the concept please feel free to ask below. Thank you!

Don't miss out!