KNN algorithm in machine learning – A Simple guide

K-NN Algorithm - devpyjp.com

KNN algorithm is the most simple and easy classification algorithm in supervised machine learning. K-NN is known as K- Nearest Neighbors. It is the basic and most popular algorithm and it is also called Lazy learner because of training speed and prediction time.

Features of KNN algorithm

  • The most common and simple classification algorithm, it can give highly effective results.
  • KNN also used for Regression, we take the average of neighbors.
  • Easy to interpretable.
  • It is also used for filling missing values.

KNN algorithm totally depends on the majority of the neighborhoods or class labels. K – represents the number of neighbors. In the below picture we take k=5.

KNN algorithm – www.devpyjp.com

In our dataset, we have positive and negative data points. They have plotted as above and we took k as 5.

In the positive points, we have a circle, the green point surrounded by totally positive points so the green point is also a positive point.

In the negative points, we have a circle, the yellow point surrounded by totally negative points so the yellow point is also a negative point.

Now the problem is, what is that purple color point. It is surrounded by positive and negative points. so we can’t decide which class it is?

Stp by step to KNN algorithm:

Step – 1:

Choose or select the number k of neighbors. This is the step we need to take care of it because it decides the algorithm’s performance.

Note: K must be an odd integer point, otherwise we can’t identify the new data point which class it belongs. If k =4, sometimes we get 2 positive and 2 negative.

Step – 2:

Now take the K nearest neighbors of new data point based on the Euclidian distance or Manhattan distance. Euclidian distance is the most common distance used in machine learning algorithms.

Step – 3:

Now count the class labels in k neighbors. The new data point belongs to the majority class. Let’s take an example, K = 5, and 2 are -negative points and 3 are positive points, then the new data point belongs to + positive point.

Cautions :

  • Affected by outliers
  • Not well on a random dataset
  • Time & space complexities are very high, we need to store all training data when the model in production.
  • Not well for low latency applications

Ok! I hope you understand how the KNN algorithm works, check out all machine learning algorithms. please comment below if you have any quires about the K-NN algorithm. Thank you!

Don't miss out!
Subscribe To Newsletter

Receive top Machine learning & Python news, lesson ideas, project tips, FREE Ebooks and more!

Invalid email address
Thanks for subscribing!

Leave a Reply