K-Nearest Neighbors (KNN) Algorithm

Image for post
Image for post
Simple Analogy for K-Nearest Neighbors (K-NN)

What is K-NN ?

K-NN is a non-parametric and lazy learning algorithm. Non-parametric means there is no assumption for underlying data distribution i.e. the model structure is determined from the dataset.

K-NN classification

In K-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

Image for post
Image for post
The Red point is classified to the class most common among its k nearest neighbors..

The Euclidean distance

  • The Euclidean distance is the most common distance metric used in low dimensional data sets. It is also known as the L2 norm. The Euclidean distance is the usual manner in which distance is measured in the real world.

Other popular distance measures :

  • Hamming Distance: Calculate the distance between binary vectors.
  • Manhattan Distance: Calculate the distance between real vectors using the sum of their absolute difference. Also called City Block Distance.
  • Minkowski Distance: Generalization of Euclidean and Manhattan distance.

Steps to be carried out during the K-NN algorithm are as follows :

  1. Divide the data into training and test data.
  2. Select a value K.
  3. Determine which distance function is to be used.
  4. Choose a sample from the test data that needs to be classified and compute the distance to its n training samples.
  5. Sort the distances obtained and take the k-nearest data samples.
  6. Assign the test class to the class based on the majority vote of its k neighbors.
Image for post
Image for post
Steps to be carried in KNN algorithm

Performance of the K-NN algorithm is influenced by three main factors :

  1. The distance function or distance metric used to determine the nearest neighbors.
  2. The decision rule used to derive a classification from the K-nearest neighbors.
  3. The number of neighbors used to classify the new example.

Advantages of K-NN :

  1. The K-NN algorithm is very easy to implement.
  2. Nearly optimal in the large sample limit.
  3. Uses local information, which can yield highly adaptive behavior.
  4. Lends itself very easily to parallel implementation.

Disadvantages of K-NN :

  1. Large storage requirements.
  2. Computationally intensive recall.
  3. Highly susceptible to the curse of dimensionality.

K-NN Algorithm finds its applications in :

  1. Finance — financial institutes will predict the credit rating of customers.
  2. Healthcare — gene expression.
  3. Political Science — classifying potential voters in two classes will vote or won’t vote.
  4. Handwriting detection.
  5. Image Recognition.
  6. Video Recognition.
  7. Pattern Recognition.
Image for post
Image for post
Some Applications of K-NN

Book lover || Passionate Blogger || GitHub : github.com/afrozchakure || Linkedin : linkedin.com/in/afroz-chakure-489780168

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store