0% found this document useful (0 votes)
12 views

Distance Metrics In Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Distance Metrics In Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Distance Metrics in Machine Learning

Created by Daniel Zaldaña


https://x.com/ZaldanaDaniel

Euclidean Distance 1 def manhattan (


Formula 2 x : ndarray ,
3 y : ndarray
v
u n 4 ) -> float :
uX 5 """ Calculate Manhattan distance using
d(x, y) = t (xi − yi )2 sklearn . """
i=1
6 X = x . reshape (1 , -1) if x . ndim == 1 else x
7 Y = y . reshape (1 , -1) if y . ndim == 1 else y
When to use:
8 return manhattan_distances (X , Y ) [0 , 0]
• Continuous data in low-dimensional space

• When scale and magnitude matter Cosine Similarity


• Default choice for clustering algorithms Formula
Properties: x·y
similarity(x, y) =
||x||||y||
• Symmetric: d(x, y) = d(y, x) When to use:
• Non-negative: d(x, y) ≥ 0 • Text analysis and document similarity

• Sensitive to outliers and scale • High-dimensional sparse data


• When direction matters more than magnitude
1 from sklearn . metrics . pairwise import (
2 euclidean_distances , 1 def cosine_similarity (
3 manhattan_distances , 2 x : ndarray ,
4 cosine_distances 3 y : ndarray
5 ) 4 ) -> float :
6 import numpy as np 5 """ Calculate cosine similarity using
7 from typing import ndarray sklearn . """
8
6 X = x . reshape (1 , -1) if x . ndim == 1 else x
9 def euclidean ( 7 Y = y . reshape (1 , -1) if y . ndim == 1 else y
10 x : ndarray , 8 return 1 - cosine_distances (X , Y ) [0 , 0]
11 y : ndarray
12 ) -> float :
13 """ Calculate Euclidean distance using
sklearn . """ Mahalanobis Distance
14 # Reshape if needed for single vectors
15 X = x . reshape (1 , -1) if x . ndim == 1 else x Formula
16 Y = y . reshape (1 , -1) if y . ndim == 1 else y q
17 return e u cl i d ean_distances (X , Y ) [0 , 0] d(x, y) = (x − y)T Σ−1 (x − y)
When to use:
• Correlated features
Manhattan Distance
• Anomaly detection
Formula
• Scale-invariant clustering
n
X
d(x, y) = |xi − yi | 1 from sklearn . covariance import
i=1
EmpiricalCovariance
When to use: 2

3 def mahalanobis (
• Grid-like patterns (e.g., city blocks) 4 x : ndarray ,
5 y : ndarray ,
• When diagonal movement costs more 6 cov : ndarray = None
7 ) -> float :
• Robust to outliers 8 """ Calculate Mahalanobis distance using
sklearn . """
9 X = x . reshape (1 , -1) if x . ndim == 1 else x When to use:
10 Y = y . reshape (1 , -1) if y . ndim == 1 else y
11 • Binary or set-based data
12 if cov is None :
13 # Estimate covariance from data • Comparing discrete features
14 cov_estimator = EmpiricalCovariance ()
15 cov_estimator . fit ( np . vstack ([ X , Y ]) ) • Document similarity with word sets
16 cov = cov_estimator . covariance_
17

18 diff = X - Y 1 def jaccard (


19 inv_covmat = np . linalg . inv ( cov ) 2 x : ndarray ,
20 return np . sqrt ( 3 y : ndarray
21 diff . dot ( inv_covmat ) . dot ( diff . T ) 4 ) -> float :
22 ) [0 , 0] 5 """ Calculate Jaccard distance using
sklearn . """
6 X = x . reshape (1 , -1) if x . ndim == 1 else x
7 Y = y . reshape (1 , -1) if y . ndim == 1 else y
Minkowski Distance 8 return pairwise_distances (
Formula 9 X , Y , metric = ’ jaccard ’
10 ) [0 , 0]
n
! p1
X
p
d(x, y) = |xi − yi |
i=1
When to use: Hamming Distance
• Generalizing distance metrics Formula
• When you need to tune the influence of large differences n
X
• Experimenting with different p-norms d(x, y) = ⊮xi ̸=yi
i=1
1 from sklearn . metrics import pairwise_distances
2 When to use:
3 def minkowski (
4 x : ndarray , • Categorical data
5 y : ndarray ,
6 p : float = 2 • Error detection in communication
7 ) -> float :
8 """ Calculate Minkowski distance using • Comparing equal-length strings
sklearn . """
9 X = x . reshape (1 , -1) if x . ndim == 1 else x
10 Y = y . reshape (1 , -1) if y . ndim == 1 else y 1 def hamming (
11 return pa ir wi se_distances ( 2 x : ndarray ,
12 X , Y , metric = ’ minkowski ’ , p = p 3 y : ndarray
13 ) [0 , 0] 4 ) -> float :
5 """ Calculate Hamming distance using
sklearn . """
6 X = x . reshape (1 , -1) if x . ndim == 1 else x
Jaccard Distance 7 Y = y . reshape (1 , -1) if y . ndim == 1 else y
8 return pairwise_distances (
Formula 9 X , Y , metric = ’ hamming ’
|x ∩ y| 10 ) [0 , 0]
d(x, y) = 1 −
|x ∪ y|

You might also like