What is measure of dispersion? (Data spread – range, variance and standard deviation)
Here we see how the data points are spread, it tells us how much variation is there from central tendency
Range:
Here we see minimum and maximum value of data
Variance:
Squared average deviation of each data, here we first calculate mean and then we find difference with mean and each data point, the difference is squared to the power of 2 in order to eliminate negative value, later to the sum we take average
There are 2 kinds of variance based on the situation,
Sample variance: if we are looking at sample data we perform sample variance
s² = (Σ(xᵢ – x̄)² ) / (n – 1)
· Σ (sigma) represents the sum of all the values
· xᵢ represents each individual value in the sample
· x̄ (x bar) represents the sample mean
· n represents the total number of elements in the sample
Population variance: if we are looking at sample data we perform population variance
σ² = (Σ(xᵢ – μ)² ) / N
· Σ (sigma) represents the sum of all the values
· xᵢ represents each individual value in the dataset
· μ (mu) represents the population mean
· N represents the total number of elements in the population
Note: Whence variance is less the data spread is less, the height of data increases, it is leptokurtic or mesocratic distribution
Standard Deviation: it is the square root of the variance. Here we square the the variance to make the value positive
Similarly we have 2 kinds of standard deviation
Population Standard Deviation (σ):
σ = √(σ² ) = √((Σ(xᵢ – μ)² ) / N)
Sample Standard Deviation (s):
s = √(s² ) = √((Σ(xᵢ – x̄)² ) / (n – 1))
· Σ (sigma) represents the sum of all the values
· xᵢ represents each individual value in the dataset
· μ (mu) represents the population mean
· x̄ (x bar) represents the sample mean
· N represents the total number of elements in the population
· n represents the total number of elements in the sample