Hello Friends welcome back,

Any one who desires to become Data Scientists or get into role of Analytics, cant skip Statistics. Stats play a big role to understand the complex block of Analytics.

The Basic building block of Statistics are – Measures of Centrality and Spread !!

Why do we need Measure of Centrality & Spread ? –> Imagine in real world, we have millions of data arranges in Rows and Columns. Very hard to get the sense of data by having physical look. Hence

  • We can drawing plots can be an good Visual Summary
  • In some situations, we want even more succinct summary that given a idea how the data look alike !

centrality

Now, these summaries are called as Statistics(Estimator of Population). For Quantitative Data (i.e. Numerical Data) we have

  1. Measures of Centrality –> Mean, Median & Mode
  2. Percentiles –> Quartiles, Quintiles, Deciles
  3. Measures of Spread –> Range, IQR, Variance & Standard Deviations

Measure of Centrality

Mean: Simply the average of  data points i.e. Sum of collection of Numbers / Count of Numbers in Collection

mean

Median: Is the value which appears at the center of the data when data is sorted. We have 2 cases when no of data points is odd/even, depending on which the median will differ and is given as below.

median

Mode: Defined as most frequently occurring value in the dataset.

Characteristics of Measures of Centrality

Mean is known as “Centre of Gravity of Dataset”. The deviation of a point from the mean is defined as the difference between this point and the mean.

Note: “The Sum of the deviations of all points from the mean is 0”. Refer below screenshot for proof!!

mean proof

Sensitivity of the Measures of Centrality to Outliers

Outliers are any point which is far away from the other values in the data.

outliers

Hey, means are very sensitive to Outliers. But median & mode are not so sensitive. Therefore to account for the sensitivity to outliers it is advised to compute the “Trimmed Mean”. Trimmed mean is computed by dropping K extreme elements from either side before calculating mean. Example, refer below screenshot.Mean example

Points to Remember:

  • Perfectly symmetric distribution – It is observed that Mean=Median=Mode.
  • Left skewed distribution – Mean < Median < Mode
  • Right skewed distribution – Mean > Median > Mode

Effects of Transformation on the Measures of Centrality

  1. Scaling: Multiply/Divide each value in the dataset with constant value i.e. Xnew = a*X
  2. Shifting: Add/Subtract each value in the dataset with constant value i.e. Xnew = a+X
  3. Scaling and Shifting: Combination of both i.e. Xnew = a*X + b

When transformations applied to mean, the new mean also gets shifted or scaled accordingly.

mean transformation

However, transformations doesn’t have impact on median. Similarly scaled value of the mode will be the new mode.

Summary:

  • Mean is sensitive to outliers but median is not
  • Mean is the center of gravity of the data
  • Effect of Skewness/Transformations: Mean varies accordingly, however Median & Mode has less or no effect

You can also learn about Sampling and Confidence Interval from our earlier blog posts.

Some of the Books to refer more on this topic,