These relationships are pretty clear once you see the other distributional metri...

FabHK · on Dec 17, 2017

Not sure what you're saying? The article is about "central tendency" or "location" statistics (ie "first moment"), and how 3 common ones pop out of minimising different distances (L0, L1, L2).

It doesn't even mention variance (second central moment), let alone skew or kurtosis?

jdonaldson · on Dec 17, 2017

I'm saying distance techniques are related to covariance, and there's a lot of useful info from stats when you go to higher moments. I never see this in ML and I'm wondering why so I'm pointing it out.

no_identd · on Dec 18, 2017

Careful tho: 'Kurtosis' doesn't equal 'Kurtosis', to the point where different R packages have functions named "Kurtosis" that implement different things. Here's an example I stumbled upon today:

https://stat.ethz.ch/pipermail/r-help/2005-December/083875.h...

"pkg:moments uses the ratio of 4th sample moment to square of second sample moment, while pkg:fBasics uses the variance instead of the second moment and subtracts 3 (for reasons to do with the Normal distribution)."

"The "correct" number for kurtosis depends on your purpose. The number for "kurtosis" that subtracts 3 estimates a "cumulant", which is the standard fourth moment correction weight in an Edgeworth expansion approximation to a distribution. Neither of the numbers described ... compute the "4th sample k statistic", which is "the unique unbiased estimator" for that number (http://mathworld.wolfram.com/k-Statistic.html)."

jameskegel · on Dec 17, 2017

This is the first time the concept of skewness was and kurtosis has made sense to me, thanks