Metabolomics and Data Analysis blog

What to do when points need averaging? Obvious, not so fast…

Data analysis

By Miroslava Cuperlovic-Culf

November 2023

Taking the average is one of the most basic, simplest steps in analyses and a great topic to start a new blog with. It is so simple to do and so basic that it is rarely thought about but how you do this step will greatly influence the rest of any analysis. Arithmetic Mean v.s. geomean v.s. harmonic mean – which one to use? All these are Pythagorean means*, every one providing information about the central tendency in the data but with very different optimal applications.

Arithmetic mean is simply calculated as:

The question that this method answers is “what unique values should all values be changed to in order to achieve the same total?”.

General situations when mean average is the most appropriate:

When is inappropriate to use arithmetic mean:

As in general case, in computational biology arithmetic mean is appropriate for data that has additive relationships between values such as temperature measurement for example. Interestingly for Principal Component Analysis (PCA) arithmetic mean is assumed in the method derivation and thus if data is not normally distributed prior to PCA data has to be log transformed to make geomean, which is more appropriate for this data, equal to arithmetic mean in the set.

Geometric mean is calculated as:

N-the root of the product of values. The question it tries to answer is “what unique values should all values be changed to in order to achieve the same product?”

Geomean is the most appropriate when:

When geomean is inappropriate:

Geomean is preferred whenever quantities, values are separated by a multiplier of some sort, when data in not linearly related. Pharmacokinetics, PK analysis follows a skewed distribution in trials and thus geomean is the most appropriate centra tendency analyzer. In omics, when requiring gene average, as in averaging house keeping genes expression for transcriptomics, geomean is a good choice. Similarly, in averaging metabolites across samples geomean is the most appropriate.

Interestingly log of geomean is arithmetic mean of log values:

And just because all is clearer with some more equations – the relationship between two methods is further:

So, because of this when needed arithmetic mean on non-linearly dependent data first do log transform and then arithmetic mean becomes equivalent to geomean. Fascinating!

Harmonic mean is calculated as:

Harmonic mean is appropriate for data that consists of a set of rates and it tries to answer a question of “what number can all values be changed to in order for rate of change to be the same?”. lt is defined as the reciprocal of the arithmetic mean of the reciprocals of the values. The advantages of harmonic mean are: it provides the highest weight to the smallest item in the group; it can determine averages for negative and non-negative values but disadvantages include its great sensitivity to extreme values and complete inability to deal with zero values.

Harmonic mean is optimal when averaging variables that are best described as ratios such as rate of growth for example. In another example published comparison of methods has shown that this is the best method for the determination of a mixture activity in drug IC50 analysis (doi: 10.1021/co100065a).

Interestingly arithmetic mean is always the biggest and harmonic mean the smallest of the three means. In other words:

These three means are identical only when all values are identical and thus, in a real world application it is very important to choose correctly.

*These approaches do come from the Pythagoras time where geometric mean and arithmetic mean can be linked to Pythagoras theorem that we all know and love and there are many sites showing how. Legend has it that Pythagoras observed that harmonious ringing of blacksmith’s hammers is heard when masses of hammers are in certain proportions and these proportions it turned out can be given in whole numbers, called Pythagorean tuning. This tuning relationship is defined through harmonic mean frequencies and this observation later resulted in our current musical scale and musical cords.

Want to leave a comment?

All fields are required. Your comment will be posted in 24 hours.

Valid.
Please fill out this field.
Valid.
Please fill out this field.
Valid.
Please fill out this field.
Valid.
Please fill out this field.
Valid.
Please fill out this field.