Welch’s t-test for comparing two groups: An Excel calculator (1.0)

In this post, I wish to promote my latest Excel calculator, Welch’s t-test for comparing two groups (version 1.0). It was released 22nd October 2018.

Preliminary note

In Version 1.0, there are a couple of minor bugs that all relate to the calculator’s ancillary output. I am currently correcting these for a Version 1.2 release. This pending version will also have the additional functions:

  • Users will be able to perform analyses based on either the raw scores for two groups, or alternatively, just the aggregate measures (M, SD, and n)
  • Users who opt for raw score input will be able to perform additional normality checks
  • Users will be able to measure effect sizes as they relate to Student’s t-test, as well as to Welch’s t-test

When Version 1.2 becomes available, I shall update this page and provide a link to a Version 1.2 notes page.

Who is it for?

I designed this with one of my students in mind, who needed to correct his analyses for the fact that sample size was unbalanced between two comparative groups. However, this calculator is for anyone who wants to compare aggregate measures between any two groups.

What do you need?


Some prior knowledge of what a t-test is, and how to interpret it, is recommended. You also need to know that when comparing differences between two groups using a t-test, there is an optimal variant of the test called Welch’s procedure. Welch’s t-test corrects for measurement bias caused by the two groups’ having different sample sizes and sample variances, whereas your classic Student’s t-test makes no such attempt to correct this bias.


All you need is a set of average and standard deviation scores for two groups. You also need a number describing how many independent scores were made by each group:


In the above demonstration, imagine you are comparing the average height (in cms) of students in two tutorial classes. Group 1 has 13 students, whose average height is 164.2 cm, with a standard deviation of 14.1. Group 2 is larger (n = 25), and consists of taller students on average (M = 173.9), albeit with heights that are less variable (SD = 11.3).


By default, the calculator assumes a number of things that you are free to challenge with your own specs:

  1. Mean difference constant is zero. You wish to compare the group mean difference to a baseline of zero
  2. Criterion of significance is five percent. You believe outcomes that have less than a 5 in 100 chance of being caused at random are probably not false positives
  3. Two tails are better than one. You want to run a two-tailed comparison of the groups’ scores
  4. Control event rate is 20 percent. You guess that around 20 in 100 of the individual scores from the control group are more like scores from the experimental group. This parameter might matter if your experiment involved the administration of a certain treatment, and you want to know what effect that treatment is having relative to a control group. If that or something similar is the case, then my calculator will produce you an effect size known as number needed to treat. Otherwise, you can probably ignore this assumption.

Here is a screen capture of what these parameters look like in my calculator, bearing in mind that you are free to swap in your own custom values:


Note that the calculator does not assume that your two groups are the same size, nor does it care about their variances being the same size. Such is the beauty of the Welch procedure – it corrects for these discrepancies ‘just in case‘, whether or not they are large enough to be meaningful.

Look, if you are still paranoid about your groups’ variances being unequal, my calculator has an inequality of variances test, built-in and free of charge. Go nuts!

What do you get?

With your data entered and the parameters set to your liking, my Excel calculator will calculate automatically a range of useful outputs.

In a nutshell, you get your test outcomes, t(df) and p, both with and without the Welch correction. Extending on the demonstration above, here’s what the Welch t-test output looks like:


And here’s what the output looks like without Welch’s magic touch:


Do you see the difference?

With Welch’s correction, the average height of the second tute’s students is statistically higher at the 0.05 significance level, t‘(20.23) = -2.15, p‘ = 0.044, 95% CIdiff [-19.12, -0.28]. Note that without the Welch correction, the t value is biased, which in turn produces an apparently ‘less likely null’ effect (p = .027).

What else? Plus, a warning…

As a bonus, you also get mean difference confidence intervals, plus a range of effect sizes – Cohen’s d and his somewhat misfit cousins, including Cohen’s U3, which you can read about here. In Version 1.0 of my calculator, bonus effect sizes corresponding to the Welch-corrected test are produced only; in a later version, I may chuck them in for the uncorrected test too.


Continuing the running demonstration of the two tutorial groups with different average heights, we see in my calculator’s grey box that variances are not statistically different, F (12, 24) = 1.56, p = 0.172.

The observant among you will notice a bug that needs fixing; Cohen’s d value is correct, but please ignore the negative sign! So we have an apparently large effect (d = 0.78), with only about one fifth of the heights sampled overlapping between tutorial groups (Cohen’s U3 = 21.52%). The Superiority statistic shows the likelihood that a student picked at random from Group 1 (being shorter students on average) will be taller than someone randomly selected from Group 2 (the taller group).

At present, there are a few known bugs in the ancillary output, which I will fix in time. For now, just know that these will invert (and so will their interpretation) if Group 1 has a larger mean than Group 2. For example, if Group 1 was the taller group, then Cohen’s U3 would be 100% – 21.52% = 78.48%. There’s also a fault in the n Needed to Treat formula that I used, so please ignore that particular output in Version 1.0 of my calculator.

How do you use it?

Simple usage instructions are provided in the Excel worksheet. If those are too confusing, feel free to contact me.

If you are preparing a manuscript for publication, please cite this calculator in the body of your text as Gaetano (2018). Please also add the following reference to your reference list or bibliography:


Gaetano, J. (2018). Welch’s t-test for comparing two independent groups: An Excel calculator (1.0). [Microsoft Excel workbook]. doi:10.13140/RG.2.2.24158.28484

Please click here to access the calculator hosting page, then click the Download button.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s