Seeing triple (a short guide to experiment reproducibility)

Like the image? Click HERE for more.

With the reproducibility crisis in science showing no signs of abating, it’s never been more important to clearly communicate how rigorously your data were obtained. Here’s TIR’s short guide to technical replicates, biological replicates, independent experiments, and what they do and don’t tell you. 

When you’re doing experiments and, later on, preparing to publish them, it’s essential that you are clear – both to yourself and your readers – on how reproducible those conclusions are. However, there are many sources of variation in data, and demonstrating reproducibility requires time and care.

1. Technical replicates
Technical replicates are when you set up multiple identical experiments and run them in parallel, or take multiple readings from a single experiment. Good examples are enzyme assays or growth curves. At a chosen timepoint, you take measurements from each of those replicates, and calculate the mean value. The more technical replicates you have, the more accurate your estimation of the mean value will become. In other words, technical replicates control for the experimental variability for a given biological sample on a given day. However, no matter how many technical replicates you have – whether it’s 5 or 500 – your overall sample size is still 1 (n=1). All those measurements are averaged into a single mean value, and that’s your result from that one experiment. The number of technical replicates you use will usually be determined by how well-behaved the setup is and by simple convenience; calculating cumulative mean* values across technical replicates and seeing how quickly they converge on a mean is a good way of getting a sense of how much variation there is in your data.

2. Biological replicates
Of course, what technical replicates can’t control for is how representative your sample is. There will always be biological variation between different samples, whether they’re mice, cell lines, or preparations of purified protein, and it’s likely that this will be the greatest source of variation in your data. You can do as many technical replicates as you like, but if your sample is abnormal then your mean value is not going to be representative of the population as a whole. Biological replicates address this. By taking measurements from different biological samples (different mice, different cell lines, different protein preparations), you get a sense of how reproducible your data are against the background of intrinsic biological variation.

3. Independent experiments
However, what biological replicates don’t necessarily control for is human error. You can set up multiple technical replicates for multiple biological samples, but if you forget to add the enzyme/drug, or mix up your tubes, or do any of the hundreds upon hundreds of things that can spoil an experiment without your realising, then your data won’t be right. No amount of technical or biological replicates can protect you from human error on the day. Human error is an unavoidable feature of lab work, and the best way to control for it is to do independent experiments. Only by carrying out multiple independent experiments on different days can you be confident that an effect, or lack of it, is likely to be genuine.

Blurring the lines
Ideally, you should aim to quantify multiple technical replicates per experiment (exact number estimated through calculation of cumulative means), from multiple biological replicates, and on different days (independent experiments) in order to be 100% sure of your data. However, there’s a trade-off between rigour and rate of progress. Obviously you want your data to be as ironclad as possible, but at the same time it’s worth asking how much time to invest in obtaining what might be a trivial conclusion.

In some circumstances, blurring the lines between biological replicates and independent experiments is an acceptable shortcut – i.e. assaying your biological replicates on separate days in order to kill two birds with one stone. Whether you take this shortcut or not (and remember, it is a shortcut, and a questionable one), it’s still good practice to define what you mean by “technical replicate”, “biological replicate” and “independent experiment” in your Materials & Methods section so that your readers are absolutely clear on what you did. If the data obtained in this way are highly variable though, there’s no way of telling if it’s human error or biological variability of both – so be prepared to backtrack and do things rigorously if that’s the case.

In cell biology
In whole organism (e.g. mouse) work, the difference between technical replicates and biological replicates is pretty obvious – technical replicates are readings derived from a single mouse, biological replicates are readings derived from different mice. In cell biology it can be a bit harder to define things. For instance, if you’re doing an immunofluorescence experiment, what constitutes a technical replicate? Preparing identical coverslips in parallel, or are the hundreds/thousands of cells on a single coverslip all technical replicates, given that they’ve been exposed to the same labelling conditions but may respond slightly differently? If you’re dealing with a transgenic cell line, then biological replicates are different clones – but those clones could have been obtained from only one or several separate transfections (that tradeoff between rigour and speed again).

As per the preceding section, the best thing is to state explicitly in your Materials & Methods section (in the statistics part) what definitions you are using. And don’t forget to provide detail on biological replicates and independent experiments in your figure legends!

Counting the ways
How many independent experiments should you do? As many as possible really, but three is an achievable minimum for cell biology work. The number will generally be determined along empirical lines, with certain assays/techniques being more amenable to higher sample sizes than others.

Statistical significance (an update for 2019!)
Statistical significance tests are outside TIR’s area of expertise, but luckily there is a truly excellent technical perspective published by MBoC in May 2019 that is explicitly targeted at cell biologists – you can find it (free download) HERE. Fig2 is a flow chart designed to help you find the right statistical test for the experiment you’ve performed.

Two key observations/recommendations from it: (1) proportions and percentages are categorical responses and therefore not numerical data, despite appearances. As such, not all significance tests – and in particular, not the t-test – are applicable; (2) standard deviation, and not standard error of the mean, should be used for error bars in charts.

As always, corrections and clarifications from readers are very welcome…


* Cumulative mean: calculating the cumulative mean is very simple. You take your first measurement, A. Then you take your second measurement, B. The mean value for your experiment is now (A+B)/2. With your third measurement (C) the mean value becomes (A+B+C)/3. And so on. These iterative calculations of the mean are the cumulative mean.

By plotting the cumulative mean values (A, (A+B)/2, (A+B+C)/3 etc) versus total number of observations (1, 2, 3) on a graph, you will see how your mean value for the experiment gradually converges towards a particular value. This value is your mean from the experiment.

A good rule of thumb is that your sample size for the experiment in question – i.e. the number of technical replicates – should be double the total number of observations required for the mean value to converge on a set point. Under circumstances of high experimental reproducibility, that means that your total number of technical replicates will be relatively low; however if there’s more variation in your setup then the mean will take longer to converge and you should be prepared to take a higher number of measurements. If no convergence is readily seen, this suggests that the level of variability is very high – possibly indicative of low-quality data.

15 thoughts on “Seeing triple (a short guide to experiment reproducibility)

  1. Thanks for this article. We all should be doing more to limit irreproducibility. I would like to include a short comment about your guide in Biofisica-Magazine (news media by the Spanish Biophysics Society) ¿Do you mind that? ¿Can I reuse the figure of your post as the head of the comment in Biofisica-Magazine?

    Liked by 1 person

  2. This is super helpful…I was wandering, what happens in (bio)chemistry experiments? For instance, I work with enzymes produced by ourselves and we know that they are not very reproducible…every batch produced presents a high variability in activity because it’s very sensitive to many variables (purification, lyophilisation, storage, time etc…). So we usually produce a lot of the enzyme and use that single batch for all of our experiments….but sometimes it’s finished before we have done with the experiments, requiring more enzyme to be produced. In that case, when I’m measuring activity of the enzyme, can I say that my n = 3 if I do 3 “independent experiments” from the same enzyme batch (meaning that I do each experiment in a different day and, in that case, the enzyme is considered as another chemical reactant of the experiment – like pyruvate, etc)? Thank you.


    1. Thanks, delighted that you found it helpful! And your question is a very good one, and very relevant. I was having a similar discussion not long ago concerning a single-particle cryo-EM study where the question was whether the biological replicates were the batches (purifications) or individual molecules used for averaging.

      The point about doing replicates – whether technical or biological – is to minimise error and account for variability. In most cases, biological replicates will be the source of the largest variability. Given that you have already identified that the different purifications/batches of your enzyme appear to the source of highest variability in your assays, that means that the separate purifications are the biological replicates.

      What you propose doing is totally correct. If you do 3 separate experiments on different days, then your measured activity derives from 3 independent experiments and your n=3. That measured activity relates only to a single purification however, and so the question is: how representative is that purification of the enzyme’s real activity? To get a better sense of that, you will probably be better off obtaining an average from multiple purifications, where each purification is a separate biological replicate.

      I realise of course that it may not be possible to take such an approach. If that is the case, I would recommend being as transparent as possible in your description of the data and in your Materials & Methods. Note that you did 3 independent experiments using only a single batch of enzyme, but that batch variability does occur and so the measured values should be treated with a degree of caution as they may not be truly representative. If possible though, I would definitely recommend calculating activity using multiple batches – in that way you have best accounted for the variability in your setup. Sound reasonable?


      1. Dear Broke. Thanks for your prompt reply. Indeed I’d love to be able to do a triplicate of the desired enzyme, starting from the production of the enzyme itself (ex. using 3 different culture plates) and also keeping the 3 batches separated during purification. However the limiting issue here is time (it takes about 2/3 weeks to get some 500-600 mg of the lyophilised enzyme when we are lucky) and money (lots of expensive resources goes into enzyme production). So I think my best compromise will be to use the single batch and perform the triplicated ind. exp. in different days whenever possible. Maybe if I need more enzyme then I’ll have to produce more and could then compare the values of both batches, which is a good think for my statistics but also means loads of extra work.


  3. Sure, that’s what I meant with “it may not be possible…”. It’s not always possible to have multiple biological replicates – in cell biology, when using a wild-type strain, it’s generally not possible to compare results with other independent wild isolates, for instance. In your case, just be as clear as possible in your reporting in terms of batches, independent experiments etc and then it will be clear to readers that you are being transparent and have identified possible sources of variability in your data. Good luck! B.


    1. Dear Brooke

      I forgot to ask something, let’s say I have 3 batches of the same enzyme, produced in the same way but they were produced in different months. If I calculate their activity, would I still be able to consider each as a independent/biological replicate (n=3) or the time factor does not allow that?

      Looking forward for your opinion.



      1. Hi, sorry for the slow reply. I think that different batches can be considered separate biological replicates, even given the time difference. The question is whether you think the activity has been affected by the different storage times. As always, clarity about what you’ve done is the most important thing.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s