What is a confidence interval?

Written by Dr. Eva Arnold Posted in Method validation

In every drug manufacturing process, controls are mandatory. Thinking about the patient, the goal should be to implement processes that are 100% error free, but that is practically impossible. For example, it is not feasible to unpack, check, and analyse every tablet from a batch to check that no mistake has been overlooked.

For this reason, a sample is collected randomly from the batch and examined for any unacceptable parameters. Then, based on the results, an estimate is made that relates to the entire batch (statistically called “population”). This is where the factor “confidence interval” comes into play.

Thus, values determined during in-process controls (IPCs) are compared with previously defined target values. Supposing production adheres to these values very precisely, the measured values will scatter normally distributed very closely around this target value. Simply said, considering a case where a tablet is expected to contain 400 mg of active ingredient. The tablets produced would seldom match the expected active ingredient content but would rather contain something between 390 mg and 410 mg. As the producer strives to meet the 400 mg mark, the chances are high that the content for most tablets would vary between 395 mg and 405 mg. If it would be feasible to plot all the tablets of the batch and their respective active ingredient weight, a classical normal distribution curve would be obtained. From this, the "real" mean value, the so-called expectation value µ, can be determined. As a quality feature, in this case, the “real mean” value µ was defined to be between 395 mg and 405 mg as mentioned in the specification.

As mentioned earlier, it is non-viable to examine every single tablet and hence the expectation value µ will remain unknown to us. Since it is anyways desired to know whether the mean of the current batch of tablets fits within the range of the target value, sampling is done with e.g. 100 tablets and then analysed. There is a high probability that the mean of the sample x would differ from the “true” mean µ. In other words, it is vastly unlikely that the expected mean value µ will exactly match the mean value x of the sample. In addition, the mean value of the sample of this batch will differ slightly from the mean value of the next batch. So, it is fair to say that the “real” mean value µ can't be determined without destroying the entire batch and that indeed would be a genuine problem.

To address this issue, a “range” rather than a single value is chosen within which the “true” value µ is presumed to fit in. With a “range” we consider that the measured values differ from different samples / batches and with “presumed” we express a certain probability. This range is called “confidence interval”. The larger this range is, the likelihood of it containing the “true” value µ becomes higher. Now, it is obvious to think about considering the widest possible confidence interval so that the “real" mean value µ is always included. But, with increasing the confidence interval, the precision of the estimate decreases. Thus, we would like to have a small confidence interval. This raises another important question i.e., in a randomly drawn sample, there is the possibility of the presence of outliers, located at the outer edge of the normal distribution of the active ingredients’ quantity. The chance of these outliers containing the “true" mean µ inside its confidence interval is slim. Although the probability of having such outliers in the sample is low, it cannot be neglected. To include this probability, a so-called “confidence level” is defined. It provides information on the probability with which the confidence interval actually contains the “true” mean value µ. A confidence level of 95% essentially means that 95% of the confidence interval samples contain the "true" mean µ. 

But how to define the confidence interval? The width of the confidence interval depends on the standard deviation (σ) of the sample and the selected confidence level. The choice of the confidence level can be made individually. In the pharmaceutical industry, a confidence level of 95% is usually applied. The selected confidence level is then used to determine the critical z-value using a normal distribution table. For a confidence level of 95%, this results in a z-value of 1.96. Using this z-value, the standard deviation of the sample (σ) and the sample size (n), the confidence interval (CI) for the true value µ can be determined according to the formula:

CI for µ = x ± z * σ / √(n)

Let's say a total of 100 tablets were analyzed and a mean x of 398.8 mg and a standard deviation σ of 12.4 mg were obtained. The confidence limits for our confidence interval may be calculated as follows:

µ = x ± 1.96 * σ / √(n) = 398.8 mg ± 1.96 * 12.4 mg / √(100) = 398.8 mg ± 2.43 mg

That is, the "true" mean µ is within the confidence interval [396.4; 401.2] and would thus be within our specification (between 395 and 405 mg).

We can use the CONFIDENCE.NORM function in Excel to avoid having to perform old-fashioned calculations using a normal distribution table. In addition to the values for the standard deviation σ of our sample and the sample size n, “α” must be entered. α is nothing more than the probability of error, i.e. the value that remains after the confidence level has been subtracted from 1. Accordingly, for a 95% confidence level, α is equal to 0.05 (1 - 0.95 = 0.05). For small samples, it is not the normal distribution but the t-distribution that is the measure of all things, which is why the Excel function CONFIDENCE.T should be selected for small samples.

The higher the standard deviation σ in the sample (thus the statistical spread), the broader is the confidence interval. This also explains why a broad confidence interval is a sign of a poor estimate (à poorer precision). At the same time, the confidence interval can be tightened by increasing the sampling size n. With an increase in the sampling size, the scatter is reduced and so is the standard deviation because, in this case, the more tablets being analyzed, the closer it gets to the "true" normal distribution. For example, if only 10 tablets would be examined, confidence limits of 7.8 would be obtained for the same mean and standard deviation, thus resulting in a confidence interval [391.0; 406.6] being no longer compliant with the specification. Hence, the lot would not be allowed to be released. In addition, the choice of the confidence level also contributes to the width of the confidence interval: a confidence interval based on a 99% confidence level is wider than one based on a 95% confidence level, because you want to guarantee more certainty with 99% than with 95%.

To conclude, the larger the sampling size n chosen, the smaller the confidence interval and the more precise our estimate. In addition to the confidence level, the choice of sampling size is therefore crucial for the reliability of the control of the manufacturing process.

Finally, for method validation, the knowledge of the calculation of confidence intervals is necessary as it is recommended by the ICH Q2(R1) guideline for the parameter trueness as well as for all types of precision. If is useful or not and what other possible applications for confidence intervals exist in respect to method validation is detailed in this blog article.

But speaking of possible applications... Confidence intervals can be created not only around a single value, but also around a trend. This makes sense, for example, in stability studies to determine the shelf life of tablets before they are marketed. To do this, we use stability-indicating methods at specified times to track, for example, the content of the tablets of a certain number of batches (e.g. 3) and compare it to the specification. If we plot the content on the y-axis against the time in months on the x-axis, we will see that the content is likely to continue to decrease a little bit over time. Even though we used the mean values of the 3 batches we examined to form the data points, our examined progression is still only a small sample and if we would examine a larger sample (i.e., many more batches), we would conceivably see more steeply declining progressions. Therefore, in order to represent all possible progressions with a selected probability (à see above confidence level), we span a confidence interval around the decreasing regression line of our data points at a certain time point and thus show the true stability progression. The time at which the lower limit of the confidence interval intersects with the lower specification limit for the content is the maximum shelf life.

Subsequently, the confidence interval placed around the stability data submitted for market authorization could also be used as intervention limits for out-of-trend (OOT) results. This could be done in the context of trending for results obtained in the course of ongoing stability studies. Accordingly, all values that do not lie within the confidence interval would be considered OOT results and corresponding actions would have to be taken.