# What is a confidence interval?

Written by Dr. Eva Arnold on . Posted in Method validation In every drug manufacturing process, controls are mandatory. Thinking about the patient, the goal should be to implement processes that are 100% error free, but that is practically impossible. For example, it is not feasible to unpack, check, and analyse every tablet from a batch to check that no mistake has been overlooked.

For this reason, a sample is collected randomly from the batch and examined for any unacceptable parameters. Then, based on the results, an estimate is made that relates to the entire batch. This is where the factor “confidence interval” comes into play.

Thus, values determined during the in-process controls are compared with previously defined target values. Supposing production adheres to these values very precisely, the real values in a normal distribution will be distributed very closely around this target value. Simply said, considering a case where a tablet is expected to contain 400 mg of active ingredient. The tablets produced would seldom match the expected active ingredient content but would rather contain something between 390 mg and 410 mg. As the producer strives to meet the 400 mg mark, the chances are high that the content for most tablets would vary between 395 mg and 405 mg. If it would be feasible to plot all the tablets of the batch and their respective active ingredient weight, a classical normal distribution curve would be obtained. From this, the "real" mean value, the so-called expectation value µ, can be determined. As a quality feature, in this case, the “real mean” value was defined to be between 395 mg and 405 mg as mentioned in the specification.

As mentioned earlier, it is non-viable to examine every single tablet and hence the expectation value will remain unknown to us. Since it is anyways desired to know whether the current batch of tablets fits within the range of the target value, sampling is done with e.g. 100 tablets, and then analysed. There is a high probability that the mean of the sample would differ from the “true” mean. In other words, it is vastly unlikely that the expected mean value will exactly match the mean value of the sample. So, it is fair to say that the “real” mean value cannot be determined without destroying the entire batch and that indeed would be a genuine problem.

To address this issue, a “range” rather than a single value is chosen within which the “true” value is presumed to fit in. This range is called “confidence interval”. The larger this range is, the likelihood of it containing the “true” value becomes higher. Now, it is obvious to think about considering a widest possible confidence interval so that the “real mean value is always included. But, with increasing the confidence interval, the precision of the estimate decreases. Thus, we would like to have a small confidence interval. This raises another important question i.e., in a randomly drawn sample, there is the possibility of the presence of outliers, located at the outer edge of the normal distribution of the active substance’ quantity. The chance of these outliers containing the “true mean”, even with a confidence interval, is slim. Although the probability of having such outliers in the sample is low, it cannot be neglected. To include this probability, a so-called “confidence level” is defined. A confidence level of 95% essentially means that 95% of the confidence interval samples contain the "true" mean. This allows 5% of the values to be outliers.

But how to define the confidence interval? The width of the confidence interval depends on the standard deviation (σ) of the sample and the selected confidence level. The choice of the confidence level can be made individually. In the pharmaceutical industry, a confidence level of 95% is usually applied. The selected confidence level is then used to determine the critical value Za/2. For 0.95 / 2 = 0.475 the corresponding value is looked up in a z-table (normal distribution table). For a confidence level of 95%, this results in a z-value of 1.96. Using this z-value, the standard deviation (σ) of the sample and the sample size (n), the confidence interval (CI) for the true value can be determined according to the formula:

CI for µ = x ± z * σ / √(n)

Let's say a total of 100 tablets were analyzed and a mean (x) of 398.8 mg and a standard deviation of 12.4 mg were obtained. The confidence interval may be calculated as follows:

µ = x ± 1.96 * σ / √(n) = 398.8 mg ± 1.96 * 12.4 / √(100) = 398.8 mg ± 2.43

That is, the "true" mean µ is within the confidence interval [396.4; 401.2] and would thus be within our specification (between 395 and 405 mg).

The higher the standard deviation in the sample (thus the statistical spread), the larger is the confidence interval. This also explains why a broad confidence interval is a sign of a poor estimate. At the same time, confidence interval can be reduced by increasing the sampling size. With an increase in the sampling size, the scatter is reduced and so is the standard deviation because, in this case, the more tablets being analyzed, the closer it gets to the "true" normal distribution. For example, if only 10 tablets would be examined, a confidence interval of 7.8 would be obtained for the same mean and standard deviation, thus resulting in a confidence interval [391.0; 406.6] being no longer compliant with the specification. Hence, the lot would not be allowed to be released.

To conclude, the larger the sampling size (n) chosen, the smaller the confidence interval and the more precise our estimate. The choice of sampling size is therefore crucial for the trustworthiness of the control of the manufacturing process whereas the sample size alone is less decisive. Let's take the tablet example: If 10 random samples of 10 tablets per batch are taken, estimate is less precise than if 1 sample of 100 tablets is taken, even though the same number of tablets, 100 pieces, are used.

Finally, for method validation, the knowledge of the calculation of confidence intervals is necessary as it is recommended by the ICH Q2(R1) guideline for the parameter trueness as well as for all types of precision. In another blog article, we address if it makes sense...