McCray, Gareth and Titman, Andrew and Ghaneh, Paula and Lancaster, Gillian
(2017)
*Internal pilot sample size re-estimation in paired comparative diagnostic accuracy trials with a binary response.*
Trials, 18 (Suppl.): 200.
ISSN 1745-6215

## Abstract

The sample size required to power a trial to a nominal level in a paired comparative diagnostic accuracy trial, i.e. Trials in which the diagnostic accuracy of two testing procedures are compared relative to a gold standard, depends on the correlation between the two diagnostic tests being compared. The lower the correlation between the tests the higher the sample size required, the higher the correlation between the tests the lower the sample size required. A priori, we usually do not know the correlation between the two tests and thus cannot determine the exact sample size. Furthermore, the correlation between two tests is a quantity for which 1) it is difficult to make an accurate intuitive estimate and, 2) it is unlikely estimates exist in the literature, particularly if one of the tests is new, as is very likely to be the case. One option, suggested in the literature, is to use the implied sample size for the maximal negative correlation between the two tests, thus, giving the largest possible sample size. However, this overly conservative technique is highly likely to be wasteful of resources and unnecessarily burdensome on trial participants - as the trial is likely to be overpowered and recruit many more participants than needed. A more accurate estimate of the sample size can be determined at a planned interim analysis point where the sample size is re-estimated - thereby incorporating an internal pilot study into the trial design, with the intention of producing an accurate estimate of the correlation between the tests into the trial. Methods This paper discusses a sample size estimation and re-estimation method based on the maximum likelihood estimates, under an implied multinomial model, of the observed values of correlation between the two tests and, if required, prevalence, at a planned interim. The method is illustrated by comparing the accuracy of two procedures for the detection of pancreatic cancer, one procedure using the standard battery of tests, and the other using the standard battery with the addition of a PET/CT scan all relative to the gold standard of a cell biopsy. Simulation of the proposed method are also conducted to determine robustness in various conditions. Results The results show that the type I error rate of the overall experiment is stable using our suggested method and that the type II error rate is close to or above nominal. Furthermore, the instances in which the type II error rate is above nominal are in the situations where the lowest sample size is required, meaning a lower impact on the actual number of participants recruited. Conclusion We recommend a paired comparative diagnostic accuracy trial which used an internal pilot study to re-estimate the sample size at the interim. This design would use a maximum likelihood estimate, under a multinomial model, of the correlation between the two tests being compared for diagnostic accuracy, in order to more effectively estimate the number of participants required to power the trial to at least the nominal level.