Statistical Methods for Combining Diagnostic Tests and Performance Evaluation Metrics
Author: Chengning Zhang (Ph.D.)
Publisher:
Published: 2022
Total Pages: 0
ISBN-13:
DOWNLOAD EBOOKIn biomedical studies, it is usually the case that several diagnostic tests can be performedon an individual or multiple disease markers are available simultaneously, and that many of them may be associated with the clinical outcome. In practice, a single test or marker often has limited diagnostic performance. Therefore, it is important to combine multiple sources of information available to achieve higher classification performance. This dissertation focuses on statistical methods for combining multiple diagnostic tests and the corresponding performance evaluation metrics. In the first project, we provide a survey of the current state of the art in methods for combining multiple tests. We categorize existing methods into three general groups and conduct extensive simulation studies to compare the performance of different combination methods. The reviewed methods serve as benchmark for developing new combination approaches in the following projects. In the second project, we consider the problem of combining multiple tests whose values are missing at random (MAR). In addition, we aim to exploit the known monotonicity relationship between the input variables and the disease outcome for gains in diagnostic accuracy. We develop a novel likelihood-based approach to monotone classification that accounts for missing inputs in a natural and principled way. The risk score function is obtained through the nonparametric maximum likelihood estimation (NPMLE). A novel expectation-maximization (EM)-type algorithm is devised to compute the NPMLE by treating the monotonicity-constrained risk score function as a cumulative distribution for a latent random vector. Through simulation studies and a real data example, we demonstrate that the proposed method outperforms state-of-the-art methods for combining multiple inputs under monotonic assumption, especially when the inputs contain missing data. We illustrate our approach with a dataset from a recent nonalcoholic fatty liver disease (NALFD) study. In the third project, our approach established in the second part is extended to the scenario where one covariate is randomly censored. The proposed approach consists of two steps. In step one, we use a Cox proportional hazards model for the distribution of the censored covariate given other covariates in the model, this conditional distribution is used for calculating the observed likelihood of data. In step two, a similar expectation maximization (EM)-type algorithm is devised, based on observed data likelihood from step one, to compute the NPMLE of the monotonicity-constrained risk score function. Through simulation studies, we demonstrate that the proposed method outperforms the simple but inefficient complete-case analysis as well as the substitution methods. We apply our method to the data set from a primary biliary cirrhosis (PBC) study conducted at Mayo Clinic. The proposed methods in part two and three can be extended to multi-class cases, where the labels have an inherent order but no meaningful numeric distance between them. A natural question arises as to how to evaluate the classification performance under such setting. Therefore, in the fourth project, we consider the problem of performance evaluation metrics for ordinal classification. We propose three novel performance evaluation metrics that better capture the ordinality of the outcomes. The first metric is adapted from the area under the receiver operating characteristic (ROC) curve (AUC), while the latter two are simple and interpretable generalizations of the Harrell's concordance index (C-INDEX). Moreover, we show the optimality of the AUC based metrics through Neyman-Pearson lemma. We conduct extensive simulation studies to confirm the usefulness of the proposed performance metrics for ordinal classification.