Enhancing Error Detection in Clinical Laboratories Using Machine Learning

Enhancing Error Detection in Clinical Laboratories Using Machine Learning: A Multicenter Retrospective Study

Introduction

Laboratory tests’ precision and reliability are paramount for accurate diagnostics. However, errors, particularly sample misidentification, pose significant risks to patient safety. Traditional methods like delta checks, used to detect such errors, often need more sensitivity and specificity. A recent study titled “Machine Learning-Based Sample Misidentification Error Detection in Clinical Laboratory Tests: A Retrospective Multicenter Study,” published in Clinical Chemistry introduces and evaluates the effectiveness of machine learning (ML) models in improving the detection of sample misidentification errors across multiple clinical laboratories.

Study Findings

This retrospective multicenter study developed and validated several ML models—Deep Neural Networks (DNN), Extreme Gradient Boosting (XGB), Random Forest (RF), and Logistic Regression (LR)—to detect misidentification errors in tumor marker tests. The study involved large datasets from four hospitals: Asan Medical Center (AMC) for model training and internal validation, and Haeundae Paik Hospital, Pusan National University Hospital, and Ilsan Paik Hospital for external validation. The datasets included tumor markers such as alpha-fetoprotein (AFP), cancer antigen 19-9 (CA19-9), carcinoembryonic antigen (CEA), and prostate-specific antigen (PSA).

comparing the AUROC values for different models

Bar chart comparing the AUROC values for different models (DNN, XGB, RF, LR, DPC, absDPC) across various tumor markers (AFP, CA19-9, CEA, PSA).

 

Key Observations:

  • DNN and XGB models generally outperform the conventional delta check methods (DPC and absDPC) and other machine learning models (RF, LR).
  • LR has the lowest AUROC values, indicating poorer performance in detecting sample misidentification errors.
  • The chart shows that machine learning models like DNN and XGB have a significant advantage over traditional methods in terms of accuracy.

The study found that the DNN and XGB models outperformed traditional delta check methods. Specifically, DNN and XGB achieved an area under the receiver operating characteristic curve (AUROC) between 0.834 and 0.903, compared to 0.705 to 0.816 for conventional methods. External validation across the three additional hospitals demonstrated that the ML models’ balanced accuracy (BAC) ranged from 0.760 to 0.836, surpassing the 0.670 to 0.773 range observed with conventional models. These results indicate that ML models, particularly DNN and XGB, are more effective in identifying sample misidentification errors and could be crucial in enhancing the reliability of laboratory test results.

Performance of Machine Learning Models (AUROC 0.834 to 0.903):

  • The study found that the Deep Neural Networks (DNN) and Extreme Gradient Boosting (XGB) models achieved an area under the receiver operating characteristic curve (AUROC) between 0.834 and 0.903. This means that these models were highly effective in distinguishing between correctly identified samples and misidentified ones, outperforming traditional methods, which had an AUROC between 0.705 and 0.816.

Balanced Accuracy (BAC) of 0.760 to 0.836 in External Validation:

  • When the machine learning models were tested across three hospitals, they maintained a balanced accuracy (BAC) ranging from 0.760 to 0.836. This indicates that the models were consistently reliable in different clinical settings, successfully identifying errors while minimizing false positives and false negatives.

Improved Sensitivity Over Traditional Methods:

  • The machine learning models demonstrated higher sensitivity compared to traditional delta check methods. This means the ML models were better at detecting sample misidentification errors, reducing the likelihood of missed errors that could lead to incorrect patient diagnoses and treatments.

Complex Decision Boundaries:

  • The DNN and XGB models developed non-linear, complex decision boundaries that were more effective at classifying errors than the linear boundaries used by traditional methods. This ability to handle complex, real-world data patterns enhances the accuracy and reliability of error detection.

Potential Increase in Manual Verification:

  • Despite the higher sensitivity, the study notes that ML models may increase manual verification due to a decrease in specificity. This trade-off suggests that while more errors are detected, the number of cases flagged for review could increase, requiring additional human resources to confirm.

Discussion

The study’s discussion highlights the superior performance of ML models in detecting sample misidentification errors compared to conventional delta checks. For example, the DNN model maintained over 90% of its internal validation performance when externally validated across different hospitals. This demonstrates the robustness and generalizability of ML-based approaches in real-world clinical settings. However, the study also notes that ML models, despite their higher sensitivity, may increase the workload due to a decrease in specificity, potentially leading to more manual verifications.

Another significant point is the complexity of decision boundaries developed by ML models. The DNN and XGB models exhibited non-linear, complex decision boundaries that were more effective at classifying errors than the linear boundaries of traditional methods. This capability allows ML models to better handle real-world data’s variability and complexity, which is crucial for accurate error detection in diverse laboratory environments.

The study also acknowledges limitations, including the reliance on a simplified input model that only considers current and previous test results without accounting for other demographic variables or environmental factors. While beneficial for standardization across different laboratories, this simplification may introduce biases and reduce the model’s applicability in more complex scenarios. Additionally, the study used in silico simulations to generate misidentification errors, which may not fully capture the complexities of actual clinical settings.

Conclusion

This retrospective multicenter study demonstrates the potential of ML-based models, particularly DNN and XGB, to significantly improve the detection of sample misidentification errors in clinical laboratories. The study provides compelling evidence that ML models can outperform traditional delta checks in both sensitivity and accuracy, making them valuable tools for enhancing laboratory test reliability. However, the study also highlights significant limitations, such as the increased need for manual verification and the potential biases introduced by simplified input models. While ML-based approaches show great promise, further prospective studies and optimizations are needed to realize their potential in clinical practice fully.

Reference:

Seok, H. S., Yu, S., Shin, K. H., Lee, W., Chun, S., & Kim, S. (2024). Machine Learning-Based Sample Misidentification Error Detection in Clinical Laboratory Tests: A Retrospective Multicenter Study. Clinical Chemistry. DOI: 10.1093/clinchem/hvae114