Skip to content

A collaboration between graduate students and faculty at Harrisburg University of Science and Technology may yield long-term benefits for clinical trial planning and positively impact patient experiences and outcomes.

HU doctoral students Shao-Wen Lai, Bowen Long, and Jiawen Wu, along with their instructor, Dr. Bellur Srikar, Assistant Professor of Data Analytics, recently published a research paper called “Predicting Phase 1 Lymphoma Clinical Trial Durations Using Machine Learning: An In-depth Analysis and Broad Application Insights.”

The paper, which appears in the journal “Clinics and Practice,” reveals a predictive model that showcased robust performance in predicting clinical trial durations for both lymphoma and lung cancer research. Scientists have observed rising rates of lymphoma diagnoses in the United States and beyond, making it one of the most frequently diagnosed types of cancer in the world today. Lung cancer is the second-most-common type of cancer for both men and women, according to the American Cancer Society.

According to “Predicting Phase 1 Lymphoma Clinical Trial Durations,” the case for more accurate clinical trial durations is clear. With more accurate expectations established from the beginning, stakeholders in clinical trials are better able to plan for the distribution of resources, anticipate possible shortcomings, improve patient safety and participation, and engage in a more productive dialogue with regulatory authorities. Most importantly, improved research planning has the potential to result in better patient outcomes.

The team developed a machine learning-based predictive model that shows promise in comparing likely Phase 1 trial durations to expected norms. The model – called the Random Forest model – demonstrated a similarly impressive confidence level for lung cancer, which highlights the method’s versatility.

Previously, researchers invested the bulk of their predictive capabilities towards predicting sample size – that is, on finding the likely number of participants required to achieve the desired level of statistical relevance. Only recently have greater efforts focused on predicting how long a particular trial might take. As this paper notes, several variables can influence this prediction and cause unforeseen delays, including operational setbacks, commercial barriers, and strategic challenges.

According to the authors, close to 85 percent of clinical trials experience some type of setback, which highlights the need for a better predictive tool. Their research sought methods for comparing likely trial durations with expected averages, and it provides insights into the challenges associated with predicting clinical trial durations, which have been underrepresented in the current conversation.

Key contributions of the study include extensive modeling using eight machine learning models, insights into data volume requirements, in-depth model probability analysis, and arguments for the broad applicability demonstrated by the Random Forest model. Limitations of the study are acknowledged as well, including the exclusive use of data, potential biases, and the need for further validation across different cancer types. Ultimately, however, the conclusions are positive. The paper highlights the Random Forest model’s contribution to efficient trial resource allocation, cost savings, and potentially improved success rates.

The entire research paper on using machine learning to predict clinical trial durations can be accessed here. Harrisburg University offers two relevant multidisciplinary, PhD-track degree programs connected to this recently published research: Doctor of Computational Sciences and Doctor of Data Sciences.