When machine-learning algorithms are applied to enormous datasets, such as those accessible in Huntington’s disease, latent patterns typically undetectable by clinical observation can be discovered. Using probabilistic machine learning approaches, researchers sought to create and test a Huntington’s disease development model. Longitudinal data from four observational studies (PREDICT-HD, REGISTRY, TRACK-HD, and Enroll-HD) was combined with machine-learning approaches (Bayesian latent-variable analysis and continuous-time hidden Markov models) to create a probabilistic model of disease development. The model was tested against current clinical reference evaluations (Unified Huntington’s Disease Rating Scale [UHDRS] diagnostic confidence level, total functional capacity, and total motor scores) and the CAG-age product using a different Enroll-HD dataset.

About 9 illness states were found based on 44 motor, cognitive, and functional measures corresponding to the reference evaluations. About 3,158 people (mean age 48.4 years) were included in the validation set, with 61.5% having an evident illness. According to an analysis of changeover timings, the “early-disease” stages 1 and 2, which occur before motor diagnosis, lasted around ~16 years, according to an analysis of changeover timings. Motor onset occurred in an increasing proportion of subjects during “transition” states 3 to 5, which lasted ten years, and “late-disease” states 6 to 9, which also lasted ten years. The yearly chance of transitioning from one of the nine disease states identified to the next varied from 5% to 27%. The natural history of Huntington’s disease may be divided into nine stages, each with different severity. Therefore, calculating illness state features and progression probability will enhance the trial design and participant selection.