Evaluating Score Based Disease Progression Using Item Response Theory

Item-Response Theory (IRT) has recently gain importance in evaluation of diseases like Alzheimer and Parkinson. It turns out to be an efficient way of evaluating the overall disease progression in these score based disease evaluation. This short review discusses the major aspects of IRT which is responsible in improving the standard of evaluation of a score based disease progression.


Introduction
A score-based evaluation of a disease progression has been in the practice for many diseases like Alzheimer, Parkinson, Pain score based diseases, Schizophrenia and many more. The scores, used to record the severity of the disease, are usually categorical in nature, which means that there are only finitely many choices to rank the severity of the disease. For example, Hoehn and Yahr staging in Unified Parkinson's Disease Rating Scale (UPDRS), can be on a scale of 0 to 5, where 0 is rated when almost negligible symptom is observed and 5 ranking as the extremely severe condition in Parkinson, with unit increment. One of the major drawback of these score based evaluation is that the measure of the severity only focuses on a specific symptom of the disease, rather than considering the overall disease progression. In this discussion, our major focus will be centered on Parkinson's disease (PD). PD progression is evaluated by a score based mechanisms using responses in various motor and non-motor symptoms, collectively called sub-scores, using an organized way of collecting the scores by UPDRS [1].
The standard measurement guideline contain six parts, viz.. Part I (evaluation of mentation, behavior and mood), Part II (self-evaluation of the activities of daily life (ADLs) including speech, swallowing, handwriting, dressing, hygiene, falling, salivating, turning in bed, walking, and cutting food), Part III (clinician-scored monitored motor evaluation), Part IV (complications of therapy), Part V (Hoehn and Yahr staging of severity of Parkinson's disease), Part VI (Schwab and England ADL scale). Each of these parts contain several sub-parts (also known as sub-scores), focusing on each section in detail. For example Part I consist of 4 sub-scores, 13 sub-scores under Part II, 27 sub-scores under Part III and 11 under Part IV. The score values recorded from these sub-parts ranges from 0 to 4 or 0 to 5 with either an increment of 0.5 or 1, where the higher numbers represent greater severity of the disease than lower ones. The scores are categorical in nature and hence have finitely many choices in each section. The score obtained from this evaluation, measures the status of individual symptom at a particular time.
For example, the motor responses in Part III, includes Hand Movement, Leg Agility, Rigidity etc. At any particular time, this scoring regimen, will provide, the status of the individual subscore, as recorded under UPDRS guideline. The problem arises, when the overall disease progression needs to be evaluated. One way, to overcome the problem, is to add all scores obtained from different sub-scores and use the sum to evaluate the overall disease progression. Suppose at any particular time, the hand movement score is 1 and the Leg Agility score is 2.5, the overall disease progression in these two categories will be 3.5. If the situation reverses, and in future, the Hand Movement turns out to be 2.5 and the Leg Agility measure is 1, the total sum, still remains the same. But does that mean the disease state did not change in that person? Or does that imply, the drug has no effect on the patient? This turns out to be a major issue for the clinicians to come up with a procedure which can be used to measure the overall disease progression in a score based disease progression.

Importance of IRT in evaluation of disease status
Item Response Theory (IRT), appears to evaluate the overall disease progression in a much efficient and concrete way. The idea of IRT was introduced by Educational Testing Service (ETS) [2], for ranking the student, not merely based on their total score obtained in the exam, but also evaluating their intelligence quotient (IQ) by comparing their performance in difficult question as compared to the easier ones. If they can perform better in harder questions, it is assumed that they have a higher IQ and their ranking is more as compared to others who perform better in easier questions. Thus percentile scoring system was introduced by the major examination committees like Graduate Record Examination (GRE). The same concept holds true in case of evaluating the overall severity of disease in a score based disease evaluation. If a person suffering PD, has higher score in a sub-score where the major class of people in the database, reflect score in the lower range (which corresponds to a difficult question in an examination), his disease status is more as compared to another who has a higher score in a sub-score where the majority has higher score (comparable to an easier question in the exam). The red one corresponds to an IRF of Hand-Movement score of 1 and the blue one corresponds to an IRF score of 2.5. The x-axis shows the Latent score or severity of the overall disease and the y-axis shows the probability of observing the particular score (score =1 or score = 2.5 in Hand Movement in this case).
Going back to our previous example, Hand Movement Score of 1 and Leg Agility of 2.5, at a particular time, will be evaluated against the data set and a percentile rank or weight will be assigned to each categories. If Hand-Movement is an "easier problem" as compared to Leg-Agility, then the subject will have higher score (termed as Latent score or disability) when Leg-Agility is higher as compared to when Hand Movement is higher. A probability function (also known as Item Response Function (IRF)) is generated, to evaluate the probability of a certain score based on the latent score obtained for the particular person. Figure 1 shows IRF to compare Hand-Movement score of 1 (in red) and 2.5 (in blue). It is observed that for lower latent scores or disability, a subject will have more probability of obtaining score of 1 in Hand Movement, whereas for higher latent score, it will have more probability of obtaining score of 2.5. Thus this function will have a more statistical outlook of interpreting the expected scores. Since these latent scores are continuous in nature, regression functions can be easily applied on these scores to extend and evaluate the probability of these different sub-scores for future time points. This is a very robust way of measuring disease progression in a score-based disease evaluation and can be successfully applied to track the disease progression and efficacy of drug action.

Conclusion
Since the method is completely based on data, the impact of database has a huge influence on the progression of the disease. This technique compares the present status of the subject with the state of the subjects included in the database and rank them accordingly. It helps to evaluate the severity of each symptoms, whether it is motor symptom or non-motor symptom, in case of a PD patient, and thereby determines the progression. For neuro diseases, this method turns out to be a much robust and elegant way of evaluating the overall disease progression.