Predicting Patients' Functional Status- GMU Machine Learning and Inference Laboratory

predicting patients' functional status from clinical notes and diagnoses

(Wojtusiak, Giang, Alemi, Oz)

Assessing functional status of residents in nursing homes and medical foster homes is a time consuming and costly process. It requires assessment by a registered nurse specifically trained in the assessments in consultation with other members of the interdisciplinary team.  The status is usually assessed using a standardized form called the Minimum Data Set (MDS). The MDS resident assessment is conducted quarterly, at admission, readmission, discharge, and with a significant change in condition. The MDS has nearly 400 data elements, including cognitive function, physical functioning, continence, preferences for routine and activity, psychosocial well-being, mood state, disease diagnoses, health conditions, nutritional status. This project concerns predicting patients’ functional status as measured by Barthel Index which consists of 10 data elements and can be considered a simplified version of the MDS.

 The investigated approach is to apply a set of machine learning methods to analyze patients’ history given by a set of diagnoses as well as clinical notes. The two approaches are then integrated to provide the final predictions. Past patients’ diagnoses are provided in a time-stamped structured database. Clinical notes are retrieved from 6 months prior to the time of assessment of the functional status.

 The specific methods used in the project include:

-       AQ21 rule learning for analyzing structured data

-       Guided Bayesian approach for analyzing clinical notes

-       Random forest-based selection of relevant diagnoses and parts of notes

-       Mapping of structured data and notes onto concepts within the Unified Medical Language System (UMLS)


The general architecture of the system is presented in the figure below. A set of independent classifiers predict scores for all elements of the Barthel Index. The scores are then aggregated to obtain the Barthel Score. At the same time a regression model is applied to directly predict the Barthel Score. Finally, the Bayesian Guided approach is applied to clinical notes to obtain the score independently from structured data essay-papers. All scores are averaged to obtain the resulting Barthel Score for a given patient.



The project is done in collaboration with and funded in part by the Department of Veterans’ Affairs.


For references, see publications section.

MLI Copyright © 2017 Machine Learning and Inference Laboratory
College of Health and Human Services, George Mason University
4400 University Dr, MSN 1J3, Fairfax, VA 22030, U.S.A