Machine Learning and Inference Laboratory
Department of Health Administration and Policy
Janusz Wojtusiak, PhD
Research Interests and Selected Projects:
General areas of my research include development and analysis of machine learning and data/knowledge mining algorithms, as well as application of intelligent computational methods in healthcare. The research can be characterized as belonging to two main themes: (M) theoretical and algorithmic work on machine learning methods; and (A) applications in health informatics.
I am actively participating in several research projects related to these areas. Many of the projects can be referred to what has been recently called BIG DATA analytics, and some of them are outlined below. Many of the projects intersect, as some are more focused on (M)ethods and others on (A)pplications.
(M) Rule learning that includes development of machine learning algorithms for deriving accurate and transparent attributional rules from data and background knowledge. The novelty of the methods is in their "understanding" of concepts which are linked to domain ontologies. In principle, the method knows how attributes and their value are related. For example, it can derive from UMLS (a large medical ontology) knowledge that an ICD-9 code corresponds to a diagnosis or a procedure, and what is its relation to a HCPCS code representing a treatment within the data. Several other forms of incorporating semantic information into the machine learning process are also investigated. By doing so, the method can arrive at results that are potentially more accurate and more natural to domain experts, thus have higher chance of being accepted.
(M) Inteligent Synthetic Patient Data Generation that investigates novel approaches to learn machine learning-based models from data and use these models to generate synthetic data. Such a generated data needs to be accurate individually (every single patient's record is clinicaly reasonable) and statistically (the generated data matches statistically target population).
(M)A) Prediction of patients' functional status based on their medical histories is part of a larger project in the US Department of Veterans Affairs whose goal is to evaluate performance of the VA's Medical Foster Home program. The predictions are passed on diagnosis codes (ICD-9) as well as clinical notes and incorporate information about time dependencies between different events.
(M) Learning from aggregated data is a novel approach to machine learning in which individual data points are replaced with aggregated summaries which provided as input for learning. This type of data can be present when learning from published medical results, in which only statistical summaries of cohorts of patients are available, and in distributed analysis of massive datasets in which a single node cannot process all the data and the aggregated summaries/models can be shared among nodes. The methods are investigated for learning rules and other representations. Some investigated applications include prediction of liver complications in patients with metabolic syndrome.
(MA) Medical claim payments prediction is an important practical problem in revenue cycle management of hospitals and private practices. The work in collaboration with Jay Shiver (GMU) and Ron Ewald (Inova) aims at discovering patterns that describe claims for provided services which are partially or entirely denied. The project includes also methodological work on unsupervised labeling of data for supervised learning, and combining classification and regression learning.
(A) Treatment options selection and classification for prostate cancer patients is part of a larger project whose goal is to compare selection of treatment options among prostate cancer patients. In this work, machine learning methods are applied to predict mortality, as well as create homogenous groups of patients for whom disparities in treatment selection can be investigated.
(M) Learnable evolution model is a novel non-Darwinian evolutionary method that uses machine learning to guide evolutionary optimization process. Instead of randomly searching space of possible solutions, LEM hypothesizes why some candidate solutions perform better than others and used these hypotheses to create new solutions likely to perform better.
(MA) Non-Darwinian evolutionary optimization for engineering design investigates theoretical and practical aspects of applying the learnable evolution model to solve hard optimization problems in engineering. In a project supported by the National Institute for Standards and Technology, LEM is applied to optimize heat exchangers (ISHED system).
(MA) Autonomous learning and optimization in intelligent transportation logistics is a project at the University of Bremen, Germany, whose goal is to allow learning capabilities in distributed logistic systems. Focused on transportation logistics, the project resulted in more general algorithms that allow for learning in autonomous distributed environments.
(M) Inferential theory of learning is a theoretical framework created by R.S. Michalski that describes learning process as a set of operations (called transmutations). The theory has been recently extended to describe operation performed within the learnable evolution model.
(A) Prediction of possible claims resulting from reported medical errors investigates the possibility of using machine learning methods to predict which reported medical errors/near misses result in claims or lawsuits. It is part of a larger project led by Lorens Helmchen (GMU) in collaboration with Inova Health System whose goal is to improve reporting, and ultimately reduce error rates and improve patient care.
Most of the projects involve my current and past PhD and MS students: Che Ngufor, Talha Oz, Kat Irvin, Bo Yu, Chris Jose, and others. To learn about research interests of the Machine Learning and Inference Laboratory, please click here. The website includes more detailed descriptions of several of the mentioned projects. You can find additional information in my publications, or general MLI publications. If you have further questions, feel free to contact me.
I'd like to thank for funding support that made this research possible. Among the supporting agencies are: National Institute for Standards and Technology, Department of Veterans Affairs, Mason-Inova fund, GMU provost office, National Science Foundation, Robert Wood Johnson Foundation, Healthcare Risk Management and Patient Safety, Cochrane Collaboration group at GMU, German Science Foundation, and others.