Julio Souza1,2, João Vasco Santos1,2, João Viana1,2, Fernando Lopes1,2, Alberto Freitas1,2
1MEDCIDS – Department of Community Medicine, Informatics and Decision in Health, 2CINTESIS – Center for Health Technology and Services Research
Aim: APR-DRG (All Patient Refined Diagnosis Related Groups) grouping algorithm rely on routinely collected hospital data. Portugal’s national DRG database defines an unlimited number of secondary diagnosis and procedures per episode of care. Thus, our aim is to discuss desirable meta-features of hospitalization data, namely the optimal number of secondary diagnosis and procedure fields that should be provided in hospital data sets in order to consistently assign episodes of care to APR-DRG.
Methods: We used hospitalization data (2014-2016) from Portugal’s national DRG database and applied supervised machine learning techniques in order to build a classifier for a set of APR-DRGs. Secondary diagnosis and procedure fields will be systematically and randomly removed according to different cut-off scenarios and the data will be re-submitted for grouping. After each iteration, the classification results will be compared with those obtained from the original data set and a minimal number of secondary diagnosis and procedure fields for APR-DRG assignment will be estimated.
Expected Results and Discussion: We selected 56 APR-DRGs used by Agency for Healthcare Research and Quality’s inpatient quality indicators for risk adjustment. We expect to provide some insight on the existence of an optimal number of secondary diagnosis and procedure fields for the proper APR-DRG grouping and the importance of such features to reflect adequately the clinical complexity of the hospitalizations in Portugal’s national DRG database and for ensuring the quality and comparability of data for health services research, quality of care assessment and hospital billing processes.
keywords: supervised machine learning, agency for healthcare research and quality, APR-DRG, hospital data, data quality