The last decade has seen unprecedented growth in the availability and size of digital health data, including electronic health records, genetics, and wearable sensors. These rich data sources present opportunities to develop and apply machine learning methods to enable precision medicine. The aim of this workshop is to engender discussion between machine learning and clinical researchers about how statistical learning can enhance both the science and the practice of medicine. 

Of particular interest to this year’s workshop is a phrase recently coined by the British Medical Journal, "Big Health Data", where the focus is on modeling and improving health outcomes across large numbers of patients with diverse genetic, phenotypic, and environmental characteristics. The majority of clinical informatics research has focused on narrow populations representing, for example, patients from a single institution or sharing a common disease, and on modeling clinical factors, such as lab test results and treatments. Big hea considers large and diverse cohorts, often reaching over 100 million patients in size, as well as environmental factors that are known to impact health outcomes, including socioeconomic status, health care delivery and utilization, and pollution. Big Health Data problems pose a variety of challenges for standard statistical learning, many of them nontraditional. Including a patient’s race and income in statistical analysis, for example, evokes concerns about patient privacy. Novel approaches to differential privacy may help alleviate such concerns. Other examples include modeling biased measurements and non-random missingness and causal inference in the presence of latent confounders.

In this workshop we will bring together clinicians, health data experts, and machine learning researchers working on healthcare solutions. The goal is to have a discussion to understand clinical needs and the technical challenges resulting from those needs including the development of interpretable techniques which can adapt to noisy, dynamic environments and the handling of biases inherent in the data due to being generated during routine care.