Benson Hsu, VP, Data and Analytics, Sanford Health
Healthcare analytics is in its infancy. Most organizations lack a cohesive data strategy and even fewer have effectively integrated analytics into the care delivery process, with most using analytics in one-off projects. For the majority of providers, the goal of analytics is a predictive algorithm based on fast moving clinical data that improves care delivery at the bedside. However, there is a new frontier past applying predictive algorithms on data captured within the four walls of healthcare.
To even start on this journey, it is vital to have established the foundation for effective data and analytics. Specifically, two core capabilities must be first addressed: Data Warehouse and Data Governance.
"Non-traditional data such as social media data and consumer purchasing behavior can be applied to clinical metrics and outcomes"
The silos of data must be broken down. Healthcare data often reside in multiple distinct silos, whether functional or historical. There are the functional silos where operational data such as supply chain information is distinct from patient data within the electronic medical records that is distinct from financial data and separated from patient satisfaction and quality data. There are historic silos as a result of mergers or acquisitions where similar data, such as clinical data within an electronic medical record, sit within different databases. A well-defined linking of the data is vital to the consistency and reproducibility of analytics. Without a cohesive data warehouse strategy, whether through systems such as Hadoop, HANA, or virtualization, there will never be a single source of truth on which to base analytics.
If the single source of truth from a cohesive data warehouse strategy is the most important building block of analytics, a common language for analytics is a close second. Without a common language and strict data governance, analytics becomes a jumbled set of terms and calculations, rarely consistent or actionable. If the average length of stay within a hospital is calculated based on claims data (one day is one day regardless of discharge in the morning or afternoon), an optimization project on improving hospital capacity by targeting early in the day discharge may have no effect on the measured length of stay.
Moreover, language or governance extends into benchmarking. Healthcare is still too complex and fragmented to have a common measure. Payers and provider often differ in the approach to measure healthcare value with some relying on operational efficiency, some on cost metrics, and all overlaid on clinical quality. However, quality is at times, even more complex. Is diabetes well treated if there are no inpatient hospitalization or when the Hemoglobin A1C less than 7? Does random blood glucose matter or does BMI matter more? Ultimately, if it is a combination, how are each variable weighed? Organizations need to define the metrics or goalposts for success otherwise analytics can never be used to focus the limited resources of the organization on improvement.
Only once the foundation of a standard data warehouse is formed and a common language, including data governance and benchmarking, is established, can the hard work of analytics come into play.
At this point, most organizations start down the road of advanced analytics, generally in the world of predictive algorithms. For instance, for those individuals discharged from the hospital after a cardiovascular admission (such as a heart attack), can we identify a score that will predict those at highest risk for readmission to the hospital?
This is a standard question and applicable to the work that we do in healthcare. But we can be better. Most predictive algorithms look internally to within the four walls of our healthcare system to evaluate for clues for readmission. But readmission, like most events in healthcare, often occurs due to events and knowledge external to our healthcare system. Impacts of socioeconomic status and behavior patterns drive healthcare utilization as much as, if not more, than clinical characteristics. But how do we measure socioeconomic status and behavior? How do we leverage data external to the healthcare system?
As a clinician, I know that I rarely capture socioeconomic or behavior data unless it directly impacts the clinical issue at hand. I will ask the parents about whether or not their child attended daycare to understand the exposure pattern of an infection but fail to ask if the family has faced recent financial difficulties resulting the child being under significant psychologic stress, thereby decreasing her ability to mount a robust immune response. I will ask about vaccinations but not ask about how often the family gets fast food due to the lower costs and high convenience since the parents both work two jobs.
However, these nuggets of knowledge are just as important as clinical data. Applying these non-clinical variables to our analytic process can vastly improve our predictive algorithms. Capturing these variables can occur in a traditional way by having clinicians enter these into the electronic medical record. Alternatively, de-identified data sets can be applied to gather a proxy for socioeconomic and behavioral data. Governmental data such as the census data can give insights to patient population served by a healthcare organization by examining patient outcomes and metrics based on zip codes. Other de-identified and aggregated sources of data such as through CMS and AHRQ can also be teaching data sets that align existing clinical data to new variables predictive of behavior.
Furthermore, non-traditional data such as social media data and consumer purchasing behavior can also be applied to clinical metrics and outcomes. Similar to retailers understanding your purchasing patterns to predict which other products may be of interest, healthcare organizations can utilize this data set to anticipate declining health. These data can be both identifiable, such as purchasing pattern tied to a specific household or aggregated, such as purchasing behavior tied to a zip code or county.
Back to the example of readmission after a cardiovascular discharge, if clinical data of hypertension and high cholesterol and be aligned with socioeconomic data suggestive of low income and purchasing behavior suggestive of increased fast food purchasing pattern–a predictive algorithm based on this largely diverse data set can be countless times more sensitive and specific in predicting a cardiovascular event as opposed to an algorithm based solely on clinical data.
The frontier of healthcare analytics no longer ends with the data at our disposal. Instead, by capturing or identifying proxies to socioeconomic and behavioral data external to our four walls of care, predictive algorithms can make the leap into targeted prediction of event thereby improving the way we can care for our population.