Data Structures

Evidation Health Project

The issue that Evidation Health brought to us with this project was a lack of standardization of patient-generated health data, (PGHD), which has significant impact on how effectively this form of data can be applied to formal research. As a company focused on analyzing PGHD in relation to personal health outcomes, Evidation Health great interest in developing standardized ways to collect, compile, and manipulate this data, especially between different sources. Thus, they tasked the team at Cal Poly with researching, developing, and documenting a software tool to help tackle this problem. From there, we set a goal to take data for a single domain from two different sources, aggregate the data together, and present a standardized dataset that showcases the specific data domain holistically. For this project we chose to focus on daily heart rate readings from Fitbit and Apple HealthKit devices. We were able to create a standardization method that consists primarily of using the Spark SQL module to manipulate data so that it conforms to a standardized Spark DataFrame. Ideally this will provide a basis for a more sophisticated, modular, and scalable standardization method which can be applied across multiple domains.

Coronavirus Update and Resources