Healthcare is relying more on more on data, and the growth is poised to become exponential. Sequencing data is predicted to reach up to 40 Exabytes by 2025. An exabyte represents one billion gigabytes. The adoption of Electronic Health Records will also contribute to the massive growth of data.
According to Statista, by 2022, more than 1 billion wearables will in use worldwide, continuously monitoring heart rate, number of steps, blood pressure, and much more. Agrawal & Prabakaran recently published an article about the challenges and opportunities ahead when dealing with healthcare data.
Medical information come from various sources: health insurances, hospitals’ reports, diagnostic laboratories, medical imaging departments, physicians notes, fitness watches…etc. All of these pieces of clinical information are often unstructured, inconsistent and often incomplete.
Amazon aims at securely storing and bringing back order in this gigantic flow of unstructured raw health information. Thanks to Amazon Web Services, machine learning, and artificial intelligence.
What are data lakes?
Data Lakes are a way to store unstructured data, semi-structured or structured data. They are designed for low-cost storage and can hold everything. They are quickly adaptable to change and offer a cost-efficient way to store data from many different sources. Once the data are ingested into the lake, predictions using machine-learning can be made and the information redistributed to the appropriate channels.
In a nutshell, data lakes are like a water tank that will process and filter data and deliver them in a meaningful way to the relevant party.
Data lakes break down silos of data and improve access to the authorized parties only. Relying on large datasets, machine learning algorithms, and artificial intelligence can be trained on existing data and improve their predictions or inference. When managed effectively, data lakes are the perfect solution to leverage the power of big data in healthcare systems.
What is Amazon HealthLake?
Amazon HealthLake is more than a HIPAA secure data lake; it brings meaning to healthcare and life sciences. Health data come from numerous sources such as life science organizations, wearable, hospitals,…etc. Interoperability is becoming a critical issue that needs to be addressed. The new Fast Healthcare Interoperability Resources (FHIR) standard aims to do just that or at least to try.
The launch of Amazon HealthLake at the end of 2020 aims at transforming health data. Coming from various silos, the data are uploaded on the HIPAA eligible service and stored in the cloud. Relying on Machine Learning and Natural Language Processing (NLP), the data will be analyzed, normalized, and meaningful medical information extracted.
Organizations such as hospitals or laboratories will have the ability to easily query the lake in the cloud and find orders and potentially innovative relevant medical information for a patient, a clinical trial, or a scientific experiment.
In our opinion, Amazon HealthLake can change the way health data are stored, processed, analyzed, and understood. By breaking silos and storing data in the cloud safely, Amazon HealthLake provides a much-needed solution.
Some will argue that data lakes’ weaknesses lie in their potential lack of strong security. It is true but, Amazon HealthLake is HIPAA compliant, which is already an excellent first move. Technologies such as blockchain are more and more used in healthcare and maybe a way to ensure privacy.
How HealthLake can be used?
According to the Amazon HealthLake website, some of the use cases go from population health management and the identification of trends, improve the quality of care by predicting disease onset earlier, predicting the outcome of treatment according to a specific group of patients, and optimize hospital efficiency and even reduce the costs of healthcare.
Uploading data to Amazon HelthLake is free, and the storage costs start at $0.27 per hour, including 10 Gb of data and 3,500 queries per hour. Additional Gb of data is charged $0.25 per month. Further queries, the use of NLP and FHIR data export are charged additionally.
For a hospital storing 1Tb of data, proceeding to 13,500 queries per hour, and analyzing 5M characters with NLP, the total monthly cost will be in the range of $500. It may seem expensive but considering all of the patients’ benefits, the smooth operation of the hospital, and the security of the information, this new way to store and analyze medical information will most definitely look very appealing.
In a Nutshell
With the exponential growth in the quantity of health-related data to be stored and processed, Amazon HealthLake is undoubtedly the future of health data. By leveraging the power of machine learning, new unseen correlations can be found and better treatments provided.
In a world where personalized medicine is the future, data lakes will undoubtedly become a ubiquitous way to process information flow and provide better treatments and contribute to decreasing healthcare costs.
Finally, HealthLake could be a much-needed solution for startups developing health apps and looking for a way a secure way to store the data they collect following the mobile health laws.