Analysing Health Data: The Promises, the Threats and the EU

by Dimitrios Gontzes

An article on the use of Artificial Intelligence on patients’ health records has recently been circulated around the office. “That’s great news,” I thought, “at last someone is tapping into this vast dataset for the good of the people”. Not a minute had gone by and I received an office email as a reply to the article: “I think this is a very bad thing and destroys trust”. I gasped…

Before I knew it, I was debating with myself: on one hand, I thought, analysing health data can be very rewarding for patients and health organisations but on the other, there are threats with sharing health data, some hidden and some more obvious. To better understand my opinion on the subject, I began researching the benefits and the risks of using health data mining. My findings shocked me.

The Benefits of Data Analysis in Healthcare

Firstly, I wanted to understand how people defined data mining. Data mining is the process of data selection and exploration and building models using vast data stores to uncover previously unknown patterns. In healthcare, vast amounts of data are generated quite rapidly. Because of its sheer volume, it is often too difficult to analyse the data by traditional means. As a result, data mining has become a popular and necessary method of extracting useful information. This information can shine a light in various field including treatment effectiveness, management of health delivery and customer experience.

Undeniably the most important question that we have as patients is “Is this treatment effective?” By using data on causes, symptoms and delivery of different treatments, it is possible to compare and analyse options in order to show what is most effective. This analysis can be supplemented by data on financial costs and classify as well as predict the cost-effectiveness of a specific treatment strategy. To this aim, decision trees are a very popular tool for treatment management as they guide clinicians through complex strategies and visualise alternative pathways in great detail.

Management of health delivery is another important area for data analysis. Health data is collected throughout the life span of a medical condition, from prevention, to diagnosis, treatment and chronic management. Healthcare organisations benefit from putting these pieces together to gain insight into the patient journey. For example, organisations can monitor hospital admissions, re-admissions and resource utilisation and compare these data with scientific literature to identify best practices, and avoid frequent pitfalls. Moreover, data mining can have a great impact in reducing health inequalities. By studying specific population groups and analysing the way they interact and access health care, services can be designed to be more impactful with targeted interventions. Taking this further, health care services can build predictive models that would identify potential health risks before actual symptoms appear and design cost-effective and value driven strategies to combat or avoid the risk altogether.  By shifting medical conditions to primary care, specialists’ costs would be avoided.

Pattern recognition also plays an important role in improving customer service. By collecting and studying practice patterns and outcomes based on patients’ experiences, carers would be able to quickly determine what works, what doesn’t and what needs to be improved.

The Threats: the Known and the Unknown

Now that I better understood the potential of mining health data, I concentrated on identifying the possible risks. As I was researching health data security, I noticed that articles were vague on the consequences of exposed health data. Undeniably, health data (as all data) can be stolen but what is the damage of a health data security breach? What can criminals or organisations in general do with our health data?

There seems to be two different kinds of health data breaches. One is an individual’s attack and access of private information and the second is a systemic leakage or loss of health data. 

The first kind of breach, commonly referred to as “hacking”, is the situation where classified data is retrieved by an unauthorised “agent” and subsequently published or sold. Identity theft is the most dangerous threat here, especially in countries where Healthcare is not freely available. By stealing patient data that have been meticulously collected by healthcare organisations, thieves can pretend that they are you, receive health care services they need and send you the bill. According to IBM Security, the Dark Web is full of illegal websites where you can purchase such data. There is also a threat for ransomware attacks. Recently, two US hospitals announced that malicious software locked users and patients out of access to important data. Later on, the attackers demanded a payoff to restore the system. Another area of concern is Health Insurance and the practice of “health condition discrimination.” If sensitive health information gets leaked to insurance companies, individuals may be asked to pay higher premiums or even be denied of coverage.

The idea that individual attacks to health data are made only by highly trained and tech-savvy hackers is wrong. Whoever can access sensitive information and further on, violate the individual’s privacy rights could be considered as a hacker. Employers, family members, friends and partners usually have access to one’s private information. Sharing or somehow using these data can lead to major issues like health condition discrimination, personal annoyance, shame and sycophancy. If such sensitive data gets shared to a wider audience (via journalism or social media) the consequences can be truly devastating for the victim damaging one’s reputation and social status.

The second category of threats is the systemic leakage of health data. By the term “systemic” I mean a constant breach of health data privacy that in some cases happens knowingly by the victims/patients. As I mentioned in the beginning of the article, the health data is voluminous and as a result it might be “owned” by a number of entities (including: primary health care, local and specialised hospitals, health insurer, life insurer, clinical laboratory, pharmacy, employers, national statistics, medical researchers, health apps etc.). These entities might use one’s health data in a way that they do not serve the patient’s interests. For example, it is not unusual for some organisations to sell or transfer purchasing data to other companies for commercial purposes. A pharmacist might have a partnership with a pharmaceutical company to help them promote products.  While such commercial manoeuvres are not life threatening, they still use very personal and often life-depending information. For example, with the continuous stream of leaked health data, it is easy for an identify thief as well as marketing organisations to create a patient profile that is profoundly accurate. Beyond the annoyance of targeted marketing, this extremely personal information could be used for nefarious reasons and, like the two US hospitals, be used in cyber extortion.  

EU – General Data Protection Regulation

Up to this point, I felt very confused. I personally want to share my health data and get all the benefits from it but on the other hand I don’t want to put myself into jeopardy by having my data misinterpreted or misused.   What I really want is to have more control over who uses my data and give permissions to those trusted organisations that make me feel safe that my information is securely stored.

The European Union has made a great leap forward by introducing the General Data Protection Regulation (GDPR). Firstly, it explicitly states that we all have rights when it comes to personal data. This includes access to the data, the ability and means to have inaccuracies corrected, the option to have information erased, and to be protected from direct marketing. The GDPR makes it a law for organisations to have procedures that ensure that these rights are covered. Organisations are responsible of documenting what personal data they hold, where it came from and to whom they are allowed to share it with. Individuals can always request to see this log and organisation must respond to their requests within strict timescales.

From a data analysis perspective, the GDPR dictates that organisations confirm a legal basis for processing and collection personal data, explicitly explaining their intent. This must be communicated to the patients/clients in a clear and easy-to-understand language, so gone are the days of the long legal text in tiny fonts. GDPR also states that data controllers must demonstrate at any time that consent was given and it cannot be inferred from silence or pre-ticked boxes. Instead, consent has to be a “positive indication of agreement to personal data being processed”.

The GDPR also tightens the regulations regarding data security by inserting mandatory investigation procedures and reports with possible penalty fines of up to 4% of annual worldwide turnover for non-compliance. It has also introduced a Data Protection Impact Assessment that urges organisations to review their risk management processes and data security policies. Finally, it requires organisations to designate a Data Protection Officers that will be responsible for data protection compliance and data governance.


There are great benefits from using sophisticated analyses on health data. This has the potential to help both patients lead a healthier life and health organisations to better manage their services, but there are also threats through cyber-attacks and by leaked data. As organisations realise the power of health data analytics and cyber criminals capitalise on their attacks, heath data security breaches will become more frequent and destructive. The GDPR is moving in the right direction by strengthening individual rights and tightening data governance rules but in the end, the choice of whether to share your health data is yours. Personally, I would not share my health data unless organisations convince me that the benefits of doing so will outweigh the risks.



Dimitrios Gontzes is a Consultant at Optimity Advisors in London.  His academic background in Engineering (MEng and MSc) and work experience with a number of NHS organisations across the UK has provided him with a comprehensive knowledge in Life Sciences and Data Systems. Dimitrios specialises in Information Management Solutions and enjoys using sophisticated data analytics to deliver value with a competitive edge.