How Security can be a Catalyst for Big Data Analytics

The use cases for big data analytics are nearly endless. From improving how companies market to customers to stopping attempted financial fraud, from tailoring cancer treatment for better outcomes to reducing the number of traffic accidents.

Not only are the uses and benefits expanding, the environments where big data is collected and analysed are expanding too. In fact, big data is literally everywhere today – on-premises, in the cloud, streaming from sensors and devices, and moving across the internet. Increasingly, some of that data, including sensitive personal information, is ending up in the hands of cybercriminals who sell it to other bad actors for use in myriad malicious activities.

Enterprises need to find a balance between making data available for data science and compliance and risk reduction.

“So how can data security become a catalyst for analytics? Isn’t it just a hurdle?”

The real hurdles for data analysis are the myriad threats to sensitive data, both known and unknown, coupled with the potential for fines for non-compliance with data privacy laws such as the EU’s GDPR, the UK’s Data Protection Act, PCI DSS, Australia’s NBD Act, Canada’s PIPEDA; the list goes on and on. And it seems that inevitably every dataset you want to analyse contains some form of sensitive data that’s protected under one piece of legislation or another. This is where conflict arises between data security and data analysis.

The fact is that sensitive data has to be protected. Now, there are many different methods to do so, but some are more elegant than others.

Classic Data Protection

Most classic protection mechanisms, such as encryption, are what give data security its reputation as a hurdle for data analytics as encrypted data typically can’t be processed in its protected form. There is no referential integrity and it’s not format-preserving.

Furthermore, big data environments present a unique challenge as sensitive data has to be protected while in use for analytics, in motion between on-premises data stores and the cloud, and while it’s at rest. Classic security solutions fail to keep the data protected throughout every stage of this complex lifecycle. With no guarantee that the data is protected, it would seem that the only choice would be to bar access to the data entirely.

Fortunately, there are other, more modern choices.

Data-centric security solutions, such as tokenisation, are what can be the catalyst for big data analytics by eliminating the risks and disadvantages of typical security solutions.

With tokenised data, it is possible to pseudonymise datasets and then run analytics on them while they’re still in a protected state. Instead of focusing on protecting the perimeter, network, endpoints or applications, data-centric security prioritises datasets to protect the data itself. This ensures that the data is protected throughout the entire data lifecycle, going wherever the data goes to provide strong protection without affecting usability.

It also protects the individual data elements wherever possible. That means if a dataset contains a mix of sensitive personal information along with other data that is not sensitive or regulated, a data-centric security strategy protects the data at the individual element level.

In a nutshell, the right form of data security, namely data-centric security, is what can give analysts access to the valuable data sets they seek while allowing risk and security managers to rest assured that the data will be protected throughout its lifecycle.

Published by comforte AG