Published on July 2nd, 2019 | by Ava Barker

How to Secure your Big Data with Access Controls in the Cloud?

Businesses always look after something that harnesses their potential into an essential requirement. This can be fulfilled by the trends of big data because of its endless potential. The use cases for big data are unlimited as they range from targeting customers to fraud analytics and anomaly detection and much more. This data can be quickly generated from varied sources like user’s browser, search history, credit card payments and also from mobile pinging of the nearest network tower.

Prior to this, any unauthorized or accidental disclosure can have severe consequences for your enterprise if your sensitive information gets captured. This can lead loss to financial terms in more intangible ways as the loss of brand recognition and users trust. In past years, a few highly scalable and complex processing frameworks have emerged for big data like Hadoop, Hive, Spark, and Presto. To keep these frameworks secured is very challenging because of their distributed nature which involves many touch points, operational processes, and services. By accelerating cloud adoption, it becomes even more complicated to monitor access and data flows for big data.

Why do Access Control matters?

Most of the organizations try to secure their data by means of encryption and perimeter control without any comprehensive and granular data-access control strategy. This kind of strategy becomes crucial as multiple employees with different levels of authority, responsibility and can run different jobs on the platform. For instance, the marketing team needs to access your financial data in order to detect financial projections.

Coarse-grained permission that allows users ‘all or nothing’ access can be no longer sufficient when you need to deal with hundreds or thousands of employees who need to access the data for variant uses. Instead of that, you can set up a scalable, fine-grained and consistent control capabilities to prevent unnecessary access to your sensitive information at every individual stage processing.

Further, you can also start growing public concern and global compliance requirements by implementing a solution all the more urgent. In this article, we will see how the data access explores the challenges and nuances of administering granular data access for cloud-based big data.

Leveraging Granular Access Controls

In addition to authentication, implementing an adequate access control strategy can be very effective when it comes to reducing the attack surface for the most common types of data attacks like phishing and social engineering attacks. According to the Cloud Security Alliance, there is a comprehensive granular access control strategy which includes the respective elements:

Normalize mutable elements and denormalize immutable elements
Tracking secrecy requirements by ensuring proper implementation
Maintain access labels
Track all the administrator data
Use SSO (Single Sign-On)
Proper labeling scheme to maintain data federation

Generally, the big data companies support all of the above multiple listed features which are applicable to our platform. Howsoever, the options and configurations will vary according to the cloud service model and service provider specific capabilities.

Basically, try to select the brands which strive to deliver a data-access control solution that is more like machine and cloud-agnostic which can access controls at a minimum levels like from ingesting data to data access – infrastructure, platform, and the last data levels.

What happens at the Infrastructure level?

Access controls can be placed to limit data access to cloud infrastructure by using resources and services. For example, the enterprise system administrators in AWS can leverage IAM roles in order to restrict your AWS resources like S3 storage and EC2 instances. Enterprises can create dual IAM roles for added granularity where one role acts as the cross-account IAM with access AWS accounts and another role is restricted access to the data in AWS S3 buckets.

In Microsoft Azure, the system administrators can restrict your cloud access by using resources like Azure Active Directory and IAM roles under Azure RBAC. System administrators can easily configure an application under Active Directory to assign it either to the contributor IAM role or create a custom IAM role for further limiting access to computer resources.

Similar to this, in the Oracle Cloud infrastructure, the system administrators can limit access to storage and computer resources by using users and policies under Oracle CLoud Infrastructure IAM which can utilize broad policies to allow access for all resources in a compartment or define more restrictive policies to limit few specific resources.

What is inside the Platform Level?

At the big data platform, the system administrators can leverage built-in Role-based Access Control (RBAC) capabilities to restrict the user access for specific platform artifacts like clusters, notebooks, and dashboards. The users can either use any predefined roles or create the custom roles which provides granular permissions based on business functions for granting users to various degrees of access and permissible actions on the platform.

Hence, the RBAC can help to achieve privacy by improving operational efficiency and reducing the administrative burden of HR policy management. With the help of EBAC, you can enjoy the added benefits of better financial governance by reducing costs as administrators allocate more computer resources to high-value users without disrupting others.

The Data Level

With the increased public concern and legal stiff penalties, the data level access controls are crucial for enterprises as they process personally identifiable information like protected health data or any other sensitive data. The big data companies usually provide granular access controls through Hive authorization to implement it across use cases and engines like Speak, Hive, and Presto.

You can also extend access control capabilities to row filtering and column-level access control by data masking. The ultimate goal of it is to offer a sufficient level of granular controls in order to achieve the best data access governance to customers.

What can we expect in the future?

As the data privacy and protection escalate our concerns, the big data companies are trying to equip enterprises with a multi-layer access control solution that is more data-engine and cloud-agnostic. Hence, the data access control plays a vital role when it comes to cloud infrastructure. Keep Learning!

Tags: big data, cloud, cloud data, data, privacy

About the Author

Ava Barker working as a Technology Consultant at Tatvasoft UK which is a big data company in UK. Coming from a technology background she likes to share her insights about development, design and more. She has also published her author bylines on many different publications online.