Big data technologies provide significant value for business today, but one of the limitations of the corresponding projects is information security risks. At the same time, there is still no generally accepted single protection concept.
Insufficient attention is paid to the security of large data systems, but when implementing large data projects, security issues should be taken into account from the very beginning, otherwise enterprises will receive additional business risks instead of business opportunities.
Big data technologies have now acquired significant value for business. Over the past few years, companies have been racing to launch projects and learn new methods to identify the most valuable information from the data sets available to them. Increasing sales, reducing costs, reducing risks and improving operational efficiency are just a few of the successes achieved by processing big data to solve business problems.
Technologies of big data processing are used in a variety of industries: telecommunications, financial, retail, healthcare, information technology and many others. At the same time, analysts call information security risks one of the most significant limitations of big data projects.
Security for big data projects is not only a question of information availability. The data, which serves as a source for analysis, as a rule, contains sensitive information for business: trade secret, personal data. Violation of confidentiality of work with such data can result in serious problems, including fines from regulators, customer churn, loss of market capitalization.
Another major challenge for large data projects is ensuring the integrity of both the data being analyzed and the results obtained from its processing, which are of commercial value.
Current approaches to securing large data technologies are usually based on the use of disparate measures in the absence of a single protection concept. Today, there are no clearly defined methods describing the systematized steps and actions to protect big data, structured and unstructured, which are characterized by their technological features of collection, aggregation, storage and analysis. Critical data protection approaches are required at all stages of processing, from collection and transfer to analysis and storage.
The section “Security and Privacy” deals with various aspects of information security, gives examples of projects in various industries with a list of their shortcomings, classifies the main areas of protection, describes roles and operations. The Security and Privacy Fabric environment is responsible for security and privacy issues, covering all major components of the architecture.
Interface for interaction between data providers and application providers. One of the features of large data systems is the import and use of a variety of data from different internal and external sources, so all incoming data in real time must be checked for integrity and absence of malicious signs.
Interface of interaction between application provider and data consumers. Consumers in large data systems are end users or other systems that search, analyze, visualize and other operations based on this data. All interfaces of consumers’ access to information should be protected and ensure confidentiality in accordance with legal provisions, including access to sensitive data by authorities.
Interface of interaction between application provider and platform for work with big data. Big data platforms usually have a complex multi-layered structure and often involve different technological approaches to data storage and processing. It is very important to implement access control when interacting with the big data platform to guarantee access to data in accordance with the rules of access differentiation. Data can be stored and retrieved using encryption.
Data protection in the internal interaction of different technologies and large data platforms. The Big Data Platform typically consists of an infrastructure platform, a platform for storing structured and unstructured data, and a data processing platform.
Therefore, ensuring the protection of the big data platform is a very time-consuming process: it is necessary to ensure the security of processing in distributed software systems, protection of information in databases by means of various DBMS; data and transaction logs should be protected; for access control and key tracking, key management should be provided.
In addition, to ensure the proper security context and functioning of data at each stage, it is important to guarantee the legitimacy of data origin, and to ensure their availability, countermeasures against DoS attacks should be provided.
Protection of large data system management tools. Means of large data system management provide ample opportunities for implementation of security mechanisms, which allow monitoring in real time the state of components, management of access differentiation rules, identification of data sources and others. However, additional measures are required to protect the very means of management of such a system – they are of particular value for violators.
- Security of infrastructure
The use of technologies and platforms to ensure the performance, scalability and availability of databases. Realization of high availability of resources. Protection of platforms for interaction between developers and information technology services (DevOps).
- Confidentiality of data
Analysis of social data impact on security and confidentiality of large data projects. Protection of data regardless of where it is stored or used. Ensuring the confidentiality and manageability of large data (inventory and classification of data, use of data masking technologies, formation of management policies and rules of access to data).
- Data management
Protecting data warehouses (access control lists, protecting application programming interfaces, protecting database access mechanisms). Key management and implementation of data life cycle transparency.
- Integrity and response procedures
Big data analytics to detect malicious activity and to understand the state of big data processing systems. Detection of security events and response to detected threats. Detection, analysis and investigation of incidents. Security of analytical results.
The basis for implementing projects to protect large data systems should be the Data-Centric Security approach, which provides a comprehensive solution to the relevant issues. Modern methods of implementing business processes already go beyond the infrastructure boundaries of a company or organization: the use of mobile devices within the BYOD approach, cloud and hybrid services, as well as the transfer of corporate data to contractors and customers blur the boundaries of the enterprise.
When implementing projects to protect large data processing systems, enterprises often face a shortage of specialized solutions. However, projects in the field of large data analysis are always complex, and the stack of technologies used, defined by the goals, objectives and budget of the project, is very variable. This means that one should not hope for quick elaboration of issues related to protection system design as well as the set of measures necessary to ensure an acceptable level of security.
Not enough attention is paid to the issues of security of systems of work with large data – the vast majority of projects are designed and implemented without regard to information security, which sooner or later will lead to a significant increase in the time and cost of implementation of security systems, and sometimes more sad for the business consequences. When implementing large data projects, security issues should be taken into account from the very beginning; otherwise, projects may turn from business opportunities into new business risks.