Вы находитесь на странице: 1из 5

Risk Assessment for Big Data in Cloud: Security,

Privacy and Trust


Hazirah Bee bt Yusof Ali Lili Marziana bt Abdullah
Kulliyah of Information and Communication Technology Kulliyah of Information and Communication Technology
International Islamic University Malaysia International Islamic University Malaysia
+6019 2229179 +6016 2928314
hazirahbee@gmail.com lmarziana@iium.edu.my

Mira Kartiwi Azlin Nordin


Kulliyah of Information and Communication Technology Kulliyah of Information and Communication Technology
International Islamic University Malaysia International Islamic University Malaysia
+6016 2928314 +6016 2928314
mira@iium.edu.my azlinnordin@iium.edu.my

ABSTRACT
The alarming rate of big data usage in the cloud makes data today makes us wonder if the existing security measures can still
exposed easily. Cloud which consists of many servers linked to be applied. Big data is very different compared to traditional
each other is used for data storage. Having owned by third parties, databases where it is measured by volume, variety, velocity and
the security of the cloud needs to be looked at. Risks of storing value [1]. Big data’s architecture of “highly distributed, real time,
data in cloud need to be checked further on the severity level. ad-hoc queries, parallel and power programming language, move
There should be a way to access the risks. Thus, the objective of the code, nonrelational data, auto-tiering and variety of input data
this paper is to use SLR so that we can have extensive background sources” causes the increasing amount of data to be managed in
of literatures on risk assessment for big data in cloud computing the internet [1].
environment from the perspective of security, privacy and trust.
As big data needs to be stored in the cloud, the cloud computing
CCS Concepts distributes the computing tasks to other servers in the internet [2] .
• Software and its engineering ➝ Software organization and
Cloud users use resources, access servers and storage based on
properties ➝ Software system structures ➝ Distributed demand [2].Cloud computing provides fast, efficient and cheap
systems organizing principles ➝ Cloud computing computing power while maximizing user’s need of storage [2].
• Software and its engineering ➝ Software creation and Cloud as a storage reduces the cost of operations since the
management ➝ Software development process management companies do not have to buy and to manage servers themselves
➝ Risk management [2].

• Security and privacy ➝ Formal methods and theory of Besides being used as a storage, cloud usage increases the
security ➝ Trust frameworks companies’ performance. Cloud’s features of shared infrastructure,
shared platform and shared software enable companies to operate
Keywords efficiently regardless of storage location. Not only that, cloud also
Risk Assessment; Big Data; Cloud; Security; Privacy; Trust improves scalability, availability and techniques especially when
lost information need to be recovered [3].
1. INTRODUCTION Both big data and cloud computing together benefit companies
The increasing usage of big data in cloud computing environment and because of that, the usage of big data increases drastically in
the cloud. With the drastic increase, there are few emerging
security concerns. Big data is much more vulnerable compared to
Permission to make digital or hard copies of all or part of this work for traditional database [1] as it is now stored in servers owned by the
personal or classroom use is granted without fee provided that copies are cloud provider in the cloud. The speed of data creation, the
not made or distributed for profit or commercial advantage and that copies structured and unstructured data format and the various usage of
bear this notice and the full citation on the first page. Copyrights for
data make protecting big data in the cloud become intolerable [1].
components of this work owned by others than ACM must be honored.
The leakages of big data to irrelevant parties increase by the day
Abstracting with credit is permitted. To copy otherwise, or republish, to
and the probability of data leaks become higher than expected[4].
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org. Leakages of big data are mainly due to increased risks. Therefore,
AICCC '18, December 21–23, 2018, Tokyo, Japan it is important for us to understand what risk is. The classic
© 2018 Association for Computing Machinery. definition of risk associated to an incident is:
ACM ISBN 978-1-4503-6623-6/18/12...$15.00
DOI: https://doi.org/10.1145/3299819.3299841 RISK(I) = IMPACT(I) * PROBABILITY(I)

63
The formula is enough if we know exactly, the number of 2 Intervention (I) Risks assessment of security, privacy
occurrences to get the probability and the number of victims to get . and trust
the value for impact [4]. Unfortunately, this is not happening for
big data in the cloud. The formula is not suitable for big data in 3 Comparison (C) Nil
the cloud computing environment as the occurrence of security .
leakages can happen continuously [4]. Not only that, the victim 4 Outcomes (O) Effectiveness of risk assessment
can be a single victim and can also be a mass victim. Therefore, .
calculating risks accurately for big data in the cloud can turn out
to be a nightmare. 5 Context (C) Big data in cloud
.
Our investigation of the literatures started with systematic
literature mapping [5] and continued with systematic literature RQ1: What areas of risk assessment for big data in cloud
review to understand previous researches being done on risk computing that has been addressed?
assessment of big data in the cloud. Section 1 introduces the RQ2: What was developed to achieve advances in risk
research, Section 2 presents the methodology, Section 3 presents assessment for big data in cloud computing?
the results, Section 4 presents the findings and finally Section 5
concludes the research. RQ3: What approaches / types were applied in risk assessment
in big data and cloud computing?
2. METHODOLOGY
2.2.2 Search Strategy and Selection Criteria
2.1 Systematic Literature Mapping
Through mapping, we reduced the number of literatures from
We started our research with systematic literature mapping
1802 to 56. Since, conducting search in mapping is applicable to
(Mapping) to map researches that had been done so that we can
conducting search in SLR and the same search string is being used,
identify gaps for further research [6]. Mapping is done prior to
the same number of literatures is used for SLR. The search string
SLR where it helps to answer the research questions so that we
focus on population, intervention and outcome from the PICOC
can get a wider scope of the research and to continue with SLR
table.
[6].
Search string for SLR is as stated below.
We adopted Mapping process done by [7] and the steps taken are
define the research questions, conduct the search, screening of (“risk assessment” OR “risk analysis” OR “risk management” OR
papers, keywording using abstracts and finally data extraction and “risk treatment” OR “risk mitigation”) AND “big data” AND
mapping process. cloud
After going through all the processes used by [7], we found that Inclusion criteria for this research includes security, privacy and
few literatures discussed on risk assessment . Therefore, we trust in big data with the cloud together and risk assessment
decided to focus on risk assessment for the SLR. framework in big data and cloud.

2.2 Systematic Literature Review Exclusion criteria are those criteria to be excluded from the
Systematic Literature Review (SLR) helps to identify, to evaluate research. Both inclusion and exclusion criteria enable us to focus
and to interpret all literatures related to the research questions [8]. on risk, big data and cloud together with security, privacy and
Research questions that relate to the title of the research provide trust.
us with related literatures. The literatures being reviewed using
2.2.3 Classification Scheme
SLR enable us to see the gaps for future research [8]. Besides that,
SLR also provides us with extensive background of literatures so The classification scheme coincides with the review questions that
that we really understand the research we are focusing on [8]. focus on risk assessment for big data in cloud. As we are
interested on risk assessment for big data in cloud computing
[9] explained the differences of Mapping and SLR. Mapping environment, we classify the literatures into 3 categories.
summarizes previous researches, describes and classifies what has
been produced by the literatures while SLR critically examines • Areas of risk assessment for big data in cloud computing
the contributions of previous researches, explains the results and • Achievement of literatures in the focus area
finally clarifies different views of previous researches [9].
• Research approaches or types
SLR processes are similar to Mapping where we must define the
research questions, conduct the search, screening of papers, The first category is “areas of risk assessment for big data in cloud
keywording using abstracts and data extraction [8]. computing”. Areas include: Storage, Software/System
Development, Visualization, Application/Solution, Analytics,
2.2.1 Review Question Virtualization, Data, Network and Provider. This category will
Our questions for this research focus on risk assessment for big help to answer RQ1.
data in cloud. The review questions should be able to lead us on
The second category is the “achievement of the literatures in the
what to research. In order to do that, they should follow the five
focus area” that includes Processes, Methods, Models, Metrics
elements as listed below [10]: and Tools [11]. This category will answer RQ2.
Table 1: PICOC Table Processes describe flow of activities [12].
1 Population (P) Big data offerings on cloud Methods describe the blueprint of building models of situations.
. Methods include framework, modeling approach, modelling

64
guideline and best practice where they are guidelines to build Out of 56, only 12 studies focus on risk assessment. Almost half
models [9]. focus on Application/Solution while the least number of studies
focus on Provider.
Models originate from constructs and models to enable situations
to be generalized into patterns for application of similar domains
[9]. Conceptual models, abstractions and representations are in Bar Chart 2
this category [9] . 6 5
Metrics measure important variables [12] and finally, Tools 4
support software variability [12]. 4
2
The third category is research approaches or types. This will help
2 1
to answer RQ3.
In total, there are 3 categories and the objective of the categories 0
is to ensure that we can answer RQ1, RQ2 and RQ3 with ease. Risk Assessment for Big Data in Cloud

2.2.4 Data Extraction Application/Solution Data Network Provider


As our interest is on risk assessment for big data in cloud
computing environment, we classify the literatures into 3
Bar Chart 2 shows 12 literatures on risk assessment. Here the
categories: areas, achievement and research approaches as stated
literatures discuss on Application/Solution, Data, Network and
in the classification scheme.
Provider. 5 literatures focus on Application/Solution, 4 literatures
3. RESULTS focus on Data, 2 literatures focus on Network and 1 literature
focuses on Provider.
3.1 Answer to the SLR questions
RQ1: What areas of risk assessment for big data in cloud RQ2: What was developed to achieve advances in risk
computing that has been addressed? assessment for big data in cloud computing?

The literature discussing research on risk assessment for big data The studies’ contributions can be grouped according to Process,
in cloud computing can be classified following the asset Method, Model, Metric and Tool [12]. Process tells us about the
components of big data and cloud computing which cover storage, flow of activities or actions, Method describes the rules of how
software/system development, visualization, application/solution, certain jobs need to be done, Model resembles a replica of the real
analytics, virtualization, data, network and provider. situation, Metric measures any variable and finally, tool provides
software tool to support software variability [12].
Bar Chart 1 25
Bar Chart 3
20 28
30
20
20
13
15 8
10 6
1
9 9 10 0
8
SLR Literatures for Big Data in Cloud
Process Method Model Metric Tool
3 3 5
2 During our segregation of literatures based on the contribution, we
1 1 found that 28 literatures contribute to Model, followed by 13
0 literatures contribute to Process, 8 literatures contribute to Metric,
SLR Literatures for big data in cloud 6 literatures contribute to Method and finally 1 literature
contributes to Tool.
Storage Software/System Development Bar Chart 4
Visualization Application/Solution
Analytics Virtualization
10
6
Data Network
Provider 5 3
2
1
Bar Chart 1 displays all the 56 literatures that we get from SLR. 0
More literatures focus on Application/Solution compared to Risk Assessment for Big Data in Cloud
Storage and Visualization.

Process Method Model Metric

65
limit[14]. Using comprehensive weight method, weigth vector is
In Bar Chart 4, we found that 6 literatures contribute to Process, 1 obtained [14]. The improved model not just look after the
literature contributes to Method, 2 literatures contribute to Model “dynamic and fuzzy” of information security evaluation but also
and 3 literatures contribute to Metric. None of the risk assessment solve the problem of expert evaluation and the lack of “objective
literatures discussed on Tool. and comprehensive” evaluation [14]. This is a comprehensive
method that can overcome the weaknesses of a single weight
RQ3: What research approaches / types were applied in risk
method [14] . The analysis of the model shows that lack of control
assessment in big data and cloud computing?
have a positive correlation to risk [14]. The improved cloud
Research approach that can be applied for risk assessment for big matter-element model uses confidentiality, integrity and
data in cloud computing are Validation, Philosophical, Experience availability as indicator of its assets value to control the risk [14].
and Evaluation [9][12].
While [14] provided a solution to evaluate information security
Before we can segregate the literatures into these research risk, [15] presented a framework to show various information
approaches, we need to understand their meanings and usage. objects involved in ISO27005 risk management standard. [15]
classified the information using the guidelines of UNINETT
Validation research is applied if the technique is novel and yet to scheme. The classification helps to segregate information based
be tested and implemented [7][13]. The research technique on its “sensitivity and importance” [15]. The classification helps
involves collecting and analyzing data to assess the accuracy of to identify the information that needs to be protected and the level
the instrument. Example of validation research are techniques of protection required. Besides that, the classification also helps in
used for experiments which are done in the lab [12]. providing the storage time and serves as a disposal guideline[15].
Philosophical research involves looking at new ways of doing Finally, the classification helps the companies to protect the
things and proposes a framework [7][12]. information’s confidentiality, integrity and availability where
personal information is segregated and therefore, privacy can be
Evaluation research on the other hand, evaluates the techniques preserved [15].
implemented in real situation [7][13] so that we know the
advantages and disadvantages of the research in actual [16] on the other hand, investigated risk assessment and
environment [13][12]. management of information system. Systematic analysis of threats
and vulnerabilities is required so that system security can be
Experience research is the research that has been done before [12]. assured [16] and the occurrence of risks that comes from
Usually, it comes from the authors’ experiences themselves [12]. confidentiality, integrity and availability can be easily managed
[16]. [16] claimed that monitoring risk continuously is needed so
Bar chart 5 that risk assessment activity can produce correct information on
information security risks. [16] also believed that privacy control
10 is becoming more important that privacy protection needs to be
6 included in the risk assessment.
5 3 [17] claimed that current cloud assurance standards can never
2 replace the continuous risk monitoring. Therefore, [17] proposed
1
using risk assessment procedure to processes obtained with the
0 deployment of different security controls. The procedure can
Risk Assessment for Big Data in Cloud provide automatic assessment of costs and risk factors [17]. Not
only that, the procedure enables us to have risk aware design and
Validation Philosophical Experience Evaluation cloud-based services deployment easily [17].
[18] provided security risk assessment algorithm for cyber
By analyzing Bar Chart 5, it is found that 6 literatures use physical power system. [18] used “rough set and gene expression”
Validation, 3 literatures use Experience, 2 literatures use programming. [18] claimed that the algorithm makes “high
Philosophical and 1 research uses Evaluation as the research efficiency of function mining, accuracy of security risk level
approach. prediction and strong practicality” [18] .
Many literatures focus on Validation to compare the accuracy of [19] demonstrated how trust and control can help to mitigate the
the instruments with the real environment. Validation has the risk of cloud computing adoption for highly sensitive data. [19]
highest number of literatures followed by Experience, stated that trust and control are factors for cloud adoption where
Philosophical and finally Evaluation. the two types of trust which are competent trust and goodwill trust
together with formal control can reduce the risk [19]. Increasing
4. FINDINGS AND GAPS goodwill trust and competence trust can bring the perceived risk
Not many literatures focus on Storage, Software/System of cloud adoption to an acceptable level [19].
development, Visualization, Analytics and Virtualization. None of
the literatures is looking into Tools as contribution. [20] introduced continuous assessment techniques so that cloud
service customers can have the security and privacy assurances of
Past researches do contribute to risk assessment for big data and cloud service providers. Metrics usage as a tool provides security
cloud. Out of 56 literatures, 12 literatures investigate risk assurance and transparency to the cloud service customer [20].
assessment and below are some of the findings. Metrics improve cloud service customers’ risk evaluation and the
[14] proposed an improved cloud matter-element model. Data improvement technique enables cloud service customers to choose
involved in information security evaluation is processed by the best cloud service providers [20]. This situation resulted in the
calculating the correlation between sample cloud and the grade

66
increase of cloud service customers’ trust to the chosen cloud 1018, no. 1, 2018.
service providers [20]. [6] D. Budgen, M. Turner, P. Brereton, and B. Kitchenham,
To summarize, all the literatures above discuss on risk assessment “Using Mapping Studies in Software Engineering,” Proc.
from various perspectives. 4 literatures contribute to risk PPIG, vol. 2, pp. 195–204, 2008.
assessment model, algorithm, metrics and tool. [14] used risk [7] S. Zein, N. Salleh, and J. Grundy, “A systematic
evaluation model to solve the dynamic and fuzzy of security mapping study of mobile application testing techniques,”
evaluation and also solve the problem of expert evaluation. [18] J. Syst. Softw., vol. 117, pp. 334–356, 2016.
used security risk assessment algorithm to predict security risk
level. [20] used metrics to assess risk and continuous assessment [8] Kitchenham, “Performing systematic literature reviews in
techniques to ensure cloud service customers’ trust on the security software engineering,” Proceeding 28th Int. Conf. Softw.
and privacy assurances of cloud service providers. [17] used risk Eng. - ICSE ’06, vol. 45, no. 4ve, p. 1051, 2007.
assessment procedure to processes obtained with the deployment [9] Verdonck and F. Gailly, “An Exploratory Analysis on the
of different security controls to provide automatic assessment of Comprehension of 3D and 4D Ontology-Driven
costs and risk factors.[16] used systematic analysis of threats and Conceptual Models,” Concept. Model. - ER 2016, Lect.
vulnerabilities in risk assessment to provide a better security. [15] Notes Comput. Sci. vol. 9975, vol. 9975, no. 2, pp. 163–
used classification to segregate information based on the 172, 2016.
importance and level of protection required to protect privacy. [10] M. Petticrew and H. Roberts, Systematic Reviews in the
Finally, [19] used trust and control to reduce risk of cloud Social Sciences: A Practical Guide. 2006.
adoption.
[11] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson,
5. CONCLUSION “Systematic mapping studies in software engineering,”
Security of using big data in the cloud is a factor for companies in EASE’08 Proc. 12th Int. Conf. Eval. Assess. Softw. Eng.,
deciding whether to use or not to use the facilities of big data in pp. 68–77, 2008.
the cloud. From the literatures, it is found that, privacy is being [12] S. Mujtaba, K. Petersen, R. Feldt, and M. Mattsson,
discussed everywhere. Even though companies know the benefits “Software product line variability: A systematic mapping
of using big data in the cloud and the benefits of open access study,” Sch. Eng. Blekinge Inst. Technol., no. November
infrastructure of the cloud, they still need their privacy. Privacy 2015, 2008.
contradicts with the freedom of using big data in the cloud and
[13] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson,
when security and privacy are being investigated seriously, trust
“Systematic Mapping Studies in Software Engineering,”
gradually emerge. Trust enables companies to trust the cloud and
12Th Int. Conf. Eval. Assess. Softw. Eng., vol. 17, p. 10,
make full use of the cloud facilities without any question of
2008.
cloud’s credibility.
[14] D. Zong-you, Z. Wen-long, S. Yan-an, and W. Hai-tao,
Literatures on trust for big data in the cloud are lacking. Thus, we “The application of cloud matter #x2014; Element in
need to put more efforts on it so that we can trust the big data information security risk assessment,” 2017 3rd Int. Conf.
even though they are residing in the cloud. Inf. Manag., pp. 218–222, 2017.
6. ACKNOWLEDGMENTS [15] V. Agrawal, “A Framework for the Information
This research has been funded by Fundamental Research Grant Classification in ISO 27005 Standard,” Proc. - 4th IEEE
Scheme (FRGS16-022-0521) supported by Ministry of Higher Int. Conf. Cyber Secur. Cloud Comput. CSCloud 2017
Education. 3rd IEEE Int. Conf. Scalable Smart Cloud, SSC 2017, pp.
264–269, 2017.
7. REFERENCES [16] Lulu Liang, Wang Ren, Jing Song, Huaming Hu, Qiang
[1] M. Paryasto, A. Alamsyah, and B. Rahardjo, “Big-data He, and Shuo Fang, “The state of the art of risk
security management issues,” 2014 2nd Int. Conf. Inf. assessment and management for information systems,”
Commun. Technol., pp. 59–63, 2014. 2013 9th Int. Conf. Inf. Assur. Secur., pp. 66–71, 2013.
[2] D. Zhe, W. Qinghong, S. Naizheng, and Z. Yuhan, [17] V. Bellandi, S. Cimato, E. Damiani, G. Gianini, and A.
“Study on Data Security Policy Based on Cloud Storage,” Zilli, “Toward economic-aware risk assessment on the
2017 IEEE 3rd Int. Conf. Big Data Secur. Cloud cloud,” IEEE Secur. Priv., vol. 13, no. 6, pp. 30–37, 2015.
(BigDataSecurity), IEEE Int. Conf. High Perform. Smart [18] S. Deng, D. Yue, X. Fu, and A. Zhou, “Security risk
Comput. IEEE Int. Conf. Intell. Data Secur., pp. 145–149, assessment of cyber physical power system based on
2017. rough set and gene expression programming,” IEEE/CAA
[3] L. R. Techio and M. Misaghi, “EMSCLOUD - An J. Autom. Sin., vol. 2, no. 4, pp. 431–439, 2015.
evaluative model of cloud services cloud service [19] A. Khosravani, “A case study analysis of risk, trust and
management,” 5th Int. Conf. Innov. Comput. Technol. control in cloud computing,” … Conf. (SAI), 2013, pp.
INTECH 2015, no. Intech, pp. 100–105, 2015. 879–887, 2013.
[4] E. Damiani, “Toward big data risk analysis,” 2015 IEEE [20] R. Trapero, J. Luna, and N. Suri, “Quantifiably Trusting
Int. Conf. Big Data (Big Data), pp. 1905–1909, 2015. the Cloud: Putting Metrics to Work,” IEEE Secur. Priv.,
[5] H. B. Yusof Ali, L. M. Abdullah, M. Kartiwi, A. Nordin, vol. 14, no. 3, pp. 73–77, 2016.
N. Salleh, and N. S. A. A. Bakar, “A Systematic
Literature Mapping of Risk Analysis of Big Data in
Cloud Computing Environment,” J. Phys. Conf. Ser., vol.

67

Вам также может понравиться