Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Security and Resilience in Intelligent Data-Centric Systems and Communication Networks
Security and Resilience in Intelligent Data-Centric Systems and Communication Networks
Security and Resilience in Intelligent Data-Centric Systems and Communication Networks
Ebook785 pages8 hours

Security and Resilience in Intelligent Data-Centric Systems and Communication Networks

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Security and Resilience in Intelligent Data-Centric Systems and Communication Networks presents current, state-of-the-art work on novel research in theoretical and practical resilience and security aspects of intelligent data-centric critical systems and networks. The book analyzes concepts and technologies that are successfully used in the implementation of intelligent data-centric critical systems and communication networks, also touching on future developments. In addition, readers will find in-demand information for domain experts and developers who want to understand and realize the aspects (opportunities and challenges) of using emerging technologies for designing and developing more secure and resilient intelligent data-centric critical systems and communication networks.

Topics covered include airports, seaports, rail transport systems, plants for the provision of water and energy, and business transactional systems. The book is well suited for researchers and PhD interested in the use of security and resilient computing technologies.

  • Includes tools and techniques to prevent and avoid both accidental and malicious behaviors
  • Explains the state-of-the-art technological solutions for main issues hindering the development of monitoring and reaction solutions
  • Describes new methods and technologies, advanced prototypes, systems, tools and techniques of future direction
LanguageEnglish
Release dateSep 29, 2017
ISBN9780128113745
Security and Resilience in Intelligent Data-Centric Systems and Communication Networks

Related to Security and Resilience in Intelligent Data-Centric Systems and Communication Networks

Related ebooks

Science & Mathematics For You

View More

Related articles

Reviews for Security and Resilience in Intelligent Data-Centric Systems and Communication Networks

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Security and Resilience in Intelligent Data-Centric Systems and Communication Networks - Massimo Ficco

    2013;114:114–125.

    Chapter 1

    Dependability of Container-Based Data-Centric Systems

    Petar Kochovski*; Vlado Stankovski*    * University of Ljubljana, Ljubljana, Slovenia

    Abstract

    Nowadays, many Cloud computing systems and applications have to be designed to be able to cope with the four Vs of the Big Data problem: volume, variety, veracity, and velocity. In this context, an important property in such systems is dependability. Dependability is a measure of the availability, reliability, tractability, response time, durability, security, and other aspects of the system. Dependability has been well established as an important property in services science and systems engineering; however, data-centric systems may have some specific requirements due to a variety of technologies that are still maturing and evolving. The goal of the present work is to establish the main aspects of dependability in data-centric systems and to analyze to what extent various Cloud computing technologies can be used to achieve this property. Particular attention is given to the use of container-based technologies that represent a novel, lightweight form of virtualization. The practical part of the chapter presents examples of highly dependable data-centric systems, such as ENTICE and SWITCH, suitable for the Internet of Things (IoT) and the Big Data era.

    Keywords

    Dependability; Big Data; Container-based systems; Storage

    1 Introduction

    The overall computing capacity at disposal to humanity has been exponentially increasing in the past decades. With the current speed of progress, supercomputers are projected to reach 1 Exa Floating Point Operations per Second (EFLOPS) in 2018. The Top500.org website presents an interesting overview of various achievements in the area. However, there are some new developments, which would result in even greater demands for computing resources in the very near future.

    While exascale computing is certainly the new buzzword in the High Performance Computing (HPC) domain, the trend of Big Data is expected to dramatically influence the design of the World’s leading computing infrastructures, supercomputers, and cloud federations. There are various areas where data is increasingly important to applications. These include research, environment, electronics, telecommunication, automotive, weather and climate, biodiversity, geophysics, aerospace, finance, chemistry, logistics, and energy. Applications in these areas have to be designed to process exabytes of data in many different ways. All this poses some new requirements on the design of future computing infrastructures.

    The recent high development and implementation of the Internet of Things (IoT) has led to the installation of billions of devices that can sense, actuate, and even compute large amounts of data. The data coming from these devices pose a threat to the existing, traditional data management approaches. The IoT produces a continuous stream of data that constantly needs to be stored in some data center storage. However, this technology is not intended for storing the data, but processing it as well, and giving the necessary response to the devices. Big Data is not a new idea, because it was firstly mentioned in an academic paper in 1999 (Bryson et al., 1999). Though much time has passed since then, the active use of the Big Data has just recently begun. As the amount of data produced by the IoT kept rising, organizations were forced to adapt technologies to be able to map with IoT data. Therefore it could be stated that the rise of the IoT forced the development and implementation of the Big Data technologies.

    Big Data has no clear definition, despite the fact that it is first thought of in relation to size. Big Data is based on four main characteristics, also known as the 4Vs: volume, variety, velocity, and veracity. The volume is related to the size of data, which is growing exponentially in time. Thus none knows the exact amount of data that is being generated, but everyone is aware that it is an enormous quantity of information. Although most of the data in the past was structured, during the last decade the amount of data has grown and most of it is unstructured data now. IBM has estimated that 2.3 trillion gigabytes of data are created every day and 40 zettabytes of data will be created by 2020, which is an increase of 300 times from 2005 (IBM Big Data, 2013). The variety describes those different types of data, because various data types are generated by industries as financial services, health care, education, high performance computing and life science institutions, social networks, sensors, smart devices, etc. Each of these data types differ from each other. This means that it is impossible to fit this kind of various data on a spreadsheet or into a database application. The velocity measures the frequency of the generated data that needs to be processed. As required the data could be processed in real-time or processed when it is needed. The veracity is required to guarantee the trustworthiness of the data, which means that the data has the quality to enable the right action when it is needed.

    With the previous analysis it is obvious that Big Data plays a crucial role in today’s society. Due to the different formats and size of the unstructured data, the traditional storage infrastructures could not achieve the desired Quality of Service (QoS) and could lead to data unavailability, compliance issues, and increased storage expenses. A solution, that addresses the data-management problems is the data-centric architecture, on which the data-centric systems are designed. The philosophy behind DCS is simple, as the data size grows, the cost of moving data becomes prohibitive. Therefore the data-centric systems offer the opportunity to move the computing to the data instead of vice versa. The key of this system design is to separate data from behavior. These systems are designed to organize the interactions between applications in terms of the stateful data, instead of the operations to be performed. As the volume and velocity of unstructured data increase all the time, new data management challenges appear, such as providing service dependability for applications running on the Cloud. According to the author (Kounev et al., 2012) dependability can be defined as the ability of a system to provide dependable services in terms of availability, responsiveness, and reliability. Although dependability has many definitions, in this chapter we will try to depict the dependability in a containerized, component-based system, data-centric system.

    While many petascale computing applications are addressed by supercomputers, there is a vast potential for wider utilization of computing infrastructures by using the form of cloud federations. Cloud federations may relate to both computing and data aspects. This is an emerging area that has not been sufficiently explored in the past and may provide a variety of benefits to data-centric applications.

    The remaining section of this chapter are structured as follows. Section 2 describes the component-based software engineering methodology, the architectural approach for data management, and container interoperability. Section 3 points out the key concepts and relations in dependability, where the dependability attributes, means, and threats are described. Section 4 describes the serving of virtual machine and container images to applications. Section 5 elaborates the QoS management in the software engineering process. Section 6 concludes the chapter.

    2 Component-Based Software Engineering

    In order to be able to build dependable systems and applications we must rely on a well-defined methodology that would help analyze the requirements and the trade-offs in relation to dependability. Therefore it is necessary to relate the development of dependable data-centric systems and applications to modern software engineering practices in all their phases, such as requirements analysis, component development, workflow management, testing, deployment, monitoring, and maintenance.

    With the constant growth of software’s complexity and size, the traditional software development approaches have become ineffective in terms of productivity and cost. Component-Based Software Engineering (CBSE) has emerged to overcome these problems by using selected components and integrating them into one well-defined software architecture, where each component should be presented as a functionality independent from other components of the whole system. As a result of CBSE implementation, during the software engineering process the developer selects and combines appropriate components instead of designing and developing the components themselves—they are found in various repositories of Open-Source software. The components can be heterogeneous, written in different programming languages, and integrated in an architecture, where they communicate with each other using well-defined interfaces.

    2.1 Life-Cycle

    Every software development methodology addresses a specific life cycle of the software. Although life cycles of different methodologies might be very different, they could all be described by a set of phases that are common for all of them. These phases represent major product lifecycle periods and they are related to the state of the product. The existing CBSE development lifecycle has separated component development from system development (Sharp and Ryan, 2009). Although the component development process is in many aspects similar to system development there are some notable differences, for instance components are intended for reuse in many different products, many of which have yet to be designed.

    The component lifecycle development, shown in Fig. 1, can be described in seven phases (Sharp and Ryan, 2009):

    Fig. 1 Component-based software development lifecycle.

    1. Requirements phase: During this phase the requirement specification for the system and the application development are decided. Also during the requirements phase the availability of the components is calculated.

    2. Analysis and Design phase: This phase starts with a system analysis and a design providing overall architecture. Furthermore, this phase develops a detailed design of the system. During this phase it is decided which components will be used in the development. Because the design phase impacts the QoS, during this phase the necessary QoS level to be achieved is determined.

    3. Implementation phase: Within this phase the components are selected according to their functional properties. They are verified and tested independently and together as well. Some component adaptations may be necessary to ensure compatibility.

    4. Integration phase: Although integration is partially fulfilled during the implementation phase, the final integration of all components within the architecture occurs during this phase.

    5. Test phase: This phase is necessary to ensure component quality within the given architecture, because the component could have been developed for another architecture with different requirements. The components are usually tested individually, after integrating in assembly and after the full integration in the system.

    6. Release phase: This phase includes preparing the software for delivery and installation.

    7. Maintenance phase: As for all software products, it is necessary to provide maintenance with the replacement of components. The approach in CBSE is to provide maintenance by replacing the old components with new components or by adding new components into the system.

    Cloud computing technologies can be used to increase the ubiquitous access to services, anywhere at anytime. Therefore Cloud architecture needs to be designed to provide the desired QoS. The author (Ramachandran, 2011) explains that it is essential to design the Cloud applications as Web service components based on well-proven software processes, design methods, and techniques such as the component-based software engineering techniques. One of the most common problems during Cloud application development is the SLA support. Since the SLAs vary between different service providers, this goal can only be reached using components designed for flexible interface that links to many different SLAs. This CBSE supports implementation of different design characteristics, which can be designed to support many SLAs.

    2.2 Architectural Approaches for Data Management

    Containerization is a process of distributing and deploying applications in a portable and predictable way. The use of containers is attractive for developers and system administrators as well, because of the possibilities they offer, such as abstraction of the host system away from the containerized application, easy scalability, simple dependency management and application versioning, and lightweight execution environments (Paraiso et al., 2016). Compared to VM virtualization, containers provide a lighter execution environment, since they are isolated at the process level, sharing the host’s kernel. This means that the container itself does not include a complete operating system, leading to very quick start up times and smaller transfer times of container images. Therefore an entire stack of many containers could run on top of a single instance of the host OS. Depending on the container, we can distinguish three different architectural approaches, described in the following.

    2.2.1 Functionality and data in a container

    The use of containers can make it possible to diminish the otherwise hard barrier between stateful and stateless services. The use of containers makes it possible to configure this aspect as part of the software engineering process. Fig. 2 shows a container that provides an embodiment of both functionality and data. If the service needs to be stopped and restarted somewhere else, it may be possible to use a check pointing mechanism (Stankovski et al., 2016). For example, when a service restart may be needed is there deteriorated in the network parameters, such as the latency between the client and the running service. This approach may also be useful when the service needs to be used when various events happen in the environment, for example, when a user wishes to have a videoconference, or when a user wishes to upload a file to a Cloud storage. In these cases, the geographic location of the user may influence the Quality of Experience of the applications and thus, it may be necessary to start the container in a location near to the place where the user is located.

    Fig. 2 A single container with functionality and data.

    A container with functionality and data is a container that containerizes the application code, all the libraries required for the target application and the dataset together. In general terms, the use of container technologies has many advantages over other virtualization approaches. Containers offer better portability, because they can be easily moved among any system that runs on the same host OS without additional code optimization. It is possible to run many containers on the same infrastructure, that is, they can easily scale horizontally. Thus the performance is improved, because there is only one OS. The main advantage of containers is that they can be created much faster than VM. This provides a more agile environment for development and a continuous integration and delivery.

    However, containers are not perfect, and there are some disadvantages. With containerization there is no isolation from the host OS. Since the containers share the same OS, security threats could easily access the entire system. This security threat can be solved by combining containers and VM, where the containers are created within an OS running on a VM. With this approach the potential security breach could only affect the VM’s OS, but not the physical host. Another containerization disadvantage is that the container and the base-host must use the same OS.

    2.2.2 Clean separation between functionality and data in containers

    Another approach that may be useful in relation to containers is to completely separate the functionalities from the actual data needed for their operation. This can be achieved by developing software defined systems and applications with a clear cut between the two. In such cases, when the functionality is needed, along with the data, a minimum of two containers need to be started: the first container provides the functionality, while the second container provides the data infrastructure (e.g., file system, SQL data base or knowledge base, as in the case of the ENTICE project). Fig. 3 graphically presents the possible situation.

    Fig. 3 Two linked containers provide separation of the functionality from the data.

    While the time required to setup and start the needed functionality for the given user may be slightly increased (as it requires the mounting of the data container on the container that provides for the functionality and the use of a network for communication between the containers), the separation can be effectively used in situations where the privacy and security aspects are very important to the application, and single-tenant users are the architectural choice. However, this architecture may be used to achieve high Quality of Service also in multitenant applications, such as those that are based on the Internet of Things (Taherizadeh et al., 2016b), where both containers containing the Web server and the SQL data base can elastically scale.

    2.2.3 Separate distributed data infrastructure and services

    The previously described approaches function well when there is no need to work with large data sets. However, long running applications produce large applications data sets, which must be stored and managed somewhere. This has led to arrangements of two parallel architectures: one for the management of data with a variety of approaches (e.g., CEPH, 2016; Amazons3, 2017; Storj.io, 2016; etc.) and another for the management of functionalities that are packed onto containers.

    Such clearly separated data management infrastructures and services can be established in a virtualized format as shown in Fig. 4. An example of such a system is Cassandra (Apache Cassandra, 2016), which can operate in a distributed and containerized format, trading performance with elegance of implementation. Thus the Cloud application is developed to make use of two infrastructures, one for handling elasticity and other properties of the containerized application, and another for the application dataset, hosted on a scalable distributed storage, which can be accessible from any container on the network. In order to improve the dependability of an architecture like this, with improving the availability and reliability, could be achieved by fast reinstatation of database nodes, when the application becomes unresponsive. According to the author (Stankovski et al., 2015) this could be done by saving the container’s file system with the state of the given container into disk files, copied to another server, and restarting the container without rebooting or recommencement from the beginning of the process.

    Fig. 4 Geographically distributed containerized data infrastructure based on a specific technology such as Cassandra.

    As previously explained, in order to improve the Quality of Service parameters (aka performability) in the case of handling great amounts of data it may be necessary to make use of data management systems, such as CEPH, which cannot be effectively virtualized (see Fig. 5).

    Fig. 5 A separate nonvirtualized data management infrastructure such as CEPH.

    2.3 Emerging Container Interoperability Architectures

    CNCF.io is an emerging container interoperability architecture promoted by a recently formed organization. The architecture as shown in Fig. 6 is developed to address several areas of emergent needs for Cloud applications, including the IoT and Big Data application domains. There are three main reasons for developing the CNCF architecture: container packaging and distribution, dynamic scheduling of containers, and micro-service implementation. The first part of the CNCF is designed based on technologies that build, store, and distribute container-packaged software. The second part of CNCF consists of technologies that dynamically orchestrate container images. Examples of orchestration technologies are: Kubernetes (2017) and Apache Mesos (2017). The third part of this emerging architecture consists of R&D microservices, which are designed to ensure that the deployed container image is built in a way that it can be easily found, accessed, consumed, and linked to other services. Thus this architecture can be used to build distributed systems and support the discovery and consumption of software (i.e., the overall life-cycle of Cloud applications).

    Fig. 6 A Cloud interoperability architecture of the Cloud Native Computing Foundation.

    As is evident from this architecture, distributed storage facilities have not been sufficiently considered, nor have they been effectively included in the CNCF architecture. The architecture specifies the need to separate the interfaces between the containers and the Software Defined Network and Software Defined Storage.

    3 Key Concepts and Relations in Dependability

    Based on the previously described use cases, and based on a review of the literature, it is possible to establish some key concepts and relations in dependability. System failures have devastating effect, resulting in many unsatisfied users affected by the failure and disruption of business continuity. The downtime results in loss of productivity, loss of revenue, bad financial performance, and damage to the user’s reputation. Damage to the user’s reputation could directly affect the user’s confidence and credibility with costumers, banks, and business partners. Therefore, it is crucial to design a dependable data-centric system. The original definition of dependability is the ability of a system to deliver service that can justifiably be trusted. An alternate version of the definition of dependability, based on the definition of failure, is stated as the ability of a system to avoid failures that are more frequent or more severe, and outage duration that are longer than is acceptable to the user (Avizienis et al., 2004). Although many different characteristics could influence the dependability of a system, the three essential characteristics are: attributes, threats, and

    Enjoying the preview?
    Page 1 of 1