Вы находитесь на странице: 1из 16

TABLE OF CONTENTS

ABSTRACT ....................................................................................................................................3

1. INTRODUCTION .......................................................................................................................4
1.1. BUSINESS INTELLIGENCE ........................................................................................................4
1.2. BIG DATA ................................................................................................................................5
1.3. BIG DATA ANALYTICS .............................................................................................................4
2. TERM PAPER OUTLINE ...........................................................................................................4
3. BIG DATA EVOLUTION ...........................................................................................................5
4. BEHAVIORAL TYPES OF DATA ............................................................................................5
4.1.STRUCTURED DATA .................................................................................................................5
4.2. UNSTRUCTURED DATA ............................................................................................................5
4.3. SEMI-STRUCTURD DATA ..........................................................................................................5
5. CHARACTERISTICS OF BIG DATA ........................................................................................5
5.1. VOLUME..................................................................................................................................5
5.2. VARIETY .................................................................................................................................5
5.3. VELOCITY ...............................................................................................................................6
5.4. VERACITY ...............................................................................................................................6
5.5. COMPLEXITY ...........................................................................................................................6
5.6. VIZUALIZATION ......................................................................................................................6
6. FIELDS OF BIG DATA ...............................................................................................................6
6.1. SOCIAL NETWORKING SITES ....................................................................................................6
6.2. SEARCH ENGINES ....................................................................................................................6
6.3. MEDICAL HISTORY ..................................................................................................................6
6.4.ONLINE SHOPPPING ..................................................................................................................6
6.5. STOCK EXCHANGE ..................................................................................................................7
7. COMPARISION OF TRADITIONAL AND BIG DATA ...........................................................7
8. TECHNIQUES FOR BIG DATA ................................................................................................7
8.1. OPTIMIZATION METHODS ........................................................................................................8
8.2. STATISTICS .............................................................................................................................8
8.3. DATA MINING ..........................................................................................................................8

1
8.4. MACHINE LEARNING ...............................................................................................................8
8.5. ARTIFCIAL NEURAL NETWORK ................................................................................................8
8.6. VIZUALIZATION APPROACHES .................................................................................................9
8.7. SOCIAL NETWORK ANALYSIS...................................................................................................9
9. BIG DATA TOOLS .....................................................................................................................9
9.1. BIG DATA TOOLS BASED ON BATCH PROCESSING ........................................................................9
9.1.1. APACHE HADOOP AND MAP/REDUCE ..................................................................................10
9.1.2. DYRAD ...............................................................................................................................10
9.1.3. APACHE MASHOUT .............................................................................................................10
9.1.4. JASPERSOFT BI SUITE..........................................................................................................11
9.2. STREAM PROCESSING BIG DATA TOOLS .....................................................................................11
9.2.1. STORM ...............................................................................................................................11
9.2.2. S4 ......................................................................................................................................11
9.2.3. SQLSTREAM S-SERVER .......................................................................................................12
9.3. BIG DATA TOOLS ON INTERACTIVE ANALYSIS ...........................................................................12
9.3.1. GOOGLE'S DREMEL .............................................................................................................12
9.3.2. APACHE DRILL ...................................................................................................................12
9. BIG DATA UNDERLYING TECHNOLOGY AND FUTURE RESEARCHES ......................13
9.1. GRANULAR COMPUTING ........................................................................................................13
9.2. CLOUD COMPUTING...............................................................................................................13
9.3. BIO-INSPIRED COMPUTING ....................................................................................................14
9.4. QUANTUM COMPUTING .........................................................................................................15
10. CONCLUSION ....................................................................................................................12

REFERENCES ............................................................................................................................16

2
ABSTRACT
Massive, fast and diverse data moving everywhere is known as “Big Data”. This data has become
an important source for valuable insights and ultimately helping to make more informed business
decision. However this data has special attributes and so it can’t be managed and processed by the
current traditional software systems, which has become a real problem nowadays.
This paper explores the evolution, types, characteristics, techniques and technologies connected to
BIG data, and aims for a basic understanding of the reader about BIG data after reading this paper.

3
1. INTRODUCTION
In the modern era of ICT, where data is growing exponentially, the businesses are faced with huge
data sets generated through various sources like online sales transactions, social networks, sensors,
web logs, videos, telecommunications etc. and are at a high risk of collection, storage, retrieval
and analysis of such enormous data to realize higher value from the available data.
The amount of data we produce every day is truly mind-boggling. There are 2.5 quintillion bytes
of data created each day at our current pace, but that pace is only accelerating with the growth of
the Internet of Things (IoT). (Forbes, May 21, 2018).
Hence, data in recent times has transformed into BIG data which refers to the data that exceeds the
processing capacity of conventional databases. And as a result, BIG Data Analytics has emerged
as a core business practice in Business Intelligence as firms seek to use their information assets to
improve business outcomes. The evolution of BIG data Analytics to not just analyze raw structured
data, but also semi-structured and unstructured data from various data sources has made BIG data
analytics a hottest emerging practice in BI today.
1.1. Business Intelligence
The concept of Business Intelligence was introduced by Gartner Group in 1996, which refers to
the applications, infrastructure, tools and best practices for the collection, consolidation,
integration, analysis and presentation of business information to optimize decision making and
thus increase business competitiveness.
1.2. BIG Data
BIG data is the term for a collection of data set so huge and complex that it becomes difficult to
process it using traditional data mining techniques and tools. Major processes of big data include
capture, curation, storage, search, sharing, transfer, analysis and visualization.
1.3. BIG Data Analytics
It is a method to uncover the hidden designs and pattern in large data, to extract useful information
that can be divided into two major segments; data management and data analysis. It helps by
optimizing funnel conversion, behavioral analytics, predictive support, market based analysis,
pricing optimization, predict security, fraud detection etc.
2. TERM PAPER OUTLINE
This essay has briefly introduced the concepts of Business Intelligence, BIG data and BIG data
analytics, its importance and application to the data driven businesses in the introduction section.
The next section discusses the evolution of BIG data, followed by a few other sections explaining
the types, characteristics, fields and comparison of BIG data with traditional data respectively. The
remaining sections discuss the prominent and current practices of BIG data techniques, tools and
technologies.

4
3. BIG DATA EVOLUTION
The revolutionary step in the world of data was the introduction of relation database models which
could store data in the forms of tables and processed whenever needed. The analysis of the stored
data was done to solve different issues. Then came the internet world where excessive data growth
was experienced in the beginning of 20th century, as the World Wide Web generated large volume
of data with high velocity and variety. Such data sets were difficult to handle so new data type was
introduced known as BIG data. Today, we cannot imagine a world where data is stored for
examples, a place where peoples details, transactions and documentation are immediately lost after
use. If this would happen then the ability to produce information to perform analytics would be
lost by organizations as a consequence.
4. BEHAVIORAL TYPES OF BIG DATA
There are three major types of BIG data. These data types are discussed in detail below;
4.1. Structured Data
This is the data stored in relational data base systems and organized in the forms of rows and
column in tables. The system allows to store, process and operate the data. Programming language
called SQL, structured query language is used to manage this data type. Name, date, address are a
few examples of structured data type.
4.2. Unstructured Data
Data that cannot be store in row and column format is referred as unstructured data. It cannot be
stored in data banks. It maybe textual or non-textual and has varying size and content. Examples
of unstructured data type include audio and video files, web logs, mobile text messages etc.
4.3. Semi-structured Data
Data which is in the form of structured data but does not fit into the formal structure of relational
data base or other forms of data tables and requires particular types of files that hold specific
marker or tags for storage. Examples of this data type include XML files, JSON etc.
5. CHARACTERISTICS OF BIG DATA

5.1. Volume (Scale)


ICT has provided unlimited data collection points and the quantity of data produced today is huge
as compared to ancient times, and is growing greater than anything ever seen.
5.2. Variety (Complexity)
The data created through different sources is diverse in nature and includes different data formats,
data semantics and data structure types.

5
5.3. Velocity
Data formation is very quick now a days and nearly unstoppable as most of the data now is created
in real time.
5.4. Veracity
Majority of the generated data from different sources is noisy and has problems of uncleanness
and inaccuracy. Hence, it becomes difficult to be absolutely certain about the veracity of big data.
5.5. Complexity
It refers to the degree of interdependence and interconnectedness of the data structures in BIG data
in a way that a minor change in one of the data components may lead to drastic change or no
change at all in the behavior of the entire data system. Consequently, BIG data is considered
volatile and complex in nature.
5.6. Visualization
Big Data visualization involves the presentation of data of almost any type in a graphical format
that makes it easy to understand and interpret. But it goes beyond traditional corporate graphs,
histograms and pie charts to more complex visual representations like heat maps and fever charts,
enabling the decision makers to explore data sets to identify correlations and unexpected patterns.
6. FIELDS OF BIG DATA
There are two types of data, human generated and machine generated. Following are a few
prominent fields with highest contribution in BIG data generation;
6.1. Social Networking sites
Social platforms that carry information, posts, links etc. of different people from all over the world
like Facebook, Snap chat, and Twitter etc.
6.2. Search Engines
Google being the major player in this category along with a few others sends Google spiders to
crawl billion of pages on web and discover new and updated pages to be added to the Google Index
and you can only imagine how often this crawling takes place per nanosecond
6.3. Medical History
Medical records generated by different healthcare institutions and hospitals to record the patients’
medical history.
6.4. Online Shopping
Online shopping records all the transaction types, preferences, timings and occasions of purchases
which is useful to generate consumer insights for businesses for designing consumer specific
customized offers and for re-marketing.

6
6.5. Stock Exchange
Trading of different shares of various companies also generates massive financial trading data used
to predict market divers and trends.
7. COMPARISION OF TRADITIONAL AND BIG DATA
Dimension Traditional Data BIG Data Advantage
Data Centralized database Distributed Database Cost effective
Architecture
Data Type Structured Semi-structured and Improved variety
unstructured
Volume Small amount of data. Large amount of data. Helps business
Range – GB to TB Range - > petabytes intelligence
Data Schema Fixed schema Dynamic schema Preserves the
information in
the data
Data relationship Relationship with data is Difficult to explore _____
easily explored relationship between data
items
Scaling More than one server for Single server for Cost effective
computing computing
Accuracy Less accurate results Highly accurate results Reliable results

8. TECHNIQUES FOR BIG DATA


To capture the insights and valuable information from Big Data, we need techniques and
technologies for analyzing it.
Until now, scientists have developed a variety of techniques and technologies to capture, curate,
analyze and visualize Big Data. Nevertheless, they are far away from meeting variety of needs of
BIG data. These techniques and technologies are multidisciplinary, including computer science,
economics, mathematics and statistics etc. We will discuss the current and most common
techniques and technologies for exploiting data intensive applications in the next few sub-
sections.
Big Data requires extraordinary techniques to efficiently process large volume of data within
limited run times. Big Data techniques are driven by specified applications. For example, Wal-
Mart applies machine learning and statistical techniques to explore patterns from their large
volume of transaction data.

7
8.1. Optimization Methods
These methods are applied to solve quantitative problems in different fields, such as physics,
biology, engineering, and economics.
Computational strategies for addressing global optimization problems involve, simulated
annealing, adaptive simulated annealing, quantum annealing, as well as genetic algorithm which
naturally lends itself to parallelism and therefore can be highly efficient.
8.2. Statistics
Statistics is the field of science to collect, organize, and interpret data. Statistical techniques are
used to exploit correlations and causal relationships between different objectives. Numerical
descriptions are also provided by statistics.
However, standard statistical techniques are not well suited to manage Big Data, and extensions
of classical techniques or completely new methods have been developed to apply statistical
models to BIG data such as efficient approximate algorithm for large-scale multivariate
monotonic regression, which is an approach for estimating functions that are monotonic with
respect to input variables.
Another trend of data-driven statistical analysis focuses on scale and parallel implementation of
statistical algorithms.
8.3. Data mining
It is a set of techniques to extract valuable information (patterns) from data, including clustering
analysis, classification, regression and association rule learning. It involves a number of methods
from machine learning and statistics.
8.4. Machine learning (ML)
ML is an important subjection of artificial intelligence which is aimed to design algorithms that
allow computers to evolve behaviors based on empirical data. The most important characteristic
of machine learning is to discover knowledge and make intelligent decisions automatically.
When Big Data is concerned, we need to scale up machine learning algorithms, both supervised
learning and unsupervised learning, to cope with it. Deep machine learning has become a new
research frontier in artificial intelligence. In addition, there are several frameworks, like
Map/Reduce, DryadLINQ, and IBM parallel machine learning toolbox that have capabilities to
scale up machine learning.
8.5. Artificial neural network (ANN)
(ANN) is a more mature technique which has a wide range of application coverage. Its
successful application can be found in pattern recognition, image analysis, adaptive control etc.
Most of the currently employed ANNs for artificial intelligence are based on statistical
estimations, classification optimization and control.

8
However, the complexity in a neural network increases the learning time. Therefore, the learning
process in a neural networks over Big Data is severely time and memory consuming. Neural
processing of large data sets often leads to very large networks. There are two main challenges in
this situation. One is that the conventional training algorithms perform poorly, and the other is
the training time and memory limitations. Consequently, two common approaches can be
employed in this situation. One is to reduce the data size by some sampling methods, and the
structure of the neural network maybe remains the same. The other one is to scale up neural
networks in parallel and distributed ways.
8.6. Visualization Approaches
These are the techniques used to create tables, images, diagrams and other intuitive display ways
to understand data. Big Data visualization is not that easy like traditional relative small data sets
because of the presence of 6Vs in it that have been discussed above.
The common practices for large-scale data visualization include extraction and geometric
modeling to significantly reduce the data’s size.
8.7. Social Network Analysis (SNA)
SNA is an emerging key technique in the modern sociology and views social relationships in terms
of network theory consisting of nodes and ties.
SNA include social system design, human behavior modeling, social network visualization, social
networks evolution analysis, and graph query and mining. Recently, Online Social networks and
Social media analysis have become popular. One of the main obstacles regarding SNA is the
vastness of Big Data. Analysis of a network consisting of millions or billions of connected objects
is computationally costly.
Higher level Big Data technologies include distributed file systems, distributed computational
systems, massively parallel-processing (MPP) systems, data mining based on grid computing,
cloud-based storage and computing resources, along with granular computing and biological
computing.
9. BIG DATA TOOLS
Next, we need tools/platforms along with techniques and technologies to make sense of Big Data.
Current tools concentrate on three classes, namely, batch processing tools, stream processing tools,
and interactive analysis tools. In the following sub-sections, we’ll discuss several tools for each
class.
9.1. Big Data tools based on batch processing
One of the most powerful batch process-based Big Data tools is Apache Hadoop. It provides
infrastructures and platforms for other specific Big Data applications. A number of specified Big
Data systems are built on Hadoop, and have special usages in different domains, for example, data
mining and machine learning used in business and commerce.

9
9.1.1. Apache Hadoop and map/reduce
Apache Hadoop a well-established software platform that support data-intensive distributed
applications. It implements the computational paradigm named Map/Reduce. Apache Hadoop
platform consists of the Hadoop kernel, Map/Reduce and Hadoop distributed file system (HDFS),
as well as a number of related projects, including Apache Hive, Apache HBase, and so on.
Map/Reduce, which is a programming model and an execution for processing and generating large
volume of data sets, was pioneered by Google, and developed by Yahoo! and other web
companies. Map/Reduce is based on the divide and conquer method, and works by breaking down
a complex problem into many sub-problems, until these sub-problems is scalable for solving
directly. These sub-problems are then assigned to a cluster of working notes, and solved in separate
and parallel ways. Finally, the solutions to the sub-problems are then combined to give a solution
to the original problem. The divide and conquer method is implemented by two steps: Map step
and Reduce step. In terms of Hadoop cluster, there are two kinds of nodes in Hadoop infrastructure.
They are master nodes and worker nodes. The master node takes the input, divides it into smaller
sub-problems, and distributes them to worker nodes in Map step. Afterwards, the master node
collects the answers to all the sub-problems and combines them in some way to form the output in
Reduce step.
With the addition of Map/Reduce, Hadoop works as a powerful software framework which can
process vast quantities of data in-parallel on large clusters (perhaps thousands of nodes) of
commodity hardware in a reliable, fault-tolerant manner.
9.1.2. Dryad
A Dryad job is to generate a graph and these generated graphs can also be updated after execution,
in order to deal with the unexpected events in the computation.
Dryad is a self-contained system with complete functions including job creation and management,
resource management, job monitoring and visualization, fault tolerance, re-execution. Therefore,
many software have been built on top Dryad, including Microsoft’s Server 2005 Integration
Services (SSIS).
9.1.3. Apache mahout
The Apache Mahout aims to provide scalable and commercial machine learning techniques for
large-scale and intelligent data analysis applications. Many renowned big companies, such as
Google, Amazon, Yahoo!, IBM, Twitter and Facebook have projects involving Big Data problems
and Apache Mahout provides a tool to solve these big challenges.
Mahout’s core algorithms, including clustering, classification, pattern mining, regression,
dimension reduction, evolutionary algorithms and batch based collaborative filtering, run on top
of Hadoop platform via the Map/reduce framework.

10
9.1.4. Jaspersoft BI suite
The Jaspersoft package is an open source software that produce reports from database columns.
The software has already been installed in many business information systems.
One important property of Jaspersoft is that it can quickly explore Big Data without extraction,
transformation (ETL), and loading
In addition to the above tools, following are also commonly used batch processing tools
 Pentaho business analytics
 Skytree server
 Tableau
 Karmasphere studio and analyst
 Talend Open Studio
 Talend Open

9.2. Stream processing Big Data tools


Hadoop is designed for batch processing. It is a multi-purpose engine but not a real-time and high
performance engine, since there are high throughout latency in its implementations. For certain
stream data applications, such as processing log files, industry with sensor, machine-to-machine
(M2M) and telematics requires real-time response for processing large amount of stream data. In
those applications, stream processing for real-time analytics is mightily necessary. Stream Big
Data has high volume, high velocity and complex data types. Indeed, when the high velocity and
time dimension are concerned in applications that involve real-time processing, there are a number
of different challenges to Map/Reduce framework.
Numerous Big Data tools based on stream processing have been developed. One of the most
commonly used platforms is Storm and others include S4, SQLstream, Splunk, Apache Kafka, and
SAP Hana; a few of which are discussed below,
9.2.1. Storm
Storm is a distributed and fault-tolerant real-time computation system for processing limitless
streaming data. It is released as open source and free for remolding. It is also very easy to set up
and operate, and guarantees the processing of all the data. It has many applications, such as real-
time analytics, interactive operation system, on-line machine learning, continuous computation,
distributed RPC, and ETL.
9.2.2. S4
S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable computing platform for
processing continuous unbounded streams of data. It was initially released by Yahoo! in 2010 and
has become an Apache Incubator project since 2011. S4 allows programmers to easily develop
applications, and possesses has several competitive properties, including robustness,
decentralization, scalability, cluster management and extensibility.

11
The core platform of S4 is written in Java. S4 has been put to use in production systems at Yahoo!
for processing thousands of search queries, and good performances show up in other applications.
9.2.3. SQLstream s-Server
SQLstream is another Big Data platform designed for processing large-scale streaming data in
real-time. It focuses on intelligent and automatic operations of streaming Big Data. SQLstream is
appropriate for discovering patterns from large amounts of unstructured log file, sensor, network
and other machine-generated data. The new release SQLstream s-Server 3.0 has good
performances in real-time data collection, transformation and sharing, which is in favor of real-
time Big Data management and analytics. The standard SQL language are still adopted in the
underlying operations.
9.3. Big Data tools based on interactive analysis
The interactive analysis presents the data in an interactive environment, allowing users to
undertake their own analysis of information. User are connected to the computer and can interact
with it in real time. The data can be reviewed, compared and analyzed in tabular or graphical
format or both simultaneously. Following are a couple of interactive analysis tools.
9.3.1. Google’s Dremel
In 2010, Google proposed an interactive analysis system, named Dremel, which is scalable for
processing nested data. It has capability to run aggregation queries over trillion-row tables in
seconds by means of combining multi-level execution trees and columnar data layout. The system
scales to thousands of CPUs and petabytes of data, and has thousands of users at Google.
9.3.2. Apache drill
Apache Drill is another distributed system for interactive analysis of Big Data [89]. It is similar to
Google’s Dremel. For Drill, there are more flexibility to support a several of different query
languages, data formats and data sources.
It has an objective to scale up on 10,000 servers or more, and reaches the capability to process
petabytes of data and trillions of records in seconds.
Every Big Data platform has its own focus, some of them are designed for batch processing, some
for stream processing and some are good at real-time analytic. Each Big Data platform has a
specific functionality and has different operational features to perform those functions
respectively.

12
10. BIG DATA UNDERLYING TECHNOLOGY & FUTURE RESEARCHES
The advanced techniques and technologies for developing Big Data science is with the purpose
of advancing and inventing the more sophisticated and scientific methods of managing,
analyzing, visualizing, and exploiting informative knowledge from large, diverse, distributed and
heterogeneous data sets.
In the following subsections, we will discuss several ongoing or underlying techniques and
technologies to harness Big Data, including granular computing, cloud computing, biological
computing systems and quantum computing.
10.1. Granular computing
When we talk about Big Data, the first property of it is its size. As granular computing (GrC) is a
general computation theory for effectively using granules such as classes, clusters, subsets,
groups and intervals to build an efficient computational model for complex applications with
huge amounts of data, information and knowledge, therefore it is very natural to employ granular
computing techniques to explore Big Data. Granular computing can reduce the data size into
different level of granularity. Under certain circumstances, some Big Data problems can be
readily solved in such way.
In GrC processing is carried out at different levels of information granules. The information
represented by different level of granules show up distinct knowledge, features, and patterns,
where the irrelevant features are hidden and valuable ones are highlighted. Taking satellite
images as an example, the interests of researches within the low-resolution images may are the
cloud patterns that present typhoons or other weather phenomena. However, in high-resolution
satellite images, the large-scale atmospheric phenomena are ignored and small targets appear,
such as a map of a city or a scene of a street. The same is generally true for all data. In different
granularities of information, different features and patterns emerge.
There are different types of granularity adopted in data mining and machine learning, including
variable granulation, variable transformation, variable aggregation, system granulation
(aggregation), concept granulation (component analysis), equivalence class granulation and
component granulation.
The information hidden in Big Data will be partially loss if the data size is reduced to small ones.
Not all the Big Data applications can use the GrC techniques as the process. It depends on the
confidence and accuracy of results required. For example, financial data in banks and
government are very sensitive and require high accuracy, and the sensor data generated by users
needs to be processed one by one. In such cases, GrC may not applicable, and other advanced
solutions are required.
10.2. Cloud computing
The development of virtualization technologies have made supercomputing more accessible and
affordable. Powerful computing infrastructures hidden in virtualization software make systems to
be like a true physical computer, but with the flexible specification of details such as number of

13
processors, memory and disk size, and operating system. The use of these virtual computers is
known as cloud computing, which has been one of the most robust Big Data techniques.
Cloud computing not only delivers applications and services over the Internet, it also has been
extended to infrastructure as a service, for example, Amazon EC2, Software as a Service (SaaS)
from a whole crew of companies starting at Salesforce and proceeding through NetSuite, Cloud9,
Jobscience and Zuora-a list that is almost never ending and platform as a service, such as Google
AppEngine and Microsoft Azure. It leads to the utility computing, i.e., pay-as-you-go computing.
Another bonus brought by cloud environment is cloud storage which provides a possible tool for
Big Data storage. Cloud storage has good extensibility and scalability in storing information.
Cloud computing is a highly feasible technology and attracts large number of businesses to
develop it and apply it to Big Data problems. Apart from its flexibility, cloud computing
addresses one of the challenges relating to transferring and sharing BIG data, because the data
sets and analysis results held in the cloud can be shared with others. There are a few
disadvantages in cloud computing. The obvious one is the time and cost that are required to
upload and download large quantities of data in the cloud environment. Furthermore, there are
certain privacy concerns related to the hosting of sensitive data on publicly accessible servers.
10.3. Bio-inspired computing
Human brains help us to rethink the way we interact with Big Data. Our brains do not need to
locate and view files with complex information sets. The information is partitioned and
individually stored as simple data elements in the brain tissue. The processing for information in
human brain is also executed in highly distributed and parallel ways. The multi-located storage
schema and synchronous parallel processing approaches make our brain working so fast and
efficiently.
Biological computing models are better alternatives for Big Data because they have mechanisms
with high-efficiency to organize, access and process data in ways that are more practical for the
infinite inputs we deal with every day.
Bio-computers is inspired and developed by biological molecules, such as DNA and proteins to
conduct computational calculations involving storing, retrieving, and processing data. A
significant feature of bio-computer is that it integrates biologically derived materials to perform
computational functions and receive intelligent and efficient performance. A bio-computer is
composed of a pathway or series of metabolic pathways involving biological materials that are
engineered to behave in a certain manner based on the input of the system. The resulting
pathway of reactions that takes place constitutes an output, which is based on the engineering
design of the bio-computer and can be interpreted as a form of computational analysis. There are
three kinds of distinguishable bio-computers, including biochemical computers, biomechanical
computers, and bio-electronic computers.
Bio-inspired computing enables human-like analysis of massive quantities of data. It is true that
the future constructed by bio-inspired technologies are so remarkable that large amount of funds
and human resources are poured into related research activities.

14
10.4. Quantum computing
A quantum computer has memory that is exponentially larger than its apparent physical size, and
can manipulate an exponential set of inputs simultaneously. If a real quantum computer existed
now, we could solve problems that are exceptionally difficult on current computers including
today’s Big Data problems. Although it is very hard to develop quantum computer, the main
technical difficulty in building a quantum computer could soon be the thing that makes it
possible to build one. For example, D-Ware Systems Company developed their quantum
computer, called ‘‘D-Wave one’’ with 128 qubits processor and ‘‘D-Wave two’’ with 512 qubits
processor on 2011 and 2013 respectively.
Quantum computing refers to harness and exploit the powerful laws of quantum mechanics to
process information. In a traditional computer, information is presented by long strings of bits
which encode either a zero or a one. Whereas, a quantum computer uses quantum bits or qubits.
Quantum computational operations have been executed under a small number of quantum bits in
practical experiments, as the theoretical research continues to be advanced. There are a number
of universities, institutes, national governments and military funding research groups working on
quantum computing studies to develop quantum computers for both civilian and national security
purposes.
11. Conclusion
As we have entered into the era of Big Data which is the next frontier for innovation,
competition and productivity, a new wave of scientific revolution is about to begin.
In this paper, a couple of potential techniques to solve the BIG data problem, including quantum
computing and biological computing are also proposed. Although these technologies are still
under development, but the breakthrough is expected to come shortly. Undoubtedly, today and
future’s Big Data problems will benefit from those progresses.
Big Data also means big systems, big challenges and big profits and current Big Data techniques
and technologies are very limited to solve the real Big Data problems completely. Therefore,
more scientific investments from both governments and enterprises should be poured into this
scientific paradigm to capture huge values from Big Data.
Fortunately, we are witnessing the birth and development of Big Data. Human resources, capital
investments and creative ideas are fundamental components of future development of Big Data.

15
REFERENCES
Chen, C. P., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and
technologies: A survey on Big Data. Information Sciences, 275, 314-347.

De Mauro, A., Greco, M., & Grimaldi, M. (2016). A formal definition of Big Data based on its essential
features. Library Review, 65(3), 122-135.

Laxmi, P. S. S., & Pranathi, P. S. (2015). Impact of big data analytics on business intelligence-scope of
predictive analytics. J. Curr. Eng. Technol, 5(2), 856-860.

Lukić, J. (2017). THE IMPACT OF BIG DATA TECHNOLOGIES ON COMPETITIVE ADVANTAGE OF


COMPANIES. Facta Universitatis, Series: Economics and Organization, 255-264.

Nasser, T., & Tariq, R. S. (2015). Big data challenges. J Comput Eng Inf Technol 4: 3. doi: http://dx. doi.
org/10.4172/2324, 9307(2).

Rajaraman, V. (2016). Big data analytics. Resonance, 21(8), 695-716.

Satyanarayana, L. (2015). A Survey on Challenges and Advantages in Big Data. International Journal of
Computer Science and Technology, 6(2), 115-119.

16

Вам также может понравиться