Вы находитесь на странице: 1из 5

Big Data Applications in Management

Individual Assignment-I
Submitted By: Dinesh Kumar
Roll No.: 16PT1-12
1. What is In-Memory Computing?
In memory computing is the technique where all the processing is completed by having the data in
Random Access Memory (RAM) of the computer, rather than the same being distributed among
databases across various machines. In-Memory computing delivers quick results for small and
moderately sized data samples however it is not a credible solution for large amount of data as the
system will be costlier because of memory required to conduct the operation.

Source: Techopedia.com
In-memory computing is
the storage of information
in the main random access
memory (RAM) of
dedicated servers rather
than in complicated
relational databases
operating on
comparatively slow disk
drives. In-memory
computing helps business
customers, including
retailers, banks and
utilities, to quickly detect
patterns, analyse massive
data volumes on the fly,
and perform their
operations quickly. The
drop in memory prices in the present market is a major factor contributing to the increasing
popularity of in-memory computing technology. This has made in-memory computing economical
among a wide variety of applications.
Many technology companies are making use of this technology. For example, the in-memory
computing technology developed by SAP, called High-Speed Analytical Appliance (HANA), uses a
technique called sophisticated data compression to store data in the random access memory.
HANA's performance is 10,000 times faster when compared to standard disks, which allows
companies to analyse data in a matter of seconds instead of long hours.
Some of the advantages of in-memory computing include:
 The ability to cache countless amounts of data constantly. This ensures extremely fast
response times for searches.
 The ability to store session data, allowing for the customization of live sessions and
ensuring optimum website performance.
 The ability to process events for improved complex event processing
2. What is Google Query, pros and cons?
Google query is a Google visualization API query language which lets us execute various data
manipulations with query to the data source. The query language provides the ability to send data
manipulation and formatting requests to the data source and ensure that the returned data
structure and the contents match the expected structure.
e.g.: Let’s suppose we need to have the data in a specific structure say, A Text label and a Numeric
value. The data within the data source may not exactly match this structure, it may have multiple
columns and the order of these columns may not match with the specific data structure which we
need, then using Google Query and manipulating the data we can meet our requirement.

Pros:
Syntax of Google query is similar to SQL Syntax. With a few additional features.
Query works well with Relational Databases.
With Query the code is closer to data – Less traffic bandwidth is used.

Cons:
Google query and its functionality is somewhat limited.
If the database is not structured, then it will face problems with the output.
We need to be clear with query about what we need to do with the data.
3. What is Google Big Table, pros and cons?
Google Bigtable is fully managed, massively scalable NOSQL database service, it
contains features such as redundant auto scaling storage, and seamless cluster
sizing has a low latency storage with HBase compatibility and is fully managed by
Google’s own Bigtable operators.

Source: Wikipedia.org
“Bigtable maps two arbitrary string values (row key and column key) and timestamp (hence three-
dimensional mapping) into an associated arbitrary byte array. It is not a relational database and
can be better defined as a sparse, distributed multi-dimensional sorted map. Bigtable is designed
to scale into the petabyte range across "hundreds or thousands of machines, and to make it easy
to add more machines to the system and automatically start taking advantage of those resources
without any reconfiguration”.
Each table has multiple dimensions (one of which is a field for time, allowing
for versioning and garbage collection). Tables are optimized for Google File System (GFS) by being
split into multiple tablets – segments of the table are split along a row chosen such that the tablet
will be ~200 megabytes in size. When sizes threaten to grow beyond a specified limit, the tablets
are compressed using the algorithm BMDiff and the Zippy compression algorithm publicly known
and open-sourced as Snappy, which is a less space-optimal variation of LZ77 but more efficient in
terms of computing time. The locations in the GFS of tablets are recorded as database entries in
multiple special tablets, which are called "META1" tablets. META1 tablets are found by querying
the single "META0" tablet, which typically resides on a server of its own since it is often queried by
clients as to the location of the "META1" tablet which itself has the answer to the question of where
the actual data is located. Like GFS's master server, the META0 server is not generally
a bottleneck since the processor time and bandwidth necessary to discover and transmit META1
locations is minimal and clients aggressively cache locations to minimize queries.”
Pros:
Fully managed by Google’s own Operators
Redundant Autoscaling Storage
Security and Permissions
Seamless Cluster Sizing
Low Latency
HBase Compatible
Global Availability

Cons:
It’s a NoSQL database
Fewer options for Accessing Data (lack of built-in Indexing Capability)
Bigtable can be classified as a SaaS (Software as a Service) and thus Scalability can be an issue
It lacks the freeform nature of json documents

4. What is Data and Storage Virtualization?


Data or Storage virtualization is the process of grouping of physical (actual) data which is stored
among various machines/networks is such a manner that it looks like a single Data or Storage. This
means that while processing the data even though the data is scattered among various
machines/networks the program will consider it as a single unit and processing will be done
accordingly.
Storage Virtualization is also known as cloud storage.

Source: Techopedia.com
“Storage virtualization is the process of grouping the physical storage from multiple network
storage devices so that it looks like a single storage device.
The process involves abstracting and covering the internal functions of a storage device from the
host application, host servers or a general network in order to facilitate the application and
network-independent management of storage.
The management of storage and data is becoming difficult and time consuming. Storage
virtualization helps to address this problem by facilitating easy backup, archiving and recovery tasks
by consuming less time. Storage virtualization aggregates the functions and hides the actual
complexity of the storage area network (SAN).
Storage virtualization can be implemented by using software applications or appliances. There are
three important reasons to implement storage virtualization:
1. Improved storage management in a heterogeneous IT environment
2. Better availability and estimation of down time with automated management
3. Better storage utilization
Storage virtualization can be applied to any level of a SAN. The virtualization techniques can also
be applied to different storage functions such as physical storage, RAID groups, and logical unit
numbers (LUNs), LUN subdivisions, storage zones and logical volumes, etc.

The storage virtualization model can be divided into four main layers:
1. Storage devices
2. Block aggregation layer
3. File/record layer
4. Application layer
Some of the benefits of storage virtualization include automated management, expansion of
storage capacity, reduced time in manual supervision, easy updates and reduced downtime.”

5. Hyper-V technology?
Hyper-V technology is a Server Virtualization technology that uses virtualization services can be
produced using hypervisor-based emulations. Using Microsoft Hyper-V server hypervisor it enables
consolidation of a Single server into many virtual servers, all the while using/sharing the hardware
resources of the host server and powered by Hyper-V.

Source: Wikipedia.org
“Microsoft Hyper-V,
codenamed Viridian and
formerly known as Windows
Server Virtualization, is a
native hypervisor; it can
create virtual machines on
x86-64 systems running
Windows. Starting with
Windows 8, Hyper-V
superseded Windows
Virtual PC as the hardware
virtualization component of
the client editions of
Windows NT. A server
computer running Hyper-V
can be configured to expose
individual virtual machines
to one or more networks.”

Вам также может понравиться