Вы находитесь на странице: 1из 74

BIG DATA Final Presentation

By: Hemanth Aroumougam


Friday, April 4, 14

During the rst generation....

Friday, April 4, 14

Employees in companies started entering data into computer systems

Friday, April 4, 14

As the second generation comes...

Friday, April 4, 14

Friday, April 4, 14

But now as generations move on there is a third one to this list and it is...
Friday, April 4, 14

Now a days even machines are automatically entering data into computer systems.

Friday, April 4, 14

Friday, April 4, 14

Friday, April 4, 14

BIG DATA is the term for a collection of data sets so large and complex that it becomes difcult to process using on-hand database management tools or traditional data processing applications.

Friday, April 4, 14

Big data is a popular term used to describe


the exponential growth and availability of data, both structured and unstructured.

Friday, April 4, 14

Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difcult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity.

Friday, April 4, 14

In BIG DATA there are 3Vs which are the dening properties and the dimensions of Big Data
Friday, April 4, 14

The 3Vs are...


Friday, April 4, 14


Friday, April 4, 14

Volume Variety Velocity

Volume-

Big Volume consists of simple SQL analytics and with complex non-SQL analytics. In other words volume refers to the amount of data.

Friday, April 4, 14

SQL
SQL Stands for Structured Query Language. SQL is a standardized query language for
requesting information from a database. database system in 1979 by the Oracle Corporation.

SQL was rst introduced as a commercial

Friday, April 4, 14

Historically, SQL has been the favorite query language for database management systems running on minicomputers and mainframes.

Volume
Petabyte (PB) Terabyte (TB) Gigabyte (GB) Megabyte (MB) Kilobyte (KB)
Friday, April 4, 14

Variety-

Large number of diverse data sources to integrate. In other words variety is basically referring to the number of different types of data.

Friday, April 4, 14

Structured Data

VARIETY

Semi structured Data Unstructured Data

Friday, April 4, 14

Structured Data

Structured Data is data that resides in a xed eld within a record or le is called structured data. This includes data contained in relational databases and spreadsheets. Structured data has the advantage of being easily entered, stored, queried and analyzed.

Friday, April 4, 14

EXAMPLES OF STRUCTURED DATA


Library Catalogues (date, author, place, subject, etc) Census records (birth, income, employment, place etc.) Phone numbers (and the phone book) Economic data (GDP, PPI, ASX etc.) XML-TEI (bringing structure to the text through tagging particular elements like versions of the word canal in 17th C Dutch. Databases Data warehouse Enterprise systems (CRM, ERP, etc)

Friday, April 4, 14

Semi structured Data


Semi-structured data is a form of
structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables

Friday, April 4, 14

EXAMPLES OF SEMI STRUCTURED DATA

Web Pages Information Integration XML

Friday, April 4, 14

Unstructured Data
Unstructured Data refers to information
that either does not have a pre-dened data model or is not organized in a predened manner. Unstructured information is typically text-heavy. In other words unstructured data is something that is at the other end of the spectrum. It might be in any form: text, audio, video. We denitely dont know from looking at the data what it means ,unless we apply human understanding to it.

Friday, April 4, 14

EXAMPLES OF UNSTRUCTURED DATA


Book Story Heavy text audio video RSS Feeds Word documents Excel Spreadsheets Email messages
Friday, April 4, 14

Velocity-

Velocity is basically referring to the speed in which the data is processed.

Friday, April 4, 14

TYPES OF VELOCITY
REAL TIME ANALYSIS

NEAR REAL TIME

PERIODIC

BATCH
Friday, April 4, 14

Benets of Batch Processing.


It can shift the time of job processing to when the computing resources are less busy. It avoids idling the computing resources with minute-by-minute manual intervention and supervision. By keeping high overall rate of utilization, it amortizes the computer, especially an expensive one. It allows the system to use different priorities for batch and interactive work. Rather than running one program multiple times to process one transaction each time, batch processes will run the program only once for many transactions, reducing system overhead.

Friday, April 4, 14

Friday, April 4, 14

Friday, April 4, 14

Friday, April 4, 14

ORACLE BIG DATA SOLUTION

Oracle is the rst vendor to offer a complete and integrated solution to address the full spectrum of enterprise big data requirements. Oracles big data strategy is centered on the idea that you can extend your current enterprise information architecture to incorporate big data. New big data technologies, such as Hadoop and Oracle NoSQL database, run alongside your Oracle data warehouse to deliver business value and address your big data requirements.

Friday, April 4, 14

Friday, April 4, 14

Advantages and Disadvantages of BIG DATA

Friday, April 4, 14

ADVANTAGES

Friday, April 4, 14

Data mining allows uses are that you can nd correlations easier More calculated now therefore accuracy is higher Data is now combined into a big mass which allows for links to be found For example: company with decades of information can make use of Big Data and data analysis to create competitive advantages and open new business opportunities Started because companies have been nding it hard to manage all their data! Creates new growth opportunities, lots of jobs

DISADVANTAGES

Big risks on security and privacy Challenges arise: expensive, need to spend a lot to get it working A lot of analyzing: uncover patterns, apply algorithms, connections relationships Still need specialization regarding the analysts; hard to nd the right skill set

Friday, April 4, 14

BIG DATA Softwares

Friday, April 4, 14

MongoDBMongo, Inc

Hadoop- Apache Foundation

Friday, April 4, 14


Friday, April 4, 14

Apache Hadoop is an open source data framework for storage and large scale processing for data sets on clusters of commodity hardwares. It is licensed under the Apache License 2.0. !The Apache Hadoop framework is composed of the following modules: Hadoop Common contains libraries and utilities needed by other Hadoop modules. Hadoop Distributed File System (HDFS) a distributed le-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. Hadoop YARN a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Hadoop MapReduce a programming model for large scale data processing. This is written in- Java

MongoDB is a big data software which came from the word humongous. MongoDB is a cross-platform document-oriented database. A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. This is classied as NoSQL. !A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. MarkLogic is an American Business company that makes NoSQL database. Language written in- C++

Friday, April 4, 14

Friday, April 4, 14

Enterprise NoSQL Database Technology Best Big Data Search

Real-time Your Hadoop


Friday, April 4, 14

Enterprise NoSQL Database Technology

For more than a decade, MarkLogic has delivered a powerful, agile, and trusted enterprise-grade NoSQL (Not Only SQL) database that enables organizations to turn all data into valuable and actionable information. Key features include ACID transactions, horizontal scaling, real-time indexing, high availability, disaster recovery, governmentgrade security, and more.

Friday, April 4, 14

Best Big Data Research


MarkLogics scale-out, real-time platform is more than a search engine linked to a content repository it is the most complete platform for building search-oriented applications.


Friday, April 4, 14

Search all data for more value. Bring all relevant content back to users unstructured and structured, internal and public. Real-time updates. Real-time results. When documents are updated or inserted, they are available for search immediately. Able to query all types of data. Structured, semi-structured, and unstructured content are all supported within the same queries. Real-time alerts for fast response. MarkLogic has the highest performance alerting engine available, capable of running millions of custom queries on each and every change to the document repository no polling required. Search you can bank on. Businesses that count on revenue through paid content search and retrieval trust MarkLogic to deliver.

Real Time your Hadoop


Seamlessly combine the power of MapReduce with MarkLogics real-time, interactive analysis and indexing on a single, unied platform.

Get more power out of Hadoop. Hadoop and MarkLogic together can allow you to tackle problems that would be difcult or impossible to address by either technology alone. Save money by leveraging common infrastructure. Using MarkLogic and Hadoop Distributed File System (HDFS) enables common batchprocessing infrastructure to be used across many different projects and applications. Enterprise-class support for Hadoop. Our partnership with Intel provides a strong, supported platform for building secure, enterprise-class Big Data Applications with Apache Hadoop.

Friday, April 4, 14

Friday, April 4, 14

Some points of what can you accomplish with BIG DATA?

Friday, April 4, 14

Dialogue with Consumers



Todays consumers are a tough nut to crack. They look around a lot before they buy. You want to make customers to buy your products. Big Data allows you to prole these increasingly vocal and ckle little tyrants in a far-reaching manner so that you can engage in an almost one-on-one, real-time conversation with them. This is not actually a luxury. If you dont treat them like they want to, they will leave you in the blink of an eye.

Friday, April 4, 14

Re-develop your Products



Big Data can also help you understand how others perceive your products so that you can adapt them. Analysis of unstructured social media text allows you to uncover the sentiments of your customers and even segment those in different geographical locations or among different demographic groups.

Friday, April 4, 14

Perform Risk Analysis

Success not only depends on how you run your company. Social and economic factors are crucial for your accomplishments as well. ! Predictive analytics, fueled by Big Data allows you to scan and analyze newspaper reports or social media feeds so that you permanently keep up to speed on the latest developments in your industry and its environment. Detailed health-tests on your suppliers and customers are another goodie that comes with Big Data. This will allow you to take action when one of them is in risk of defaulting.

Friday, April 4, 14

Keeping your data safe



You can map the entire data landscape across your company with Big Data tools, thus allowing you to analyze the threats that you face internally. You will be able to detect potentially sensitive information that is not protected in an appropriate manner and make sure it is stored according to regulatory requirements.

Friday, April 4, 14

Friday, April 4, 14

Where they use BIG DATA and How?


Friday, April 4, 14

Big Data is used in many elds like....

Friday, April 4, 14

Fault Logging and cost predictions-

Car Makers

Car makers place hundreds of sensors on components around the car which constantly log data on performance and faults. All of this data can be used to reengineer designs for more efcient products and to predict what the strain of warranty repairs are likely to be on cost and man resource.

Friday, April 4, 14

Friday, April 4, 14

TOYOTA
WHERE From Factories and from sensors Data Center(Headquarters) NEEDS BENEFITS Safety and Quality Analysis Feedback from Design

Friday, April 4, 14

Finance

B2B supplier proling- Finance professionals can use big
data to check on the health of their suppliers and business partners. They can monitor a variety of indicators including when creditors pay their bills and whether there is any change

Fraud detection-Companies like Visa are using big data to


create fraud detection models which can ag up potential fraudsters.

Friday, April 4, 14

VISA
WHERE Where ever they buy Data Center(Headquarters) NEEDS Detect Fraud, Customers Behavior

BENEFITS Personal Recommendation

Friday, April 4, 14

General Manufacturing

!Simulations- Manufacturers can take real data from their


products on the market and then run simulations based on what would happen if they changed one particular component or design aspect. They can then nd ways to make the product cheaper, more reliable or more environmentally friendly. The Formula 1 racing teams are particularly adept in this area, as are advanced aerospace companies.

!Expanded product design modeling- Similarly, with


new big-data enabled computer aided design programs, product designers can substitute components or materials from huge databases and then access in-depth information on how this affects the nal product, including the ramications on cost, production processes, environmental effects, legislative requirements, supply chain and so on.!

Friday, April 4, 14

Friday, April 4, 14

GM
WHERE Several Branches Data Center(GM Headquarters in Gurgaon ) NEEDS Safety and Quality Analysis.

BENEFITS

Awareness and Indication on what to x.

Friday, April 4, 14

Policing
!Suspect tracking- By combining CCTV images, facial
recognition software, travel trends and identiers on travel cards, police forces can capture criminals by automatically linking people to their likely destinations on buses and metro systems. This allows police to catch those that they miss at the scene of the crime and also to control arrest statistics, meeting targets for arrests in one London borough, for instance, as needed.

Friday, April 4, 14

Friday, April 4, 14

CBI
WHERE Several Branches Data Center(CBI Headquarters in Delhi) NEEDS To identify persons behavior and actions Give awareness for what that person is going to do next. What is their next plan?

BENEFITS

Friday, April 4, 14

Utilities (oil & gas)

Asset monitoring- As with the machines in manufacturing


plants, the utilities companies use big data to keep track on all of their assets spread across a country, continent or the globe. This enables them to x any broken asset (such as a sewage cleansing plant, a leaking pipe or a gas pump), perform pre-emptive running maintenance or isolate areas in which repair actions have been ineffective.

Friday, April 4, 14

Friday, April 4, 14

CHEVRON
WHERE From the Machines in the Manufacturing plants Data Center(ChevronHeadquarters) To keep track of what is going on in the NEEDS Manufacturing plant. Like broken pipes, leakage and etc... BENEFITS This gives them feedback from designs so they know how to improve the construction of the manufacturing plant because that is their main source of how they get oil and gas.

Friday, April 4, 14

Retail and Marketing

Mood mapping- Retailers use feeds from social networks to


build an understanding of how their products and company reputation is seen among the public. With the constant streams of opinions from Facebook, Twitter, Google+ and the like, companies are able to cheaply and quickly gather large samples of customer opinion.

Friday, April 4, 14

Friday, April 4, 14

Friday, April 4, 14

Air Jordan
WHERE From Social Media Networking Sites Data Center(Air Jordan Headquarters) NEEDS Customers behavior, helps to nd out opinions and feelings, feedback of their brand. BENEFITS This gives them feedback on what the customers are thinking about their product. Gives feedback from audiences to improve their product.
Friday, April 4, 14

THANK YOU !!!


Friday, April 4, 14

Вам также может понравиться