You are on page 1of 37

Big Data

Ni dung

Gii thiu Big Data Cc thnh phn Big Data

T chc lu tr d liu BigData

Gii thiu BIG DATA

BIG DATA ?
L thut ng ch s lng cc ln d liu v cc h s khch hng, m thanh, hnh nh, vn bn

BIG DATA ?

D liu c s lng ln cn c lu tr nh
Truyn thng: thng tin khch hng, giao dch Thu thp t ng qua cm bin: thi tit, nht k Mng x hi: comment trn facebook, twitter S lng Tc a dng Gi tr

c trng

Big Data

Dung lng

Nhu cu lu tr ngy cng tng Lm th no qun l? D liu cng ln th:


Kh nng x l gim? Phn tch d liu gim Truy xut chm

2000: 800000 (PB) lu tr trn th gii(*) 2020: 35 ZB trn ton th gii?(*)

(*)S liu t IBM 1ZB = 1021 bytes 1PB = 1015 bytes

a dng

D liu n t nhiu ngun:


Cm bin Smart device Mng x hi Tin tc

D liu phc tp

Truyn thng v khng truyn thng C cu trc, bn cu trc, khng cu trc


8

Tc

Khi lng d liu l rt ln tc truy xut chm Yu cu t ngi s dng:


Nhanh n nh Chnh xc

Tm quan trng Big Data


L s tn ti ca doanh nghip Mang ti s hiu bit su sc hn cho doanh nghip

10

Cc thnh phn Big Data

11

Cc thnh phn

12

Cc thnh phn
Qun l, lu tr d liu: c s h tng lu tr d liu, v ngun thao tc n. Phn tch d liu: cng ngh v cc cng c phn tch cc d liu v thu thp hiu bit su sc t n S dng d liu: a d liu ln phn tch phc v trong Kinh doanh thng minh v cc ng dng ca ngi dng cui

13

Qun l d liu

H d liu c cu trc
H thng qun l c s d liu quan h(RDBMS): lu tr v thao tc d liu c cu trc. H thng MPP: tp hp d liu s ngy cng ln thm v tng cng d liu tng trng. Kho d liu: tp hp v lu tr d liu cho cc bo co sau ny. Hn ch Kh m rng, hiu sut

14

Qun l d liu
H d liu khng cu trc: ph hp cho vic lu tr d liu c cu trc phc tp v d dng m rng D liu D liu c cu trc v khng c cu trc Ly t nhiu ngun vi kch c khc nhau D liu thng rt ln, yu cu tc x l cao Yu cu t chc d liu p ng:

15

Phn tch d liu


L ni m cc cng ty bt u trch xut gi tr d liu ln. Lin quan ti vic pht trin cc ng dng v s dng cc ng dng t c ci nhn su sc vo d liu ln. Xy dng cc tool phn tch d liu

16

S dng d liu
L cc hot ng trn d liu c phn tch

17

T chc lu tr d liu BigData

18

Hadoop

Gii thiu v Hadoop Cc thnh phn ca Hadoop HDFS (Hadoop Distributed file System)

19

Hadoop l g?
Mt nn tng ng dng h tr cc ng dng phn tn vi d liu rt ln

Hng terabyte Hng ngn node

Cung cp phng tin lu tr d liu trn nhiu node, h tr ti u ha lu lng mng.

Apache hadoop is a framework that allows for the distributed processing of

20

S dng Hadoop- L do
- Opensource, c mt cng ng ngi s dng ng o nn se c c s h tr - Lu tr va x ly d liu phn tn, c kha nng x ly mt lung ln d liu song song - D dng m rng do h thng uc chia thnh cc module c lp - S dng thnh phn lu tr HDFS c kha nng lu tr mt lng d liu ln kt hp vi Map Reduce cho phep x ly d liu song song lm tng hiu sut - C th vit cc chng trnh Map Reduce trong Java- mt ngn ng n gin, thng dng va c kha nng p ng nhu cu x ly mnh me

21

Companies using Hadoop


Yahoo Google Facebook Amazon AOL IBM And many more at

http://wiki.apache.org/hadoop/PoweredB y
22

Thnh phn ca Hadoop


X l (MapReduce): mt framework gip pht trin cc ng dng phn tn theo m hnh MapReduce mt cch d dng v mnh me. Lu tr (HDFS): h thng file phn tn, cung cp kh nng lu tr d liu khng l v tnh nng ti u ho vic s dng bng thng gia cc node.

23

Hadoop Distributed file System (HDFS)


HDFS is a file system designed for storing very large files with streaming data access patterns, running clusters on commodity hardware.

24

HDFS

25

Kin trc ca HDFS

26

Kin trc ca HDFS


Name node: ng vai tr l master ca h thng HDFS, qun l thng tin cc file, block id tng ng cho tng file Block: n v lu tr d liu nh nht

Hadoop dng mc nh 64MB/block Mt file chia lm nhiu block Cc block cha bt k node no trong cluster

DataNode: Cha cc block


27

Kin trc ca HDFS


JobTracker: tip nhn cc yu cu thc thi cc MapReduce job.

Phn chia job v giao task cho task tracker Qun l tnh trng ca tng node

TaskTracker:

Nhn cc task t jobTracker v thc hin task

28

C ch hot ng HDFS

29

C ch hot ng HDFS

client yu cu c d liu t Name Node, namenode tr v v tr cc block ca d liu Chng trnh trc tip yu cu d liu ti cc node

30

C ch hot ng HDFS

Ghi

Ghi theo dng ng ng (pipeline) client yu cu thao tc ghi Name Node Namenode kim tra quyn ghi v m bo file khng tn ti Cc bn sao ca block to thnh ng ng d liu tun t c ghi vo

31

u im

Hadoop Distributed file System

Lu tr c lng file rt ln Truy cp d liu theo dng Lin kt d liu n gin Phn cng ph thng, a dng T ng pht hin li, phc hi d liu nhanh
C tr truy cp Khng th lu tr qu nhiu file trn cng 1 cluster
32

Nhc im

Hadoop Common

Tp hp cc th vin h tr cho Hadoop Bao gm tp cc lnh

Cat copy file ti b ra chun(stdout) Chmod chuyn quyn c v ghi cho mt file Chown chuyn quyn s hu ca mt file hoc 1 tp hp file

33

MapReduce
Qun l tin trnh song song, phn tn, sp xp lch trnh I/O Qun l trng thi d liu Qun l s lng ln d liu c quan h ph thuc nhau X l li Tru tng ha vi lp trnh vin

34

MapReduce

35

Cc trnh qun l Bigdata hin nay

7drgtrgtsrf

36

Ti liu tham kho


Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society (Randal E. Bryant Carnegie Mellon University, Randy H. Katz University of California, Berkeley, Edward D. Lazowska University of Washington) Understanding the Elements of Big Data: More than a Hadoop Distribution(Martin Hall, Founder, Karmasphere) Big Data The power and possibilities of Big Data Basic Data Analysis Tutorial Oracle: Big Data for the enterprise

37