Академический Документы
Профессиональный Документы
Культура Документы
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-1
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-2
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-3
Chapter
Topics
Mul:-Dataset
Opera:ons
with
Pig
Techniques
for
Combining
Data
Sets
Joining
Data
Sets
in
Pig
Set
OperaEons
SpliMng
Data
Sets
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-4
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-5
Stores
A
B
C
D
E
F
Anchorage
Boston
Chicago
Dallas
Edmonton
Fargo
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-6
A
B
C
D
E
F
Anchorage
Boston
Chicago
Dallas
Edmonton
Fargo
Salespeople
1
2
3
4
5
6
7
8
9
10
Alice
Bob
Carlos
Dieter
tienne
Fredo
George
Hannah
Irina
Jack
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
B
D
F
A
F
C
D
B
C
10-7
This
collects
values
from
both
data
sets
into
a
new
rela:on
As
before,
the
new
relaEon
is
keyed
by
a
eld
named
group
This
group
eld
is
associated
with
one
bag
for
each
input
(group, {bag of records}, {bag of records})
store_id
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-8
Example
of
COGROUP
Stores
A
B
C
D
E
F
Anchorage
Boston
Chicago
Dallas
Edmonton
Fargo
Salespeople
1
2
3
4
5
6
7
8
9
10
Alice
Bob
Carlos
Dieter
tienne
Fredo
George
Hannah
Irina
Jack
B
D
F
A
F
C
D
B
C
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-9
Chapter
Topics
Mul:-Dataset
Opera:ons
with
Pig
Techniques
for
Combining
Data
Sets
Joining
Data
Sets
in
Pig
Set
OperaEons
SpliMng
Data
Sets
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-10
Join
Overview
The
COGROUP
operator
creates
a
nested
data
structure
Pig
La:ns
JOIN
operator
creates
a
at
data
structure
Similar
to
joins
in
a
relaEonal
database
A
JOIN
is
similar
to
doing
a
COGROUP
followed
by
a
FLATTEN
Though
they
handle
null
values
dierently
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-11
Key
Fields
Like
COGROUP,
joins
rely
on
a
eld
shared
by
each
rela:on
joined = JOIN stores BY store_id, salespeople BY store_id;
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-12
Inner
Joins
The
default
JOIN
in
Pig
La:n
is
an
inner
join
joined = JOIN stores BY store_id, salespeople BY store_id;
An
inner
join
outputs
records
only
when
the
key
is
found
in
all
inputs
In
the
above
example,
stores
that
have
at
least
one
salesperson
You
can
do
an
inner
join
on
mul:ple
rela:ons
in
a
single
statement
But
you
must
use
the
same
key
to
join
them
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-13
Anchorage
Boston
Chicago
Dallas
Edmonton
Fargo
salespeople
1
2
3
4
5
6
7
8
9
10
Alice
Bob
Carlos
Dieter
tienne
Fredo
George
Hannah
Irina
Jack
B
D
F
A
F
C
D
B
C
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-14
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-15
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-16
Outer
Joins
Pig
La:n
allows
you
to
specify
the
type
of
join
following
the
eld
name
Inner
joins
do
not
specify
a
join
type
joined = JOIN relation1 BY field [LEFT|RIGHT|FULL] OUTER,
relation2 BY field;
An
outer
join
does
not
require
the
key
to
be
found
in
both
inputs
Outer
joins
require
Pig
to
know
the
schema
for
at
least
one
rela:on
Which
relaEon
requires
schema
depends
on
the
join
type
Full
outer
joins
require
schema
for
both
relaEons
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-17
Anchorage
Boston
Chicago
Dallas
Edmonton
Fargo
salespeople
1
2
3
4
5
6
7
8
9
10
Alice
Bob
Carlos
Dieter
tienne
Fredo
George
Hannah
Irina
Jack
B
D
F
A
F
C
D
B
C
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-18
Anchorage
Boston
Chicago
Dallas
Edmonton
Fargo
salespeople
1
2
3
4
5
6
7
8
9
10
Alice
Bob
Carlos
Dieter
tienne
Fredo
George
Hannah
Irina
Jack
B
D
F
A
F
C
D
B
C
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-19
Anchorage
Boston
Chicago
Dallas
Edmonton
Fargo
salespeople
1
2
3
4
5
6
7
8
9
10
Alice
Bob
Carlos
Dieter
tienne
Fredo
George
Hannah
Irina
Jack
B
D
F
A
F
C
D
B
C
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-20
Chapter
Topics
Mul:-Dataset
Opera:ons
with
Pig
Techniques
for
Combining
Data
Sets
Joining
Data
Sets
in
Pig
Set
Opera:ons
SpliMng
Data
Sets
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-21
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-22
Anchorage
Boston
Dallas
Alice
Bob
Hannah
Jack
B
D
B
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-23
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-24
UNION
Example
june
Adapter
Battery
Cable
DVD
HDTV
549
349
799
1999
79999
july
Fax
GPS
HDTV
Ink
17999
24999
65999
3999
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-25
Chapter
Topics
Mul:-Dataset
Opera:ons
with
Pig
Techniques
for
Combining
Data
Sets
Joining
Data
Sets
in
Pig
Set
OperaEons
SpliBng
Data
Sets
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-26
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-27
SPLIT
Example
Split
customers
into
groups
for
rewards
program,
based
on
life:me
value
customers
Annette
Bruce
Charles
Dustin
Eva
Felix
Glynn
Henry
Ian
Jeff
Kai
Laura
Mirko
9700
23500
17800
21250
8500
9300
27800
8900
43800
29100
34000
7800
24200
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-28
EssenEal
Points
You
can
use
COGROUP
to
group
mul:ple
rela:ons
This
creates
a
nested
data
structure
Pig
supports
common
SQL
join
types
Inner,
leO
outer,
right
outer,
and
full
outer
You
may
need
to
fully-qualify
eld
names
when
using
joined
data
Pigs
CROSS
operator
creates
every
possible
combina:on
of
input
data
This
can
create
huge
amounts
of
data
use
it
carefully!
You
can
use
a
UNION
to
concatenate
data
sets
In
addi:on
to
combining
data
sets,
Pig
supports
spliBng
them
too
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-29
Bibliography
The
following
oer
more
informa:on
on
topics
discussed
in
this
chapter
For
Pig
group
and
cogroup
informa:on:
http://pig.apache.org/docs/r0.10.0/
basic.html#GROUP
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-30
Extending
Pig
Chapter
10.2
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-31
Extending
Pig
How
to
use
parameters
in
your
Pig
La:n
to
increase
its
exibility
How
to
dene
and
invoke
macros
to
improve
the
reusability
of
your
code
How
to
call
user-dened
func:ons
from
your
code
How
to
write
user-dened
func:ons
in
Python
How
to
process
data
with
external
scripts
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-32
Chapter
Topics
Extending
Pig
Adding
Flexibility
with
Parameters
Macros
and
Imports
UDFs
Contributed
FuncEons
Using
Other
Languages
to
Process
Data
with
Pig
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-33
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-34
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-35
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-36
Use
-m filename
opEon
to
tell
Pig
which
le
contains
the
values
Parameter
values
can
be
dened
with
the
output
of
a
shell
command
For
example,
to
set
MONTH
to
the
current
month:
MONTH=`date +'%m'`
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-37
Chapter
Topics
Extending
Pig
Adding
Flexibility
with
Parameters
Macros
and
imports
UDFs
Contributed
FuncEons
Using
Other
Languages
to
Process
Data
with
Pig
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-38
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-39
10-40
Invoking
Macros
To
invoke
a
macro,
call
it
by
name
and
supply
values
in
the
correct
order
define calc_commission (NAME, SPLIT_AMT, LOW_PCT, HIGH_PCT)
returns result {
allsales = LOAD 'sales' AS (name, price);
... (other code removed for brevity) ...
$result = FOREACH grouped GENERATE SUM(commissions.amount);
};
alice_comm = calc_commission('Alice', 1000, 0.07, 0.12);
carlos_comm = calc_commission('Carlos', 2000, 0.08, 0.14);
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-41
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-42
Chapter
Topics
Extending
Pig
Adding
Flexibility
with
Parameters
Macros
and
Imports
UDFs
Contributed
FuncEons
Using
Other
Languages
to
Process
Data
with
Pig
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-43
Java
All
Python
JavaScript (experimental)
Ruby (experimental)
Groovy (experimental)
In
the
next
few
slides,
you
will
see
how
to
use
UDFs
in
Java,
and
how
to
write
and
use
UDFs
in
Python
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-44
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-45
Alice
Bob
Carlos
David
(314) 555-1212
212.555.9753
405-555-3912
(202) 555.8471
We will write a Python UDF that can consistently extract the area code
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-46
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-47
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-48
Chapter
Topics
Extending
Pig
Adding
Flexibility
with
Parameters
Macros
and
Imports
UDFs
Contributed
Func:ons
Using
Other
Languages
to
Process
Data
with
Pig
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-49
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-50
Piggy
Bank
Piggy
Bank
ships
with
Pig
You
will
need
to
register
the
piggybank.jar
le
The
locaEon
may
vary
depending
on
source
and
version
In
CDH
on
our
VMs,
it
is
at
/usr/lib/pig/piggybank.jar
Some
UDFs
in
Piggy
Bank
include
(package
names
omiked
for
brevity)
Class
Name
Descrip:on
ISOToUnix
UnixToISO
LENGTH
HostExtractor
DiffDate
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-51
DataFu
DataFu
does
not
ship
with
Pig,
but
is
part
of
CDH
4.1.0
and
later
You
will
need
to
register
the
DataFu
JAR
le
In
VM,
at
/usr/lib/pig/datafu-0.0.4-cdh4.2.0.jar
Some
UDFs
in
DataFu
include
(package
names
omiked
for
brevity)
Class
Name
Descrip:on
Quantile
Median
Sessionize
HaversineDistInMiles
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-52
-122.401385
40.707555
-74.011679
REGISTER '/usr/lib/pig/datafu-*.jar';
DEFINE DIST datafu.pig.geo.HaversineDistInMiles;
Input data
Pig LaEn
(2564.207116295711)
Output data
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-53
Chapter
Topics
Extending
Pig
Adding
Flexibility
with
Parameters
Macros
and
Imports
UDFs
Contributed
FuncEons
Using
Other
Languages
to
Process
Data
with
Pig
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-54
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-55
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-56
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-57
Output data
Input
data
andy
betty
chuck
debbie
1963-11-15
1985-12-30
1979-02-23
1982-09-19
(andy,49)
(betty,27)
(chuck,34)
(debbie,30)
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-58
EssenEal
Points
Pig
supports
several
extension
mechanisms
Parameters
and
macros
can
help
make
your
code
more
reusable
And
easier
to
maintain
and
share
with
others
Piggy
Bank
and
DataFu
are
two
examples
of
open
source
UDFs
You
can
also
write
your
own
UDFs
It
is
also
possible
to
embed
Pig
within
another
language
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-59
Bibliography
The
following
oer
more
informa:on
on
topics
discussed
in
this
chapter
Documenta:on
on
Parameter
Subs:tu:on
in
Pig
http://tiny.cloudera.com/dac07a
Documenta:on
on
Macros
in
Pig
http://tiny.cloudera.com/dac07b
Documenta:on
on
User-Dened
Func:ons
in
Pig
http://tiny.cloudera.com/dac07c
Documenta:on
on
Piggy
Bank
http://tiny.cloudera.com/dac07d
Introducing
Data
Fu
http://tiny.cloudera.com/dac07e
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-60
Bibliography
(contd)
The
following
oer
more
informa:on
on
topics
discussed
in
this
chapter
Details
about
the
other
supported
languages
can
be
found
in
the
latest
documenta:on
on
the
Pig
Web
site:
http://pig.apache.org/docs/r0.11.1/udf.html
For
details
on
Python
UDFs,
see
http://pig.apache.org/docs/r0.10.0/
udf.html#python-udfs
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-61
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-62
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-63
Chapter
Topics
Pig
Troubleshoo:ng
And
Op:miza:on
Troubleshoo:ng
Pig
Logging
Using
Hadoops
Web
UI
Data
Sampling
and
Debugging
Performance
Overview
Understanding
the
ExecuEon
Plan
Tips
for
Improving
the
Performance
of
Your
Pig
Jobs
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-64
TroubleshooEng
Overview
We
have
now
covered
how
to
use
Pig
for
data
analysis
Unfortunately,
someEmes
your
code
may
not
work
as
you
expect
It
is
important
to
remember
that
Pig
and
Hadoop
are
intertwined
Here
we
will
cover
some
techniques
for
isola:ng
and
resolving
problems
We
will
start
with
a
few
opEons
to
the
Pig
command
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-65
Helping
Yourself
We
will
discuss
some
op:ons
for
the
pig
command
in
this
chapter
You
can
view
all
of
them
by
using
the
-h
(help)
opEon
Keep
in
mind
that
many
opEons
are
advanced
or
rarely
used
One
useful
op:on
is
-c
(check),
which
validates
the
syntax
of
your
code
$ pig -c myscript.pig
myscript.pig syntax OK
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-66
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-67
Chapter
Topics
Pig
Troubleshoo:ng
And
Op:miza:on
TroubleshooEng
Pig
Logging
Using
Hadoops
Web
UI
Data
Sampling
and
Debugging
Performance
Overview
Understanding
the
ExecuEon
Plan
Tips
for
Improving
the
Performance
of
Your
Pig
Jobs
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-68
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-69
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-70
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-71
Chapter
Topics
Pig
Troubleshoo:ng
And
Op:miza:on
TroubleshooEng
Pig
Logging
Using
Hadoops
Web
UI
Data
Sampling
and
Debugging
Performance
Overview
Understanding
the
ExecuEon
Plan
Tips
for
Improving
the
Performance
of
Your
Pig
Jobs
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-72
HDFS
MR1
MR2
Daemon Name
Address
NameNode
http://hostname:50070/
DataNode
http://hostname:50075/
JobTracker
http://hostname:50030/
TaskTracker
http://hostname:50060/
ResourceManager
http://hostname:8088/
NodeManager
http://hostname:8042/
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-73
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-74
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-75
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-76
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-77
Killing
a
Job
A
job
that
processes
a
lot
of
data
can
take
hours
to
complete
SomeEmes
you
spot
an
error
in
your
code
just
aOer
submiMng
a
job
Rather
than
wait
for
the
job
to
complete,
you
can
kill
it
First,
nd
the
Job
ID
on
the
front
page
of
the
JobTracker
Web
UI
Then,
use
the
kill
command
in
Pig
along
with
that
Job
ID
grunt> kill job_201303151454_0028
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-78
Chapter
Topics
Pig
Troubleshoo:ng
And
Op:miza:on
TroubleshooEng
Pig
Logging
Using
Hadoops
Web
UI
Data
Sampling
and
Debugging
Performance
Overview
Understanding
the
ExecuEon
Plan
Tips
for
Improving
the
Performance
of
Your
Pig
Jobs
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-79
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-80
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-81
10-82
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-83
Chapter
Topics
Pig
Troubleshoo:ng
And
Op:miza:on
TroubleshooEng
Pig
Logging
Using
Hadoops
Web
UI
Data
Sampling
and
Debugging
Performance
Overview
Understanding
the
ExecuEon
Plan
Tips
for
Improving
the
Performance
of
Your
Pig
Jobs
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-84
Performance
Overview
We
have
discussed
several
techniques
for
nding
errors
in
Pig
La:n
code
Once
you
get
your
code
working,
youll
oOen
want
it
to
work
faster
Performance
tuning
is
a
broad
and
complex
subject
Requires
a
deep
understanding
of
Pig,
Hadoop,
Java,
and
Linux
Typically
the
domain
of
engineers
and
system
administrators
Most
of
these
topics
are
beyond
the
scope
of
this
course
Well
cover
the
basics
and
oer
several
performance
improvement
Eps
See
Programming
Pig
(chapters
7
and
8)
for
detailed
coverage
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-85
Chapter
Topics
Pig
Troubleshoo:ng
And
Op:miza:on
TroubleshooEng
Pig
Logging
Using
Hadoops
Web
UI
Data
Sampling
and
Debugging
Performance
Overview
Understanding
the
Execu:on
Plan
Tips
for
Improving
the
Performance
of
your
Pig
jobs
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-86
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-87
Anchorage
Boston
Chicago
Dallas
Edmonton
Fargo
sales
A
D
A
B
A
B
E
D
1999
2399
4579
6139
2489
3699
2479
5799
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-88
10-89
Chapter
Topics
Pig
Troubleshoo:ng
And
Op:miza:on
TroubleshooEng
Pig
Logging
Using
Hadoops
Web
UI
Data
Sampling
and
Debugging
Performance
Overview
Understanding
the
ExecuEon
Plan
Tips
for
Improving
the
Performance
of
your
Pig
Jobs
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-90
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-91
10-92
10-93
=
=
=
=
=
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-94
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-95
10-96
10-97
10-98
10-99
EssenEal
Points
You
can
boost
performance
by
elimina:ng
unneeded
data
during
processing
Pigs
error
messages
dont
always
clearly
iden:fy
the
source
of
a
problem
We
recommend
tesEng
your
scripts
with
a
small
data
sample
Looking
at
the
Web
UI,
and
especially
the
log
messages,
can
be
helpful
The
resources
listed
on
the
upcoming
bibliography
slide
may
further
assist
you
in
solving
problems
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-10
0
Bibliography
The
following
oer
more
informa:on
on
topics
discussed
in
this
chapter
Hadoop
Log
Loca:on
and
Reten:on
http://tiny.cloudera.com/dac08a
Pig
Tes:ng
and
Diagnos:cs
http://tiny.cloudera.com/dac08b
Mailing
List
for
Pig
Users
http://tiny.cloudera.com/dac08c
Ques:ons
Tagged
with
Pig
on
StackOverow
http://tiny.cloudera.com/dac08d
Ques:ons
Tagged
with
PigLa:n
on
StackOverow
http://tiny.cloudera.com/dac08e
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-10
1
Bibliography
The
following
oer
more
informa:on
on
topics
discussed
in
this
chapter
Performance
:ps
for
Pig
http://pig.apache.org/docs/r0.10.0/
perf.html#performance-enhancers
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10-10
2