Вы находитесь на странице: 1из 8

Apache Pig

What is it ? How does it work ? Why use it ? PigLatin Data Types PigLatin Maths PigLatin Example

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Pig What is it ?

A high level language Used to analyse large data sets Used to create MapReduce jobs Abstracts definition of jobs Uses Pig Latin to define jobs Less code needed Compiles to MapReduce code

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Pig How does it work ?

Three ways to use it


Grunt Pig's interactive shell Write Pig Latin in a script file Embed Pig commands in another language Local mode single machine Hadoop run on a Hadoop/MapReduce cluster

Run modes

Creates MapReduce code automatically

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Pig Why use it ?

It is quicker It is data omnivorous It is easy to learn It is widely used Minor performance loss

Compared to native code

It can be extended via user defined functions ( UDF )

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

PigLatin Data Types

Int Long Float Double Chararray Bytearray Tuple Bag Map

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

PigLatin Maths
Some of the built in maths functions

ABS CEIL EXP FLOOR LOG ROUND SIN TAN

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

PigLatin Example
Example borrowed from Wikipedia
input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray); -- Extract words from each line and put them into a pig bag -- datatype, then flatten the bag to get one word on each row words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; -- filter out any words that are just white spaces filtered_words = FILTER words BY word MATCHES '\\w+'; -- create a group for each word word_groups = GROUP filtered_words BY word; -- count the entries in each group word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; -- order the records by count ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/tmp/number-of-words-on-internet';

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Contact Us

Feel free to contact us at


www.semtech-solutions.co.nz info@semtech-solutions.co.nz

We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems

Вам также может понравиться