Вы находитесь на странице: 1из 2

EMP1.

txt (id:int, name:chararray, dept:chararray, salary:int)

100,AAA,IT,2000
200,BBB,IT,3000
300,CCC,Admin,2500
400,DDD,Admin,500

EMP2.txt
500,EEE,IT,2000
600,FFF,IT,3000
700,GGG,Admin,2500
800,HHH,Admin,500

EMPDetails.txt (id:int, city:chararray, pincode:int)

100,Pune,20014
200,Delhi,20015
300,Pune,20014
400,Pune,20014
500,Pune,20014
600,Delhi,20015
700,Delhi,20015
800,Delhi,20015

Without Schema
A = load emp/* using PigStorage(,);
B = load empdetails/* using PigStorage(,);
C = foreach A generate $0, $1;
Dump C;

Join
D = join A by $0, B by $0;
E = foreach D generate $1, $2;
Dump E;

With Schema
A = load emp/* using PigStorage(,) AS (id:int, name:chararray, dept:chararray, salary:int);
B = load empdetails/* using PigStorage(,) AS (id:int, city:chararray, pincode:int);

Group by
C = group A by dept Parallel 2;
dump C;

Flatten
D = foreach C generate group, flatten(A);
Case Statement (Note: Case statement wont work in CDH4)
FOREACH A GENERATE name, (
CASE salary % 2
WHEN 0 THEN 'even'
WHEN 1 THEN 'odd'
END
);

If..ELSE
FOREACH A GENERATE name, (salary%2 == 0 ? EVEN : ODD)

Filter
F = filter A by dept == IT

Total Count
E = group A all
F = foreach E generate SUM(A.salary) as TotalSalary, COUNT(A.name) as TotalCount
G = foreach A generate name, (float) salary/F.TotalSalary
Dump G;

Limit
H = limit A 2;

Order by
I = order A by $1 desc

Split by
Split A into J if (salary >= 2000), K if (salary < 2000);

Store
Store J into output using PigStorage(\t);

Union Note: Relation A and B has different schema and its data type. Is it correct to
perform union on such relations?
L = union A, B

Register Jar
Register ./tutorial.jar
M = foreach A generate org.apache.pig.tutorial.ToLower($2)

Execution Plan
illustrate Relation_name;

Вам также может понравиться