Here ,we will be running Apache Pig Sample scripts using grunts. Don't worry if you will not understand entire syntax, it is to just see the power of Apache Pig.
In few lines of code you can write word count application. (Refer the video session to understand it more)

Step 1 : Start Grunt shell.
    Open terminal and type pig 

Step 1A : Create a file at /user/cloudera/Training/pig/hadoopexam.txt
with following content.

I am learning Pig Using HadoopExam
I am learning Spark Using HadoopExam
I am learning Java Using HadoopExam
I am learning Hadoop Using HadoopExam

Step 2 : Now load the file stored in hdfs (Space separated file)
input1  = LOAD '/user/cloudera/Training/pig/hadoopexam.txt' AS (f1:chararray);

DUMP input1;
(I am learning Pig Using HadoopExam)
(I am learning Spark Using HadoopExam)
(I am learning Java Using HadoopExam)
(I am learning Hadoop Using HadoopExam)

Step 3: flatten the words in each line
wordsInEachLine = FOREACH input1 GENERATE flatten(TOKENIZE(f1)) as word;
DUMP wordsInEachLine;

Step 4: Group the same words
groupedWords = group wordsInEachLine by word;
dump groupedWords;
describe groupedWords;

Step 5 : Now do the wordcount.            
countedWords = foreach groupedWords generate group, COUNT(wordsInEachLine);
dump countedWords;

Now here we can see that, no need to wait for job to finish, we can check the results in between. After each step using DUMP statement we can check that our script is correct.  As we will move ahead, we will be keep writing complex applications and understand the concepts.   

More About PigLatin
  • Pig scripts can be a linear workflow (As shown above in word count example)
  • Pig Scripts can have branching like multiple data inputs are joined (De-normalizing) and data splitting etc.  
  • In Pig latin scripts , you will not find if statements and for loop (This is simply a DAG : Direct Acyclic Graph)