Apache pig is an abstraction layer for processing large datasets; it’s a sub project from Apache and built for Hadoop. Apache pig is scripting language for exploring big datasets, writing pig script is very easy it does not require any programming skills like other languages and it is very supportive for programmers with inbuilt queries. The interesting thing is pig can process the terabytes of the data simply by using the some (very less) commands of piglatin. Mapreduce allows you as programmer ,to specify the mapper and reducers functions for particular issue ,even some issues requires more than one mapper and reducers and it require programming skills so instead of mapreduce simply issuing the some commands we can solve the problem (depending upon issue we should select mapreduce or pig).off course pig scripts converts internally in to mapreduce but the programmer need not worry about converting the mapreduce ,the pig execution environment will take care.
We can do pig scripts with two ways:
• Pig Latin.
• Execution environment (local /hdfs).
Pig was originally developed at Yahoo Research around 2006 for researchers to have an ad-hoc way of creating and executing map-reduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation.
Pig Execution Flow
Apache pig is high level scripting language used with hadoop.
Every pig script will convert MapReduce internally i.e. pig execution environment will convert into MapReduce and execute in cluster environment. Apache pig works for all data, including the structured and unstructured data and it will store the results into output location.