In comparison to SQL, Pig • has a nested relational model, • uses
lazy evaluation, • uses
extract, transform, load (ETL), • is able to store data at any point during a
pipeline, • declares
execution plans, • supports pipeline splits, thus allowing workflows to proceed along
DAGs instead of strictly sequential pipelines. On the other hand, it has been argued
DBMSs are substantially faster than the MapReduce system once the data is loaded, but that loading the data takes considerably longer in the database systems. It has also been argued
RDBMSs offer out of the box support for column-storage, working with compressed data, indexes for efficient random data access, and transaction-level fault tolerance. Pig Latin is
procedural and fits very naturally in the pipeline paradigm while SQL is instead
declarative. In SQL users can specify that data from two tables must be joined, but not what join implementation to use (You can specify the implementation of JOIN in SQL, thus "... for many SQL applications the query writer may not have enough knowledge of the data or enough expertise to specify an appropriate join algorithm."). Pig Latin allows users to specify an implementation or aspects of an implementation to be used in executing a script in several ways. SQL is oriented around queries that produce a single result. SQL handles trees naturally, but has no built in mechanism for splitting a data processing stream and applying different operators to each sub-stream. Pig Latin script describes a
directed acyclic graph (DAG) rather than a pipeline. ==See also==