All widely used Unix shells have a special syntax construct for the creation of pipelines. In all usage one writes the commands in sequence, separated by the
ASCII vertical bar character | (which, for this reason, is often called "pipe character"). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of
buffer storage). The pipeline uses
anonymous pipes. For anonymous pipes, data written by one process is buffered by the operating system until it is read by the next process, and this uni-directional channel disappears when the processes are completed; this differs from
named pipes, where messages are passed to or from a pipe that is named by making it a file, and remains after the processes are completed. The standard
shell syntax for
anonymous pipes is to list multiple commands, separated by
vertical bars ("pipes" in common Unix verbiage): command1 | command2 | command3 For example, to list files in the current directory (), retain only the lines of output containing the string (), and view the result in a scrolling page (), a user types the following into the command line of a terminal: ls -l | grep key | less The command ls -l is executed as a process, the output (stdout) of which is piped to the input (stdin) of the process for grep key; and likewise for the process for less. Each
process takes input from the previous process and produces output for the next process via
standard streams. Each | tells the shell to connect the standard output of the command on the left to the standard input of the command on the right by an
inter-process communication mechanism called an
(anonymous) pipe, implemented in the operating system. Pipes are unidirectional; data flows through the pipeline from left to right.
Example Below is an example of a pipeline that implements a kind of
spell checker for the
web resource indicated by a
URL. An explanation of what it does follows. curl 'https://en.wikipedia.org/wiki/Pipeline_(Unix)' | sed 's/[^a-zA-Z ]/ /g' | tr 'A-Z ' 'a-z\n' | grep '[a-z]' | sort -u | comm -23 - •
curl obtains the
HTML contents of a web page (could use
wget on some systems). •
sed replaces all characters (from the web page's content) that are not spaces or letters, with spaces. (
Newlines are preserved.) •
tr changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line). •
grep includes only lines that contain at least one lowercase
alphabetical character (removing any blank lines). •
sort sorts the list of 'words' into alphabetical order, and the -u switch removes duplicates. •
comm finds lines in common between two files, -23 suppresses lines unique to the second file, and those that are common to both, leaving only those that are found only in the first file named. The - in place of a filename causes comm to use its standard input (from the pipe line in this case). sort /usr/share/dict/words sorts the contents of the words file alphabetically, as comm expects, and <( ... ) outputs the results to a temporary file (via
process substitution), which comm reads. The result is a list of words (lines) that are not found in /usr/share/dict/words. •
less allows the user to page through the results.
Error stream By default, the
standard error streams ("
stderr") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to the
console. However, many shells have additional syntax for changing this behavior. In the
csh shell, for instance, using |& instead of | signifies that the standard error stream should also be merged with the standard output and fed to the next process. The
Bash shell can also merge standard error with |& since version 4.0 or using 2>&1, as well as redirect it to a different file.
Pipemill In the most commonly used simple pipelines the shell connects a series of sub-processes via pipes, and executes external commands within each sub-process. Thus the shell itself is doing no direct processing of the data flowing through the pipeline. However, it's possible for the shell to perform processing directly, using a so-called
mill or
pipemill (since a while command is used to "mill" over the results from the initial command). This construct generally looks something like: command | while read -r var1 var2 ...; do # process each line, using variables as parsed into var1, var2, etc # (note that this may be a subshell: var1, var2 etc will not be available # after the while loop terminates; some shells, such as zsh and newer # versions of Korn shell, process the commands to the left of the pipe # operator in a subshell) done Such pipemill may not perform as intended if the body of the loop includes commands, such as cat and ssh, that read from
stdin: on the loop's first iteration, such a program (let's call it
the drain) will read the remaining output from command, and the loop will then terminate (with results depending on the specifics of the drain). There are a couple of possible ways to avoid this behavior. First, some drains support an option to disable reading from stdin (e.g. ssh -n). Alternatively, if the drain does not
need to read any input from stdin to do something useful, it can be given < /dev/null as input. As all components of a pipe are run in parallel, a shell typically forks a subprocess (a subshell) to handle its contents, making it impossible to propagate variable changes to the outside shell environment. To remedy this issue, the "pipemill" can instead be fed from a
here document containing a
command substitution, which waits for the pipeline to finish running before milling through the contents. Alternatively, a
named pipe or a
process substitution can be used for parallel execution.
GNU bash also has a option to disable forking for the last pipe component. ==Creating pipelines programmatically==