This is an old revision of the document!


Basic grammar rules of Bash

The shell language is built upon a few basic grammar rules. This article presents a somewhat theoretical view, though some terminology is tweaked a bit in order to avoid discussing too many corner-cases and to present a model that fits with reality most of the time. Having a correct understanding of the basic grammar is extremely important in understanding the language.

In the shell, everything is a command!

Bash is fundamentally a command oriented language, and it is very pure in this sense. Just as highly functional or object oriented languages derive most or sometimes all of their grammar from primitive notions of functions or objects respectively, Bash and other POSIX compatible shells build upon the concept of a command. The smallest valid chunk of shell code is always a command, as is every valid line you can enter at a prompt without an error. Likewise, an entire program of thousands of lines can consist of a single (composite) command.

Problems can arise when attempting to draw too many analogies between constructs found in other languages and shell commands. These are some useful ways to look at what a command is.

  • Commands are somewhat like functions found in other languages in that they provide for control flow, code reuse, modularity, and have a "scope" of sorts to provide some degree of encapsulation. A command has what is called an execution environment which consists of a set of properties local to the command and inherited from the parent execution environment. All ksh-like shells additionally support function local variables, though this isn't specified by POSIX and the details can vary drastically between shells.
  • Commands can also be thought of as analogous to programs or processes. This is where the idea of command diverges from common language constructs. Importantly, each command has its own implicit I/O. Just like running a UNIX program, every command has its own file descriptors inherited from some parent environment. The execution environment of a command can include, for example, a local set of environment variables, trapped signals, file descriptors, and positional parameters. A command may or may not execute in a subshell, in which case the command or commands really have their own execution environment by virtue of being a real independent UNIX process.

These are crude descriptions and some liberty is taken with the term execution environment. We can't get too specific about what a command really is because they encompass every construct in the language. Commands can be subdivided into two main types: simple and compound.

Simple Commands

The Bash manual says:

A simple command is a sequence of optional variable assignments followed by blank-separated words and redirections,
and terminated by a control operator. The first word specifies the command to be executed, and is passed as argument
zero.  The remaining words are passed as  arguments to the invoked command.

A simple command can be any or all of the following:

  • A list of zero or more words, which designate a command name and a list of arguments.
  • Zero or more simple or compound assignments separated by whitespace which must precede the word list.
  • Zero or more redirection operators, which may appear anywhere within the command, but by convention are usually placed after the word list.

Sounds harder than it actually is. It is what you do day-to-day. You enter simple commands with parameters and the shell executes them. Remember, there are only two things, simple and compound commands. Almost everything that isn't a compound command is a simple command. Many of the more complex constructs in Bash can be broken down into simple commands.

Some examples: A (very ;-)) simple command

echo "Hello world..."

All of the following are simple commands

x=5
>tmpfile
{x}<"$x" _=${x=<(echo moo)} <&0$(cat <&"$x" >&2)

ls
ls > list.txt
ls -l
LC_ALL=C ls

The last one might not be familiar to you. That one simply adds "LC_ALL=C" to the environment of the ls-program. It doesn't affect your current shell. This also works while calling functions, unless Bash runs in POSIX® mode (then it affects your current shell).

Every command has a so-called exit code - it's a kind of return status. The shell can catch it and react on it. It goes from 0 to 255, where 0 means success, and the rest means any kind of failure or issue to report back to the caller.

The simple command construct is the base for all higher constructs. Everything you execute, form pipelines to functions, finally ends up in (many) simple commands. That's why Bash only has one method to finally expand and execute a simple command.

Compound Commands

See also the list of compound commands.

There are two forms of compound commands:

  • forming a new syntax element using lists as "body"
  • complete independant syntax elements

Basically, it's everything else that's not described elsewhere in this article. Compound commands have the following characteristics:

  • they begin and end with a specific keyword or operator (e.g. for … done)
  • they can be redirected as a whole

See the following table for a short overview (without details - really just a plain overview!):

Compound command syntaxDescription
( <LIST> )Execute <LIST> in an extra subshell ⇒ article
{ <LIST> ; }Execute <LIST> as separate group (but not in a subshell) ⇒ article
(( <EXPRESSION> ))Evaluate the arithmetic expression <EXPRESSION> article
[[ <EXPRESSION> ]]Evaluate the conditional expression <EXPRESSION> (aka "the new test command") ⇒ article
for <NAME> in <WORDS> ; do <LIST> ; doneExecutes <LIST> while setting the variable <NAME> to one of <WORDS> on every iteration (classic for-loop) ⇒ article
for (( <EXPR1> ; <EXPR2> ; <EXPR3> )) ; do <LIST> ; doneC-style for-loop (driven by arithmetic expressions) ⇒ article
select <NAME> in <WORDS> ; do <LIST> ; doneProviding simple menus ⇒ article
case <WORD> in <PATTERN>) <LIST> ;; … esacDecicions based on pattern matching - executing <LIST> on match ⇒ article
if <LIST> ; then <LIST> ; else <LIST> ; fiThe if-clause: making decisions based on exit codes ⇒ article
while <LIST1> ; do <LIST2> ; doneExecute <LIST2> while <LIST1> returns TRUE (exit code) ⇒ article
until <LIST1> ; do <LIST2> ; doneExecute <LIST2> until <LIST1> returns TRUE (exit code) ⇒ article

Function Definition Command

FIXME Missing an additional extra article about shell functions

A shell function definition basically makes a compound command available under a new name. The speciality now is, that function, when ran, has its own "private" set of positional parameters and I/O descriptors. It acts like a script in the script. Simple said: You create a new command.

The definition is easy (one of more possibilities):

<NAME> () <COMPOUND_COMMAND> <REDIRECTIONS>

which usually is used with the {…; } compound command, and thus looks like

print_help() { echo "Sorry, no help available"; }

Like told above, a function definition can have any compound command as body. Structures like

countme() for ((x=1;x<=9;x++)); do echo $x; done
are unusual, but perfectly valid since the for-loop construct is a compound command!

If there are redirections specified, these are not performed on function definition, they are performed on function execution:

# this will NOT perform the redirection (on definition time)
f() { echo ok ; } > file

# NOW the redirection will be performed (during EXECUTION of the function)
f

Bash allows three equivalent forms of the function definition:

NAME ()          <COMPOUND_COMMAND> <REDIRECTIONS>
function NAME () <COMPOUND_COMMAND> <REDIRECTIONS>
function NAME    <COMPOUND_COMMAND> <REDIRECTIONS>

The space between NAME and () is optional, usually you just see it without.

I suggest to use the first form. It's specified in POSIX and all Bourne-like shells seem to support it.

Note: Before version 2.05-alpha1, Bash only recognized the definition using curly braces (name() { … }), also, other shells allow the definition using any command (not only compound command set).

To execute a function like a regular shell script you would put it together like this:

#!/bin/bash
# Add shebang

mycmd()
{
  # this $1 is the one of the function!
  find / -iname "$1"
}

# this $1 is the one of the script itself!
mycmd "$1" # Execute command immediately after defining function 

exit 0

Just informational(1):

Internally, for forking, Bash stores the function definitions in environment variables. Variables with the content "() ….".

Something like the following works without "officially" declaring a function:

$ export testfn="() { echo test; }"
$ bash -c testfn
test
$

Just informational(2):

It is possible to set function names containing slashes:

/bin/ls() {
  echo LS FAKE
}

The elements of this name aren't subject to path search, of course.

The weird names a function can have should not be used. Quote from the maintainer:

  • It was a mistake to allow such characters in function names (`unset' doesn't work to unset them without forcing -f, for instance). We're stuck with them for backwards compatibility, but I don't have to encourage their use.

FIXME Missing an additional extra article about pipelines and pipelining

[time [-p]] [ ! ] command [ | command2 … ]

Don't get confused about the name "pipeline" here, it's a grammar name for that construct, such a pipeline isn't necessarily a pair of commands where stdout/stdin is connected through a real pipe.

Basically, pipelines are one or more simple commands (when more, separated by the |-symbol and connecting their input and output), for example:

ls /etc | wc -l
will execute ls on /etc and pipe the output to wc, which will count lines. That way, you simply count your directory entries in /etc.

The last command in the pipeline will also set the exit code for the pipeline. This exit code can be "reversed" by preceeding an exclamation-mark to the pipeline: An unsuccessful pipeline will end up "successful" and vice versa. In this example, the commands in the if-structure will be executed, if the pattern "^root:" is not found in /etc/passwd:

if ! grep '^root:' /etc/passwd; then
  echo "No root-user defined... eh?"
fi
Yes, this is also a pipeline (though there is no piping!), because the exclamation mark to reverse the exit code can only be used in pipelines. If grep's exit code is 1 (FALSE) (the text was not found), the leading ! will "reverse" the exit code for the shell, the shell sees (and reacts on) exit code 0 (TRUE) and the then-path of the if-clause is executed. One could say we checked for "not grep "^root" /etc/passwd".

The set option pipefail determines the behavior of how bash reports exit code of pipeline. If it's set, then the exit code ($?) is the last command that exits with non-0 status, if none fails, it's 0. If it's not set, then $? always holds the exit code of the last command (as explained above).

The shell option lastpipe will make the last element in a pipeline construct being executed in the current shell environment, not in a separate subshell.

There's also an array PIPESTATUS[] that is set after a foreground pipeline is executed. Each element of PIPESTATUS[] reports the exit code of each respective command in the pipeline. Note: (1) it's only for foreground pipe and (2) for higher level structure that is built up from pipeline, like list, PIPESTATUS[] holds only the last pipeline executed.

Another thing you can do with pipelines is logging their execution time. Note that time is not a command, it belongs to the special words for pipeline-syntax:

# time updatedb
real    3m21.288s
user    0m3.114s
sys     0m4.744s

FIXME Missing an additional extra article about the list operators

A list is a sequence of one or more pipelines separated by one of the operators ;, &, &&, or ││, and optionally terminated by one of ;, &, or <newline>.

⇒ It's a bunch of pipelines separated or terminated by tokens that all have different meanings for Bash.

Your whole Bash script technically is one big single list!

OperatorDescription
<PIPELINE1> <newline> <PIPELINE2>Newlines completely separate pipelines. The next pipeline is simply executed, without any checks or specials (Hey! You do that every day! You enter a command and press <RETURN>!)
<PIPELINE1> ; <PIPELINE2>The semicolon does what <newline> does: It completely separates the pipelines
<PIPELINE> & <PIPELINE>The pipeline infront of that & is executed async ("in background") - if a pipeline follows this, it is immediately executed after the async one was started
<PIPELINE1> && <PIPELINE2><PIPELINE1> is executed and only if its exit code was 0 (TRUE), then <PIPELINE2> is executed (AND-List)
<PIPELINE1> || <PIPELINE2><PIPELINE1> is executed and only if its exit code was not 0 (FALSE), then <PIPELINE2> is executed (OR-List)

Note: POSIX calls this construct a "compound lists".

Not much of correct definitions, just some short slogans:

FIXME more…



A common compound command

if [ -d /data/mp3 ]; then
  cp mymusic.mp3 /data/mp3
fi

Let's reverse the exit code of the test command, only one thing changes:

if ! [ -d /data/mp3 ]; then
  cp mymusic.mp3 /data/mp3
fi

This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information
Jacob, 2010/12/28 01:43, 2010/12/28 06:50

"Basically, pipelines are one or more simple commands"

Actually, the bash man page defines "simple commands" and "compound commands" explicitly, but does not define (plain old) "commands". Presumably they can be simple commands or compound commands. Witness:

for NUM in {1..100}; do echo "$NUM"; done | fgrep 3 | while read NUM; do echo "${NUM/3/three}"; done

The REDIRECTION section also implies that "command" includes "compound commands" by the phrase "The following redirection operators may precede or appear anywhere within a simple command or may follow a command." For example:

while read -a LIST; do echo "${LIST[@]}"; done < <(sed "s/#.*//" /etc/fstab | grep .)

Jan Schampera, 2010/12/28 06:49

Yes, I know the glitch here. command is defined as

  • simple command
  • compound command
  • function definition
  • (Bash) coproc definition

Bash manpage is a bit unlucky here (and this page, too!).

I will make some thoughts about it, the best would be an "advanced syntax description" page that includes (simplified) syntax diagrams based on incomplete (not down to every string and character definition) EBNF to be more accurate. It's just some amount of work and will have mistakes at first.

You could leave a comment if you were logged in.
  • syntax/basicgrammar.1352986381.txt
  • Last modified: 2012/11/15 13:33
  • by ormaaj