- Article pages now have a discussion option at the bottom (moderated/captcha, but no registration needed)

Words...

FIXME This article needs a review, it covers two topics (commandline splitting and word splitting) and mixes both a bit too much. But in general, it's still usable to understand this behaviour, it's "wrong but not wrong".

One fundamental principle of Bash is to recognize words in the commandline that was entered, or under some other circumstances like expanding a variable.

Splitting the commandline

Bash scans the commandline and splits it into words, usually to put the parameters you enter for a command into the right C-memory (the argv vector) to later correctly call the command. These words are recognized by splitting the commandline at the position of special characters, Space or Tab (manual defines them as blanks). For example, take the echo program. It displays all its parameters separated by a space on the terminal. When you enter an echo command on your Bash prompt, Bash will do what told above: It will look for those special characters, and use them to determinate what are separate parameters.

You don't know what I'm talking about? I'm talking about this:

$ echo Hello little world
Hello little world

In other words, something you do (and Bash does) everyday. The characters where Bash splits the command line (SPACE, TAB - blanks) are recognized as told above. There is no null-argument generated when you have 2 or more blanks in the commandline. A sequence of more blank characters is treated like one single blank character. Let's see what I mean:

$ echo Hello                little                  world
Hello little world

As promised: Bash splits the commandline at the position of a sequence of one or more blanks into words, then it calls echo with each word being a separate parameter to it. In this example, echo is called with exactly 3 parameters: "Hello", "little" and "world"!

Does that mean we can't echo more than one Space now? Of course not! Bash treats the blanks special, but there is a way to tell Bash to not treat them special: Escaping and quoting.

Escaping a character means, to take away the special role of a character. Bash will literally use an escaped character as text, even if it's a special one. Escaping is done by preceeding the character with a backslash:

$ echo Hello\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ little \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ world
Hello                little                  world

None of the above spaces will be used to split the commandline into words (one, infact, the space between echo and the rest - it's used to determinate what is the command and what should be its arguments). Thus, echo is called with exaclty one parameter: "Hello little world".

And as we're all lazy and want an easy life, Bash has a mechanism to "mass-escape" a complete area of the commandline: Quoting. In the context of commandline-splitting, which this section is about, it doesn't matter which kind of quoting you use: weak quoting or strong quoting, both mask the blanks we talked about to not be special:

$ echo "Hello                little                  world"
Hello                little                  world
$ echo 'Hello                little                  world'
Hello                little                  world

What is it all about now? Well, for example imagine a program that expects a filename as parameter, like cat. Filenames can have spaces in them:

$ ls -l
total 4
-rw-r--r-- 1 bonsai bonsai 5 Apr 18 18:16 test file

$ cat test file
cat: test: No such file or directory
cat: file: No such file or directory

$ cat test\ file
m00!

$ cat "test file"
m00!

Nothing special, as you usually enter that on the commandline with Tab-completition, that will take care of the spaces. But there's another splitting Bash does.

Word splitting

For a more technical view, please read the article about word splitting!

The first kind of splitting is done by Bash to parse the commandline into separate tokens. This is what was described above, it's a pure commandline parsing.

After the commandline has been split into words, Bash will perform expansions, if needed - variables that occur in the commandline need to be expanded (substituted by their value), for example. This is where the second type of word splitting comes in - several expansions finally undergo word splitting (but others do not).

Imagine you have a filename stored in a variable (variables will be explained elsewhere, just believe what you see):

MYFILE="test file"

When this variable is used, its occurance will be substituted by its content.

$ cat $MYFILE
cat: test: No such file or directory
cat: file: No such file or directory

Though this is another step where spaces make a hard life, the quoting is used again to influence it. Quotes also affect the word splitting:

$ cat "$MYFILE"
m00!

Example

Let's follow an unquoted command through these steps, assuming that the variable is set:

MYFILE="THE FILE.TXT"
and the first review is:
echo The file is named $MYFILE

The parser will scan for blanks and mark the relevant words ("splitting the commandline"):

Initial commandline splitting:
Word 1Word 2Word 3Word 4Word 5Word 6
echoThefileisnamed$MYFILE

A parameter-/variable expansion is part of that commandline, Bash will perform the substitution, and the word splitting on the results:

Word splitting after substitution:
Word 1Word 2Word 3Word 4Word 5Word 6Word 7
echoThefileisnamedTHEFILE.TXT

Now let's imagine we quoted the $MYFILE, the commandline now looks like:

echo The file is named "$MYFILE"

Word splitting after substitution (quoted!):
Word 1Word 2Word 3Word 4Word 5Word 6
echoThefileisnamedTHE FILE.TXT

See also

Discussion

kobu, 2012/03/27 23:25

space, tab AND newline … don't forget newline

Enter your comment
 
syntax/words.txt · Last modified: 2010/03/09 07:33 (external edit)
GNU Free Documentation License 1.3
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0