Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
syntax:pe [2012/06/12 09:54]
ormaaj Bugs / portability - oddities with $@ and $*
syntax:pe [2018/06/22 13:49] (current)
csb [Case modification: Arrays] [2] should point to SOME and not to TEXT in the array
Line 26: Line 26:
   * [[#​indirection | Indirection]]   * [[#​indirection | Indirection]]
     * ''​${!PARAMETER}''​     * ''​${!PARAMETER}''​
-  * :V4: [[#​case_modification | Case modification ]]+  * [[#​case_modification | Case modification ]]
     * ''​${PARAMETER^}''​     * ''​${PARAMETER^}''​
     * ''​${PARAMETER^^}''​     * ''​${PARAMETER^^}''​
Line 100: Line 100:
 ''​${!PARAMETER}''​ ''​${!PARAMETER}''​
  
 +In some cases, like for example
  
-**Everywhere** you can name a parameter to expand, like for example 
 <​code>​ <​code>​
 ${PARAMETER} ${PARAMETER}
Line 107: Line 107:
 ${PARAMETER:​0:​3} ${PARAMETER:​0:​3}
 </​code>​ </​code>​
-you can use the form+ 
 +you can instead ​use the form 
 <​code>​${!PARAMETER}</​code>​ <​code>​${!PARAMETER}</​code>​
-which will enter a level of indirectionThe referenced parameter is not ''​PARAMETER''​ itself, but the parameter ​named by the value of it. If your parameter ''​PARAMETER''​ has the value "''​TEMP''",​ then ''​${!PARAMETER}'' ​really references ​''​TEMP'':​+ 
 +to enter a level of indirectionThe referenced parameter is not ''​PARAMETER''​ itself, but the parameter ​whose name is stored as the value of ''​PARAMETER''​. If the parameter ''​PARAMETER''​ has the value "''​TEMP''",​ then ''​${!PARAMETER}'' ​will expand to the value of the parameter named ''​TEMP'':​
 <​code>​ <​code>​
-read -p "Which variable do you want to inspect? ​" ​look_var+read -rep 'Which variable do you want to inspect? ​' ​look_var
  
-echo "The value of \"$look_var\" is: \"​${!look_var}\""+printf 'The value of "%s" is: "​%s"​\n' "​$look_var" ​"​${!look_var}" ​
 </​code>​ </​code>​
  
 Of course the indirection also works with special variables: Of course the indirection also works with special variables:
 +
 <​code>​ <​code>​
 # set some fake positional parameters # set some fake positional parameters
Line 125: Line 129:
 </​code>​ </​code>​
  
-This is also known as "​variable variables"​ or "​indirect reference"​. **Indirect references ​to [[syntax:​arrays|array names]] are not possible (as of Bash 4.1).**+You can think of this mechanism ​as being roughly equivalent ​to taking any parameter expansion that begins with the parameter name, and substituting the ''​!PARAMETER''​ part with the value of PARAMETER.
  
 +<​code>​
 +echo "​${!var^^}"​
 +# ...is equivalent to
 +eval 'echo "​${'"​$var"'​^^}"'​
 +</​code>​
  
-===== Case modification =====+It was an unfortunate design decision to use the ''​!''​ prefix for indirection,​ as it introduces parsing ambiguity with other parameter expansions that begin with ''​!''​. Indirection is not possible in combination with any parameter expansion whose modifier requires a prefix to the parameter name. Specifically,​ indirection isn't possible on the ''​${!var@}'',​ ''​${!var*}'',​ ''​${!var[@]}'',​ ''​${!var[*]}'',​ and ''​${#​var}''​ forms. This means the ''​!''​ prefix can't be used to retrieve the indices of an array, the length of a string, or number of elements in an array indirectly (see [[syntax/​arrays#​indirection]] for workarounds). Additionally,​ the ''​!''​-prefixed parameter expansion conflicts with ksh-like shells which have the more powerful "​name-reference"​ form of indirection,​ where the exact same syntax is used to expand to the name of the variable being referenced.
  
-:V4:+Indirect references to [[syntax:arrays|array names]] are also possible since the Bash 3 series (exact version unknown), but undocumented. See [[syntax/​arrays#​indirection]] for details. 
 + 
 +Chet has added an initial implementation of the ksh ''​nameref''​ declaration command to the git devel branch. (''​declare -n'',​ ''​local -n'',​ etc, will be supported). This will finally address many issues around passing and returning complex datatypes to/from functions. 
 +===== Case modification =====
  
 ''​${PARAMETER^}''​ ''​${PARAMETER^}''​
Line 180: Line 192:
     * => ''​THIS IS SOME TEXT''​     * => ''​THIS IS SOME TEXT''​
    * ''​echo "​${array[2]^^}"''​    * ''​echo "​${array[2]^^}"''​
-     * => ''​TEXT''​+     * => ''​SOME''​
  
 ===== Variable name expansion ===== ===== Variable name expansion =====
Line 313: Line 325:
 echo ${MYSTRING/​%x/​y} ​ # RESULT: xxxxxxxxxy</​code>​ echo ${MYSTRING/​%x/​y} ​ # RESULT: xxxxxxxxxy</​code>​
  
-If the replacement part is completely omitted, like, the matches are replaced by the nullstring, i.e. they are removed. This is equivalent to specifying an empty replacement:​+If the replacement part is completely omitted, the matches are replaced by the nullstring, i.e.they are removed. This is equivalent to specifying an empty replacement:​
 <​code>​ <​code>​
 echo ${MYSTRING//​conservative/​} echo ${MYSTRING//​conservative/​}
Line 344: Line 356:
  
 => ''​64''​ => ''​64''​
 +
 +The length is reported in characters, not in bytes. Depending on your environment this may not always be the same (multibyte-characters,​ like in UTF8 encoding).
  
 There'​s not much to say about it, mh? There'​s not much to say about it, mh?
Line 601: Line 615:
 STRING="​Hello world" STRING="​Hello world"
  
-# only print it +# only print '​Hello'​ 
-echo "​${STRING#?????​}"​+echo "​${STRING%??????​}"​
  
-# store it +# only print '​world'​ 
-STRING=${STRING#?????​}+echo "​${STRING#??????​}"​ 
 + 
 +# store it into the same variable 
 +STRING=${STRING#​??????}
 </​code>​ </​code>​
  
 ===== Bugs and Portability considerations ===== ===== Bugs and Portability considerations =====
  
-  * Bash doesn'​t follow either POSIX or its own documentation when expanding either a quoted ''"​$@"''​ or ''"​${arr[@]}"''​ with an adjacent expansion. ''"​$@$x"''​ expands in the same way as ''"​$*$x"''​ - i.e. all parameters plus the adjacent expansion are concatenated into a single argument. As a workaround, each expansion needs to be quoted separately. Unfortunately,​ this bug took a very long time to notice ​and exists in all versions of Bash through 4.2p29.<​code>​ +  * **Fixed in 4.2.36** ([[ftp://​ftp.cwru.edu/​pub/​bash/​bash-4.2-patches/​bash42-036 | patch]]). ​Bash doesn'​t follow either POSIX or its own documentation when expanding either a quoted ''"​$@"''​ or ''"​${arr[@]}"''​ with an adjacent expansion. ''"​$@$x"''​ expands in the same way as ''"​$*$x"''​ - i.e. all parameters plus the adjacent expansion are concatenated into a single argument. As a workaround, each expansion needs to be quoted separately. Unfortunately,​ this bug took a very long time to notice.<​code>​ 
-~ $ set -- a b c; x=foo; printf '<​%s>​ ' "​$@$x"​ "​$*$x"​ "​$@""​$x"​+~ $ set -- a b c; x=foo; printf '<​%s>​ ' "​$@$x"​ "$*""​$x" "​$@""​$x"​
 <a b cfoo> <a b cfoo> <a> <b> <​cfoo>​ <a b cfoo> <a b cfoo> <a> <b> <​cfoo>​
-</​code>​Bug report and patch: http://​lists.gnu.org/​archive/​html/​bug-bash/​2012-05/​msg00137.html+</​code>​
  
-  * Almost all shells disagree about the treatment of an unquoted ''​$@'',​ ''​${arr[@]}'',​ ''​$*'',​ and ''​${arr[*]}''​ when ''​IFS'' ​is set to null. POSIX is unclear about the expected behavior ​of these parameters when left unquoted, and for reasons I don't understand, a null IFS causes both wordsplitting ​and pathname expansion to behave randomly. Since there are few good reasons to leave ''​IFS''​ set to null for more than the duration of a command or two, and even fewer to expand ''​$@''​ and ''​$*''​ unquoted, this should be a rare issue. **Always quote them**!<​code>​ +  * Almost all shells disagree about the treatment of an unquoted ''​$@'',​ ''​${arr[@]}'',​ ''​$*'',​ and ''​${arr[*]}''​ when [[http://​mywiki.wooledge.org/​IFS | IFS]] is set to null. POSIX is unclear about the expected behavior. A null IFS causes both [[syntax:​expansion:​wordsplit | word splitting]] ​and [[syntax:​expansion:​globs | pathname expansion]] to behave randomly. Since there are few good reasons to leave ''​IFS''​ set to null for more than the duration of a command or two, and even fewer to expand ''​$@''​ and ''​$*''​ unquoted, this should be a rare issue. **Always quote them**!<​code>​ 
-~ $ touch x 'y z'for sh in {{,m}k,{d,b}a,z}sh; do printf '​%s\n' ​"​$sh" ​"$("​$sh"​ -'set -- a "b c" ​d \*IFS=printf ​"<​%s> ​" ​$*echoprintf "<​%s>​ " $@'​)";​ done ) +touch x 'y z' 
-ksh +for sh in bb {{d,b}a,{m,}k,z}sh; do 
-<a> <b c> <d> <x> <y z>  +    echo "​$sh"​ 
-<a> <b c> <d> <x> <y z>  +    ​"​$sh"​ -s a 'b c' ​d \* </​dev/​fd/​0 
-mksh +done <<​\EOF 
-<a b c d *>  +${ZSH_VERSION+:​} false && emulate sh 
-<a b c d *> +IFS= 
 +printf ​'<​%s> ​' ​$* 
 +echo 
 +printf "<​%s>​ " $@ 
 +echo 
 +EOF 
 +</code><code
 +bb 
 +<ab cd*> 
 +<ab cd*>
 dash dash
-<ab cd*>  +<ab cd*> 
-<ab cd*> ​+<ab cd*>
 bash bash
-<a> <b c> <d> <x> <y z>  +<a> <b c> <d> <x> <y z> 
-<a> <b c> <d> <x> <y z> +<a> <b c> <d> <x> <y z> 
 +mksh 
 +<a b c d *> 
 +<a b c d *> 
 +ksh 
 +<a> <b c> <d> <x> <y z> 
 +<a> <b c> <d> <x> <y z>
 zsh zsh
-<a> <b c> <d> <*>  +<a> <b c> <d> <x> <y z
-<a> <b c> <d> <*+<a> <b c> <d> <x> <y z
-</​code>​As you can see, only Bash and ksh93 produce the same output in this case. When ''​IFS''​ is set to a non-null value, or unset, all shells behave the same - first expanding into separate args, then applying pathname expansion and word-splitting to the results, except for zsh, which doesn'​t do pathname expansion.+</​code>​When ''​IFS''​ is set to a non-null value, or unset, all shells behave the same - first expanding into separate args, then applying pathname expansion and word-splitting to the results, except for zsh, which doesn'​t do pathname expansion ​in its default mode. 
 + 
 +  * Additionally,​ shells disagree about various wordsplitting behaviors, the behavior of inserting delimiter characters from IFS in ''​$*'',​ and the way adjacent arguments are concatenated,​ when IFS is modified in the middle of expansion through side-effects.<​code>​ 
 +for sh in bb {{d,​b}a,​po,​{m,​}k,​z}sh;​ do 
 +    printf '%-4s: ' "​$sh"​ 
 +    "​$sh"​ </​dev/​fd/​0 
 +done <<​\EOF 
 +${ZSH_VERSION+:​} false && emulate sh 
 +set -f -- a b c 
 +unset -v IFS 
 +printf '<​%s>​ ' ${*}${IFS=}${*}${IFS:​=-}"​${*}"​ 
 +echo 
 +EOF 
 +</​code><​code>​ 
 +bb  : <a b cabc> <​a-b-c>​ 
 +dash: <a b cabc> <​a-b-c>​ 
 +bash: <a> <b> <ca> <b> <c-a b c> 
 +posh: <a> <b> <ca b c> <​a-b-c>​ 
 +mksh: <a> <b> <ca b c> <​a-b-c>​ 
 +ksh : <a> <b> <ca> <b> <c> <a b c> 
 +zsh : <a> <b> <ca> <b> <c> <​a-b-c>​ 
 +</​code>​ksh93 and mksh can additionally achieve this side effect (and others) via the ''​${ cmds;​}''​ expansion. I haven'​t yet tested every possible side-effect that can affect expansion halfway through expansion that way. 
 + 
 +  * As previously mentioned, the Bash form of indirection by prefixing a parameter expansion with a ''​!''​ conflicts with the same syntax used by mksh, zsh, and ksh93 for a different purpose. Bash will "​slightly"​ modify this expansion in the next version with the addition of namerefs. 
 + 
 +  * Bash (and most other shells) don't allow .'s in identifiers. In ksh93, dots in variable names are used to reference methods (i.e. "​Discipline Functions"​),​ attributes, special shell variables, and to define the "real value" of an instance of a class. 
 + 
 +  * In ksh93, the ''​_''​ parameter has even more uses. It is used in the same way as ''​self''​ in some object-oriented languages; as a placeholder for some data local to a class; and also as the mechanism for class inheritance. In most other contexts, ''​_''​ is compatible with Bash. 
 + 
 +  * Bash only evaluates the subscripts of the slice expansion (''​${x:​y:​z}''​) if the parameter is set (for both nested expansions and arithmetic). For ranges, Bash evaluates as little as possible, i.e., if the first part is out of range, the second won't be evaluated. ksh93 and mksh always evaluate the subscript parts even if the parameter is unset. <​code>​ 
 + $ bash -c '​n="​y[\$(printf yo >&​2)1]"​ m="​y[\$(printf jo >&​2)1]";​ x=(); echo "​${x[@]:​n,​6:​m}"'​ # No output 
 + $ bash -c '​n="​y[\$(printf yo >&​2)1]"​ m="​y[\$(printf jo >&​2)1]";​ x=([5]=hi); echo "​${x[@]:​n,​6:​m}"'​ 
 +yo 
 + $ bash -c '​n="​y[\$(printf yo >&​2)1]"​ m="​y[\$(printf jo >&​2)1]";​ x=([6]=hi); echo "​${x[@]:​n,​6:​m}"'​ 
 +yojo 
 + $ bash -c '​n="​y[\$(printf yo >&​2)1]"​ m="​y[\$(printf jo >&​2)1]";​ x=12345; echo "​${x:​n,​5:​m}"'​ 
 +yojo 
 + $ bash -c '​n="​y[\$(printf yo >&​2)1]"​ m="​y[\$(printf jo >&​2)1]";​ x=12345; echo "​${x:​n,​6:​m}"'​ 
 +yo 
 +</​code>​ 
 + 
 +==== Quote Nesting ==== 
 + 
 +  * In most shells, when dealing with an "​alternate"​ parameter expansion that expands to multiple words, and nesting such expansions, not all combinations of nested quoting are possible. 
 +<​code>​ 
 +# Bash 
 + $ typeset -a a=(meh bleh blerg) b 
 + $ IFS=e 
 + $ printf "<​%s>​ " "​${b[@]-"​${a[@]}"​ "​${a[@]}"​}";​ echo # The entire PE is quoted so Bash considers the inner quotes redundant. 
 +<meh> <​bleh>​ <blerg meh> <​bleh>​ <​blerg>​ 
 + $ printf "<​%s>​ " "​${b[@]-${a[@]} ${a[@]}}";​ echo # The outer quotes cause the inner expansions to be considered quoted. 
 +<meh> <​bleh>​ <blerg meh> <​bleh>​ <​blerg>​ 
 + $ b=(meep beep) 
 + $ printf "<​%s>​ " "​${b[@]-"​${a[@]}"​ "​${a[@]}"​}"​ "​${b[@]-${a[@]} ${a[@]}}";​ echo # Again no surprises. Outer quotes quote everything recursively. 
 +<​meep>​ <​beep>​ <​meep>​ <​beep>​ 
 +</​code>​ 
 + 
 +Now lets see what can happen if we leave the outside unquoted. 
 +<​code>​ 
 +# Bash 
 + $ typeset -a a=(meh bleh blerg) b 
 + $ IFS=e 
 + $ printf "<​%s>​ " ${b[@]-"​${a[@]}"​ "​${a[@]}"​};​ echo # Inner quotes make inner expansions quoted. 
 +<meh> <​bleh>​ <blerg meh> <​bleh>​ <​blerg>​ 
 + $ printf "<​%s>​ " ${b[@]-${a[@]} ${a[@]}}; echo' # No quotes at all wordsplits / globs, like you'd expect. 
 +<m> <h> <bl> <h> <bl> <rg m> <h> <bl> <h> <bl> <​rg>​ 
 +</​code>​ 
 + 
 +This all might be intuitive, and is the most common implementation,​ but this design sucks for a number of reasons. For one, it means Bash makes it absolutely impossible to expand any part of the inner region //​unquoted//​ while leaving the outer region quoted. Quoting the outer forces quoting of the inner regions recursively (except nested command substitutions of course). Word-splitting is necessary to split words of the inner region, which cannot be done together with outer quoting. Consider the following (only slightly far-fetched) code: 
 + 
 +<​code>​ 
 +# Bash (non-working example) 
 + 
 +unset -v IFS # make sure we have a default IFS 
 + 
 +if some crap; then 
 +    typeset -a someCmd=(myCmd arg1 'arg2 yay!' '​third*arg*'​ 4) 
 +fi 
 + 
 +someOtherCmd=mycommand 
 +typeset -a otherArgs=(arg3 arg4) 
 + 
 +# What do you think the programmer expected to happen here? 
 +# What do you think will actually happen... 
 + 
 +"​${someCmd[@]-"​$someOtherCmd"​ arg2 "​${otherArgs[@]}"​}"​ arg5 
 +</​code>​ 
 + 
 +This final line is perhaps not the most obvious, but I've run into cases were this type of logic can be desirable and realistic. We can deduce what was intended: 
 + 
 +  * If ''​someCmd''​ is set, then the resulting expansion should run the command: ''"​myCmd"​ "​arg1"​ "arg2 yay!" "​third*arg*"​ "​4"​ "​arg5"''​ 
 +  * Otherwise, if ''​someCmd''​ is not set, expand ''​$someOtherCmd''​ and the inner args, to run a different command: ''"​mycommand"​ "​arg2"​ "​arg3"​ "​arg4"​ "​arg5"''​. 
 + 
 +Unfortunately,​ it is impossible to get the intended result in Bash (and most other shells) without taking a considerably different approach. The only way to split the literal inner parts is through word-splitting,​ which requires that the PE be unquoted. But, the only way to expand the outer expansion correctly without word-splitting or globbing is to quote it. Bash will actually expand the command as one of these: 
 + 
 +<​code>​ 
 +# The quoted PE produces a correct result here... 
 + $ bash -c '​typeset -a someCmd=(myCmd arg1 "arg2 yay!" "​third*arg*"​ 4); printf "<​%s>​ " "​${someCmd[@]-"​$someOtherCmd"​ arg2 "​${otherArgs[@]}"​}"​ arg5; echo'​ 
 +<​myCmd>​ <​arg1>​ <arg2 yay!> <​third*arg*>​ <4> <​arg5>​ 
 + 
 +# ...but in the opposite case the first 3 arguments are glued together. There are no workarounds. 
 + $ bash -c '​typeset -a otherArgs=(arg3 arg4); someOtherCmd=mycommand;​ printf "<​%s>​ " "​${someCmd[@]-"​$someOtherCmd"​ arg2 "​${otherArgs[@]}"​}"​ arg5; echo'​ 
 +<​mycommand arg2 arg3> <​arg4>​ <​arg5>​ 
 + 
 +# UNLESS! we unquote the outer expansion allowing the inner quotes to 
 +# affect the necessary parts while allowing word-splitting to split the literals: 
 + $ bash -c '​typeset -a otherArgs=(arg3 arg4); someOtherCmd=mycommand;​ printf "<​%s>​ " ${someCmd[@]-"​$someOtherCmd"​ arg2 "​${otherArgs[@]}"​} arg5; echo'​ 
 +<​mycommand>​ <​arg2>​ <​arg3>​ <​arg4>​ <​arg5>​ 
 + 
 +# Success!!! 
 + $ bash -c '​typeset -a someCmd=(myCmd arg1 "arg2 yay!" "​third*arg*"​ 4); printf "<​%s>​ " ${someCmd[@]-"​$someOtherCmd"​ arg2 "​${otherArgs[@]}"​} arg5; echo'​ 
 +<​myCmd>​ <​arg1>​ <​arg2>​ <​yay!>​ <​third*arg*>​ <4> <​arg5>​ 
 + 
 +# ...Ah f^^k. (again, no workaround possible.) 
 +</​code>​ 
 + 
 +=== The ksh93 exception === 
 +To the best of my knowledge, ksh93 is the only shell that acts differently. Rather than forcing nested expansions into quoting, a quote at the beginning and end of the nested region will cause the quote state to reverse itself within the nested part. I have no idea whether it's an intentional or documented effect, but it does solve the problem and consequently adds a lot of potential power to these expansions. 
 + 
 +All we need to do is add two extra double-quotes:​ 
 +<​code>​ 
 +# ksh93 passing the two failed tests from above: 
 + 
 + $ ksh -c '​otherArgs=(arg3 arg4); someOtherCmd="​mycommand";​ printf "<​%s>​ " "​${someCmd[@]-""​$someOtherCmd"​ arg2 "​${otherArgs[@]}""​}"​ arg5; echo'​ 
 +<​mycommand>​ <​arg2>​ <​arg3>​ <​arg4>​ <​arg5>​ 
 + 
 + $ ksh -c '​typeset -a someCmd=(myCmd arg1 "arg2 yay!" "​third*arg*"​ 4); printf "<​%s>​ " "​${someCmd[@]-""​$someOtherCmd"​ arg2 "​${otherArgs[@]}""​}"​ arg5; echo'​ 
 +<​myCmd>​ <​arg1>​ <arg2 yay!> <​third*arg*>​ <4> <​arg5>​ 
 +</​code>​
  
 +This can be used to control the quote state of any part of any expansion to an arbitrary depth. Sadly, it is the only shell that does this and the difference may introduce a possible compatibility problem.
 ===== See also ===== ===== See also =====
   * Internal: [[syntax:​expansion:​intro | Introduction to expansion and substitution]]   * Internal: [[syntax:​expansion:​intro | Introduction to expansion and substitution]]
   * Internal: [[syntax:​arrays|Arrays]]   * Internal: [[syntax:​arrays|Arrays]]
   * Dictionary, internal: [[dict:​terms:​parameter]]   * Dictionary, internal: [[dict:​terms:​parameter]]
  • syntax/pe.1339494867.txt
  • Last modified: 2012/06/12 09:54
  • by ormaaj