Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

String Operations in Shell

Version 2.1 (May 10, 2021)

News

Bash

Recommended Links

Bash as a scripting language Double square bracket conditionals Pattern Matching Variable Substitution

 

${var:-bar}
(default)
${var:=bar}
(set if not defined)
# ##   % %% KSH Substitutions
Length Index substr function substr function in bash )expanded info Search and Replace Concatenation Trimming from left and right Text processing using regex
BASH Debugging Bash Built-in Variables Annotated List of Bash Enhancements bash Tips and Tricks Sysadmin Horror Stories Tips and Tricks Humor Etc

Those notes are partially based on lecture notes by Professor Nikolai Bezroukov at FDU.

String operators allow you to manipulate the contents of a variable without resorting to AWK or Perl. Modern shells such as bash 3.x or ksh93 supports most of the standard string manipulation functions, but in a very pervert, idiosyncratic way. Anyway, standard functions like length, index, substr are available. Strings can be concatenated by juxtaposition and using double quoted strings. You can ensure that variables exist (i.e., are defined and have non-null values) and set default values for variables and catch errors that result from variables not being set. You can also perform basic pattern matching. There are several basic string operations available in bash, ksh93 and similar shells:

Introduction

Shell string processing capabilities were weak, but recently in bash 4.x they were improved and briught closer to ksh88 capabilities. Most "classic" string  handing function such as index, substr, concatenation, trimming, case conversion, translation of one set of symbols into another, etc, are available.  Directly or indirectly. Regular expression now can be used for matching the string using Perl compatible regex pattern. So if you use bash 4.1+  life is good. Almost ;-) 

One interesting idiosyncrasy of Unix shells including bash is that  many string operators in shell use unique among programming languages curly-bracket syntax. In shell any variable can be displayed as ${name_of_the_variable} instead of $name_of_the_variable. This notation was initially introduced to protect a variable name from merging with string that comes after it, but was extended to allow string operations on variables.  Here is example in which it is used for separation of a variable $var and a string "_string" using curly brackets:

export var='test' 
echo ${var}_string # var is a variable that uses syntax ${var} with the value test
echo $var_string # var_string is a variable that doesn't exist, echo doesn't print anything

In Korn 88 shell this notation was extended to allow expressions inside curvy brackets. For example

${var:=moo}

Each operation is encoded using special symbol or two symbols ( "digraph", for example :- , := , etc). An argument that the operator may need is positioned after the symbol of the operation. Later this notation extended ksh93 and adopted by bash and other shells.  Recently it was also extended to case conversion using ^^ and ,, digrams

This "ksh-originated" group of operators is the most popular and probably the most widely used group of string-handling operators so it makes sense to learn them, if only in order to be able to modify old scripts.

Recent developments

Bash 3.2 introduced =~ operator with "normal" Perl-style regular expressions that can be used instead in many cases and they are definitely preferable in new scripts that you might write. for more about Perl compatible regular expression see Text processing using regex. They are now standard de-fact and are use in Perl, Python, Java, Javascript and other modem languages. So good knowledge of them is necessary for any sysadmin.

Let's say we need to establish whether variable $ip appears to be a valid IP address:

for ip in "255.255.255.255" "10.10.10.10" "400.0.0.0" ; do

   echo "=== testing $ip ==="
   if [[ $ip =~ ^[0-2][0-9]{0,2}\.[0-2][0-9]{0,2}\.[0-2][0-9]{0,2}\.[0-2][0-9]{0,2} ]] ; then
       echo "$ip Looks like a valid IP"
   else
      echo "$ip is unvalid ip"
   fi
done

Running this fragment will get:

=== testing 255.255.255.255 ===
255.255.255.255 Looks like a valid IP
=== testing 10.10.10.10 ===
10.10.10.10 Looks like a valid IP
=== testing 400.0.0.0 ===
400.0.0.0 is unvalid ip

In bash-3.1, a string append operator (+=) was added:

PATH+=":~/bin"
echo "$PATH"

In bash 4.1 negative length specifications in the ${var:offset:length} expansion,   previously errors, are now treated as offsets from the end of the variable.

It also extended printf built-in which now has a new %(fmt)T specifier, which allows time values to use strftime-like formatting.

printf -v can now assign values to array indices.

The read built-in now has a new `-N nchars' option, which reads exactly NCHARS characters, ignoring delimiters like newline.

The {x} operators to the [[ conditional command now do string  comparison according to the current locale if the compatibility level is greater than 40.

Matching pattern substitution

If pattern is a string, then "matching pattern substitution" is the combination of two functions index and substr, Only if index function succeed, substr function is applied. If regular expression is used, this is equivalent to $var=s/regex/string/operation in Perl. See also Search and Replace Unlike Perl only basic regular expressions are allowed

This notation was introduced in ksh88 and still remains very idiosyncratic. We have four operations: #, ##, % and %%. Two perform search/matching from the left of the string, two from the right.

In examples below we will assume that the variable var has value "this is a test" (as produced by execution of statement export var="this is a test")

The # and % operators have a convenient mnemonic if you use the US keyboard. The # key is on the left side of the $ key and operates from the left, while % is to right of $ key.

Here is test Perl program for the example above that illustrates how # operates

#!/usr/bin/perl
   $var="this is a test";
   $pattern='this';
   if ( ($k=index($var,$pattern))>-1 ) {
      substr($var,$k,length($pattern))='';
   }
   print "var=$var\n";
If executed, it will print
var= is a test
Note a space after equal sign -- pattern matches up to letter 't' so space is the first unmatched symbol that gets into the result.

Implementation of classic string operations in shell

Despite shell deficiencies in this area and idiosyncrasies preserved from 1970th most classic string operations can be implemented in shell. You can define functions that behave almost exactly like in Perl or other "more normal" scripting languages. In case shell facilities are not enough you can always add functions written in AWK or Perl. It's actually sad that AWK was not integrated into shell.

Length Operator

There are several ways to get length of the string.

More complex example. Here's the function for validating that that string is less the a given max length. It requires two parameters, the actual string and the maximum length the string should be.

check_length() 
# to call: check_length string max_length_of_string 
{ 
	# check we have the right params 
	if (( $# != 2 )) ; then 
	   echo "check_length need two parameters: a string and max_length" 
	   return 1 
	fi 
	if (( ${#1} > $2 )) ; then 
	   return 1 
	fi 
	return 0 
} 

You could call the function check_length like this:

#!/usr/bin/bash
# test_name 
while : 
do 
  echo -n "Enter customer name :" 
  read NAME 
  [ check_length $NAME 10 ] && break
  echo "The string $NAME is longer then 10 characters"    
done 
echo $NAME

Determining the Length of Matching Substring at Beginning of String

There is rarely used and pretty obscure capability of expr built-in operation called match. Sometimes it can be useful to emulate index function, as bash (shame on developers) does not yet has index function. Only substr function, which was recently implemented (see below)
expr match "$string" : '$substring' 

where:

The arguments are converted to strings and the second is considered to be a (basic, the same as used by GNU grep ) regular expression, with a `^' implicitly prepended. The first argument is then matched against this regular expression.

If the match succeeds and REGEX uses `\(' and `\)', the `:' expression returns the part of STRING that matched the sub-expression; otherwise, it returns the number of characters matched.

If the match fails, the `:' operator returns the null string if `\(' and `\)' are used in REGEX, otherwise 0

Only the first `\( ... \)' pair is relevant to the return value; additional pairs are meaningful only for grouping the regular expression operators.

In the regular expression, `\+', `\?', and `\|' are operators which respectively match one or more, zero or one, or separate alternatives. SunOS and other `expr''s treat these as regular characters. (POSIX allows either behavior.)

For example

my_string=abcABC123ABCabc
#         |------|

echo `expr "$my_string" : 'abc[A-Z]*.2'`       # 8
See shell script - OR in `expr match` - Unix & Linux Stack Exchange for some additional info.

Index

There is no such function. In cases when you use if often and need exact position of the substring in the source string you probably will be better off switching to Perl or any other scripting language that you know well.

If you just need to check that the substring occurs in the string can also use regular expression with the =~ operator

You can use pattern matching for determining whether particular substring is present in string (the most typical usage of index function) but you can't determine its position

string="abba"
[[ $string =∼ *bb*  ]] && echo "bb is contains in string $string" 

Note that you do not need to quote variables inside [[...]]. also if you need space you can escape it with the backslash.

Case statement also will work and in many cases can be used to emulzte index function:

case "$string" in 
  *bb*)
    # Do stuff
    ;;
esac

In discussion String contains in Bash - Stack Overflow the following solution was ranked highest which provides the solution to "space inside the string problem" which hunt previous approaches:

You can use Marcus's answer (* wildcards) outside a case statement, too, if you use double brackets:
string='My long string'
if [[ $string == *"My long"* ]]; then
  echo "It's there!"
fi

The other half-decent way to implement checking if the string contains a substring is to use grep and <<< redirection operator. The option -b prints the byte offset of the string shown in the first position that you can extract from the result, but this is a perversion. You need numeric position you need to switch to Perl or AWK.

if  [[ grep $substr <<< string ]] ; then 
    echo Substring $substr occurs in string
fi 

NOTE: Function index that exists in expr is not what you want -- the second operator in it is the set of characters (like in tr) not a string.

The expr index command searches through your first string looking the first occurrence of any character from your second string. In this case, it is recognizing that the 'o' in the characters 'Mozilla' matches the 4th character in "Info.out..."

This using this as a test to see what happens. It will return 4 as the first match for 'd':

echo `expr index "abcdefghijklmnopqrstuvwxyz" xyzd`
In other words
expr index $string $character_set
returns the numerical position in $string of first character in set of characters defined by $character_set that matches.
stringZ=abcABC123ABCabc
echo `expr index "$stringZ" C12`             # 6
                                             # C position.

echo `expr index "$stringZ" c`              # 3
# 'c' (in #3 position) 

This is the close equivalent of strchr() in C. Moreover:

substr function

Bash provides two implementation of substr function which are not identical:

  1. Via expr function
  2. a part of pattern matching operators in the form ${param:offset[:length}.

I recommend using the second one, as this is more compact notation and does not involves using external function expr. It also looks more modern, as if inspired by Python, although its origin has nothing to do with Python. It is actually more compact notation that in Perl.

Implementation via expr function

Classic substr function is available via expr function:

expr substr $string $position $length
Extracts $length characters from $string starting at $position.. The first character has index one.
stringZ=abcABC123ABCabc
#       123456789......
#       1-based indexing.

echo `expr substr $stringZ 1 2`              # ab
echo `expr substr $stringZ 4 3`              # ABC

Notes:

Implementation of substr function using :: notation in bash 4.1+

Idiosyncratic, but pretty slick implementation of substring function is also available in bash 3.x as a part of pattern matching operators in the form

${param:offset[:length}
Extracts substring with $length characters from $string starting at $position.. Offset is counted from zero like in Perl.

NOTES:

  1. Starting with bash 4.1 negative length specification in the ${var:offset:length} expansion, previously being an error, is now treated as offset from the end of the variable, like in Perl.
  2. If the $string parameter is "*" or "@", then this extracts the positional parameters, starting at $position.
${string:position:length}
If $length is not given the rest of the string starting from position $offset is extracted:
stringZ=abcABC123ABCabc
#       0123456789.....
#       0-based indexing.

echo ${stringZ:0}                            # abcABC123ABCabc
echo ${stringZ:1}                            # bcABC123ABCabc
echo ${stringZ:7}                            # 23ABCabc

echo ${stringZ:7:3}                          # 23A
                                             # Three characters of substring.

NOTES:

Somewhat strange nuance: taking a slice of the array of positional parameters using :: notation

If the $string parameter is "*" or "@", then this is not substr function. Instead it slice the array of parameters extracts a maximum of $length positional parameters, starting at $position.

echo ${*:2}          # Echoes second and following positional parameters.
echo ${@:2}          # Same as above.

echo ${*:2:3}        # Echoes three positional parameters, starting at second. 

Search and Replace

You can search and replace substring in a variable using ksh syntax:

alpha='This is a test string in which the word "test" is replaced.' 
beta="${alpha/test/replace}"

The string "beta" now contains an edited version of the original string in which the first case of the word "test" has been replaced by "replace". To replace all cases, not just the first, use this syntax:

beta="${alpha//test/replace}"

Note the double "//" symbol.

Here is an example in which we replace one string with another in a multi-line block of text:

list="cricket frog cat dog" 
poem="I wanna be a x\n\ A x is what I'd love to be\n\ If I became a x\n\ How happy I would be.\n"
for critter in $list; do
   echo -e ${poem//x/$critter}
done
There are several additional capabilities
${var:pos[:len]} # extract substr from pos (0-based) for len
${var/substr/repl} # replace first match
${var//substr/repl} # replace all matches
${var/#substr/repl} # replace if matches at beginning (non-greedy)
${var/##substr/repl} # replace if matches at beginning (greedy)
${var/%substr/repl} # replace if matches at end (non-greedy)
${var/%%substr/repl} # replace if matches at end (greedy)
${#var} # returns length of $var
${!var} # indirect expansion

Concatenation

In bash-3.1, a string append operator (+=) was added, It is now a preferred solution:

PATH+=":~/bin"
echo "$PATH"

Traditionally in shell strings were concatenated by juxtaposition and using double quoted strings. For example

PATH="$PATH:/usr/games"

Double quoted string in shell is almost identical to double quoted string in Perl and performs macro expansion of all variables in it. The minor differences are the treatment of escaped characters and the newline character. If you want exact match you can use $'string'

#!/bin/bash

# String expansion.Introduced with version 2 of Bash.

#  Strings of the form $'xxx' have the standard escaped characters interpreted. 

echo $'Ringing bell 3 times \a \a \a'
     # May only ring once with certain terminals.
echo $'Three form feeds \f \f \f'
echo $'10 newlines \n\n\n\n\n\n\n\n\n\n'
echo $'\102\141\163\150'   # Bash
                           # Octal equivalent of characters.

exit 0

Trimming from left and right

Using the wildcard character (?), you can imitate Perl chop function (which cuts the last character of the string and returns the rest) quite easily

test="~/bin/"
trimmed_last=${test%?}
trimmed_first=${test#?}
echo "original='$test,timmed_first='$trimmed_first', trimmed_last='$trimmed_last'"

The first character of a string can also be obtained with printf:

printf -v char "%c" "$source"
Conditional chopping line in Perl chomp function or REXX function trim can be done using while loop, for example:
function trim
{
   target=$1
   while : # this is an infinite loop
   do
   case $target in
      ' '*) target=${target#?} ;; ## if $target begins with a space remove it
      *' ') target=${target%?} ;; ## if $target ends with a space remove it
      *) break ;; # no more leading or trailing spaces, so exit the loop
   esac
   done
   return target
}

A more Perl-style method to trim trailing blanks would be

spaces=${source_var##*[! ]} ## get the trailing blanks in var $spaces
trimmed_var=${source_var#$spaces}
The same trick can be used for removing leading spaces.

Assignment of default value for undefined variables

Operator: ${var:-bar} is useful for assigning a variable a default value.

It word the following way: if $var exists and is not null, it returns $var. If it doesn't exist or is null, return bar. This operator does not change the variable $var.

Example:

$ export var=""
$ echo ${var:-one}
one
$ echo $var

More complex example:

sort -nr $1 | head -${2:-10}

A typical usage include situations when you need to check if arguments were passed to the script and if not assign some default values::

#!/bin/bash 
export FROM=${1:-"~root/.profile"}
export TO=${2:-"~my/.profile"}
cp -p $FROM $TO

set variable if it is not defined with the operator ${var:=bar}

It works as following: If $var exists and is not null, return $var. If it doesn't exist or is null, set $var to bar and return bar.

Example:

$ export var=""
$ echo ${var:=one}
one
echo $var
one

String comparison via double square bracket conditionals

The [[ ]] construct was introduced in ksh88 as a way to compensate for multiple shortcomings and limitations of the [ ] (test) solution. Essentially it makes [ ] construct obsolete except for running a program to get a return code. In turn double round brackets ((..)) construct made The [[ ]] construct obsolete for integer comparisons.

There are two types of operators that can be used inside double square bracket construct:

Paradoxically integer comparison operators are represented as strings ( -eq, -ne, -gt, etc) while string comparison operators as delimiters ("=", "==", "!=", "<", ">", etc).

The [[ ]] construct expects expression. In ksh delimiters [[ and ]] serve as single quotes so you do not have macro expansion inside: variable substitution and wildcard expansion aren't done within [[ and ]], making quoting less necessary. In bash this is less true :-).

It can act as independent operator as it produces return code. So constructs like

[[ $string =∼ [aeiou] ]] && exit; 

are legitimate and actually pretty compact way to write if statements without else clause that contain a single statement in then block.

One of the [[ ]] construct warts is that it redefined == as a pattern matching operation, which anybody who programmed in C/C++/Java strongly resent. Latest bash version corrected that and allow using Perl-style =~ operator instead (I think ksh93 allow that too):

string="abba"
[[ $string =∼ [aeiou] ]]
echo $?
0

[[ $string =∼ h[sdfghjkl] ]]
echo $?
1

Like [ ] construct [[ ]] construct can be used as a separate statement that returns an exit status depending upon whether condition is true or not. With && and || constructs discussed above this provides an alternative syntax for if-then and if-else constructs

if [[ -d  $HOME/$user ]] ; then echo " Home for user 
   $user exists..."; fi

can be written simpler as

[[ -d  $HOME/$user ]] && echo " Home for user $user 
   exists..." 

There are several types of expressions that can be used inside [[ ... ]] construct:

One unpleasant reality (and probably the most common gotcha) of using legacy [[...]] integer comparison constructs is that if one of the variable is not initialized it produces syntax error. Various tricks are used to avoid this nasty macro substitution side effect, that came of a legacy of extremely week implementation of comparisons in Borne shell (there are way too many crazy things implemented in Borne shell, anyway ;-).

There are two classic tricks to deal with this gotchas in old [[..]] construct as you will be dealing with scripts infested with those old constructs pretty often. They can be and often are used simultaneously:

Generally, it is better to initialize most variables explicitly. I know it is difficult as old habits die slowly, but this can be done. Here are the most common "legacy integer comparison operators":

-eq
is equal to

if [[ "$a" -eq "$b" ]]

-ne
is not equal to

if [[ "$a" -ne "$b" ]]

-gt
is greater than

if [[ "$a" -gt "$b" ]]

-ge
is greater than or equal to

if [[ "$a" -ge "$b" ]]

-lt
is less than

if [[ "$a" -lt "$b" ]]

-le
is less than or equal to

if [[ "$a" -le "$b" ]]

String comparisons

String comparisons is all what left useful in this construct as for integer comparisons ((..)) construct is better and for file comparisons older [...] construct is equal. This topic is discussed at greater length at String Operations in Shell

Notes:

Operator True if...
str = pat
str == pat
str matches pat. Note that in case of "==" that's not what you logically expect, if you have some experience with C /C++/Java programming !!!
str != pat str does not match pat.
str1 < str2 str1 is less than str2 is collation order used
str1 > str2 str1 is greater than str2.
-n str str is not null (has length greater than 0).
-z str str is null (has length 0).
file1 -ef file2 file1 is another name for file2 (hard or symbolic link)

While we're cleaning up code we wrote in the last chapter, let's fix up the error handling in the highest script The code for that script is:

filename=${1:?"filename missing."}
howmany=${2:-10}
sort -nr $filename | head -$howmany

Recall that if you omit the first argument (the filename), the shell prints the message highest: 1: filename missing. We can make this better by substituting a more standard "usage" message:

if [[ -z $1 ]]; then
    print 'usage: howmany filename [-N]'
else
    filename=$1
    howmany=${2:-10}
    sort -nr $filename | head -$howmany
fi

It is considered better programming style to enclose all of the code in the if-then-else, but such code can get confusing if you are writing a long script in which you need to check for errors and bail out at several points along the way. Therefore, a more usual style for shell programming is this:

if [[ -z $1 ]]; then
    print 'usage: howmany filename [-N]'
    return 1
fi
filename=$1
howmany=${2:-10}
sort -nr $filename | head -$howmany

Pattern Matching

There are two types of pattern matching is shell:

Unless you need to modify old scripts it does not make sense to use old ksh-style regex in bash.

Perl-style regular expressions

(partially borrowed from Bash Regular Expressions | Linux Journal)

Since version 3 of bash (released in 2004) bash implements an extended regular expressions which are mostly compatible with Perl regex. They are also called POSIX regular expressions as they are defined in IEEE POSIX 1003.2. (which you should read and understand to use the full power provided). Extended regular expression are also used in egrep so they are well known by system administrators. Please note that Perl regular expressions are equivalent to extended regular expressions with a few additional features:

Predefined Character Classes

Extended regular expression support set of predefined character classes. When used between brackets, these define commonly used sets of characters. The POSIX character classes implemented in extended regular expressions include:

NOTE: I have problems with GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu) using those extended classes. It does accept them, but it does not match correctly.

Modifiers are similar to Perl

Extended regex Perl regex
a+ a+
a? a?
a|b a|b
(expression1) (expression1)
{m,n} {m,n}
{,n} {,n}
{m,} {m,}
{m} {m}

It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).

In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..

The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:

#!/bin.bash

if [[ $# -lt 2 ]]; then
    echo "Usage: $0 PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo "$1 matches"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo "  capture[$i]: ${BASH_REMATCH[$i]}"
            let i++
        done
    else
        echo "$1 does not match"
    fi
    shift
done

Assuming the script is saved in "bashre.sh", the following sample shows its output:

  # sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc
  regex: aa(b{2,3}[xyz])cc

  aabbxcc matches
    capture[1]: bbx
  aabbcc does not match

Old KSH Pattern-matching Operators

Pattern-matching operators were introduced in ksh88 in a very idiosyncratic way. The notation is different from used by Perl or utilities such as grep but it has some internal logic in it and is usable for those who are programming in bash on regular basis. For everybody else there is a need to consult the reference each time you use this capabilities (stack overflow ;-) Life is not perfect.

NOTE: While they are hard to remember, but there is a handy mnemonic tip:

There are two kinds of pattern matching available: matching from the left and matching from the right.

The operators, with their functions and an example, are shown in the following table (note that on keyboard symbol "#" is to the left and symbol "%" is to the right of dollar sign; that might help to memorize them):

Operator Meaning Example
${var#t*is} Deletes the shortest possible match from the left: If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest. export $var="this is a test"
echo ${var#t*is}
is a test
${var##t*is} Deletes the longest possible match from the left: If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest. export $var="this is a test"

echo ${var##t*is}

a test

${var%t*st} Deletes the shortest possible match from the right: If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest. export $var="this is a test"

echo ${var%t*st}

this is a

${var%%t*st} Deletes the longest possible match from the right: If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest. export $var="this is a test" echo ${var%%t*is}

These operators can be used to cu string both from right and left and extract the necessary part. In the example below this is done with uptime:

cores=`grep processor /proc/cpuinfo | wc -l`
cpuload=`uptime`

cpuload=${cpuload#*average: }
cpuload=${cpuload%%.*}
if (( cpuload > cores + 1  )) ;  then 
   echo "Server $HOSTNAME overloaded: $cpuload on $corex cores"
fi

For example, the following script changes the extension of all .html files to .htm.

#!/bin/bash
# quickly convert html filenames for use on a dossy system
# only handles file extensions, not filenames

for i in *.html; do
  if [ -f ${i%l} ]; then
    echo ${i%l} already exists
  else
    mv $i ${i%l}
  fi
done

The classic use for pattern-matching operators is stripping off components of pathnames, such as directory prefixes and filename suffixes. With that in mind, here is an example that shows how all of the operators work. Assume that the variable path has the value /home /billr/mem/long.file.name; then:

Expression         	  Result
${path##/*/}                       long.file.name
${path#/*/}              billr/mem/long.file.name
$path              /home/billr/mem/long.file.name
${path%.*}         /home/billr/mem/long.file
${path%%.*}        /home/billr/mem/long

Operator #: ${var#t*is} deletes the shortest possible match from the left

Example:

$ export var="this is a test"
$ echo ${var#t*is}
is a test

Operator ##: ${var##t*is} deletes the longest possible match from the left

Example:

$ export var="this is a test"
$ echo ${var##t*is}
a test

Operator %: ${var%t*st} Function: deletes the shortest possible match from the right

Example:

$ export var="this is a test" 
$ echo ${var%t*st} 
this is a
for i in *.htm*; do 
   if [ -f ${i%l} ]; then  
      echo "${i%l} already exists" 
   else  
      mv $i ${i%l} 
   fi  
done

Operator %%: ${var%%t*st} deletes the longest possible match from the right

Example:

$ export var="this is a test" 
$ echo ${var%%t*st}

Ksh-style regular expressions

A KSH regular expression are now obsolete. Please use Perl-style regular expression if [[ variable =~ regex ]] instead (available in bash 4.x)

They use idiosyncratic prefix-based notation that is difficult to learn after you got used to regular suffix based notation. In other words each of "quantity metasymbol" or quantifiers should be used as prefix for the string not as suffix as everywhere in Unix (like in ls resolv*, not ls *(resolv).)

This was a huge blunder committed by David Korn and still is was never corrected. In any case attempts to used it looks pretty perverse and if should be abandoned for good. The notes below are just for those unfortunate people who need to understand somebody else scripts which use this notation. Information below was extracted from Learning the Korn Shell, 2nd Edition 4.5. String Operators. I never verified its correctness.

Each such operator has the form x(exp), where x is the particular operator and exp is any regular expression (often simply a regular string). The operator determines how many occurrences of exp a string that matches the pattern can contain.

Operator Meaning
*(exp) 0 or more occurrences of exp
+(exp) 1 or more occurrences of exp
?(exp) 0 or 1 occurrences of exp
@(exp1|exp2|...) exp1 or exp2 or...
!(exp) Anything that doesn't match exp
Expression Matches
x x
*(x) Null string, x, xx, xxx, ...
+(x) x, xx, xxx, ...
?(x) Null string, x
!(x) Any string except x
@(x) x (see below)

The following section compares Korn shell regular expressions to analogous features in awk and egrep. If you aren't familiar with these, skip to the section entitled "Pattern-matching Operators."

shell basic regex vs. awk/egrep regular expressions

Shell egrep/awk Meaning
*(exp) exp* 0 or more occurrences of exp
+(exp) exp+ 1 or more occurrences of exp
?(exp) exp? 0 or 1 occurrences of exp
@(exp1|exp2|...) exp1|exp2|... exp1 or exp2 or...
!(exp) (none) Anything that doesn't match exp

These equivalents are close but not quite exact. Actually, an exp within any of the Korn shell operators can be a series of exp1|exp2|... alternates. But because the shell would interpret an expression like dave|fred|bob as a pipeline of commands, you must use @(dave|fred|bob) for alternates

For example:

It is worth re-emphasizing that shell regular expressions can still contain standard shell wildcards. Thus, the shell wildcard ? (match any single character) is the equivalent to . in egrep or awk, and the shell's character set operator [...] is the same as in those utilities. For example, the expression +([0-9]) matches a number, i.e., one or more digits. The shell wildcard character * is equivalent to the shell regular expression * (?).

A few egrep and awk regex operators do not have equivalents in the Korn shell. These include:

The first two pairs are hardly necessary, since the Korn shell doesn't normally operate on text files and does parse strings into words itself.

Conversion of strings to lower or upper case

Conversion of string to lower and upper case was weak point of shell for along time. Before recent enhancements described below the "standard" way to converting string was to use tr function:

a='Hi all'
a=$(tr '[:upper:]' '[:lower:]' <<< "$a") 
hi all

Please note that this is a better solution then

a=$(tr '[A-Z]' '[a-z]' <<< "$a")

Using A-Z assumed that the text is ASCII. Which may or may not be the case. Note that tr '[A-Z]' '[a-z]; is incorrect in almost all locales. For example, in the en-US locale, A-Z is actually the interval AaBbCcDdEeFfGgHh...XxYyZ

In more complex cases Perl or AWK should be used instead.

If you need to work with very old versions of bash (1.x) and do not have access to tr, sed, awl or Perl (poor you ;-) you can to create two (or more) functions for this purpose. See Converting string to lower case in Bash - Stack Overflow for inspiration. Here is an example from this thread:

lcs="abcdefghijklmnopqrstuvwxyz"
ucs="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
input="Change Me To All Capitals"
for (( i=0; i<"${#input}"; i++ )) ; do :
for (( j=0; j<"${#lcs}"; j++ )) ; do :
if [[ "${input:$i:1}" == "${lcs:$j:1}" ]] ; then
input="${input/${input:$i:1}/${ucs:$j:1}}"
fi
done
done

Using typeset/declare statement for converting (or more correctly casting) string to lower or upper case

The typeset keyword (declare is the alias of the keyword typeset) usually is used for specifying integer type or creating local variables in shell. But recently its functionality was extended and now it allow perform two really useful string operations:

From the moment you used this option the string will cast to the specified case on assignment. It will work until you explicitly turn it off. You can turn off typeset options explicitly by typing typeset +o , where o is the option you turned on before.

In Korn shell there are two additional useful options (-L and -R) that allow also trimming string to fixed length and remove leading blanks. An obvious application for the -those options is one in which you need fixed-width output.

Here are a simple example taken from Leaning the Korn Shell [Chapter 6] 6.3 Arrays

Assume that the variable alpha is assigned the letters of the alphabet, in alternating case, surrounded by three blanks on each side:

alpha="   aBcDeFgHiJkLmNoPqRsTuVwXyZ   "

Table 6.6 shows some typeset statements and their resulting values (assuming that each of the statements are run "independently").

Table 6.6: Examples of typeset String Formatting Options
Statement Value of v
typeset -L v=$alpha "aBcDeFgHiJkLmNoPqRsTuVwXyZ "
typeset -L10 v=$alpha "aBcDeFgHiJ"
typeset -R v=$alpha " aBcDeFgHiJkLmNoPqRsTuVwXyZ"
typeset -R16 v=$alpha "kLmNoPqRsTuVwXyZ"
typeset -l v=$alpha " abcdefghijklmnopqrstuvwxyz"
typeset -uR5 v=$alpha "VWXYZ"
typeset -Z8 v= "123.50" "00123.50"

More examples

$ a="A Few Words"
$ declare -l a=a
$ echo "$a"
a few words
$ a="A Few Words"
$ declare -u a=a
$ echo "$a"
A FEW WORDS

The declation provided will work for subsequent assignments too: all string will be forcefully converted to specific case on assignment. While in many cases very convenient, sometimes it can mange your strings, is you forget about the fact that this declation will be in force until implicitly removed. If such behaviour is not what you want you need to remove particular flag from the variable by using declare the attribute with +l or +u

Converting string to lower case or upper case using ^^ and ,, operators

Another way to convert the string to lower case in bash 4.x, which do not have such a side effect, is ,, operator

$ a=${a,,}

Similarly to convert string to upper case you can use ^^

a=${a^^}

You can also toggle first character by word using ~ (probably inspired by VI)

${a~}

There are also several other options.

Using distance between characters in ASCII character set for conversion

More sophisticated version that works only with ASCII characters uses the fact that the distance between lower and upper characters is the same for all letters of the alphabet. (but not for _ or any other special letter). In this case you can work with the binary representation of each letter, which allow you to implement more complex variants of conversion (for example with partial transliteration of symbols)

97 - 65 = 32

And this is the working version with examples.
Please note the comments in the code, as they explain a lot of stuff:

#!/bin/bash

# lowerupper.sh

# Prints the lowercase version of a char
lowercaseChar(){
    case "$1" in
        [A-Z])
            n=$(printf "%d" "'$1")
            n=$((n+32))
            printf \\$(printf "%o" "$n")
            ;;
        *)
            printf "%s" "$1"
            ;;
    esac
}

# Prints the lowercase version of a sequence of strings
lowercase() {
    word="$@"
    for((i=0;i<${#word};i++)); do
        ch="${word:$i:1}"
        lowercaseChar "$ch"
    done
}

# Prints the uppercase version of a char
uppercaseChar(){
    case "$1" in
        [a-z])
            n=$(printf "%d" "'$1")
            n=$((n-32))
            printf \\$(printf "%o" "$n")
            ;;
        *)
            printf "%s" "$1"
            ;;
    esac
}

# Prints the uppercase version of a sequence of strings
uppercase() {
    word="$@"
    for((i=0;i<${#word};i++)); do
        ch="${word:$i:1}"
        uppercaseChar "$ch"
    done
}

# The functions will not add a new line, so use echo or
# append it if you want a new line after printing

# Printing stuff directly
lowercase "I AM the Walrus!"$'\n'
uppercase "I AM the Walrus!"$'\n'

echo "----------"

# Printing a var
str="A StRing WITH mixed sTUFF!"
lowercase "$str"$'\n'
uppercase "$str"$'\n'

echo "----------"

# Not quoting the var should also work, 
# since we use "$@" inside the functions
lowercase $str$'\n'
uppercase $str$'\n'

echo "----------"

# Assigning to a var
myLowerVar="$(lowercase $str)"
myUpperVar="$(uppercase $str)"
echo "myLowerVar: $myLowerVar"
echo "myUpperVar: $myUpperVar"

echo "----------"

# You can even do stuff like
if [[ 'option 2' = "$(lowercase 'OPTION 2')" ]]; then
    echo "Fine! All the same!"
else
    echo "Ops! Not the same!"
fi

exit 0

And the results after running this:

$ ./lowerupper.sh 
i am the walrus!
I AM THE WALRUS!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
myLowerVar: a string with mixed stuff!
myUpperVar: A STRING WITH MIXED STUFF!
----------
Fine! All the same!

But this is mostly "art for the sake of art". Moreover it will not work in borne shell as borne shell does not support ${1:$i:1}. But this solution is portable between all version of bash in use (C-style for loop was introduced in bash 2.0 I think).

Here strings

Here string are a special case of shell assignment (From Wikipedia):

In the following example, text is passed to the tr command (transliterating lower to upper-case) using a here document. This could be in a shell file, or entered interactively at a prompt.
$ tr a-z A-Z << END_TEXT
> one two three
> four five six
> END_TEXT
ONE TWO THREE
FOUR FIVE SIX

END_TEXT was used as the delimiting identifier. It specified the start and end of the here document. The redirect and the delimiting identifier do not need to be separated by a space: <<END_TEXT or << END_TEXT both work equally well.

Appending a minus sign to the << has the effect that leading tabs are ignored. This allows indenting here documents in shell scripts (primarily for alignment with existing indentation) without changing their value:[a]

$ tr a-z A-Z <<- END_TEXT
>       one two three
>       four five six
>       END_TEXT
ONE TWO THREE
FOUR FIVE SIX

This yields the same output, notably not indented.

By default, behavior is largely identical to the contents of double quotes: variables are interpolated, commands in backticks are evaluated, etc.[b]

$ cat << EOF
> \$ Working dir "$PWD" `pwd`
> EOF
$ Working dir "/home/user" /home/user

This can be disabled by quoting any part of the label, which is then ended by the unquoted value;[c] the behavior is essentially identical to that if the contents were enclosed in single quotes. Thus for example by setting it in single quotes:

$ cat << 'EOF'
> \$ Working dir "$PWD" `pwd`
> EOF
\$ Working dir "$PWD" `pwd`

Double quotes may also be used, but this is subject to confusion, because expansion does occur in a double-quoted string, but does not occur in a here document with double-quoted delimiter.[3] Single- and double-quoted delimiters are distinguished in some other languages, notably Perl (see below), where behavior parallels the corresponding string quoting.

Here strings

A here string (available in Bash, ksh, or zsh) is syntactically similar, consisting of <<<, and effects input redirection from a word (a sequence treated as a unit by the shell, in this context generally a string literal). In this case the usual shell syntax is used for the word ("here string syntax"), with the only syntax being the redirection: a here string is an ordinary string used for input redirection, not a special kind of string.

Operation <<< was introduced in bash-2.05b

A single word need not be quoted:

$ tr a-z A-Z <<< one
ONE

In case of a string with spaces, it must be quoted:

$ tr a-z A-Z <<< 'one two three'
ONE TWO THREE

This could also be written as:

$ FOO='one two three'
$ tr a-z A-Z <<< $FOO
ONE TWO THREE

Multiline strings are acceptable, yielding:

$ tr a-z A-Z <<< 'one
> two three'
ONE
TWO THREE

Note that leading and trailing newlines, if present, are included:

$ tr a-z A-Z <<< '
> one
> two three
> '

ONE
TWO THREE

$

The key difference from here documents is that, in here documents, the delimiters are on separate lines; the leading and trailing newlines are stripped. Here, the terminating delimiter can be specified.

Here strings are particularly useful for commands that often take short input, such as the calculator bc:

$ bc <<< 2^10
1024

Note that here string behavior can also be accomplished (reversing the order) via piping and the echo command, as in:

$ echo 'one two three' | tr a-z A-Z
ONE TWO THREE

however here strings are particularly useful when the last command needs to run in the current process, as is the case with the read builtin:

$ echo 'one two three' | read a b c
$ echo $a $b $c

yields nothing, while

$ read a b c <<< 'one two three'
$ echo $a $b $c
one two three

This happens because in the previous example piping causes read to run in a subprocess, and as such can not affect the environment of the parent process.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News :-)

Please visit Heiner Steven's SHELLdorado, the best shell scripting site on the Internet

[May 23, 2021] The Bash String Operators - Kevin Sookocheff

May 23, 2021 | sookocheff.com

The Bash String Operators Posted on December 11, 2014 | 3 minutes | Kevin Sookocheff

A common task in bash programming is to manipulate portions of a string and return the result. bash provides rich support for these manipulations via string operators. The syntax is not always intuitive so I wanted to use this blog post to serve as a permanent reminder of the operators.

The string operators are signified with the ${} notation. The operations can be grouped in to a few classes. Each heading in this article describes a class of operation.

Substring Extraction
Extract from a position
1
${string:position}

Extraction returns a substring of string starting at position and ending at the end of string . string is treated as an array of characters starting at 0.

1
2
3
4
5
> string="hello world"
> echo ${string:1}
ello world
> echo ${string:6}
world

Extract from a position with a length
1
${string:position:length}

Adding a length returns a substring only as long as the length parameter.

1
2
3
4
5
> string="hello world"
> echo ${string:1:2}
el
> echo ${string:6:3}
wor
Substring Removal
Remove shortest starting match
1
${variable#pattern}

If variable starts with pattern , delete the shortest part that matches the pattern.

1
2
3
> string="hello world, hello jim"
> echo ${string#*hello}
world, hello jim

Remove longest starting match
1
${variable##pattern}

If variable starts with pattern , delete the longest match from variable and return the rest.

1
2
3
> string="hello world, hello jim"
> echo ${string##*hello}
jim

Remove shortest ending match
1
${variable%pattern}

If variable ends with pattern , delete the shortest match from the end of variable and return the rest.

1
2
3
> string="hello world, hello jim"
> echo ${string%hello*}
hello world,

Remove longest ending match
1
${variable%%pattern}

If variable ends with pattern , delete the longest match from the end of variable and return the rest.

1
2
3
> string="hello world, hello jim"
> echo ${string%%hello*}

Substring Replacement
Replace first occurrence of word
1
${variable/pattern/string}

Find the first occurrence of pattern in variable and replace it with string . If string is null, pattern is deleted from variable . If pattern starts with # , the match must occur at the beginning of variable . If pattern starts with % , the match must occur at the end of the variable .

1
2
3
> string="hello world, hello jim"
> echo ${string/hello/goodbye}
goodbye world, hello jim

Replace all occurrences of word
1
${variable//pattern/string}

Same as above but finds all occurrences of pattern in variable and replace them with string . If string is null, pattern is deleted from variable .

1
2
3
> string="hello world, hello jim"
> echo ${string//hello/goodbye}
goodbye world, goodbye jim
bash See also

[May 10, 2021] Split a String in Bash

May 10, 2021 | www.xmodulo.com

When you need to split a string in bash, you can use bash's built-in read command. This command reads a single line of string from stdin, and splits the string on a delimiter. The split elements are then stored in either an array or separate variables supplied with the read command. The default delimiter is whitespace characters (' ', '\t', '\r', '\n'). If you want to split a string on a custom delimiter, you can specify the delimiter in IFS variable before calling read .

# strings to split
var1="Harry Samantha Bart   Amy"
var2="green:orange:black:purple"

# split a string by one or more whitespaces, and store the result in an array
read -a my_array <<< $var1

# iterate the array to access individual split words
for elem in "${my_array[@]}"; do
    echo $elem
done

echo "----------"
# split a string by a custom delimter
IFS=':' read -a my_array2 <<< $var2
for elem in "${my_array2[@]}"; do
    echo $elem
done
Harry
Samantha
Bart
Amy
----------
green
orange
black
purple

[May 10, 2021] How to manipulate strings in bash

May 10, 2021 | www.xmodulo.com

Remove a Trailing Newline Character from a String in Bash

If you want to remove a trailing newline or carriage return character from a string, you can use the bash's parameter expansion in the following form.

${string%$var}

This expression implies that if the "string" contains a trailing character stored in "var", the result of the expression will become the "string" without the character. For example:

# input string with a trailing newline character
input_line=$'This is my example line\n'
# define a trailing character.  For carriage return, replace it with $'\r' 
character=$'\n'

echo -e "($input_line)"
# remove a trailing newline character
input_line=${input_line%$character}
echo -e "($input_line)"
(This is my example line
)
(This is my example line)
Trim Leading/Trailing Whitespaces from a String in Bash

If you want to remove whitespaces at the beginning or at the end of a string (also known as leading/trailing whitespaces) from a string, you can use sed command.

my_str="   This is my example string    "

# original string with leading/trailing whitespaces
echo -e "($my_str)"

# trim leading whitespaces in a string
my_str=$(echo "$my_str" | sed -e "s/^[[:space:]]*//")
echo -e "($my_str)"

# trim trailing whitespaces in a string
my_str=$(echo "$my_str" | sed -e "s/[[:space:]]*$//")
echo -e "($my_str)"
(   This is my example string    )
(This is my example string    )      ← leading whitespaces removed
(This is my example string)          ← trailing whitespaces removed

If you want to stick with bash's built-in mechanisms, the following bash function can get the job done.

trim() {
    local var="$*"
    # remove leading whitespace characters
    var="${var#"${var%%[![:space:]]*}"}"
    # remove trailing whitespace characters
    var="${var%"${var##*[![:space:]]}"}"   
    echo "$var"
}

my_str="   This is my example string    "
echo "($my_str)"

my_str=$(trim $my_str)
echo "($my_str)"

[May 10, 2021] String Operators - Learning the bash Shell, Second Edition

May 10, 2021 | www.oreilly.com

Table 4-1. Substitution Operators

Operator Substitution
$ { varname :- word }

If varname exists and isn't null, return its value; otherwise return word .

Purpose :

Returning a default value if the variable is undefined.

Example :

${count:-0} evaluates to 0 if count is undefined.

$ { varname := word }

If varname exists and isn't null, return its value; otherwise set it to word and then return its value. Positional and special parameters cannot be assigned this way.

Purpose :

Setting a variable to a default value if it is undefined.

Example :

$ {count := 0} sets count to 0 if it is undefined.

$ { varname :? message }

If varname exists and isn't null, return its value; otherwise print varname : followed by message , and abort the current command or script (non-interactive shells only). Omitting message produces the default message parameter null or not set .

Purpose :

Catching errors that result from variables being undefined.

Example :

{count :?" undefined! " } prints "count: undefined!" and exits if count is undefined.

$ { varname : + word }

If varname exists and isn't null, return word ; otherwise return null.

Purpose :

Testing for the existence of a variable.

Example :

$ {count :+ 1} returns 1 (which could mean "true") if count is defined.

$ { varname : offset }
$ { varname : offset : length }

Performs substring expansion. a It returns the substring of $ varname starting at offset and up to length characters. The first character in $ varname is position 0. If length is omitted, the substring starts at offset and continues to the end of $ varname . If offset is less than 0 then the position is taken from the end of $ varname . If varname is @ , the length is the number of positional parameters starting at parameter offset .

Purpose :

Returning parts of a string (substrings or slices ).

Example :

If count is set to frogfootman , $ {count :4} returns footman . $ {count :4:4} returns foot .

[ 52 ]

Table 4-2. Pattern-Matching Operators

Operator Meaning
$ { variable # pattern }

If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest.

$ { variable ## pattern }

If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest.

$ { variable % pattern }

If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest.

$ { variable %% pattern }

If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest.

$ { variable / pattern / string }
$ { variable // pattern / string }

The longest match to pattern in variable is replaced by string . In the first form, only the first match is replaced. In the second form, all matches are replaced. If the pattern is begins with a # , it must match at the start of the variable. If it begins with a % , it must match with the end of the variable. If string is null, the matches are deleted. If variable is @ or * , the operation is applied to each positional parameter in turn and the expansion is the resultant list. a

[May 10, 2021] Concatenating Strings with the += Operator

May 10, 2021 | linuxize.com

Another way of concatenating strings in bash is by appending variables or literal strings to a variable using the += operator:

VAR1="Hello, "
VAR1+=" World"
echo "$VAR1"
Copy
Hello, World
Copy

The following example is using the += operator to concatenate strings in bash for loop :

languages.sh
VAR=""
for ELEMENT in 'Hydrogen' 'Helium' 'Lithium' 'Beryllium'; do
  VAR+="${ELEMENT} "
done

echo "$VAR"
Copy
Hydrogen Helium Lithium Beryllium

[May 10, 2021] String Operators (Korn Shell) - Daniel Han's Technical Notes

May 10, 2021 | sites.google.com

4.3 String Operators

The curly-bracket syntax allows for the shell's string operators . String operators allow you to manipulate values of variables in various useful ways without having to write full-blown programs or resort to external UNIX utilities. You can do a lot with string-handling operators even if you haven't yet mastered the programming features we'll see in later chapters.

In particular, string operators let you do the following:

4.3.1 Syntax of String Operators

The basic idea behind the syntax of string operators is that special characters that denote operations are inserted between the variable's name and the right curly brackets. Any argument that the operator may need is inserted to the operator's right.

The first group of string-handling operators tests for the existence of variables and allows substitutions of default values under certain conditions. These are listed in Table 4.1 . [6]

[6] The colon ( : ) in each of these operators is actually optional. If the colon is omitted, then change "exists and isn't null" to "exists" in each definition, i.e., the operator tests for existence only.

Table 4.1: Substitution Operators
Operator Substitution
${ varname :- word } If varname exists and isn't null, return its value; otherwise return word .
Purpose : Returning a default value if the variable is undefined.
Example : ${count:-0} evaluates to 0 if count is undefined.
${ varname := word} If varname exists and isn't null, return its value; otherwise set it to word and then return its value.[7]
Purpose : Setting a variable to a default value if it is undefined.
Example : $ {count:=0} sets count to 0 if it is undefined.
${ varname :? message } If varname exists and isn't null, return its value; otherwise print varname : followed by message , and abort the current command or script. Omitting message produces the default message parameter null or not set .
Purpose : Catching errors that result from variables being undefined.
Example : {count :?" undefined! " } prints "count: undefined!" and exits if count is undefined.
${ varname :+ word } If varname exists and isn't null, return word ; otherwise return null.
Purpose : Testing for the existence of a variable.
Example : ${count:+1} returns 1 (which could mean "true") if count is defined.

[7] Pascal, Modula, and Ada programmers may find it helpful to recognize the similarity of this to the assignment operators in those languages.

The first two of these operators are ideal for setting defaults for command-line arguments in case the user omits them. We'll use the first one in our first programming task.

Task 4.1

You have a large album collection, and you want to write some software to keep track of it. Assume that you have a file of data on how many albums you have by each artist. Lines in the file look like this:

14 Bach, J.S.
1       Balachander, S.
21      Beatles
6       Blakey, Art

Write a program that prints the N highest lines, i.e., the N artists by whom you have the most albums. The default for N should be 10. The program should take one argument for the name of the input file and an optional second argument for how many lines to print.

By far the best approach to this type of script is to use built-in UNIX utilities, combining them with I/O redirectors and pipes. This is the classic "building-block" philosophy of UNIX that is another reason for its great popularity with programmers. The building-block technique lets us write a first version of the script that is only one line long:

sort -nr $1 | head -${2:-10}

Here is how this works: the sort (1) program sorts the data in the file whose name is given as the first argument ( $1 ). The -n option tells sort to interpret the first word on each line as a number (instead of as a character string); the -r tells it to reverse the comparisons, so as to sort in descending order.

The output of sort is piped into the head (1) utility, which, when given the argument - N , prints the first N lines of its input on the standard output. The expression -${2:-10} evaluates to a dash ( - ) followed by the second argument if it is given, or to -10 if it's not; notice that the variable in this expression is 2 , which is the second positional parameter.

Assume the script we want to write is called highest . Then if the user types highest myfile , the line that actually runs is:

sort -nr myfile | head -10

Or if the user types highest myfile 22 , the line that runs is:

sort -nr myfile | head -22

Make sure you understand how the :- string operator provides a default value.

This is a perfectly good, runnable script-but it has a few problems. First, its one line is a bit cryptic. While this isn't much of a problem for such a tiny script, it's not wise to write long, elaborate scripts in this manner. A few minor changes will make the code more readable.

First, we can add comments to the code; anything between # and the end of a line is a comment. At a minimum, the script should start with a few comment lines that indicate what the script does and what arguments it accepts. Second, we can improve the variable names by assigning the values of the positional parameters to regular variables with mnemonic names. Finally, we can add blank lines to space things out; blank lines, like comments, are ignored. Here is a more readable version:

#
#       highest filename [howmany]
#
#       Print howmany highest-numbered lines in file filename.
#       The input file is assumed to have lines that start with
#       numbers.  Default for howmany is 10.
#

filename=$1

howmany=${2:-10}
sort -nr $filename | head -$howmany

The square brackets around howmany in the comments adhere to the convention in UNIX documentation that square brackets denote optional arguments.

The changes we just made improve the code's readability but not how it runs. What if the user were to invoke the script without any arguments? Remember that positional parameters default to null if they aren't defined. If there are no arguments, then $1 and $2 are both null. The variable howmany ( $2 ) is set up to default to 10, but there is no default for filename ( $1 ). The result would be that this command runs:

sort -nr | head -10

As it happens, if sort is called without a filename argument, it expects input to come from standard input, e.g., a pipe (|) or a user's terminal. Since it doesn't have the pipe, it will expect the terminal. This means that the script will appear to hang! Although you could always type [CTRL-D] or [CTRL-C] to get out of the script, a naive user might not know this.

Therefore we need to make sure that the user supplies at least one argument. There are a few ways of doing this; one of them involves another string operator. We'll replace the line:

filename=$1

with:

filename=${1:?"filename missing."}

This will cause two things to happen if a user invokes the script without any arguments: first the shell will print the somewhat unfortunate message:

highest: 1: filename missing.

to the standard error output. Second, the script will exit without running the remaining code.

With a somewhat "kludgy" modification, we can get a slightly better error message. Consider this code:

filename=$1
filename=${filename:?"missing."}

This results in the message:

highest: filename: missing.

(Make sure you understand why.) Of course, there are ways of printing whatever message is desired; we'll find out how in Chapter 5 .

Before we move on, we'll look more closely at the two remaining operators in Table 4.1 and see how we can incorporate them into our task solution. The := operator does roughly the same thing as :- , except that it has the "side effect" of setting the value of the variable to the given word if the variable doesn't exist.

Therefore we would like to use := in our script in place of :- , but we can't; we'd be trying to set the value of a positional parameter, which is not allowed. But if we replaced:

howmany=${2:-10}

with just:

howmany=$2

and moved the substitution down to the actual command line (as we did at the start), then we could use the := operator:

sort -nr $filename | head -${howmany:=10}

Using := has the added benefit of setting the value of howmany to 10 in case we need it afterwards in later versions of the script.

The final substitution operator is :+ . Here is how we can use it in our example: Let's say we want to give the user the option of adding a header line to the script's output. If he or she types the option -h , then the output will be preceded by the line:

ALBUMS  ARTIST

Assume further that this option ends up in the variable header , i.e., $header is -h if the option is set or null if not. (Later we will see how to do this without disturbing the other positional parameters.)

The expression:

${header:+"ALBUMS  ARTIST\n"}

yields null if the variable header is null, or ALBUMS══ARTIST\n if it is non-null. This means that we can put the line:

print -n ${header:+"ALBUMS  ARTIST\n"}

right before the command line that does the actual work. The -n option to print causes it not to print a LINEFEED after printing its arguments. Therefore this print statement will print nothing-not even a blank line-if header is null; otherwise it will print the header line and a LINEFEED (\n).

4.3.2 Patterns and Regular Expressions

We'll continue refining our solution to Task 4-1 later in this chapter. The next type of string operator is used to match portions of a variable's string value against patterns . Patterns, as we saw in Chapter 1 are strings that can contain wildcard characters ( * , ? , and [] for character sets and ranges).

Wildcards have been standard features of all UNIX shells going back (at least) to the Version 6 Bourne shell. But the Korn shell is the first shell to add to their capabilities. It adds a set of operators, called regular expression (or regexp for short) operators, that give it much of the string-matching power of advanced UNIX utilities like awk (1), egrep (1) (extended grep (1)) and the emacs editor, albeit with a different syntax. These capabilities go beyond those that you may be used to in other UNIX utilities like grep , sed (1) and vi (1).

Advanced UNIX users will find the Korn shell's regular expression capabilities occasionally useful for script writing, although they border on overkill. (Part of the problem is the inevitable syntactic clash with the shell's myriad other special characters.) Therefore we won't go into great detail about regular expressions here. For more comprehensive information, the "last word" on practical regular expressions in UNIX is sed & awk , an O'Reilly Nutshell Handbook by Dale Dougherty. If you are already comfortable with awk or egrep , you may want to skip the following introductory section and go to "Korn Shell Versus awk/egrep Regular Expressions" below, where we explain the shell's regular expression mechanism by comparing it with the syntax used in those two utilities. Otherwise, read on.

4.3.2.1 Regular expression basics

Think of regular expressions as strings that match patterns more powerfully than the standard shell wildcard schema. Regular expressions began as an idea in theoretical computer science, but they have found their way into many nooks and crannies of everyday, practical computing. The syntax used to represent them may vary, but the concepts are very much the same.

A shell regular expression can contain regular characters, standard wildcard characters, and additional operators that are more powerful than wildcards. Each such operator has the form x ( exp ) , where x is the particular operator and exp is any regular expression (often simply a regular string). The operator determines how many occurrences of exp a string that matches the pattern can contain. See Table 4.2 and Table 4.3 .

Table 4.2: Regular Expression Operators
Operator Meaning
* ( exp ) 0 or more occurrences of exp
+ ( exp ) 1 or more occurrences of exp
? ( exp ) 0 or 1 occurrences of exp
@ ( exp1 | exp2 |...) exp1 or exp2 or...
! ( exp ) Anything that doesn't match exp [8]

[8] Actually, !( exp ) is not a regular expression operator by the standard technical definition, though it is a handy extension.

Table 4.3: Regular Expression Operator Examples
Expression Matches
x x
* ( x ) Null string, x , xx , xxx , ...
+ ( x ) x , xx , xxx , ...
? ( x ) Null string, x
! ( x ) Any string except x
@ ( x ) x (see below)

Regular expressions are extremely useful when dealing with arbitrary text, as you already know if you have used grep or the regular-expression capabilities of any UNIX editor. They aren't nearly as useful for matching filenames and other simple types of information with which shell users typically work. Furthermore, most things you can do with the shell's regular expression operators can also be done (though possibly with more keystrokes and less efficiency) by piping the output of a shell command through grep or egrep .

Nevertheless, here are a few examples of how shell regular expressions can solve filename-listing problems. Some of these will come in handy in later chapters as pieces of solutions to larger tasks.

  1. The emacs editor supports customization files whose names end in .el (for Emacs LISP) or .elc (for Emacs LISP Compiled). List all emacs customization files in the current directory.
  2. In a directory of C source code, list all files that are not necessary. Assume that "necessary" files end in .c or .h , or are named Makefile or README .
  3. Filenames in the VAX/VMS operating system end in a semicolon followed by a version number, e.g., fred.bob;23 . List all VAX/VMS-style filenames in the current directory.

Here are the solutions:

  1. In the first of these, we are looking for files that end in .el with an optional c . The expression that matches this is * .el ? (c) .
  2. The second example depends on the four standard subexpressions * .c , * .h , Makefile , and README . The entire expression is !( * .c| * .h|Makefile|README) , which matches anything that does not match any of the four possibilities.
  3. The solution to the third example starts with * \ ; : the shell wildcard * followed by a backslash-escaped semicolon. Then, we could use the regular expression +([0-9]) , which matches one or more characters in the range [0-9] , i.e., one or more digits. This is almost correct (and probably close enough), but it doesn't take into account that the first digit cannot be 0. Therefore the correct expression is * \;[1-9] * ([0-9]) , which matches anything that ends with a semicolon, a digit from 1 to 9, and zero or more digits from 0 to 9.

Regular expression operators are an interesting addition to the Korn shell's features, but you can get along well without them-even if you intend to do a substantial amount of shell programming.

In our opinion, the shell's authors missed an opportunity to build into the wildcard mechanism the ability to match files by type (regular, directory, executable, etc., as in some of the conditional tests we will see in Chapter 5 ) as well as by name component. We feel that shell programmers would have found this more useful than arcane regular expression operators.

The following section compares Korn shell regular expressions to analogous features in awk and egrep . If you aren't familiar with these, skip to the section entitled "Pattern-matching Operators."

4.3.2.2 Korn shell versus awk/egrep regular expressions

Table 4.4 is an expansion of Table 4.2 : the middle column shows the equivalents in awk / egrep of the shell's regular expression operators.

Table 4.4: Shell Versus egrep/awk Regular Expression Operators
Korn Shell egrep/awk Meaning
* ( exp ) exp * 0 or more occurrences of exp
+( exp ) exp + 1 or more occurrences of exp
? ( exp ) exp ? 0 or 1 occurrences of exp
@( exp1 | exp2 |...) exp1 | exp2 |... exp1 or exp2 or...
! ( exp ) (none) Anything that doesn't match exp

These equivalents are close but not quite exact. Actually, an exp within any of the Korn shell operators can be a series of exp1 | exp2 |... alternates. But because the shell would interpret an expression like dave|fred|bob as a pipeline of commands, you must use @(dave|fred|bob) for alternates by themselves.

For example:

It is worth re-emphasizing that shell regular expressions can still contain standard shell wildcards. Thus, the shell wildcard ? (match any single character) is the equivalent to . in egrep or awk , and the shell's character set operator [ ... ] is the same as in those utilities. [9] For example, the expression +([0-9]) matches a number, i.e., one or more digits. The shell wildcard character * is equivalent to the shell regular expression * ( ?) .

[9] And, for that matter, the same as in grep , sed , ed , vi , etc.

A few egrep and awk regexp operators do not have equivalents in the Korn shell. These include:

The first two pairs are hardly necessary, since the Korn shell doesn't normally operate on text files and does parse strings into words itself.

4.3.3 Pattern-matching Operators

Table 4.5 lists the Korn shell's pattern-matching operators.

Table 4.5: Pattern-matching Operators
Operator Meaning
$ { variable # pattern } If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest.
$ { variable ## pattern } If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest.
$ { variable % pattern } If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest.
$ { variable %% pattern } If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest.

These can be hard to remember, so here's a handy mnemonic device: # matches the front because number signs precede numbers; % matches the rear because percent signs follow numbers.

The classic use for pattern-matching operators is in stripping off components of pathnames, such as directory prefixes and filename suffixes. With that in mind, here is an example that shows how all of the operators work. Assume that the variable path has the value /home /billr/mem/long.file.name ; then:

Expression                   Result
${path##/*/}                       long.file.name
${path#/*/}              billr/mem/long.file.name
$path              /home/billr/mem/long.file.name
${path%.*}         /home/billr/mem/long.file
${path%%.*}        /home/billr/mem/long

The two patterns used here are /*/ , which matches anything between two slashes, and . * , which matches a dot followed by anything.

We will incorporate one of these operators into our next programming task.

Task 4.2

You are writing a C compiler, and you want to use the Korn shell for your front-end.[10]

[10] Don't laugh-many UNIX compilers have shell scripts as front-ends.

Think of a C compiler as a pipeline of data processing components. C source code is input to the beginning of the pipeline, and object code comes out of the end; there are several steps in between. The shell script's task, among many other things, is to control the flow of data through the components and to designate output files.

You need to write the part of the script that takes the name of the input C source file and creates from it the name of the output object code file. That is, you must take a filename ending in .c and create a filename that is similar except that it ends in .o .

The task at hand is to strip the .c off the filename and append .o . A single shell statement will do it:

objname=${filename%.c}.o

This tells the shell to look at the end of filename for .c . If there is a match, return $filename with the match deleted. So if filename had the value fred.c , the expression ${filename%.c} would return fred . The .o is appended to make the desired fred.o , which is stored in the variable objname .

If filename had an inappropriate value (without .c ) such as fred.a , the above expression would evaluate to fred.a.o : since there was no match, nothing is deleted from the value of filename , and .o is appended anyway. And, if filename contained more than one dot-e.g., if it were the y.tab.c that is so infamous among compiler writers-the expression would still produce the desired y.tab.o . Notice that this would not be true if we used %% in the expression instead of % . The former operator uses the longest match instead of the shortest, so it would match .tab.o and evaluate to y.o rather than y.tab.o . So the single % is correct in this case.

A longest-match deletion would be preferable, however, in the following task.

Task 4.3

You are implementing a filter that prepares a text file for printer output. You want to put the file's name-without any directory prefix-on the "banner" page. Assume that, in your script, you have the pathname of the file to be printed stored in the variable pathname .

Clearly the objective is to remove the directory prefix from the pathname. The following line will do it:

bannername=${pathname##*/}

This solution is similar to the first line in the examples shown before. If pathname were just a filename, the pattern * / (anything followed by a slash) would not match and the value of the expression would be pathname untouched. If pathname were something like fred/bob , the prefix fred/ would match the pattern and be deleted, leaving just bob as the expression's value. The same thing would happen if pathname were something like /dave/pete/fred/bob : since the ## deletes the longest match, it deletes the entire /dave/pete/fred/ .

If we used # * / instead of ## * / , the expression would have the incorrect value dave/pete/fred/bob , because the shortest instance of "anything followed by a slash" at the beginning of the string is just a slash ( / ).

The construct $ { variable ## * /} is actually equivalent to the UNIX utility basename (1). basename takes a pathname as argument and returns the filename only; it is meant to be used with the shell's command substitution mechanism (see below). basename is less efficient than $ { variable ##/ * } because it runs in its own separate process rather than within the shell. Another utility, dirname (1), does essentially the opposite of basename : it returns the directory prefix only. It is equivalent to the Korn shell expression $ { variable %/ * } and is less efficient for the same reason.

4.3.4 Length Operator

There are two remaining operators on variables. One is $ {# varname }, which returns the length of the value of the variable as a character string. (In Chapter 6 we will see how to treat this and similar values as actual numbers so they can be used in arithmetic expressions.) For example, if filename has the value fred.c , then ${#filename} would have the value 6 . The other operator ( $ {# array [ * ]} ) has to do with array variables, which are also discussed in Chapter 6 .

http://docstore.mik.ua/orelly/unix2.1/ksh/ch04_03.htm

[Sep 11, 2019] string - Extract substring in Bash - Stack Overflow

Sep 11, 2019 | stackoverflow.com

Jeff ,May 8 at 18:30

Given a filename in the form someletters_12345_moreleters.ext , I want to extract the 5 digits and put them into a variable.

So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.

I am very interested in the number of different ways that this can be accomplished.

Berek Bryan ,Jan 24, 2017 at 9:30

Use cut :
echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2

More generic:

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

JB. ,Jan 6, 2015 at 10:13

If x is constant, the following parameter expansion performs substring extraction:
b=${a:12:5}

where 12 is the offset (zero-based) and 5 is the length

If the underscores around the digits are the only ones in the input, you can strip off the prefix and suffix (respectively) in two steps:

tmp=${a#*_}   # remove prefix ending in "_"
b=${tmp%_*}   # remove suffix starting with "_"

If there are other underscores, it's probably feasible anyway, albeit more tricky. If anyone knows how to perform both expansions in a single expression, I'd like to know too.

Both solutions presented are pure bash, with no process spawning involved, hence very fast.

A Sahra ,Mar 16, 2017 at 6:27

Generic solution where the number can be anywhere in the filename, using the first of such sequences:
number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)

Another solution to extract exactly a part of a variable:

number=${filename:offset:length}

If your filename always have the format stuff_digits_... you can use awk:

number=$(echo $filename | awk -F _ '{ print $2 }')

Yet another solution to remove everything except digits, use

number=$(echo $filename | tr -cd '[[:digit:]]')

sshow ,Jul 27, 2017 at 17:22

In case someone wants more rigorous information, you can also search it in man bash like this
$ man bash [press return key]
/substring  [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]

Result:

${parameter:offset}
       ${parameter:offset:length}
              Substring Expansion.  Expands to  up  to  length  characters  of
              parameter  starting  at  the  character specified by offset.  If
              length is omitted, expands to the substring of parameter  start‐
              ing at the character specified by offset.  length and offset are
              arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
              offset  evaluates  to a number less than zero, the value is used
              as an offset from the end of the value of parameter.  Arithmetic
              expressions  starting  with  a - must be separated by whitespace
              from the preceding : to be distinguished from  the  Use  Default
              Values  expansion.   If  length  evaluates to a number less than
              zero, and parameter is not @ and not an indexed  or  associative
              array,  it is interpreted as an offset from the end of the value
              of parameter rather than a number of characters, and the  expan‐
              sion is the characters between the two offsets.  If parameter is
              @, the result is length positional parameters beginning at  off‐
              set.   If parameter is an indexed array name subscripted by @ or
              *, the result is the length members of the array beginning  with
              ${parameter[offset]}.   A  negative  offset is taken relative to
              one greater than the maximum index of the specified array.  Sub‐
              string  expansion applied to an associative array produces unde‐
              fined results.  Note that a negative offset  must  be  separated
              from  the  colon  by  at least one space to avoid being confused
              with the :- expansion.  Substring indexing is zero-based  unless
              the  positional  parameters are used, in which case the indexing
              starts at 1 by default.  If offset  is  0,  and  the  positional
              parameters are used, $0 is prefixed to the list.

Aleksandr Levchuk ,Aug 29, 2011 at 5:51

Building on jor's answer (which doesn't work for me):
substring=$(expr "$filename" : '.*_\([^_]*\)_.*')

kayn ,Oct 5, 2015 at 8:48

I'm surprised this pure bash solution didn't come up:
a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345

You probably want to reset IFS to what value it was before, or unset IFS afterwards!

zebediah49 ,Jun 4 at 17:31

Here's how i'd do it:
FN=someletters_12345_moreleters.ext
[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}

Note: the above is a regular expression and is restricted to your specific scenario of five digits surrounded by underscores. Change the regular expression if you need different matching.

TranslucentCloud ,Jun 16, 2014 at 13:27

Following the requirements

I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.

I found some grep ways that may be useful:

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+" 
12345

or better

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}" 
12345

And then with -Po syntax:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+' 
12345

Or if you want to make it fit exactly 5 characters:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}' 
12345

Finally, to make it be stored in a variable it is just need to use the var=$(command) syntax.

Darron ,Jan 9, 2009 at 16:13

Without any sub-processes you can:
shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}

A very small variant of this will also work in ksh93.

user2350426

add a comment ,Aug 5, 2014 at 8:11
If we focus in the concept of:
"A run of (one or several) digits"

We could use several external tools to extract the numbers.
We could quite easily erase all other characters, either sed or tr:

name='someletters_12345_moreleters.ext'

echo $name | sed 's/[^0-9]*//g'    # 12345
echo $name | tr -c -d 0-9          # 12345

But if $name contains several runs of numbers, the above will fail:

If "name=someletters_12345_moreleters_323_end.ext", then:

echo $name | sed 's/[^0-9]*//g'    # 12345323
echo $name | tr -c -d 0-9          # 12345323

We need to use regular expresions (regex).
To select only the first run (12345 not 323) in sed and perl:

echo $name | sed 's/[^0-9]*\([0-9]\{1,\}\).*$/\1/'
perl -e 'my $name='$name';my ($num)=$name=~/(\d+)/;print "$num\n";'

But we could as well do it directly in bash (1) :

regex=[^0-9]*([0-9]{1,}).*$; \
[[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}

This allows us to extract the FIRST run of digits of any length
surrounded by any other text/characters.

Note : regex=[^0-9]*([0-9]{5,5}).*$; will match only exactly 5 digit runs. :-)

(1) : faster than calling an external tool for each short texts. Not faster than doing all processing inside sed or awk for large files.

codist ,May 6, 2011 at 12:50

Here's a prefix-suffix solution (similar to the solutions given by JB and Darron) that matches the first block of digits and does not depend on the surrounding underscores:
str='someletters_12345_morele34ters.ext'
s1="${str#"${str%%[[:digit:]]*}"}"   # strip off non-digit prefix from str
s2="${s1%%[^[:digit:]]*}"            # strip off non-digit suffix from s1
echo "$s2"                           # 12345

Campa ,Oct 21, 2016 at 8:12

I love sed 's capability to deal with regex groups:
> var="someletters_12345_moreletters.ext"
> digits=$( echo $var | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345

A slightly more general option would be not to assume that you have an underscore _ marking the start of your digits sequence, hence for instance stripping off all non-numbers you get before your sequence: s/[^0-9]\+\([0-9]\+\).*/\1/p .


> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
    refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

More on this, in case you're not too confident with regexps:

All escapes \ are there to make sed 's regexp processing work.

Dan Dascalescu ,May 8 at 18:28

Given test.txt is a file containing "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST" 
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST

Alex Raj Kaliamoorthy ,Jul 29, 2016 at 7:41

My answer will have more control on what you want out of your string. Here is the code on how you can extract 12345 out of your string
str="someletters_12345_moreleters.ext"
str=${str#*_}
str=${str%_more*}
echo $str

This will be more efficient if you want to extract something that has any chars like abc or any special characters like _ or - . For example: If your string is like this and you want everything that is after someletters_ and before _moreleters.ext :

str="someletters_123-45-24a&13b-1_moreleters.ext"

With my code you can mention what exactly you want. Explanation:

#* It will remove the preceding string including the matching key. Here the key we mentioned is _ % It will remove the following string including the matching key. Here the key we mentioned is '_more*'

Do some experiments yourself and you would find this interesting.

Dan Dascalescu ,May 8 at 18:27

similar to substr('abcdefg', 2-1, 3) in php:
echo 'abcdefg'|tail -c +2|head -c 3

olibre ,Nov 25, 2015 at 14:50

Ok, here goes pure Parameter Substitution with an empty string. Caveat is that I have defined someletters and moreletters as only characters. If they are alphanumeric, this will not work as it is.
filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345

gniourf_gniourf ,Jun 4 at 17:33

There's also the bash builtin 'expr' command:
INPUT="someletters_12345_moreleters.ext"  
SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' `  
echo $SUBSTRING

russell ,Aug 1, 2013 at 8:12

A little late, but I just ran across this problem and found the following:
host:/tmp$ asd=someletters_12345_moreleters.ext 
host:/tmp$ echo `expr $asd : '.*_\(.*\)_'`
12345
host:/tmp$

I used it to get millisecond resolution on an embedded system that does not have %N for date:

set `grep "now at" /proc/timer_list`
nano=$3
fraction=`expr $nano : '.*\(...\)......'`
$debug nano is $nano, fraction is $fraction

> ,Aug 5, 2018 at 17:13

A bash solution:
IFS="_" read -r x digs x <<<'someletters_12345_moreleters.ext'

This will clobber a variable called x . The var x could be changed to the var _ .

input='someletters_12345_moreleters.ext'
IFS="_" read -r _ digs _ <<<"$input"

[Sep 08, 2019] How to replace spaces in file names using a bash script

Sep 08, 2019 | stackoverflow.com

Ask Question Asked 9 years, 4 months ago Active 2 months ago Viewed 226k times 238 127


Mark Byers ,Apr 25, 2010 at 19:20

Can anyone recommend a safe solution to recursively replace spaces with underscores in file and directory names starting from a given root directory? For example:
$ tree
.
|-- a dir
|   `-- file with spaces.txt
`-- b dir
    |-- another file with spaces.txt
    `-- yet another file with spaces.pdf

becomes:

$ tree
.
|-- a_dir
|   `-- file_with_spaces.txt
`-- b_dir
    |-- another_file_with_spaces.txt
    `-- yet_another_file_with_spaces.pdf

Jürgen Hötzel ,Nov 4, 2015 at 3:03

Use rename (aka prename ) which is a Perl script which may be on your system already. Do it in two steps:
find -name "* *" -type d | rename 's/ /_/g'    # do the directories first
find -name "* *" -type f | rename 's/ /_/g'

Based on Jürgen's answer and able to handle multiple layers of files and directories in a single bound using the "Revision 1.5 1998/12/18 16:16:31 rmb1" version of /usr/bin/rename (a Perl script):

find /tmp/ -depth -name "* *" -execdir rename 's/ /_/g' "{}" \;

oevna ,Jan 1, 2016 at 8:25

I use:
for f in *\ *; do mv "$f" "${f// /_}"; done

Though it's not recursive, it's quite fast and simple. I'm sure someone here could update it to be recursive.

The ${f// /_} part utilizes bash's parameter expansion mechanism to replace a pattern within a parameter with supplied string. The relevant syntax is ${parameter/pattern/string} . See: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html or http://wiki.bash-hackers.org/syntax/pe .

armandino ,Dec 3, 2013 at 20:51

find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done

failed to get it right at first, because I didn't think of directories.

Edmund Elmer ,Jul 3 at 7:12

you can use detox by Doug Harple
detox -r <folder>

Dennis Williamson ,Mar 22, 2012 at 20:33

A find/rename solution. rename is part of util-linux.

You need to descend depth first, because a whitespace filename can be part of a whitespace directory:

find /tmp/ -depth -name "* *" -execdir rename " " "_" "{}" ";"

armandino ,Apr 26, 2010 at 11:49

bash 4.0
#!/bin/bash
shopt -s globstar
for file in **/*\ *
do 
    mv "$file" "${file// /_}"       
done

Itamar ,Jan 31, 2013 at 21:27

you can use this:
    find . -name '* *' | while read fname 

do
        new_fname=`echo $fname | tr " " "_"`

        if [ -e $new_fname ]
        then
                echo "File $new_fname already exists. Not replacing $fname"
        else
                echo "Creating new file $new_fname to replace $fname"
                mv "$fname" $new_fname
        fi
done

yabt ,Apr 26, 2010 at 14:54

Here's a (quite verbose) find -exec solution which writes "file already exists" warnings to stderr:
function trspace() {
   declare dir name bname dname newname replace_char
   [ $# -lt 1 -o $# -gt 2 ] && { echo "usage: trspace dir char"; return 1; }
   dir="${1}"
   replace_char="${2:-_}"
   find "${dir}" -xdev -depth -name $'*[ \t\r\n\v\f]*' -exec bash -c '
      for ((i=1; i<=$#; i++)); do
         name="${@:i:1}"
         dname="${name%/*}"
         bname="${name##*/}"
         newname="${dname}/${bname//[[:space:]]/${0}}"
         if [[ -e "${newname}" ]]; then
            echo "Warning: file already exists: ${newname}" 1>&2
         else
            mv "${name}" "${newname}"
         fi
      done
  ' "${replace_char}" '{}' +
}

trspace rootdir _

degi ,Aug 8, 2011 at 9:10

This one does a little bit more. I use it to rename my downloaded torrents (no special characters (non-ASCII), spaces, multiple dots, etc.).
#!/usr/bin/perl

&rena(`find . -type d`);
&rena(`find . -type f`);

sub rena
{
    ($elems)=@_;
    @t=split /\n/,$elems;

    for $e (@t)
    {
    $_=$e;
    # remove ./ of find
    s/^\.\///;
    # non ascii transliterate
    tr [\200-\377][_];
    tr [\000-\40][_];
    # special characters we do not want in paths
    s/[ \-\,\;\?\+\'\"\!\[\]\(\)\@\#]/_/g;
    # multiple dots except for extension
    while (/\..*\./)
    {
        s/\./_/;
    }
    # only one _ consecutive
    s/_+/_/g;
    next if ($_ eq $e ) or ("./$_" eq $e);
    print "$e -> $_\n";
    rename ($e,$_);
    }
}

Junyeop Lee ,Apr 10, 2018 at 9:44

Recursive version of Naidim's Answers.
find . -name "* *" | awk '{ print length, $0 }' | sort -nr -s | cut -d" " -f2- | while read f; do base=$(basename "$f"); newbase="${base// /_}"; mv "$(dirname "$f")/$(basename "$f")" "$(dirname "$f")/$newbase"; done

ghoti ,Dec 5, 2016 at 21:16

I found around this script, it may be interesting :)
 IFS=$'\n';for f in `find .`; do file=$(echo $f | tr [:blank:] '_'); [ -e $f ] && [ ! -e $file ] && mv "$f" $file;done;unset IFS

ghoti ,Dec 5, 2016 at 21:17

Here's a reasonably sized bash script solution
#!/bin/bash
(
IFS=$'\n'
    for y in $(ls $1)
      do
         mv $1/`echo $y | sed 's/ /\\ /g'` $1/`echo "$y" | sed 's/ /_/g'`
      done
)

user1060059 ,Nov 22, 2011 at 15:15

This only finds files inside the current directory and renames them . I have this aliased.

find ./ -name "* *" -type f -d 1 | perl -ple '$file = $_; $file =~ s/\s+/_/g; rename($_, $file);

Hongtao ,Sep 26, 2014 at 19:30

I just make one for my own purpose. You may can use it as reference.
#!/bin/bash
cd /vzwhome/c0cheh1/dev_source/UB_14_8
for file in *
do
    echo $file
    cd "/vzwhome/c0cheh1/dev_source/UB_14_8/$file/Configuration/$file"
    echo "==> `pwd`"
    for subfile in *\ *; do [ -d "$subfile" ] && ( mv "$subfile" "$(echo $subfile | sed -e 's/ /_/g')" ); done
    ls
    cd /vzwhome/c0cheh1/dev_source/UB_14_8
done

Marcos Jean Sampaio ,Dec 5, 2016 at 20:56

For files in folder named /files
for i in `IFS="";find /files -name *\ *`
do
   echo $i
done > /tmp/list


while read line
do
   mv "$line" `echo $line | sed 's/ /_/g'`
done < /tmp/list

rm /tmp/list

Muhammad Annaqeeb ,Sep 4, 2017 at 11:03

For those struggling through this using macOS, first install all the tools:
 brew install tree findutils rename

Then when needed to rename, make an alias for GNU find (gfind) as find. Then run the code of @Michel Krelin:

alias find=gfind 
find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done

[Mar 25, 2019] Concatenating Strings with the += Operator

Mar 25, 2019 | linuxize.com

https://acdn.adnxs.com/ib/static/usersync/v3/async_usersync.html

https://bh.contextweb.com/visitormatch

Concatenating Strings with the += Operator

Another way of concatenating strings in bash is by appending variables or literal strings to a variable using the += operator:

VAR1="Hello, "
VAR1+=" World"
echo "$VAR1"
Hello, World

The following example is using the += operator to concatenate strings in bash for loop :

languages.sh
VAR=""
for ELEMENT in 'Hydrogen' 'Helium' 'Lithium' 'Beryllium'; do
  VAR+="${ELEMENT} "
done

echo "$VAR"

[Jan 29, 2019] Split string into an array in Bash

May 14, 2012 | stackoverflow.com

Lgn ,May 14, 2012 at 15:15

In a Bash script I would like to split a line into pieces and store them in an array.

The line:

Paris, France, Europe

I would like to have them in an array like this:

array[0] = Paris
array[1] = France
array[2] = Europe

I would like to use simple code, the command's speed doesn't matter. How can I do it?

antak ,Jun 18, 2018 at 9:22

This is #1 Google hit but there's controversy in the answer because the question unfortunately asks about delimiting on , (comma-space) and not a single character such as comma. If you're only interested in the latter, answers here are easier to follow: stackoverflow.com/questions/918886/antak Jun 18 '18 at 9:22

Dennis Williamson ,May 14, 2012 at 15:16

IFS=', ' read -r -a array <<< "$string"

Note that the characters in $IFS are treated individually as separators so that in this case fields may be separated by either a comma or a space rather than the sequence of the two characters. Interestingly though, empty fields aren't created when comma-space appears in the input because the space is treated specially.

To access an individual element:

echo "${array[0]}"

To iterate over the elements:

for element in "${array[@]}"
do
    echo "$element"
done

To get both the index and the value:

for index in "${!array[@]}"
do
    echo "$index ${array[index]}"
done

The last example is useful because Bash arrays are sparse. In other words, you can delete an element or add an element and then the indices are not contiguous.

unset "array[1]"
array[42]=Earth

To get the number of elements in an array:

echo "${#array[@]}"

As mentioned above, arrays can be sparse so you shouldn't use the length to get the last element. Here's how you can in Bash 4.2 and later:

echo "${array[-1]}"

in any version of Bash (from somewhere after 2.05b):

echo "${array[@]: -1:1}"

Larger negative offsets select farther from the end of the array. Note the space before the minus sign in the older form. It is required.

l0b0 ,May 14, 2012 at 15:24

Just use IFS=', ' , then you don't have to remove the spaces separately. Test: IFS=', ' read -a array <<< "Paris, France, Europe"; echo "${array[@]}"l0b0 May 14 '12 at 15:24

Dennis Williamson ,May 14, 2012 at 16:33

@l0b0: Thanks. I don't know what I was thinking. I like to use declare -p array for test output, by the way. – Dennis Williamson May 14 '12 at 16:33

Nathan Hyde ,Mar 16, 2013 at 21:09

@Dennis Williamson - Awesome, thorough answer. – Nathan Hyde Mar 16 '13 at 21:09

dsummersl ,Aug 9, 2013 at 14:06

MUCH better than multiple cut -f calls! – dsummersl Aug 9 '13 at 14:06

caesarsol ,Oct 29, 2015 at 14:45

Warning: the IFS variable means split by one of these characters , so it's not a sequence of chars to split by. IFS=', ' read -a array <<< "a,d r s,w" => ${array[*]} == a d r s wcaesarsol Oct 29 '15 at 14:45

Jim Ho ,Mar 14, 2013 at 2:20

Here is a way without setting IFS:
string="1:2:3:4:5"
set -f                      # avoid globbing (expansion of *).
array=(${string//:/ })
for i in "${!array[@]}"
do
    echo "$i=>${array[i]}"
done

The idea is using string replacement:

${string//substring/replacement}

to replace all matches of $substring with white space and then using the substituted string to initialize a array:

(element1 element2 ... elementN)

Note: this answer makes use of the split+glob operator . Thus, to prevent expansion of some characters (such as * ) it is a good idea to pause globbing for this script.

Werner Lehmann ,May 4, 2013 at 22:32

Used this approach... until I came across a long string to split. 100% CPU for more than a minute (then I killed it). It's a pity because this method allows to split by a string, not some character in IFS. – Werner Lehmann May 4 '13 at 22:32

Dieter Gribnitz ,Sep 2, 2014 at 15:46

WARNING: Just ran into a problem with this approach. If you have an element named * you will get all the elements of your cwd as well. thus string="1:2:3:4:*" will give some unexpected and possibly dangerous results depending on your implementation. Did not get the same error with (IFS=', ' read -a array <<< "$string") and this one seems safe to use. – Dieter Gribnitz Sep 2 '14 at 15:46

akostadinov ,Nov 6, 2014 at 14:31

not reliable for many kinds of values, use with care – akostadinov Nov 6 '14 at 14:31

Andrew White ,Jun 1, 2016 at 11:44

quoting ${string//:/ } prevents shell expansion – Andrew White Jun 1 '16 at 11:44

Mark Thomson ,Jun 5, 2016 at 20:44

I had to use the following on OSX: array=(${string//:/ })Mark Thomson Jun 5 '16 at 20:44

bgoldst ,Jul 19, 2017 at 21:20

All of the answers to this question are wrong in one way or another.

Wrong answer #1

IFS=', ' read -r -a array <<< "$string"

1: This is a misuse of $IFS . The value of the $IFS variable is not taken as a single variable-length string separator, rather it is taken as a set of single-character string separators, where each field that read splits off from the input line can be terminated by any character in the set (comma or space, in this example).

Actually, for the real sticklers out there, the full meaning of $IFS is slightly more involved. From the bash manual :

The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. If IFS is unset, or its value is exactly <space><tab><newline> , the default, then sequences of <space> , <tab> , and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters <space> , <tab> , and <newline> are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.

Basically, for non-default non-null values of $IFS , fields can be separated with either (1) a sequence of one or more characters that are all from the set of "IFS whitespace characters" (that is, whichever of <space> , <tab> , and <newline> ("newline" meaning line feed (LF) ) are present anywhere in $IFS ), or (2) any non-"IFS whitespace character" that's present in $IFS along with whatever "IFS whitespace characters" surround it in the input line.

For the OP, it's possible that the second separation mode I described in the previous paragraph is exactly what he wants for his input string, but we can be pretty confident that the first separation mode I described is not correct at all. For example, what if his input string was 'Los Angeles, United States, North America' ?

IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a;
## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")

2: Even if you were to use this solution with a single-character separator (such as a comma by itself, that is, with no following space or other baggage), if the value of the $string variable happens to contain any LFs, then read will stop processing once it encounters the first LF. The read builtin only processes one line per invocation. This is true even if you are piping or redirecting input only to the read statement, as we are doing in this example with the here-string mechanism, and thus unprocessed input is guaranteed to be lost. The code that powers the read builtin has no knowledge of the data flow within its containing command structure.

You could argue that this is unlikely to cause a problem, but still, it's a subtle hazard that should be avoided if possible. It is caused by the fact that the read builtin actually does two levels of input splitting: first into lines, then into fields. Since the OP only wants one level of splitting, this usage of the read builtin is not appropriate, and we should avoid it.

3: A non-obvious potential issue with this solution is that read always drops the trailing field if it is empty, although it preserves empty fields otherwise. Here's a demo:

string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a;
## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")

Maybe the OP wouldn't care about this, but it's still a limitation worth knowing about. It reduces the robustness and generality of the solution.

This problem can be solved by appending a dummy trailing delimiter to the input string just prior to feeding it to read , as I will demonstrate later.


Wrong answer #2

string="1:2:3:4:5"
set -f                     # avoid globbing (expansion of *).
array=(${string//:/ })

Similar idea:

t="one,two,three"
a=($(echo $t | tr ',' "\n"))

(Note: I added the missing parentheses around the command substitution which the answerer seems to have omitted.)

Similar idea:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)

These solutions leverage word splitting in an array assignment to split the string into fields. Funnily enough, just like read , general word splitting also uses the $IFS special variable, although in this case it is implied that it is set to its default value of <space><tab><newline> , and therefore any sequence of one or more IFS characters (which are all whitespace characters now) is considered to be a field delimiter.

This solves the problem of two levels of splitting committed by read , since word splitting by itself constitutes only one level of splitting. But just as before, the problem here is that the individual fields in the input string can already contain $IFS characters, and thus they would be improperly split during the word splitting operation. This happens to not be the case for any of the sample input strings provided by these answerers (how convenient...), but of course that doesn't change the fact that any code base that used this idiom would then run the risk of blowing up if this assumption were ever violated at some point down the line. Once again, consider my counterexample of 'Los Angeles, United States, North America' (or 'Los Angeles:United States:North America' ).

Also, word splitting is normally followed by filename expansion ( aka pathname expansion aka globbing), which, if done, would potentially corrupt words containing the characters * , ? , or [ followed by ] (and, if extglob is set, parenthesized fragments preceded by ? , * , + , @ , or ! ) by matching them against file system objects and expanding the words ("globs") accordingly. The first of these three answerers has cleverly undercut this problem by running set -f beforehand to disable globbing. Technically this works (although you should probably add set +f afterward to reenable globbing for subsequent code which may depend on it), but it's undesirable to have to mess with global shell settings in order to hack a basic string-to-array parsing operation in local code.

Another issue with this answer is that all empty fields will be lost. This may or may not be a problem, depending on the application.

Note: If you're going to use this solution, it's better to use the ${string//:/ } "pattern substitution" form of parameter expansion , rather than going to the trouble of invoking a command substitution (which forks the shell), starting up a pipeline, and running an external executable ( tr or sed ), since parameter expansion is purely a shell-internal operation. (Also, for the tr and sed solutions, the input variable should be double-quoted inside the command substitution; otherwise word splitting would take effect in the echo command and potentially mess with the field values. Also, the $(...) form of command substitution is preferable to the old `...` form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)


Wrong answer #3

str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

This answer is almost the same as #2 . The difference is that the answerer has made the assumption that the fields are delimited by two characters, one of which being represented in the default $IFS , and the other not. He has solved this rather specific case by removing the non-IFS-represented character using a pattern substitution expansion and then using word splitting to split the fields on the surviving IFS-represented delimiter character.

This is not a very generic solution. Furthermore, it can be argued that the comma is really the "primary" delimiter character here, and that stripping it and then depending on the space character for field splitting is simply wrong. Once again, consider my counterexample: 'Los Angeles, United States, North America' .

Also, again, filename expansion could corrupt the expanded words, but this can be prevented by temporarily disabling globbing for the assignment with set -f and then set +f .

Also, again, all empty fields will be lost, which may or may not be a problem depending on the application.


Wrong answer #4

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

This is similar to #2 and #3 in that it uses word splitting to get the job done, only now the code explicitly sets $IFS to contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw with previous wrong answers, and there is only one level of splitting, as required.

One problem is that filename expansion will corrupt affected words as described earlier, although once again this can be solved by wrapping the critical statement in set -f and set +f .

Another potential problem is that, since LF qualifies as an "IFS whitespace character" as defined earlier, all empty fields will be lost, just as in #2 and #3 . This would of course not be a problem if the delimiter happens to be a non-"IFS whitespace character", and depending on the application it may not matter anyway, but it does vitiate the generality of the solution.

So, to sum up, assuming you have a one-character delimiter, and it is either a non-"IFS whitespace character" or you don't care about empty fields, and you wrap the critical statement in set -f and set +f , then this solution works, but otherwise not.

(Also, for information's sake, assigning a LF to a variable in bash can be done more easily with the $'...' syntax, e.g. IFS=$'\n'; .)


Wrong answer #5

countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

Similar idea:

IFS=', ' eval 'array=($string)'

This solution is effectively a cross between #1 (in that it sets $IFS to comma-space) and #2-4 (in that it uses word splitting to split the string into fields). Because of this, it suffers from most of the problems that afflict all of the above wrong answers, sort of like the worst of all worlds.

Also, regarding the second variant, it may seem like the eval call is completely unnecessary, since its argument is a single-quoted string literal, and therefore is statically known. But there's actually a very non-obvious benefit to using eval in this way. Normally, when you run a simple command which consists of a variable assignment only , meaning without an actual command word following it, the assignment takes effect in the shell environment:

IFS=', '; ## changes $IFS in the shell environment

This is true even if the simple command involves multiple variable assignments; again, as long as there's no command word, all variable assignments affect the shell environment:

IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environment

But, if the variable assignment is attached to a command name (I like to call this a "prefix assignment") then it does not affect the shell environment, and instead only affects the environment of the executed command, regardless whether it is a builtin or external:

IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it
IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive it

Relevant quote from the bash manual :

If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.

It is possible to exploit this feature of variable assignment to change $IFS only temporarily, which allows us to avoid the whole save-and-restore gambit like that which is being done with the $OIFS variable in the first variant. But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not involve a command word to make the $IFS assignment temporary. You might think to yourself, well why not just add a no-op command word to the statement like the : builtin to make the $IFS assignment temporary? This does not work because it would then make the $array assignment temporary as well:

IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : command

So, we're effectively at an impasse, a bit of a catch-22. But, when eval runs its code, it runs it in the shell environment, as if it was normal, static source code, and therefore we can run the $array assignment inside the eval argument to have it take effect in the shell environment, while the $IFS prefix assignment that is prefixed to the eval command will not outlive the eval command. This is exactly the trick that is being used in the second variant of this solution:

IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array does

So, as you can see, it's actually quite a clever trick, and accomplishes exactly what is required (at least with respect to assignment effectation) in a rather non-obvious way. I'm actually not against this trick in general, despite the involvement of eval ; just be careful to single-quote the argument string to guard against security threats.

But again, because of the "worst of all worlds" agglomeration of problems, this is still a wrong answer to the OP's requirement.


Wrong answer #6

IFS=', '; array=(Paris, France, Europe)

IFS=' ';declare -a array=(Paris France Europe)

Um... what? The OP has a string variable that needs to be parsed into an array. This "answer" starts with the verbatim contents of the input string pasted into an array literal. I guess that's one way to do it.

It looks like the answerer may have assumed that the $IFS variable affects all bash parsing in all contexts, which is not true. From the bash manual:

IFS The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is <space><tab><newline> .

So the $IFS special variable is actually only used in two contexts: (1) word splitting that is performed after expansion (meaning not when parsing bash source code) and (2) for splitting input lines into words by the read builtin.

Let me try to make this clearer. I think it might be good to draw a distinction between parsing and execution . Bash must first parse the source code, which obviously is a parsing event, and then later it executes the code, which is when expansion comes into the picture. Expansion is really an execution event. Furthermore, I take issue with the description of the $IFS variable that I just quoted above; rather than saying that word splitting is performed after expansion , I would say that word splitting is performed during expansion, or, perhaps even more precisely, word splitting is part of the expansion process. The phrase "word splitting" refers only to this step of expansion; it should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the words "split" and "words" a lot. Here's a relevant excerpt from the linux.die.net version of the bash manual:

Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion , tilde expansion , parameter and variable expansion , command substitution , arithmetic expansion , word splitting , and pathname expansion .

The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.

You could argue the GNU version of the manual does slightly better, since it opts for the word "tokens" instead of "words" in the first sentence of the Expansion section:

Expansion is performed on the command line after it has been split into tokens.

The important point is, $IFS does not change the way bash parses source code. Parsing of bash source code is actually a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule; for example, see the various compatxx shell settings , which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text that was parsed right off the source bytestream.


Wrong answer #7

string='first line
        second line
        third line'

while read -r line; do lines+=("$line"); done <<<"$string"

This is one of the best solutions. Notice that we're back to using read . Didn't I say earlier that read is inappropriate because it performs two levels of splitting, when we only need one? The trick here is that you can call read in such a way that it effectively only does one level of splitting, specifically by splitting off only one field per invocation, which necessitates the cost of having to call it repeatedly in a loop. It's a bit of a sleight of hand, but it works.

But there are problems. First: When you provide at least one NAME argument to read , it automatically ignores leading and trailing whitespace in each field that is split off from the input string. This occurs whether $IFS is set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case, and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will want this. There is a solution, however: A somewhat non-obvious usage of read is to pass zero NAME arguments. In this case, read will store the entire input line that it gets from the input stream in a variable named $REPLY , and, as a bonus, it does not strip leading and trailing whitespace from the value. This is a very robust usage of read which I've exploited frequently in my shell programming career. Here's a demonstration of the difference in behavior:

string=$'  a  b  \n  c  d  \n  e  f  '; ## input string

a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a;
## declare -a a=([0]="a  b" [1]="c  d" [2]="e  f") ## read trimmed surrounding whitespace

a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="  a  b  " [1]="  c  d  " [2]="  e  f  ") ## no trimming

The second issue with this solution is that it does not actually address the case of a custom field separator, such as the OP's comma-space. As before, multicharacter separators are not supported, which is an unfortunate limitation of this solution. We could try to at least split on comma by specifying the separator to the -d option, but look what happens:

string='Paris, France, Europe';
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France")

Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error: Europe is missing! What happened to it? The answer is that read returns a failing return code if it hits end-of-file (in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the while-loop to break prematurely and we lose the final field.

Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken to be LF, which is the default when you don't specify the -d option, and the <<< ("here-string") mechanism automatically appends a LF to the string just before it feeds it as input to the command. Hence, in those cases, we sort of accidentally solved the problem of a dropped final field by unwittingly appending an additional dummy terminator to the input. Let's call this solution the "dummy-terminator" solution. We can apply the dummy-terminator solution manually for any custom delimiter by concatenating it against the input string ourselves when instantiating it in the here-string:

a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; declare -p a;
declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

There, problem solved. Another solution is to only break the while-loop if both (1) read returned failure and (2) $REPLY is empty, meaning read was not able to read any characters prior to hitting end-of-file. Demo:

a=(); while read -rd,|| [[ -n "$REPLY" ]]; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

This approach also reveals the secretive LF that automatically gets appended to the here-string by the <<< redirection operator. It could of course be stripped off separately through an explicit trimming operation as described a moment ago, but obviously the manual dummy-terminator approach solves it directly, so we could just go with that. The manual dummy-terminator solution is actually quite convenient in that it solves both of these two problems (the dropped-final-field problem and the appended-LF problem) in one go.

So, overall, this is quite a powerful solution. It's only remaining weakness is a lack of support for multicharacter delimiters, which I will address later.


Wrong answer #8

string='first line
        second line
        third line'

readarray -t lines <<<"$string"

(This is actually from the same post as #7 ; the answerer provided two solutions in the same post.)

The readarray builtin, which is a synonym for mapfile , is ideal. It's a builtin command which parses a bytestream into an array variable in one shot; no messing with loops, conditionals, substitutions, or anything else. And it doesn't surreptitiously strip any whitespace from the input string. And (if -O is not given) it conveniently clears the target array before assigning to it. But it's still not perfect, hence my criticism of it as a "wrong answer".

First, just to get this out of the way, note that, just like the behavior of read when doing field-parsing, readarray drops the trailing field if it is empty. Again, this is probably not a concern for the OP, but it could be for some use-cases. I'll come back to this in a moment.

Second, as before, it does not support multicharacter delimiters. I'll give a fix for this in a moment as well.

Third, the solution as written does not parse the OP's input string, and in fact, it cannot be used as-is to parse it. I'll expand on this momentarily as well.

For the above reasons, I still consider this to be a "wrong answer" to the OP's question. Below I'll give what I consider to be the right answer.


Right answer

Here's a naïve attempt to make #8 work by just specifying the -d option:

string='Paris, France, Europe';
readarray -td, a <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

We see the result is identical to the result we got from the double-conditional approach of the looping read solution discussed in #7 . We can almost solve this with the manual dummy-terminator trick:

readarray -td, a <<<"$string,"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe" [3]=$'\n')

The problem here is that readarray preserved the trailing field, since the <<< redirection operator appended the LF to the input string, and therefore the trailing field was not empty (otherwise it would've been dropped). We can take care of this by explicitly unsetting the final array element after-the-fact:

readarray -td, a <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

The only two problems that remain, which are actually related, are (1) the extraneous whitespace that needs to be trimmed, and (2) the lack of support for multicharacter delimiters.

The whitespace could of course be trimmed afterward (for example, see How to trim whitespace from a Bash variable? ). But if we can hack a multicharacter delimiter, then that would solve both problems in one shot.

Unfortunately, there's no direct way to get a multicharacter delimiter to work. The best solution I've thought of is to preprocess the input string to replace the multicharacter delimiter with a single-character delimiter that will be guaranteed not to collide with the contents of the input string. The only character that has this guarantee is the NUL byte . This is because, in bash (though not in zsh, incidentally), variables cannot contain the NUL byte. This preprocessing step can be done inline in a process substitution. Here's how to do it using awk :

readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; }' <<<"$string, "); unset 'a[-1]';
declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

There, finally! This solution will not erroneously split fields in the middle, will not cut out prematurely, will not drop empty fields, will not corrupt itself on filename expansions, will not automatically strip leading and trailing whitespace, will not leave a stowaway LF on the end, does not require loops, and does not settle for a single-character delimiter.


Trimming solution

Lastly, I wanted to demonstrate my own fairly intricate trimming solution using the obscure -C callback option of readarray . Unfortunately, I've run out of room against Stack Overflow's draconian 30,000 character post limit, so I won't be able to explain it. I'll leave that as an exercise for the reader.

function mfcb { local val="$4"; "$1"; eval "$2[$3]=\$val;"; };
function val_ltrim { if [[ "$val" =~ ^[[:space:]]+ ]]; then val="${val:${#BASH_REMATCH[0]}}"; fi; };
function val_rtrim { if [[ "$val" =~ [[:space:]]+$ ]]; then val="${val:0:${#val}-${#BASH_REMATCH[0]}}"; fi; };
function val_trim { val_ltrim; val_rtrim; };
readarray -c1 -C 'mfcb val_trim a' -td, <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

fbicknel ,Aug 18, 2017 at 15:57

It may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray first appears in Bash 4.4. – fbicknel Aug 18 '17 at 15:57

Cyril Duchon-Doris ,Nov 3, 2017 at 9:16

You should add a "TL;DR : scroll 3 pages to see the right solution at the end of my answer" – Cyril Duchon-Doris Nov 3 '17 at 9:16

dawg ,Nov 26, 2017 at 22:28

Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,"\0"); print }' and eliminate that concatenation of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string") on Bash that supports readarray . Note your method is Bash 4.4+ I think because of the -d in readarraydawg Nov 26 '17 at 22:28

datUser ,Feb 22, 2018 at 14:54

Looks like readarray is not an available builtin on OSX. – datUser Feb 22 '18 at 14:54

bgoldst ,Feb 23, 2018 at 3:37

@datUser That's unfortunate. Your version of bash must be too old for readarray . In this case, you can use the second-best solution built on read . I'm referring to this: a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; (with the awk substitution if you need multicharacter delimiter support). Let me know if you run into any problems; I'm pretty sure this solution should work on fairly old versions of bash, back to version 2-something, released like two decades ago. – bgoldst Feb 23 '18 at 3:37

Jmoney38 ,Jul 14, 2015 at 11:54

t="one,two,three"
a=($(echo "$t" | tr ',' '\n'))
echo "${a[2]}"

Prints three

shrimpwagon ,Oct 16, 2015 at 20:04

I actually prefer this approach. Simple. – shrimpwagon Oct 16 '15 at 20:04

Ben ,Oct 31, 2015 at 3:11

I copied and pasted this and it did did not work with echo, but did work when I used it in a for loop. – Ben Oct 31 '15 at 3:11

Pinaki Mukherjee ,Nov 9, 2015 at 20:22

This is the simplest approach. thanks – Pinaki Mukherjee Nov 9 '15 at 20:22

abalter ,Aug 30, 2016 at 5:13

This does not work as stated. @Jmoney38 or shrimpwagon if you can paste this in a terminal and get the desired output, please paste the result here. – abalter Aug 30 '16 at 5:13

leaf ,Jul 17, 2017 at 16:28

@abalter Works for me with a=($(echo $t | tr ',' "\n")) . Same result with a=($(echo $t | tr ',' ' ')) . – leaf Jul 17 '17 at 16:28

Luca Borrione ,Nov 2, 2012 at 13:44

Sometimes it happened to me that the method described in the accepted answer didn't work, especially if the separator is a carriage return.
In those cases I solved in this way:
string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

for line in "${lines[@]}"
    do
        echo "--> $line"
done

Stefan van den Akker ,Feb 9, 2015 at 16:52

+1 This completely worked for me. I needed to put multiple strings, divided by a newline, into an array, and read -a arr <<< "$strings" did not work with IFS=$'\n' . – Stefan van den Akker Feb 9 '15 at 16:52

Stefan van den Akker ,Feb 10, 2015 at 13:49

Here is the answer to make the accepted answer work when the delimiter is a newline . – Stefan van den Akker Feb 10 '15 at 13:49

,Jul 24, 2015 at 21:24

The accepted answer works for values in one line.
If the variable has several lines:
string='first line
        second line
        third line'

We need a very different command to get all lines:

while read -r line; do lines+=("$line"); done <<<"$string"

Or the much simpler bash readarray :

readarray -t lines <<<"$string"

Printing all lines is very easy taking advantage of a printf feature:

printf ">[%s]\n" "${lines[@]}"

>[first line]
>[        second line]
>[        third line]

Mayhem ,Dec 31, 2015 at 3:13

While not every solution works for every situation, your mention of readarray... replaced my last two hours with 5 minutes... you got my vote – Mayhem Dec 31 '15 at 3:13

Derek 朕會功夫 ,Mar 23, 2018 at 19:14

readarray is the right answer. – Derek 朕會功夫 Mar 23 '18 at 19:14

ssanch ,Jun 3, 2016 at 15:24

This is similar to the approach by Jmoney38, but using sed:
string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
echo ${array[0]}

Prints 1

dawg ,Nov 26, 2017 at 19:59

The key to splitting your string into an array is the multi character delimiter of ", " . Any solution using IFS for multi character delimiters is inherently wrong since IFS is a set of those characters, not a string.

If you assign IFS=", " then the string will break on EITHER "," OR " " or any combination of them which is not an accurate representation of the two character delimiter of ", " .

You can use awk or sed to split the string, with process substitution:

#!/bin/bash

str="Paris, France, Europe"
array=()
while read -r -d $'\0' each; do   # use a NUL terminated field separator 
    array+=("$each")
done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }')
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output

It is more efficient to use a regex you directly in Bash:

#!/bin/bash

str="Paris, France, Europe"

array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
    array+=("${BASH_REMATCH[1]}")   # capture the field
    i=${#BASH_REMATCH}              # length of field + delimiter
    str=${str:i}                    # advance the string by that length
done                                # the loop deletes $str, so make a copy if needed

declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...

With the second form, there is no sub shell and it will be inherently faster.


Edit by bgoldst: Here are some benchmarks comparing my readarray solution to dawg's regex solution, and I also included the read solution for the heck of it (note: I slightly modified the regex solution for greater harmony with my solution) (also see my comments below the post):

## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; };
function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); };
function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\  ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; };

## helper functions
function rep {
    local -i i=-1;
    for ((i = 0; i<$1; ++i)); do
        printf %s "$2";
    done;
}; ## end rep()

function testAll {
    local funcs=();
    local args=();
    local func='';
    local -i rc=-1;
    while [[ "$1" != ':' ]]; do
        func="$1";
        if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then
            echo "bad function name: $func" >&2;
            return 2;
        fi;
        funcs+=("$func");
        shift;
    done;
    shift;
    args=("$@");
    for func in "${funcs[@]}"; do
        echo -n "$func ";
        { time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/';
        rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi;
    done| column -ts/;
}; ## end testAll()

function makeStringToSplit {
    local -i n=$1; ## number of fields
    if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi;
    if [[ $n -eq 0 ]]; then
        echo;
    elif [[ $n -eq 1 ]]; then
        echo 'first field';
    elif [[ "$n" -eq 2 ]]; then
        echo 'first field, last field';
    else
        echo "first field, $(rep $[$1-2] 'mid field, ')last field";
    fi;
}; ## end makeStringToSplit()

function testAll_splitIntoArray {
    local -i n=$1; ## number of fields in input string
    local s='';
    echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) =====";
    s="$(makeStringToSplit "$n")";
    testAll c_readarray c_read c_regex : "$s";
}; ## end testAll_splitIntoArray()

## results
testAll_splitIntoArray 1;
## ===== 1 field =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.000s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 10;
## ===== 10 fields =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.001s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 100;
## ===== 100 fields =====
## c_readarray   real  0m0.069s   user 0m0.000s   sys  0m0.062s
## c_read        real  0m0.065s   user 0m0.000s   sys  0m0.046s
## c_regex       real  0m0.005s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 1000;
## ===== 1000 fields =====
## c_readarray   real  0m0.084s   user 0m0.031s   sys  0m0.077s
## c_read        real  0m0.092s   user 0m0.031s   sys  0m0.046s
## c_regex       real  0m0.125s   user 0m0.125s   sys  0m0.000s
##
testAll_splitIntoArray 10000;
## ===== 10000 fields =====
## c_readarray   real  0m0.209s   user 0m0.093s   sys  0m0.108s
## c_read        real  0m0.333s   user 0m0.234s   sys  0m0.109s
## c_regex       real  0m9.095s   user 0m9.078s   sys  0m0.000s
##
testAll_splitIntoArray 100000;
## ===== 100000 fields =====
## c_readarray   real  0m1.460s   user 0m0.326s   sys  0m1.124s
## c_read        real  0m2.780s   user 0m1.686s   sys  0m1.092s
## c_regex       real  17m38.208s   user 15m16.359s   sys  2m19.375s
##

bgoldst ,Nov 27, 2017 at 4:28

Very cool solution! I never thought of using a loop on a regex match, nifty use of $BASH_REMATCH . It works, and does indeed avoid spawning subshells. +1 from me. However, by way of criticism, the regex itself is a little non-ideal, in that it appears you were forced to duplicate part of the delimiter token (specifically the comma) so as to work around the lack of support for non-greedy multipliers (also lookarounds) in ERE ("extended" regex flavor built into bash). This makes it a little less generic and robust. – bgoldst Nov 27 '17 at 4:28

bgoldst ,Nov 27, 2017 at 4:28

Secondly, I did some benchmarking, and although the performance is better than the other solutions for smallish strings, it worsens exponentially due to the repeated string-rebuilding, becoming catastrophic for very large strings. See my edit to your answer. – bgoldst Nov 27 '17 at 4:28

dawg ,Nov 27, 2017 at 4:46

@bgoldst: What a cool benchmark! In defense of the regex, for 10's or 100's of thousands of fields (what the regex is splitting) there would probably be some form of record (like \n delimited text lines) comprising those fields so the catastrophic slow-down would likely not occur. If you have a string with 100,000 fields -- maybe Bash is not ideal ;-) Thanks for the benchmark. I learned a thing or two. – dawg Nov 27 '17 at 4:46

Geoff Lee ,Mar 4, 2016 at 6:02

Try this
IFS=', '; array=(Paris, France, Europe)
for item in ${array[@]}; do echo $item; done

It's simple. If you want, you can also add a declare (and also remove the commas):

IFS=' ';declare -a array=(Paris France Europe)

The IFS is added to undo the above but it works without it in a fresh bash instance

MrPotatoHead ,Nov 13, 2018 at 13:19

Pure bash multi-character delimiter solution.

As others have pointed out in this thread, the OP's question gave an example of a comma delimited string to be parsed into an array, but did not indicate if he/she was only interested in comma delimiters, single character delimiters, or multi-character delimiters.

Since Google tends to rank this answer at or near the top of search results, I wanted to provide readers with a strong answer to the question of multiple character delimiters, since that is also mentioned in at least one response.

If you're in search of a solution to a multi-character delimiter problem, I suggest reviewing Mallikarjun M 's post, in particular the response from gniourf_gniourf who provides this elegant pure BASH solution using parameter expansion:

#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
    array+=( "${s%%"$delimiter"*}" );
    s=${s#*"$delimiter"};
done;
declare -p array

Link to cited comment/referenced post

Link to cited question: Howto split a string on a multi-character delimiter in bash?

Eduardo Cuomo ,Dec 19, 2016 at 15:27

Use this:
countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

#${array[1]} == Paris
#${array[2]} == France
#${array[3]} == Europe

gniourf_gniourf ,Dec 19, 2016 at 17:22

Bad: subject to word splitting and pathname expansion. Please don't revive old questions with good answers to give bad answers. – gniourf_gniourf Dec 19 '16 at 17:22

Scott Weldon ,Dec 19, 2016 at 18:12

This may be a bad answer, but it is still a valid answer. Flaggers / reviewers: For incorrect answers such as this one, downvote, don't delete!Scott Weldon Dec 19 '16 at 18:12

George Sovetov ,Dec 26, 2016 at 17:31

@gniourf_gniourf Could you please explain why it is a bad answer? I really don't understand when it fails. – George Sovetov Dec 26 '16 at 17:31

gniourf_gniourf ,Dec 26, 2016 at 18:07

@GeorgeSovetov: As I said, it's subject to word splitting and pathname expansion. More generally, splitting a string into an array as array=( $string ) is a (sadly very common) antipattern: word splitting occurs: string='Prague, Czech Republic, Europe' ; Pathname expansion occurs: string='foo[abcd],bar[efgh]' will fail if you have a file named, e.g., food or barf in your directory. The only valid usage of such a construct is when string is a glob.gniourf_gniourf Dec 26 '16 at 18:07

user1009908 ,Jun 9, 2015 at 23:28

UPDATE: Don't do this, due to problems with eval.

With slightly less ceremony:

IFS=', ' eval 'array=($string)'

e.g.

string="foo, bar,baz"
IFS=', ' eval 'array=($string)'
echo ${array[1]} # -> bar

caesarsol ,Oct 29, 2015 at 14:42

eval is evil! don't do this. – caesarsol Oct 29 '15 at 14:42

user1009908 ,Oct 30, 2015 at 4:05

Pfft. No. If you're writing scripts large enough for this to matter, you're doing it wrong. In application code, eval is evil. In shell scripting, it's common, necessary, and inconsequential. – user1009908 Oct 30 '15 at 4:05

caesarsol ,Nov 2, 2015 at 18:19

put a $ in your variable and you'll see... I write many scripts and I never ever had to use a single evalcaesarsol Nov 2 '15 at 18:19

Dennis Williamson ,Dec 2, 2015 at 17:00

Eval command and security issuesDennis Williamson Dec 2 '15 at 17:00

user1009908 ,Dec 22, 2015 at 23:04

You're right, this is only usable when the input is known to be clean. Not a robust solution. – user1009908 Dec 22 '15 at 23:04

Eduardo Lucio ,Jan 31, 2018 at 20:45

Here's my hack!

Splitting strings by strings is a pretty boring thing to do using bash. What happens is that we have limited approaches that only work in a few cases (split by ";", "/", "." and so on) or we have a variety of side effects in the outputs.

The approach below has required a number of maneuvers, but I believe it will work for most of our needs!

#!/bin/bash

# --------------------------------------
# SPLIT FUNCTION
# ----------------

F_SPLIT_R=()
f_split() {
    : 'It does a "split" into a given string and returns an array.

    Args:
        TARGET_P (str): Target string to "split".
        DELIMITER_P (Optional[str]): Delimiter used to "split". If not 
    informed the split will be done by spaces.

    Returns:
        F_SPLIT_R (array): Array with the provided string separated by the 
    informed delimiter.
    '

    F_SPLIT_R=()
    TARGET_P=$1
    DELIMITER_P=$2
    if [ -z "$DELIMITER_P" ] ; then
        DELIMITER_P=" "
    fi

    REMOVE_N=1
    if [ "$DELIMITER_P" == "\n" ] ; then
        REMOVE_N=0
    fi

    # NOTE: This was the only parameter that has been a problem so far! 
    # By Questor
    # [Ref.: https://unix.stackexchange.com/a/390732/61742]
    if [ "$DELIMITER_P" == "./" ] ; then
        DELIMITER_P="[.]/"
    fi

    if [ ${REMOVE_N} -eq 1 ] ; then

        # NOTE: Due to bash limitations we have some problems getting the 
        # output of a split by awk inside an array and so we need to use 
        # "line break" (\n) to succeed. Seen this, we remove the line breaks 
        # momentarily afterwards we reintegrate them. The problem is that if 
        # there is a line break in the "string" informed, this line break will 
        # be lost, that is, it is erroneously removed in the output! 
        # By Questor
        TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")

    fi

    # NOTE: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results 
    # in more occurrences of "3F2C417D448C46918289218B7337FCAF" than the 
    # amount of "\n" that there was originally in the string (one more 
    # occurrence at the end of the string)! We can not explain the reason for 
    # this side effect. The line below corrects this problem! By Questor
    TARGET_P=${TARGET_P%????????????????????????????????}

    SPLIT_NOW=$(awk -F"$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}")

    while IFS= read -r LINE_NOW ; do
        if [ ${REMOVE_N} -eq 1 ] ; then

            # NOTE: We use "'" to prevent blank lines with no other characters 
            # in the sequence being erroneously removed! We do not know the 
            # reason for this side effect! By Questor
            LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'")

            # NOTE: We use the commands below to revert the intervention made 
            # immediately above! By Questor
            LN_NOW_WITH_N=${LN_NOW_WITH_N%?}
            LN_NOW_WITH_N=${LN_NOW_WITH_N#?}

            F_SPLIT_R+=("$LN_NOW_WITH_N")
        else
            F_SPLIT_R+=("$LINE_NOW")
        fi
    done <<< "$SPLIT_NOW"
}

# --------------------------------------
# HOW TO USE
# ----------------

STRING_TO_SPLIT="
 * How do I list all databases and tables using psql?

\"
sudo -u postgres /usr/pgsql-9.4/bin/psql -c \"\l\"
sudo -u postgres /usr/pgsql-9.4/bin/psql <DB_NAME> -c \"\dt\"
\"

\"
\list or \l: list all databases
\dt: list all tables in the current database
\"

[Ref.: https://dba.stackexchange.com/questions/1285/how-do-i-list-all-databases-and-tables-using-psql]


"

f_split "$STRING_TO_SPLIT" "bin/psql -c"

# --------------------------------------
# OUTPUT AND TEST
# ----------------

ARR_LENGTH=${#F_SPLIT_R[*]}
for (( i=0; i<=$(( $ARR_LENGTH -1 )); i++ )) ; do
    echo " > -----------------------------------------"
    echo "${F_SPLIT_R[$i]}"
    echo " < -----------------------------------------"
done

if [ "$STRING_TO_SPLIT" == "${F_SPLIT_R[0]}bin/psql -c${F_SPLIT_R[1]}" ] ; then
    echo " > -----------------------------------------"
    echo "The strings are the same!"
    echo " < -----------------------------------------"
fi

sel-en-ium ,May 31, 2018 at 5:56

Another way to do it without modifying IFS:
read -r -a myarray <<< "${string//, /$IFS}"

Rather than changing IFS to match our desired delimiter, we can replace all occurrences of our desired delimiter ", " with contents of $IFS via "${string//, /$IFS}" .

Maybe this will be slow for very large strings though?

This is based on Dennis Williamson's answer.

rsjethani ,Sep 13, 2016 at 16:21

Another approach can be:
str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

After this 'arr' is an array with four strings. This doesn't require dealing IFS or read or any other special stuff hence much simpler and direct.

gniourf_gniourf ,Dec 26, 2016 at 18:12

Same (sadly common) antipattern as other answers: subject to word splitting and filename expansion. – gniourf_gniourf Dec 26 '16 at 18:12

Safter Arslan ,Aug 9, 2017 at 3:21

Another way would be:
string="Paris, France, Europe"
IFS=', ' arr=(${string})

Now your elements are stored in "arr" array. To iterate through the elements:

for i in ${arr[@]}; do echo $i; done

bgoldst ,Aug 13, 2017 at 22:38

I cover this idea in my answer ; see Wrong answer #5 (you might be especially interested in my discussion of the eval trick). Your solution leaves $IFS set to the comma-space value after-the-fact. – bgoldst Aug 13 '17 at 22:38

[Nov 08, 2018] How to split one string into multiple variables in bash shell? [duplicate]

Nov 08, 2018 | stackoverflow.com
This question already has an answer here:

Rob I , May 9, 2012 at 19:22

For your second question, see @mkb's comment to my answer below - that's definitely the way to go! – Rob I May 9 '12 at 19:22

Dennis Williamson , Jul 4, 2012 at 16:14

See my edited answer for one way to read individual characters into an array. – Dennis Williamson Jul 4 '12 at 16:14

Nick Weedon , Dec 31, 2015 at 11:04

Here is the same thing in a more concise form: var1=$(cut -f1 -d- <<<$STR) – Nick Weedon Dec 31 '15 at 11:04

Rob I , May 9, 2012 at 17:00

If your solution doesn't have to be general, i.e. only needs to work for strings like your example, you could do:
var1=$(echo $STR | cut -f1 -d-)
var2=$(echo $STR | cut -f2 -d-)

I chose cut here because you could simply extend the code for a few more variables...

crunchybutternut , May 9, 2012 at 17:40

Can you look at my post again and see if you have a solution for the followup question? thanks! – crunchybutternut May 9 '12 at 17:40

mkb , May 9, 2012 at 17:59

You can use cut to cut characters too! cut -c1 for example. – mkb May 9 '12 at 17:59

FSp , Nov 27, 2012 at 10:26

Although this is very simple to read and write, is a very slow solution because forces you to read twice the same data ($STR) ... if you care of your script performace, the @anubhava solution is much better – FSp Nov 27 '12 at 10:26

tripleee , Jan 25, 2016 at 6:47

Apart from being an ugly last-resort solution, this has a bug: You should absolutely use double quotes in echo "$STR" unless you specifically want the shell to expand any wildcards in the string as a side effect. See also stackoverflow.com/questions/10067266/tripleee Jan 25 '16 at 6:47

Rob I , Feb 10, 2016 at 13:57

You're right about double quotes of course, though I did point out this solution wasn't general. However I think your assessment is a bit unfair - for some people this solution may be more readable (and hence extensible etc) than some others, and doesn't completely rely on arcane bash feature that wouldn't translate to other shells. I suspect that's why my solution, though less elegant, continues to get votes periodically... – Rob I Feb 10 '16 at 13:57

Dennis Williamson , May 10, 2012 at 3:14

read with IFS are perfect for this:
$ IFS=- read var1 var2 <<< ABCDE-123456
$ echo "$var1"
ABCDE
$ echo "$var2"
123456

Edit:

Here is how you can read each individual character into array elements:

$ read -a foo <<<"$(echo "ABCDE-123456" | sed 's/./& /g')"

Dump the array:

$ declare -p foo
declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]="-" [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'

If there are spaces in the string:

$ IFS=$'\v' read -a foo <<<"$(echo "ABCDE 123456" | sed 's/./&\v/g')"
$ declare -p foo
declare -a foo='([0]="A" [1]="B" [2]="C" [3]="D" [4]="E" [5]=" " [6]="1" [7]="2" [8]="3" [9]="4" [10]="5" [11]="6")'

insecure , Apr 30, 2014 at 7:51

Great, the elegant bash-only way, without unnecessary forks. – insecure Apr 30 '14 at 7:51

Martin Serrano , Jan 11 at 4:34

this solution also has the benefit that if delimiter is not present, the var2 will be empty – Martin Serrano Jan 11 at 4:34

mkb , May 9, 2012 at 17:02

If you know it's going to be just two fields, you can skip the extra subprocesses like this:
var1=${STR%-*}
var2=${STR#*-}

What does this do? ${STR%-*} deletes the shortest substring of $STR that matches the pattern -* starting from the end of the string. ${STR#*-} does the same, but with the *- pattern and starting from the beginning of the string. They each have counterparts %% and ## which find the longest anchored pattern match. If anyone has a helpful mnemonic to remember which does which, let me know! I always have to try both to remember.

Jens , Jan 30, 2015 at 15:17

Plus 1 For knowing your POSIX shell features, avoiding expensive forks and pipes, and the absence of bashisms. – Jens Jan 30 '15 at 15:17

Steven Lu , May 1, 2015 at 20:19

Dunno about "absence of bashisms" considering that this is already moderately cryptic .... if your delimiter is a newline instead of a hyphen, then it becomes even more cryptic. On the other hand, it works with newlines , so there's that. – Steven Lu May 1 '15 at 20:19

mkb , Mar 9, 2016 at 17:30

@KErlandsson: done – mkb Mar 9 '16 at 17:30

mombip , Aug 9, 2016 at 15:58

I've finally found documentation for it: Shell-Parameter-Expansionmombip Aug 9 '16 at 15:58

DS. , Jan 13, 2017 at 19:56

Mnemonic: "#" is to the left of "%" on a standard keyboard, so "#" removes a prefix (on the left), and "%" removes a suffix (on the right). – DS. Jan 13 '17 at 19:56

tripleee , May 9, 2012 at 17:57

Sounds like a job for set with a custom IFS .
IFS=-
set $STR
var1=$1
var2=$2

(You will want to do this in a function with a local IFS so you don't mess up other parts of your script where you require IFS to be what you expect.)

Rob I , May 9, 2012 at 19:20

Nice - I knew about $IFS but hadn't seen how it could be used. – Rob I May 9 '12 at 19:20

Sigg3.net , Jun 19, 2013 at 8:08

I used triplee's example and it worked exactly as advertised! Just change last two lines to <pre> myvar1= echo $1 && myvar2= echo $2 </pre> if you need to store them throughout a script with several "thrown" variables. – Sigg3.net Jun 19 '13 at 8:08

tripleee , Jun 19, 2013 at 13:25

No, don't use a useless echo in backticks . – tripleee Jun 19 '13 at 13:25

Daniel Andersson , Mar 27, 2015 at 6:46

This is a really sweet solution if we need to write something that is not Bash specific. To handle IFS troubles, one can add OLDIFS=$IFS at the beginning before overwriting it, and then add IFS=$OLDIFS just after the set line. – Daniel Andersson Mar 27 '15 at 6:46

tripleee , Mar 27, 2015 at 6:58

FWIW the link above is broken. I was lazy and careless. The canonical location still works; iki.fi/era/unix/award.html#echotripleee Mar 27 '15 at 6:58

anubhava , May 9, 2012 at 17:09

Using bash regex capabilities:
re="^([^-]+)-(.*)$"
[[ "ABCDE-123456" =~ $re ]] && var1="${BASH_REMATCH[1]}" && var2="${BASH_REMATCH[2]}"
echo $var1
echo $var2

OUTPUT

ABCDE
123456

Cometsong , Oct 21, 2016 at 13:29

Love pre-defining the re for later use(s)! – Cometsong Oct 21 '16 at 13:29

Archibald , Nov 12, 2012 at 11:03

string="ABCDE-123456"
IFS=- # use "local IFS=-" inside the function
set $string
echo $1 # >>> ABCDE
echo $2 # >>> 123456

tripleee , Mar 27, 2015 at 7:02

Hmmm, isn't this just a restatement of my answer ? – tripleee Mar 27 '15 at 7:02

Archibald , Sep 18, 2015 at 12:36

Actually yes. I just clarified it a bit. – Archibald Sep 18 '15 at 12:36

[Nov 08, 2018] How to split a string in shell and get the last field

Nov 08, 2018 | stackoverflow.com

cd1 , Jul 1, 2010 at 23:29

Suppose I have the string 1:2:3:4:5 and I want to get its last field ( 5 in this case). How do I do that using Bash? I tried cut , but I don't know how to specify the last field with -f .

Stephen , Jul 2, 2010 at 0:05

You can use string operators :
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5

This trims everything from the front until a ':', greedily.

${foo  <-- from variable foo
  ##   <-- greedy front trim
  *    <-- matches anything
  :    <-- until the last ':'
 }

eckes , Jan 23, 2013 at 15:23

While this is working for the given problem, the answer of William below ( stackoverflow.com/a/3163857/520162 ) also returns 5 if the string is 1:2:3:4:5: (while using the string operators yields an empty result). This is especially handy when parsing paths that could contain (or not) a finishing / character. – eckes Jan 23 '13 at 15:23

Dobz , Jun 25, 2014 at 11:44

How would you then do the opposite of this? to echo out '1:2:3:4:'? – Dobz Jun 25 '14 at 11:44

Mihai Danila , Jul 9, 2014 at 14:07

And how does one keep the part before the last separator? Apparently by using ${foo%:*} . # - from beginning; % - from end. # , % - shortest match; ## , %% - longest match. – Mihai Danila Jul 9 '14 at 14:07

Putnik , Feb 11, 2016 at 22:33

If i want to get the last element from path, how should I use it? echo ${pwd##*/} does not work. – Putnik Feb 11 '16 at 22:33

Stan Strum , Dec 17, 2017 at 4:22

@Putnik that command sees pwd as a variable. Try dir=$(pwd); echo ${dir##*/} . Works for me! – Stan Strum Dec 17 '17 at 4:22

a3nm , Feb 3, 2012 at 8:39

Another way is to reverse before and after cut :
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef

This makes it very easy to get the last but one field, or any range of fields numbered from the end.

Dannid , Jan 14, 2013 at 20:50

This answer is nice because it uses 'cut', which the author is (presumably) already familiar. Plus, I like this answer because I am using 'cut' and had this exact question, hence finding this thread via search. – Dannid Jan 14 '13 at 20:50

funroll , Aug 12, 2013 at 19:51

Some cut-and-paste fodder for people using spaces as delimiters: echo "1 2 3 4" | rev | cut -d " " -f1 | revfunroll Aug 12 '13 at 19:51

EdgeCaseBerg , Sep 8, 2013 at 5:01

the rev | cut -d -f1 | rev is so clever! Thanks! Helped me a bunch (my use case was rev | -d ' ' -f 2- | rev – EdgeCaseBerg Sep 8 '13 at 5:01

Anarcho-Chossid , Sep 16, 2015 at 15:54

Wow. Beautiful and dark magic. – Anarcho-Chossid Sep 16 '15 at 15:54

shearn89 , Aug 17, 2017 at 9:27

I always forget about rev , was just what I needed! cut -b20- | rev | cut -b10- | revshearn89 Aug 17 '17 at 9:27

William Pursell , Jul 2, 2010 at 7:09

It's difficult to get the last field using cut, but here's (one set of) solutions in awk and perl
$ echo 1:2:3:4:5 | awk -F: '{print $NF}'
5
$ echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
5

eckes , Jan 23, 2013 at 15:20

great advantage of this solution over the accepted answer: it also matches paths that contain or do not contain a finishing / character: /a/b/c/d and /a/b/c/d/ yield the same result ( d ) when processing pwd | awk -F/ '{print $NF}' . The accepted answer results in an empty result in the case of /a/b/c/d/eckes Jan 23 '13 at 15:20

stamster , May 21 at 11:52

@eckes In case of AWK solution, on GNU bash, version 4.3.48(1)-release that's not true, as it matters whenever you have trailing slash or not. Simply put AWK will use / as delimiter, and if your path is /my/path/dir/ it will use value after last delimiter, which is simply an empty string. So it's best to avoid trailing slash if you need to do such a thing like I do. – stamster May 21 at 11:52

Nicholas M T Elliott , Jul 1, 2010 at 23:39

Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5

Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.

Dennis Williamson , Jul 2, 2010 at 0:05

One way:
var1="1:2:3:4:5"
var2=${var1##*:}

Another, using an array:

var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[@]: -1}

Yet another with an array:

var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[@]}
var2=${var2[$count-1]}

Using Bash (version >= 3.2) regular expressions:

var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}

liuyang1 , Mar 24, 2015 at 6:02

Thanks so much for array style, as I need this feature, but not have cut, awk these utils. – liuyang1 Mar 24 '15 at 6:02

user3133260 , Dec 24, 2013 at 19:04

$ echo "a b c d e" | tr ' ' '\n' | tail -1
e

Simply translate the delimiter into a newline and choose the last entry with tail -1 .

Yajo , Jul 30, 2014 at 10:13

It will fail if the last item contains a \n , but for most cases is the most readable solution. – Yajo Jul 30 '14 at 10:13

Rafael , Nov 10, 2016 at 10:09

Using sed :
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5

$ echo '' | sed 's/.*://' # => (empty)

$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c

$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c

Ab Irato , Nov 13, 2013 at 16:10

If your last field is a single character, you could do this:
a="1:2:3:4:5"

echo ${a: -1}
echo ${a:(-1)}

Check string manipulation in bash .

gniourf_gniourf , Nov 13, 2013 at 16:15

This doesn't work: it gives the last character of a , not the last field . – gniourf_gniourf Nov 13 '13 at 16:15

Ab Irato , Nov 25, 2013 at 13:25

True, that's the idea, if you know the length of the last field it's good. If not you have to use something else... – Ab Irato Nov 25 '13 at 13:25

sphakka , Jan 25, 2016 at 16:24

Interesting, I didn't know of these particular Bash string manipulations. It also resembles to Python's string/array slicing . – sphakka Jan 25 '16 at 16:24

ghostdog74 , Jul 2, 2010 at 1:16

Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo  \$${#}
0

Sopalajo de Arrierez , Dec 24, 2014 at 5:04

I would buy some details about this method, please :-) . – Sopalajo de Arrierez Dec 24 '14 at 5:04

Rafa , Apr 27, 2017 at 22:10

Could have used echo ${!#} instead of eval echo \$${#} . – Rafa Apr 27 '17 at 22:10

Crytis , Dec 7, 2016 at 6:51

echo "a:b:c:d:e"|xargs -d : -n1|tail -1

First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.

BDL , Dec 7, 2016 at 13:47

Although this might solve the problem, one should always add an explanation to it. – BDL Dec 7 '16 at 13:47

Crytis , Jun 7, 2017 at 9:13

already added.. – Crytis Jun 7 '17 at 9:13

021 , Apr 26, 2016 at 11:33

There are many good answers here, but still I want to share this one using basename :
 basename $(echo "a:b:c:d:e" | tr ':' '/')

However it will fail if there are already some '/' in your string . If slash / is your delimiter then you just have to (and should) use basename.

It's not the best answer but it just shows how you can be creative using bash commands.

Nahid Akbar , Jun 22, 2012 at 2:55

for x in `echo $str | tr ";" "\n"`; do echo $x; done

chepner , Jun 22, 2012 at 12:58

This runs into problems if there is whitespace in any of the fields. Also, it does not directly address the question of retrieving the last field. – chepner Jun 22 '12 at 12:58

Christoph Böddeker , Feb 19 at 15:50

For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'

From the pythonpy help: -x treat each row of stdin as x .

With that tool, it is easy to write python code that gets applied to the input.

baz , Nov 24, 2017 at 19:27

a solution using the read builtin
IFS=':' read -a field <<< "1:2:3:4:5"
echo ${field[4]}

[Nov 08, 2018] How do I split a string on a delimiter in Bash?

Notable quotes:
"... Bash shell script split array ..."
"... associative array ..."
"... pattern substitution ..."
"... Debian GNU/Linux ..."
Nov 08, 2018 | stackoverflow.com

stefanB , May 28, 2009 at 2:03

I have this string stored in a variable:
IN="[email protected];[email protected]"

Now I would like to split the strings by ; delimiter so that I have:

ADDR1="[email protected]"
ADDR2="[email protected]"

I don't necessarily need the ADDR1 and ADDR2 variables. If they are elements of an array that's even better.


After suggestions from the answers below, I ended up with the following which is what I was after:

#!/usr/bin/env bash

IN="[email protected];[email protected]"

mails=$(echo $IN | tr ";" "\n")

for addr in $mails
do
    echo "> [$addr]"
done

Output:

> [[email protected]]
> [[email protected]]

There was a solution involving setting Internal_field_separator (IFS) to ; . I am not sure what happened with that answer, how do you reset IFS back to default?

RE: IFS solution, I tried this and it works, I keep the old IFS and then restore it:

IN="[email protected];[email protected]"

OIFS=$IFS
IFS=';'
mails2=$IN
for x in $mails2
do
    echo "> [$x]"
done

IFS=$OIFS

BTW, when I tried

mails2=($IN)

I only got the first string when printing it in loop, without brackets around $IN it works.

Brooks Moses , May 1, 2012 at 1:26

With regards to your "Edit2": You can simply "unset IFS" and it will return to the default state. There's no need to save and restore it explicitly unless you have some reason to expect that it's already been set to a non-default value. Moreover, if you're doing this inside a function (and, if you aren't, why not?), you can set IFS as a local variable and it will return to its previous value once you exit the function. – Brooks Moses May 1 '12 at 1:26

dubiousjim , May 31, 2012 at 5:21

@BrooksMoses: (a) +1 for using local IFS=... where possible; (b) -1 for unset IFS , this doesn't exactly reset IFS to its default value, though I believe an unset IFS behaves the same as the default value of IFS ($' \t\n'), however it seems bad practice to be assuming blindly that your code will never be invoked with IFS set to a custom value; (c) another idea is to invoke a subshell: (IFS=$custom; ...) when the subshell exits IFS will return to whatever it was originally. – dubiousjim May 31 '12 at 5:21

nicooga , Mar 7, 2016 at 15:32

I just want to have a quick look at the paths to decide where to throw an executable, so I resorted to run ruby -e "puts ENV.fetch('PATH').split(':')" . If you want to stay pure bash won't help but using any scripting language that has a built-in split is easier. – nicooga Mar 7 '16 at 15:32

Jeff , Apr 22 at 17:51

This is kind of a drive-by comment, but since the OP used email addresses as the example, has anyone bothered to answer it in a way that is fully RFC 5322 compliant, namely that any quoted string can appear before the @ which means you're going to need regular expressions or some other kind of parser instead of naive use of IFS or other simplistic splitter functions. – Jeff Apr 22 at 17:51

user2037659 , Apr 26 at 20:15

for x in $(IFS=';';echo $IN); do echo "> [$x]"; doneuser2037659 Apr 26 at 20:15

Johannes Schaub - litb , May 28, 2009 at 2:23

You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFS only takes place to that single command's environment (to read ). It then parses the input according to the IFS variable value into an array, which we can then iterate over.
IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[@]}"; do
    # process "$i"
done

It will parse one line of items separated by ; , pushing it into an array. Stuff for processing whole of $IN , each time one line of input separated by ; :

 while IFS=';' read -ra ADDR; do
      for i in "${ADDR[@]}"; do
          # process "$i"
      done
 done <<< "$IN"

Chris Lutz , May 28, 2009 at 2:25

This is probably the best way. How long will IFS persist in it's current value, can it mess up my code by being set when it shouldn't be, and how can I reset it when I'm done with it? – Chris Lutz May 28 '09 at 2:25

Johannes Schaub - litb , May 28, 2009 at 3:04

now after the fix applied, only within the duration of the read command :) – Johannes Schaub - litb May 28 '09 at 3:04

lhunath , May 28, 2009 at 6:14

You can read everything at once without using a while loop: read -r -d '' -a addr <<< "$in" # The -d '' is key here, it tells read not to stop at the first newline (which is the default -d) but to continue until EOF or a NULL byte (which only occur in binary data). – lhunath May 28 '09 at 6:14

Charles Duffy , Jul 6, 2013 at 14:39

@LucaBorrione Setting IFS on the same line as the read with no semicolon or other separator, as opposed to in a separate command, scopes it to that command -- so it's always "restored"; you don't need to do anything manually. – Charles Duffy Jul 6 '13 at 14:39

chepner , Oct 2, 2014 at 3:50

@imagineerThis There is a bug involving herestrings and local changes to IFS that requires $IN to be quoted. The bug is fixed in bash 4.3. – chepner Oct 2 '14 at 3:50

palindrom , Mar 10, 2011 at 9:00

Taken from Bash shell script split array :
IN="[email protected];[email protected]"
arrIN=(${IN//;/ })

Explanation:

This construction replaces all occurrences of ';' (the initial // means global replace) in the string IN with ' ' (a single space), then interprets the space-delimited string as an array (that's what the surrounding parentheses do).

The syntax used inside of the curly braces to replace each ';' character with a ' ' character is called Parameter Expansion .

There are some common gotchas:

  1. If the original string has spaces, you will need to use IFS :
    • IFS=':'; arrIN=($IN); unset IFS;
  2. If the original string has spaces and the delimiter is a new line, you can set IFS with:
    • IFS=$'\n'; arrIN=($IN); unset IFS;

Oz123 , Mar 21, 2011 at 18:50

I just want to add: this is the simplest of all, you can access array elements with ${arrIN[1]} (starting from zeros of course) – Oz123 Mar 21 '11 at 18:50

KomodoDave , Jan 5, 2012 at 15:13

Found it: the technique of modifying a variable within a ${} is known as 'parameter expansion'. – KomodoDave Jan 5 '12 at 15:13

qbolec , Feb 25, 2013 at 9:12

Does it work when the original string contains spaces? – qbolec Feb 25 '13 at 9:12

Ethan , Apr 12, 2013 at 22:47

No, I don't think this works when there are also spaces present... it's converting the ',' to ' ' and then building a space-separated array. – Ethan Apr 12 '13 at 22:47

Charles Duffy , Jul 6, 2013 at 14:39

This is a bad approach for other reasons: For instance, if your string contains ;*; , then the * will be expanded to a list of filenames in the current directory. -1 – Charles Duffy Jul 6 '13 at 14:39

Chris Lutz , May 28, 2009 at 2:09

If you don't mind processing them immediately, I like to do this:
for i in $(echo $IN | tr ";" "\n")
do
  # process
done

You could use this kind of loop to initialize an array, but there's probably an easier way to do it. Hope this helps, though.

Chris Lutz , May 28, 2009 at 2:42

You should have kept the IFS answer. It taught me something I didn't know, and it definitely made an array, whereas this just makes a cheap substitute. – Chris Lutz May 28 '09 at 2:42

Johannes Schaub - litb , May 28, 2009 at 2:59

I see. Yeah i find doing these silly experiments, i'm going to learn new things each time i'm trying to answer things. I've edited stuff based on #bash IRC feedback and undeleted :) – Johannes Schaub - litb May 28 '09 at 2:59

lhunath , May 28, 2009 at 6:12

-1, you're obviously not aware of wordsplitting, because it's introducing two bugs in your code. one is when you don't quote $IN and the other is when you pretend a newline is the only delimiter used in wordsplitting. You are iterating over every WORD in IN, not every line, and DEFINATELY not every element delimited by a semicolon, though it may appear to have the side-effect of looking like it works. – lhunath May 28 '09 at 6:12

Johannes Schaub - litb , May 28, 2009 at 17:00

You could change it to echo "$IN" | tr ';' '\n' | while read -r ADDY; do # process "$ADDY"; done to make him lucky, i think :) Note that this will fork, and you can't change outer variables from within the loop (that's why i used the <<< "$IN" syntax) then – Johannes Schaub - litb May 28 '09 at 17:00

mklement0 , Apr 24, 2013 at 14:13

To summarize the debate in the comments: Caveats for general use : the shell applies word splitting and expansions to the string, which may be undesired; just try it with. IN="[email protected];[email protected];*;broken apart" . In short: this approach will break, if your tokens contain embedded spaces and/or chars. such as * that happen to make a token match filenames in the current folder. – mklement0 Apr 24 '13 at 14:13

F. Hauri , Apr 13, 2013 at 14:20

Compatible answer

To this SO question, there is already a lot of different way to do this in bash . But bash has many special features, so called bashism that work well, but that won't work in any other shell .

In particular, arrays , associative array , and pattern substitution are pure bashisms and may not work under other shells .

On my Debian GNU/Linux , there is a standard shell called dash , but I know many people who like to use ksh .

Finally, in very small situation, there is a special tool called busybox with his own shell interpreter ( ash ).

Requested string

The string sample in SO question is:

IN="[email protected];[email protected]"

As this could be useful with whitespaces and as whitespaces could modify the result of the routine, I prefer to use this sample string:

 IN="[email protected];[email protected];Full Name <[email protected]>"
Split string based on delimiter in bash (version >=4.2)

Under pure bash, we may use arrays and IFS :

var="[email protected];[email protected];Full Name <[email protected]>"
oIFS="$IFS"
IFS=";"
declare -a fields=($var)
IFS="$oIFS"
unset oIFS

IFS=\; read -a fields <<<"$var"

Using this syntax under recent bash don't change $IFS for current session, but only for the current command:

set | grep ^IFS=
IFS=$' \t\n'

Now the string var is split and stored into an array (named fields ):

set | grep ^fields=\\\|^var=
fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>")
var='[email protected];[email protected];Full Name <[email protected]>'

We could request for variable content with declare -p :

declare -p var fields
declare -- var="[email protected];[email protected];Full Name <[email protected]>"
declare -a fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>")

read is the quickiest way to do the split, because there is no forks and no external resources called.

From there, you could use the syntax you already know for processing each field:

for x in "${fields[@]}";do
    echo "> [$x]"
    done
> [[email protected]]
> [[email protected]]
> [Full Name <[email protected]>]

or drop each field after processing (I like this shifting approach):

while [ "$fields" ] ;do
    echo "> [$fields]"
    fields=("${fields[@]:1}")
    done
> [[email protected]]
> [[email protected]]
> [Full Name <[email protected]>]

or even for simple printout (shorter syntax):

printf "> [%s]\n" "${fields[@]}"
> [[email protected]]
> [[email protected]]
> [Full Name <[email protected]>]
Split string based on delimiter in shell

But if you would write something usable under many shells, you have to not use bashisms .

There is a syntax, used in many shells, for splitting a string across first or last occurrence of a substring:

${var#*SubStr}  # will drop begin of string up to first occur of `SubStr`
${var##*SubStr} # will drop begin of string up to last occur of `SubStr`
${var%SubStr*}  # will drop part of string from last occur of `SubStr` to the end
${var%%SubStr*} # will drop part of string from first occur of `SubStr` to the end

(The missing of this is the main reason of my answer publication ;)

As pointed out by Score_Under :

# and % delete the shortest possible matching string, and

## and %% delete the longest possible.

This little sample script work well under bash , dash , ksh , busybox and was tested under Mac-OS's bash too:

var="[email protected];[email protected];Full Name <[email protected]>"
while [ "$var" ] ;do
    iter=${var%%;*}
    echo "> [$iter]"
    [ "$var" = "$iter" ] && \
        var='' || \
        var="${var#*;}"
  done
> [[email protected]]
> [[email protected]]
> [Full Name <[email protected]>]

Have fun!

Score_Under , Apr 28, 2015 at 16:58

The # , ## , % , and %% substitutions have what is IMO an easier explanation to remember (for how much they delete): # and % delete the shortest possible matching string, and ## and %% delete the longest possible. – Score_Under Apr 28 '15 at 16:58

sorontar , Oct 26, 2016 at 4:36

The IFS=\; read -a fields <<<"$var" fails on newlines and add a trailing newline. The other solution removes a trailing empty field. – sorontar Oct 26 '16 at 4:36

Eric Chen , Aug 30, 2017 at 17:50

The shell delimiter is the most elegant answer, period. – Eric Chen Aug 30 '17 at 17:50

sancho.s , Oct 4 at 3:42

Could the last alternative be used with a list of field separators set somewhere else? For instance, I mean to use this as a shell script, and pass a list of field separators as a positional parameter. – sancho.s Oct 4 at 3:42

F. Hauri , Oct 4 at 7:47

Yes, in a loop: for sep in "#" "ł" "@" ; do ... var="${var#*$sep}" ...F. Hauri Oct 4 at 7:47

DougW , Apr 27, 2015 at 18:20

I've seen a couple of answers referencing the cut command, but they've all been deleted. It's a little odd that nobody has elaborated on that, because I think it's one of the more useful commands for doing this type of thing, especially for parsing delimited log files.

In the case of splitting this specific example into a bash script array, tr is probably more efficient, but cut can be used, and is more effective if you want to pull specific fields from the middle.

Example:

$ echo "[email protected];[email protected]" | cut -d ";" -f 1
[email protected]
$ echo "[email protected];[email protected]" | cut -d ";" -f 2
[email protected]

You can obviously put that into a loop, and iterate the -f parameter to pull each field independently.

This gets more useful when you have a delimited log file with rows like this:

2015-04-27|12345|some action|an attribute|meta data

cut is very handy to be able to cat this file and select a particular field for further processing.

MisterMiyagi , Nov 2, 2016 at 8:42

Kudos for using cut , it's the right tool for the job! Much cleared than any of those shell hacks. – MisterMiyagi Nov 2 '16 at 8:42

uli42 , Sep 14, 2017 at 8:30

This approach will only work if you know the number of elements in advance; you'd need to program some more logic around it. It also runs an external tool for every element. – uli42 Sep 14 '17 at 8:30

Louis Loudog Trottier , May 10 at 4:20

Excatly waht i was looking for trying to avoid empty string in a csv. Now i can point the exact 'column' value as well. Work with IFS already used in a loop. Better than expected for my situation. – Louis Loudog Trottier May 10 at 4:20

, May 28, 2009 at 10:31

How about this approach:
IN="[email protected];[email protected]" 
set -- "$IN" 
IFS=";"; declare -a Array=($*) 
echo "${Array[@]}" 
echo "${Array[0]}" 
echo "${Array[1]}"

Source

Yzmir Ramirez , Sep 5, 2011 at 1:06

+1 ... but I wouldn't name the variable "Array" ... pet peev I guess. Good solution. – Yzmir Ramirez Sep 5 '11 at 1:06

ata , Nov 3, 2011 at 22:33

+1 ... but the "set" and declare -a are unnecessary. You could as well have used just IFS";" && Array=($IN)ata Nov 3 '11 at 22:33

Luca Borrione , Sep 3, 2012 at 9:26

+1 Only a side note: shouldn't it be recommendable to keep the old IFS and then restore it? (as shown by stefanB in his edit3) people landing here (sometimes just copying and pasting a solution) might not think about this – Luca Borrione Sep 3 '12 at 9:26

Charles Duffy , Jul 6, 2013 at 14:44

-1: First, @ata is right that most of the commands in this do nothing. Second, it uses word-splitting to form the array, and doesn't do anything to inhibit glob-expansion when doing so (so if you have glob characters in any of the array elements, those elements are replaced with matching filenames). – Charles Duffy Jul 6 '13 at 14:44

John_West , Jan 8, 2016 at 12:29

Suggest to use $'...' : IN=$'[email protected];[email protected];bet <d@\ns* kl.com>' . Then echo "${Array[2]}" will print a string with newline. set -- "$IN" is also neccessary in this case. Yes, to prevent glob expansion, the solution should include set -f . – John_West Jan 8 '16 at 12:29

Steven Lizarazo , Aug 11, 2016 at 20:45

This worked for me:
string="1;2"
echo $string | cut -d';' -f1 # output is 1
echo $string | cut -d';' -f2 # output is 2

Pardeep Sharma , Oct 10, 2017 at 7:29

this is sort and sweet :) – Pardeep Sharma Oct 10 '17 at 7:29

space earth , Oct 17, 2017 at 7:23

Thanks...Helped a lot – space earth Oct 17 '17 at 7:23

mojjj , Jan 8 at 8:57

cut works only with a single char as delimiter. – mojjj Jan 8 at 8:57

lothar , May 28, 2009 at 2:12

echo "[email protected];[email protected]" | sed -e 's/;/\n/g'
[email protected]
[email protected]

Luca Borrione , Sep 3, 2012 at 10:08

-1 what if the string contains spaces? for example IN="this is first line; this is second line" arrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) ) will produce an array of 8 elements in this case (an element for each word space separated), rather than 2 (an element for each line semi colon separated) – Luca Borrione Sep 3 '12 at 10:08

lothar , Sep 3, 2012 at 17:33

@Luca No the sed script creates exactly two lines. What creates the multiple entries for you is when you put it into a bash array (which splits on white space by default) – lothar Sep 3 '12 at 17:33

Luca Borrione , Sep 4, 2012 at 7:09

That's exactly the point: the OP needs to store entries into an array to loop over it, as you can see in his edits. I think your (good) answer missed to mention to use arrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) ) to achieve that, and to advice to change IFS to IFS=$'\n' for those who land here in the future and needs to split a string containing spaces. (and to restore it back afterwards). :) – Luca Borrione Sep 4 '12 at 7:09

lothar , Sep 4, 2012 at 16:55

@Luca Good point. However the array assignment was not in the initial question when I wrote up that answer. – lothar Sep 4 '12 at 16:55

Ashok , Sep 8, 2012 at 5:01

This also works:
IN="[email protected];[email protected]"
echo ADD1=`echo $IN | cut -d \; -f 1`
echo ADD2=`echo $IN | cut -d \; -f 2`

Be careful, this solution is not always correct. In case you pass "[email protected]" only, it will assign it to both ADD1 and ADD2.

fersarr , Mar 3, 2016 at 17:17

You can use -s to avoid the mentioned problem: superuser.com/questions/896800/ "-f, --fields=LIST select only these fields; also print any line that contains no delimiter character, unless the -s option is specified" – fersarr Mar 3 '16 at 17:17

Tony , Jan 14, 2013 at 6:33

I think AWK is the best and efficient command to resolve your problem. AWK is included in Bash by default in almost every Linux distribution.
echo "[email protected];[email protected]" | awk -F';' '{print $1,$2}'

will give

[email protected] [email protected]

Of course your can store each email address by redefining the awk print field.

Jaro , Jan 7, 2014 at 21:30

Or even simpler: echo "[email protected];[email protected]" | awk 'BEGIN{RS=";"} {print}' – Jaro Jan 7 '14 at 21:30

Aquarelle , May 6, 2014 at 21:58

@Jaro This worked perfectly for me when I had a string with commas and needed to reformat it into lines. Thanks. – Aquarelle May 6 '14 at 21:58

Eduardo Lucio , Aug 5, 2015 at 12:59

It worked in this scenario -> "echo "$SPLIT_0" | awk -F' inode=' '{print $1}'"! I had problems when trying to use atrings (" inode=") instead of characters (";"). $ 1, $ 2, $ 3, $ 4 are set as positions in an array! If there is a way of setting an array... better! Thanks! – Eduardo Lucio Aug 5 '15 at 12:59

Tony , Aug 6, 2015 at 2:42

@EduardoLucio, what I'm thinking about is maybe you can first replace your delimiter inode= into ; for example by sed -i 's/inode\=/\;/g' your_file_to_process , then define -F';' when apply awk , hope that can help you. – Tony Aug 6 '15 at 2:42

nickjb , Jul 5, 2011 at 13:41

A different take on Darron's answer , this is how I do it:
IN="[email protected];[email protected]"
read ADDR1 ADDR2 <<<$(IFS=";"; echo $IN)

ColinM , Sep 10, 2011 at 0:31

This doesn't work. – ColinM Sep 10 '11 at 0:31

nickjb , Oct 6, 2011 at 15:33

I think it does! Run the commands above and then "echo $ADDR1 ... $ADDR2" and i get "[email protected] ... [email protected]" output – nickjb Oct 6 '11 at 15:33

Nick , Oct 28, 2011 at 14:36

This worked REALLY well for me... I used it to itterate over an array of strings which contained comma separated DB,SERVER,PORT data to use mysqldump. – Nick Oct 28 '11 at 14:36

dubiousjim , May 31, 2012 at 5:28

Diagnosis: the IFS=";" assignment exists only in the $(...; echo $IN) subshell; this is why some readers (including me) initially think it won't work. I assumed that all of $IN was getting slurped up by ADDR1. But nickjb is correct; it does work. The reason is that echo $IN command parses its arguments using the current value of $IFS, but then echoes them to stdout using a space delimiter, regardless of the setting of $IFS. So the net effect is as though one had called read ADDR1 ADDR2 <<< "[email protected] [email protected]" (note the input is space-separated not ;-separated). – dubiousjim May 31 '12 at 5:28

sorontar , Oct 26, 2016 at 4:43

This fails on spaces and newlines, and also expand wildcards * in the echo $IN with an unquoted variable expansion. – sorontar Oct 26 '16 at 4:43

gniourf_gniourf , Jun 26, 2014 at 9:11

In Bash, a bullet proof way, that will work even if your variable contains newlines:
IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")

Look:

$ in=$'one;two three;*;there is\na newline\nin this field'
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two three" [2]="*" [3]="there is
a newline
in this field")'

The trick for this to work is to use the -d option of read (delimiter) with an empty delimiter, so that read is forced to read everything it's fed. And we feed read with exactly the content of the variable in , with no trailing newline thanks to printf . Note that's we're also putting the delimiter in printf to ensure that the string passed to read has a trailing delimiter. Without it, read would trim potential trailing empty fields:

$ in='one;two;three;'    # there's an empty field
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two" [2]="three" [3]="")'

the trailing empty field is preserved.


Update for Bash≥4.4

Since Bash 4.4, the builtin mapfile (aka readarray ) supports the -d option to specify a delimiter. Hence another canonical way is:

mapfile -d ';' -t array < <(printf '%s;' "$in")

John_West , Jan 8, 2016 at 12:10

I found it as the rare solution on that list that works correctly with \n , spaces and * simultaneously. Also, no loops; array variable is accessible in the shell after execution (contrary to the highest upvoted answer). Note, in=$'...' , it does not work with double quotes. I think, it needs more upvotes. – John_West Jan 8 '16 at 12:10

Darron , Sep 13, 2010 at 20:10

How about this one liner, if you're not using arrays:
IFS=';' read ADDR1 ADDR2 <<<$IN

dubiousjim , May 31, 2012 at 5:36

Consider using read -r ... to ensure that, for example, the two characters "\t" in the input end up as the same two characters in your variables (instead of a single tab char). – dubiousjim May 31 '12 at 5:36

Luca Borrione , Sep 3, 2012 at 10:07

-1 This is not working here (ubuntu 12.04). Adding echo "ADDR1 $ADDR1"\n echo "ADDR2 $ADDR2" to your snippet will output ADDR1 [email protected] [email protected]\nADDR2 (\n is newline) – Luca Borrione Sep 3 '12 at 10:07

chepner , Sep 19, 2015 at 13:59

This is probably due to a bug involving IFS and here strings that was fixed in bash 4.3. Quoting $IN should fix it. (In theory, $IN is not subject to word splitting or globbing after it expands, meaning the quotes should be unnecessary. Even in 4.3, though, there's at least one bug remaining--reported and scheduled to be fixed--so quoting remains a good idea.) – chepner Sep 19 '15 at 13:59

sorontar , Oct 26, 2016 at 4:55

This breaks if $in contain newlines even if $IN is quoted. And adds a trailing newline. – sorontar Oct 26 '16 at 4:55

kenorb , Sep 11, 2015 at 20:54

Here is a clean 3-liner:
in="foo@bar;bizz@buzz;fizz@buzz;buzz@woof"
IFS=';' list=($in)
for item in "${list[@]}"; do echo $item; done

where IFS delimit words based on the separator and () is used to create an array . Then [@] is used to return each item as a separate word.

If you've any code after that, you also need to restore $IFS , e.g. unset IFS .

sorontar , Oct 26, 2016 at 5:03

The use of $in unquoted allows wildcards to be expanded. – sorontar Oct 26 '16 at 5:03

user2720864 , Sep 24 at 13:46

+ for the unset command – user2720864 Sep 24 at 13:46

Emilien Brigand , Aug 1, 2016 at 13:15

Without setting the IFS

If you just have one colon you can do that:

a="foo:bar"
b=${a%:*}
c=${a##*:}

you will get:

b = foo
c = bar

Victor Choy , Sep 16, 2015 at 3:34

There is a simple and smart way like this:
echo "add:sfff" | xargs -d: -i  echo {}

But you must use gnu xargs, BSD xargs cant support -d delim. If you use apple mac like me. You can install gnu xargs :

brew install findutils

then

echo "add:sfff" | gxargs -d: -i  echo {}

Halle Knast , May 24, 2017 at 8:42

The following Bash/zsh function splits its first argument on the delimiter given by the second argument:
split() {
    local string="$1"
    local delimiter="$2"
    if [ -n "$string" ]; then
        local part
        while read -d "$delimiter" part; do
            echo $part
        done <<< "$string"
        echo $part
    fi
}

For instance, the command

$ split 'a;b;c' ';'

yields

a
b
c

This output may, for instance, be piped to other commands. Example:

$ split 'a;b;c' ';' | cat -n
1   a
2   b
3   c

Compared to the other solutions given, this one has the following advantages:

If desired, the function may be put into a script as follows:

#!/usr/bin/env bash

split() {
    # ...
}

split "$@"

sandeepkunkunuru , Oct 23, 2017 at 16:10

works and neatly modularized. – sandeepkunkunuru Oct 23 '17 at 16:10

Prospero , Sep 25, 2011 at 1:09

This is the simplest way to do it.
spo='one;two;three'
OIFS=$IFS
IFS=';'
spo_array=($spo)
IFS=$OIFS
echo ${spo_array[*]}

rashok , Oct 25, 2016 at 12:41

IN="[email protected];[email protected]"
IFS=';'
read -a IN_arr <<< "${IN}"
for entry in "${IN_arr[@]}"
do
    echo $entry
done

Output

[email protected]
[email protected]

System : Ubuntu 12.04.1

codeforester , Jan 2, 2017 at 5:37

IFS is not getting set in the specific context of read here and hence it can upset rest of the code, if any. – codeforester Jan 2 '17 at 5:37

shuaihanhungry , Jan 20 at 15:54

you can apply awk to many situations
echo "[email protected];[email protected]"|awk -F';' '{printf "%s\n%s\n", $1, $2}'

also you can use this

echo "[email protected];[email protected]"|awk -F';' '{print $1,$2}' OFS="\n"

ghost , Apr 24, 2013 at 13:13

If no space, Why not this?
IN="[email protected];[email protected]"
arr=(`echo $IN | tr ';' ' '`)

echo ${arr[0]}
echo ${arr[1]}

eukras , Oct 22, 2012 at 7:10

There are some cool answers here (errator esp.), but for something analogous to split in other languages -- which is what I took the original question to mean -- I settled on this:
IN="[email protected];[email protected]"
declare -a a="(${IN/;/ })";

Now ${a[0]} , ${a[1]} , etc, are as you would expect. Use ${#a[*]} for number of terms. Or to iterate, of course:

for i in ${a[*]}; do echo $i; done

IMPORTANT NOTE:

This works in cases where there are no spaces to worry about, which solved my problem, but may not solve yours. Go with the $IFS solution(s) in that case.

olibre , Oct 7, 2013 at 13:33

Does not work when IN contains more than two e-mail addresses. Please refer to same idea (but fixed) at palindrom's answerolibre Oct 7 '13 at 13:33

sorontar , Oct 26, 2016 at 5:14

Better use ${IN//;/ } (double slash) to make it also work with more than two values. Beware that any wildcard ( *?[ ) will be expanded. And a trailing empty field will be discarded. – sorontar Oct 26 '16 at 5:14

jeberle , Apr 30, 2013 at 3:10

Use the set built-in to load up the $@ array:
IN="[email protected];[email protected]"
IFS=';'; set $IN; IFS=$' \t\n'

Then, let the party begin:

echo $#
for a; do echo $a; done
ADDR1=$1 ADDR2=$2

sorontar , Oct 26, 2016 at 5:17

Better use set -- $IN to avoid some issues with "$IN" starting with dash. Still, the unquoted expansion of $IN will expand wildcards ( *?[ ). – sorontar Oct 26 '16 at 5:17

NevilleDNZ , Sep 2, 2013 at 6:30

Two bourne-ish alternatives where neither require bash arrays:

Case 1 : Keep it nice and simple: Use a NewLine as the Record-Separator... eg.

IN="[email protected]
[email protected]"

while read i; do
  # process "$i" ... eg.
    echo "[email:$i]"
done <<< "$IN"

Note: in this first case no sub-process is forked to assist with list manipulation.

Idea: Maybe it is worth using NL extensively internally , and only converting to a different RS when generating the final result externally .

Case 2 : Using a ";" as a record separator... eg.

NL="
" IRS=";" ORS=";"

conv_IRS() {
  exec tr "$1" "$NL"
}

conv_ORS() {
  exec tr "$NL" "$1"
}

IN="[email protected];[email protected]"
IN="$(conv_IRS ";" <<< "$IN")"

while read i; do
  # process "$i" ... eg.
    echo -n "[email:$i]$ORS"
done <<< "$IN"

In both cases a sub-list can be composed within the loop is persistent after the loop has completed. This is useful when manipulating lists in memory, instead storing lists in files. {p.s. keep calm and carry on B-) }

fedorqui , Jan 8, 2015 at 10:21

Apart from the fantastic answers that were already provided, if it is just a matter of printing out the data you may consider using awk :
awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"

This sets the field separator to ; , so that it can loop through the fields with a for loop and print accordingly.

Test
$ IN="[email protected];[email protected]"
$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "$IN"
> [[email protected]]
> [[email protected]]

With another input:

$ awk -F";" '{for (i=1;i<=NF;i++) printf("> [%s]\n", $i)}' <<< "a;b;c   d;e_;f"
> [a]
> [b]
> [c   d]
> [e_]
> [f]

18446744073709551615 , Feb 20, 2015 at 10:49

In Android shell, most of the proposed methods just do not work:
$ IFS=':' read -ra ADDR <<<"$PATH"                             
/system/bin/sh: can't create temporary file /sqlite_stmt_journals/mksh.EbNoR10629: No such file or directory

What does work is:

$ for i in ${PATH//:/ }; do echo $i; done
/sbin
/vendor/bin
/system/sbin
/system/bin
/system/xbin

where // means global replacement.

sorontar , Oct 26, 2016 at 5:08

Fails if any part of $PATH contains spaces (or newlines). Also expands wildcards (asterisk *, question mark ? and braces [ ]). – sorontar Oct 26 '16 at 5:08

Eduardo Lucio , Apr 4, 2016 at 19:54

Okay guys!

Here's my answer!

DELIMITER_VAL='='

read -d '' F_ABOUT_DISTRO_R <<"EOF"
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"
NAME="Ubuntu"
VERSION="14.04.4 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.4 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
EOF

SPLIT_NOW=$(awk -F$DELIMITER_VAL '{for(i=1;i<=NF;i++){printf "%s\n", $i}}' <<<"${F_ABOUT_DISTRO_R}")
while read -r line; do
   SPLIT+=("$line")
done <<< "$SPLIT_NOW"
for i in "${SPLIT[@]}"; do
    echo "$i"
done

Why this approach is "the best" for me?

Because of two reasons:

  1. You do not need to escape the delimiter;
  2. You will not have problem with blank spaces . The value will be properly separated in the array!

[]'s

gniourf_gniourf , Jan 30, 2017 at 8:26

FYI, /etc/os-release and /etc/lsb-release are meant to be sourced, and not parsed. So your method is really wrong. Moreover, you're not quite answering the question about spiltting a string on a delimiter.gniourf_gniourf Jan 30 '17 at 8:26

Michael Hale , Jun 14, 2012 at 17:38

A one-liner to split a string separated by ';' into an array is:
IN="[email protected];[email protected]"
ADDRS=( $(IFS=";" echo "$IN") )
echo ${ADDRS[0]}
echo ${ADDRS[1]}

This only sets IFS in a subshell, so you don't have to worry about saving and restoring its value.

Luca Borrione , Sep 3, 2012 at 10:04

-1 this doesn't work here (ubuntu 12.04). it prints only the first echo with all $IN value in it, while the second is empty. you can see it if you put echo "0: "${ADDRS[0]}\n echo "1: "${ADDRS[1]} the output is 0: [email protected];[email protected]\n 1: (\n is new line) – Luca Borrione Sep 3 '12 at 10:04

Luca Borrione , Sep 3, 2012 at 10:05

please refer to nickjb's answer at for a working alternative to this idea stackoverflow.com/a/6583589/1032370 – Luca Borrione Sep 3 '12 at 10:05

Score_Under , Apr 28, 2015 at 17:09

-1, 1. IFS isn't being set in that subshell (it's being passed to the environment of "echo", which is a builtin, so nothing is happening anyway). 2. $IN is quoted so it isn't subject to IFS splitting. 3. The process substitution is split by whitespace, but this may corrupt the original data. – Score_Under Apr 28 '15 at 17:09

ajaaskel , Oct 10, 2014 at 11:33

IN='[email protected];[email protected];Charlie Brown <[email protected];!"#$%&/()[]{}*? are no problem;simple is beautiful :-)'
set -f
oldifs="$IFS"
IFS=';'; arrayIN=($IN)
IFS="$oldifs"
for i in "${arrayIN[@]}"; do
echo "$i"
done
set +f

Output:

[email protected]
[email protected]
Charlie Brown <[email protected]
!"#$%&/()[]{}*? are no problem
simple is beautiful :-)

Explanation: Simple assignment using parenthesis () converts semicolon separated list into an array provided you have correct IFS while doing that. Standard FOR loop handles individual items in that array as usual. Notice that the list given for IN variable must be "hard" quoted, that is, with single ticks.

IFS must be saved and restored since Bash does not treat an assignment the same way as a command. An alternate workaround is to wrap the assignment inside a function and call that function with a modified IFS. In that case separate saving/restoring of IFS is not needed. Thanks for "Bize" for pointing that out.

gniourf_gniourf , Feb 20, 2015 at 16:45

!"#$%&/()[]{}*? are no problem well... not quite: []*? are glob characters. So what about creating this directory and file: `mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no problem' and running your command? simple may be beautiful, but when it's broken, it's broken. – gniourf_gniourf Feb 20 '15 at 16:45

ajaaskel , Feb 25, 2015 at 7:20

@gniourf_gniourf The string is stored in a variable. Please see the original question. – ajaaskel Feb 25 '15 at 7:20

gniourf_gniourf , Feb 25, 2015 at 7:26

@ajaaskel you didn't fully understand my comment. Go in a scratch directory and issue these commands: mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no problem' . They will only create a directory and a file, with weird looking names, I must admit. Then run your commands with the exact IN you gave: IN='[email protected];[email protected];Charlie Brown <[email protected];!"#$%&/()[]{}*? are no problem;simple is beautiful :-)' . You'll see that you won't get the output you expect. Because you're using a method subject to pathname expansions to split your string. – gniourf_gniourf Feb 25 '15 at 7:26

gniourf_gniourf , Feb 25, 2015 at 7:29

This is to demonstrate that the characters * , ? , [...] and even, if extglob is set, !(...) , @(...) , ?(...) , +(...) are problems with this method! – gniourf_gniourf Feb 25 '15 at 7:29

ajaaskel , Feb 26, 2015 at 15:26

@gniourf_gniourf Thanks for detailed comments on globbing. I adjusted the code to have globbing off. My point was however just to show that rather simple assignment can do the splitting job. – ajaaskel Feb 26 '15 at 15:26

> , Dec 19, 2013 at 21:39

Maybe not the most elegant solution, but works with * and spaces:
IN="bla@so me.com;*;[email protected]"
for i in `delims=${IN//[^;]}; seq 1 $((${#delims} + 1))`
do
   echo "> [`echo $IN | cut -d';' -f$i`]"
done

Outputs

> [bla@so me.com]
> [*]
> [[email protected]]

Other example (delimiters at beginning and end):

IN=";bla@so me.com;*;[email protected];"
> []
> [bla@so me.com]
> [*]
> [[email protected]]
> []

Basically it removes every character other than ; making delims eg. ;;; . Then it does for loop from 1 to number-of-delimiters as counted by ${#delims} . The final step is to safely get the $i th part using cut .

[Oct 20, 2017] Simple logical operators in Bash - Stack Overflow

Notable quotes:
"... Backquotes ( ` ` ) are old-style form of command substitution, with some differences: in this form, backslash retains its literal meaning except when followed by $ , ` , or \ , and the first backquote not preceded by a backslash terminates the command substitution; whereas in the $( ) form, all characters between the parentheses make up the command, none are treated specially. ..."
"... Double square brackets delimit a Conditional Expression. And, I find the following to be a good reading on the subject: "(IBM) Demystify test, [, [[, ((, and if-then-else" ..."
Oct 20, 2017 | stackoverflow.com

Amit , Jun 7, 2011 at 19:18

I have a couple of variables and I want to check the following condition (written out in words, then my failed attempt at bash scripting):
if varA EQUALS 1 AND ( varB EQUALS "t1" OR varB EQUALS "t2" ) then 

do something

done.

And in my failed attempt, I came up with:

if (($varA == 1)) && ( (($varB == "t1")) || (($varC == "t2")) ); 
  then
    scale=0.05
  fi

Best answer Gilles

What you've written actually almost works (it would work if all the variables were numbers), but it's not an idiomatic way at all.

This is the idiomatic way to write your test in bash:

if [[ $varA = 1 && ($varB = "t1" || $varC = "t2") ]]; then

If you need portability to other shells, this would be the way (note the additional quoting and the separate sets of brackets around each individual test):

if [ "$varA" = 1 ] && { [ "$varB" = "t1" ] || [ "$varC" = "t2" ]; }; then

Will Sheppard , Jun 19, 2014 at 11:07

It's better to use == to differentiate the comparison from assigning a variable (which is also = ) – Will Sheppard Jun 19 '14 at 11:07

Cbhihe , Apr 3, 2016 at 8:05

+1 @WillSheppard for yr reminder of proper style. Gilles, don't you need a semicolon after yr closing curly bracket and before "then" ? I always thought if , then , else and fi could not be on the same line... As in:

if [ "$varA" = 1 ] && { [ "$varB" = "t1" ] || [ "$varC" = "t2" ]; }; then

Cbhihe Apr 3 '16 at 8:05

Rockallite , Jan 19 at 2:41

Backquotes ( ` ` ) are old-style form of command substitution, with some differences: in this form, backslash retains its literal meaning except when followed by $ , ` , or \ , and the first backquote not preceded by a backslash terminates the command substitution; whereas in the $( ) form, all characters between the parentheses make up the command, none are treated specially.

Rockallite Jan 19 at 2:41

Peter A. Schneider , Aug 28 at 13:16

You could emphasize that single brackets have completely different semantics inside and outside of double brackets. (Because you start with explicitly pointing out the subshell semantics but then only as an aside mention the grouping semantics as part of conditional expressions. Was confusing to me for a second when I looked at your idiomatic example.) – Peter A. Schneider Aug 28 at 13:16

matchew , Jun 7, 2011 at 19:29

very close
if (( $varA == 1 )) && [[ $varB == 't1' || $varC == 't2' ]]; 
  then 
    scale=0.05
  fi

should work.

breaking it down

(( $varA == 1 ))

is an integer comparison where as

$varB == 't1'

is a string comparison. otherwise, I am just grouping the comparisons correctly.

Double square brackets delimit a Conditional Expression. And, I find the following to be a good reading on the subject: "(IBM) Demystify test, [, [[, ((, and if-then-else"

Peter A. Schneider , Aug 28 at 13:21

Just to be sure: The quoting in 't1' is unnecessary, right? Because as opposed to arithmetic instructions in double parentheses, where t1 would be a variable, t1 in a conditional expression in double brackets is just a literal string.

I.e., [[ $varB == 't1' ]] is exactly the same as [[ $varB == t1 ]] , right? – Peter A. Schneider Aug 28 at 13:21

[Oct 20, 2017] shell script - OR in `expr match`

Notable quotes:
"... ...and if you weren't targeting a known/fixed operating system, using case rather than a regex match is very much the better practice, since the accepted answer depends on behavior POSIX doesn't define. ..."
"... Regular expression syntax, including the use of backquoting, is different for different tools. Always look it up. ..."
Oct 20, 2017 | unix.stackexchange.com

OR in `expr match` up vote down vote favorite

stracktracer , Dec 14, 2015 at 13:54

I'm confused as to why this does not match:

expr match Unauthenticated123 '^(Unauthenticated|Authenticated).*'

it outputs 0.

Charles Duffy , Dec 14, 2015 at 18:22

As an aside, if you were using bash for this, the preferred alternative would be the =~ operator in [[ ]] , ie. [[ Unauthenticated123 =~ ^(Unauthenticated|Authenticated) ]]Charles Duffy Dec 14 '15 at 18:22

Charles Duffy , Dec 14, 2015 at 18:25

...and if you weren't targeting a known/fixed operating system, using case rather than a regex match is very much the better practice, since the accepted answer depends on behavior POSIX doesn't define. Charles Duffy Dec 14 '15 at 18:25

Gilles , Dec 14, 2015 at 23:43

See Why does my regular expression work in X but not in Y?Gilles Dec 14 '15 at 23:43

Lambert , Dec 14, 2015 at 14:04

Your command should be:
expr match Unauthenticated123 'Unauthenticated\|Authenticated'

If you want the number of characters matched.

To have the part of the string (Unauthenticated) returned use:

expr match Unauthenticated123 '\(Unauthenticated\|Authenticated\)'

From info coreutils 'expr invocation' :

STRING : REGEX' Perform pattern matching. The arguments are converted to strings and the second is considered to be a (basic, a la GNU grep') regular expression, with a `^' implicitly prepended. The first argument is then matched against this regular expression.

 If the match succeeds and REGEX uses `\(' and `\)', the `:'
 expression returns the part of STRING that matched the
 subexpression; otherwise, it returns the number of characters
 matched.

 If the match fails, the `:' operator returns the null string if
 `\(' and `\)' are used in REGEX, otherwise 0.

 Only the first `\( ... \)' pair is relevant to the return value;
 additional pairs are meaningful only for grouping the regular
 expression operators.

 In the regular expression, `\+', `\?', and `\|' are operators
 which respectively match one or more, zero or one, or separate
 alternatives.  SunOS and other `expr''s treat these as regular
 characters.  (POSIX allows either behavior.)  *Note Regular
 Expression Library: (regex)Top, for details of regular expression
 syntax.  Some examples are in *note Examples of expr::.

stracktracer , Dec 14, 2015 at 14:18

Thanks escaping the | worked. Weird, normally I'd expect it if I wanted to match the literal |... – stracktracer Dec 14 '15 at 14:18

reinierpost , Dec 14, 2015 at 15:34

Regular expression syntax, including the use of backquoting, is different for different tools. Always look it up.reinierpost Dec 14 '15 at 15:34

Stéphane Chazelas , Dec 14, 2015 at 14:49

Note that both match and \| are GNU extensions (and the behaviour for : (the match standard equivalent) when the pattern starts with ^ varies with implementations). Standardly, you'd do:
expr " $string" : " Authenticated" '|' " $string" : " Unauthenticated"

The leading space is to avoid problems with values of $string that start with - or are expr operators, but that means it adds one to the number of characters being matched.

With GNU expr , you'd write it:

expr + "$string" : 'Authenticated\|Unauthenticated'

The + forces $string to be taken as a string even if it happens to be a expr operator. expr regular expressions are basic regular expressions which don't have an alternation operator (and where | is not special). The GNU implementation has it as \| though as an extension.

If all you want is to check whether $string starts with Authenticated or Unauthenticated , you'd better use:

case $string in
  (Authenticated* | Unauthenticated*) do-something
esac

netmonk , Dec 14, 2015 at 14:06

$ expr match "Unauthenticated123" '^\(Unauthenticated\|Authenticated\).*' you have to escape with \ the parenthesis and the pipe.

mikeserv , Dec 14, 2015 at 14:18

and the ^ may not mean what some would think depending on the expr . it is implied anyway. – mikeserv Dec 14 '15 at 14:18

Stéphane Chazelas , Dec 14, 2015 at 14:34

@mikeserv, match and \| are GNU extensions anyway. This Q&A seems to be about GNU expr anyway (where ^ is guaranteed to mean match at the beginning of the string ). – Stéphane Chazelas Dec 14 '15 at 14:34

mikeserv , Dec 14, 2015 at 14:49

@StéphaneChazelas - i didn't know they were strictly GNU. i think i remember them being explicitly officially unspecified - but i don't use expr too often anyway and didn't know that. thank you. – mikeserv Dec 14 '15 at 14:49

Random832 , Dec 14, 2015 at 16:13

It's not "strictly GNU" - it's present in a number of historical implementations (even System V had it, undocumented, though it didn't have the others like substr/length/index), which is why it's explicitly unspecified. I can't find anything about \| being an extension. – Random832 Dec 14 '15 at 16:13

[Oct 17, 2017] Converting string to lower case in Bash - Stack Overflow

Feb 15, 2010 | stackoverflow.com

assassin , Feb 15, 2010 at 7:02

Is there a way in bash to convert a string into a lower case string?

For example, if I have:

a="Hi all"

I want to convert it to:

"hi all"

ghostdog74 , Feb 15, 2010 at 7:43

The are various ways: tr
$ echo "$a" | tr '[:upper:]' '[:lower:]'
hi all
AWK
$ echo "$a" | awk '{print tolower($0)}'
hi all
Bash 4.0
$ echo "${a,,}"
hi all
Perl
$ echo "$a" | perl -ne 'print lc'
hi all
Bash
lc(){
    case "$1" in
        [A-Z])
        n=$(printf "%d" "'$1")
        n=$((n+32))
        printf \\$(printf "%o" "$n")
        ;;
        *)
        printf "%s" "$1"
        ;;
    esac
}
word="I Love Bash"
for((i=0;i<${#word};i++))
do
    ch="${word:$i:1}"
    lc "$ch"
done

jangosteve , Jan 14, 2012 at 21:58

Am I missing something, or does your last example (in Bash) actually do something completely different? It works for "ABX", but if you instead make word="Hi All" like the other examples, it returns ha , not hi all . It only works for the capitalized letters and skips the already-lowercased letters. – jangosteve Jan 14 '12 at 21:58

Richard Hansen , Feb 3, 2012 at 18:55

Note that only the tr and awk examples are specified in the POSIX standard. – Richard Hansen Feb 3 '12 at 18:55

Richard Hansen , Feb 3, 2012 at 18:58

tr '[:upper:]' '[:lower:]' will use the current locale to determine uppercase/lowercase equivalents, so it'll work with locales that use letters with diacritical marks. – Richard Hansen Feb 3 '12 at 18:58

Adam Parkin , Sep 25, 2012 at 18:01

How does one get the output into a new variable? Ie say I want the lowercased string into a new variable? – Adam Parkin Sep 25 '12 at 18:01

Tino , Nov 14, 2012 at 15:39

@Adam: b="$(echo $a | tr '[A-Z]' '[a-z]')"Tino Nov 14 '12 at 15:39

Dennis Williamson , Feb 15, 2010 at 10:31

In Bash 4:

To lowercase

$ string="A FEW WORDS"
$ echo "${string,}"
a FEW WORDS
$ echo "${string,,}"
a few words
$ echo "${string,,[AEIUO]}"
a FeW WoRDS

$ string="A Few Words"
$ declare -l string
$ string=$string; echo "$string"
a few words

To uppercase

$ string="a few words"
$ echo "${string^}"
A few words
$ echo "${string^^}"
A FEW WORDS
$ echo "${string^^[aeiou]}"
A fEw wOrds

$ string="A Few Words"
$ declare -u string
$ string=$string; echo "$string"
A FEW WORDS

Toggle (undocumented, but optionally configurable at compile time)

$ string="A Few Words"
$ echo "${string~~}"
a fEW wORDS
$ string="A FEW WORDS"
$ echo "${string~}"
a FEW WORDS
$ string="a few words"
$ echo "${string~}"
A few words

Capitalize (undocumented, but optionally configurable at compile time)

$ string="a few words"
$ declare -c string
$ string=$string
$ echo "$string"
A few words

Title case:

$ string="a few words"
$ string=($string)
$ string="${string[@]^}"
$ echo "$string"
A Few Words

$ declare -c string
$ string=(a few words)
$ echo "${string[@]}"
A Few Words

$ string="a FeW WOrdS"
$ string=${string,,}
$ string=${string~}
$ echo "$string"

To turn off a declare attribute, use + . For example, declare +c string . This affects subsequent assignments and not the current value.

The declare options change the attribute of the variable, but not the contents. The reassignments in my examples update the contents to show the changes.

Edit:

Added "toggle first character by word" ( ${var~} ) as suggested by ghostdog74

Edit: Corrected tilde behavior to match Bash 4.3.

ghostdog74 , Feb 15, 2010 at 10:52

there's also ${string~}ghostdog74 Feb 15 '10 at 10:52

Hubert Kario , Jul 12, 2012 at 16:48

Quite bizzare, "^^" and ",," operators don't work on non-ASCII characters but "~~" does... So string="łódź"; echo ${string~~} will return "ŁÓDŹ", but echo ${string^^} returns "łóDź". Even in LC_ALL=pl_PL.utf-8 . That's using bash 4.2.24. – Hubert Kario Jul 12 '12 at 16:48

Dennis Williamson , Jul 12, 2012 at 18:20

@HubertKario: That's weird. It's the same for me in Bash 4.0.33 with the same string in en_US.UTF-8 . It's a bug and I've reported it. – Dennis Williamson Jul 12 '12 at 18:20

Dennis Williamson , Jul 13, 2012 at 0:44

@HubertKario: Try echo "$string" | tr '[:lower:]' '[:upper:]' . It will probably exhibit the same failure. So the problem is at least partly not Bash's. – Dennis Williamson Jul 13 '12 at 0:44

Dennis Williamson , Jul 14, 2012 at 14:27

@HubertKario: The Bash maintainer has acknowledged the bug and stated that it will be fixed in the next release. – Dennis Williamson Jul 14 '12 at 14:27

shuvalov , Feb 15, 2010 at 7:13

echo "Hi All" | tr "[:upper:]" "[:lower:]"

Richard Hansen , Feb 3, 2012 at 19:00

+1 for not assuming english – Richard Hansen Feb 3 '12 at 19:00

Hubert Kario , Jul 12, 2012 at 16:56

@RichardHansen: tr doesn't work for me for non-ACII characters. I do have correct locale set and locale files generated. Have any idea what could I be doing wrong? – Hubert Kario Jul 12 '12 at 16:56

wasatchwizard , Oct 23, 2014 at 16:42

FYI: This worked on Windows/Msys. Some of the other suggestions did not. – wasatchwizard Oct 23 '14 at 16:42

Ignacio Vazquez-Abrams , Feb 15, 2010 at 7:03

tr :
a="$(tr [A-Z] [a-z] <<< "$a")"
AWK :
{ print tolower($0) }
sed :
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/

Sandeepan Nath , Feb 2, 2011 at 11:12

+1 a="$(tr [A-Z] [a-z] <<< "$a")" looks easiest to me. I am still a beginner... – Sandeepan Nath Feb 2 '11 at 11:12

Haravikk , Oct 19, 2013 at 12:54

I strongly recommend the sed solution; I've been working in an environment that for some reason doesn't have tr but I've yet to find a system without sed , plus a lot of the time I want to do this I've just done something else in sed anyway so can chain the commands together into a single (long) statement. – Haravikk Oct 19 '13 at 12:54

Dennis , Nov 6, 2013 at 19:49

The bracket expressions should be quoted. In tr [A-Z] [a-z] A , the shell may perform filename expansion if there are filenames consisting of a single letter or nullgob is set. tr "[A-Z]" "[a-z]" A will behave properly. – Dennis Nov 6 '13 at 19:49

Haravikk , Jun 15, 2014 at 10:51

@CamiloMartin it's a BusyBox system where I'm having that problem, specifically Synology NASes, but I've encountered it on a few other systems too. I've been doing a lot of cross-platform shell scripting lately, and with the requirement that nothing extra be installed it makes things very tricky! However I've yet to encounter a system without sedHaravikk Jun 15 '14 at 10:51

fuz , Jan 31, 2016 at 14:54

Note that tr [A-Z] [a-z] is incorrect in almost all locales. for example, in the en-US locale, A-Z is actually the interval AaBbCcDdEeFfGgHh...XxYyZ . – fuz Jan 31 '16 at 14:54

nettux443 , May 14, 2014 at 9:36

I know this is an oldish post but I made this answer for another site so I thought I'd post it up here:

UPPER -> lower : use python:

b=`echo "print '$a'.lower()" | python`

Or Ruby:

b=`echo "print '$a'.downcase" | ruby`

Or Perl (probably my favorite):

b=`perl -e "print lc('$a');"`

Or PHP:

b=`php -r "print strtolower('$a');"`

Or Awk:

b=`echo "$a" | awk '{ print tolower($1) }'`

Or Sed:

b=`echo "$a" | sed 's/./\L&/g'`

Or Bash 4:

b=${a,,}

Or NodeJS if you have it (and are a bit nuts...):

b=`echo "console.log('$a'.toLowerCase());" | node`

You could also use dd (but I wouldn't!):

b=`echo "$a" | dd  conv=lcase 2> /dev/null`

lower -> UPPER

use python:

b=`echo "print '$a'.upper()" | python`

Or Ruby:

b=`echo "print '$a'.upcase" | ruby`

Or Perl (probably my favorite):

b=`perl -e "print uc('$a');"`

Or PHP:

b=`php -r "print strtoupper('$a');"`

Or Awk:

b=`echo "$a" | awk '{ print toupper($1) }'`

Or Sed:

b=`echo "$a" | sed 's/./\U&/g'`

Or Bash 4:

b=${a^^}

Or NodeJS if you have it (and are a bit nuts...):

b=`echo "console.log('$a'.toUpperCase());" | node`

You could also use dd (but I wouldn't!):

b=`echo "$a" | dd  conv=ucase 2> /dev/null`

Also when you say 'shell' I'm assuming you mean bash but if you can use zsh it's as easy as

b=$a:l

for lower case and

b=$a:u

for upper case.

JESii , May 28, 2015 at 21:42

Neither the sed command nor the bash command worked for me. – JESii May 28 '15 at 21:42

nettux443 , Nov 20, 2015 at 14:33

@JESii both work for me upper -> lower and lower-> upper. I'm using sed 4.2.2 and Bash 4.3.42(1) on 64bit Debian Stretch. – nettux443 Nov 20 '15 at 14:33

JESii , Nov 21, 2015 at 17:34

Hi, @nettux443... I just tried the bash operation again and it still fails for me with the error message "bad substitution". I'm on OSX using homebrew's bash: GNU bash, version 4.3.42(1)-release (x86_64-apple-darwin14.5.0) – JESii Nov 21 '15 at 17:34

tripleee , Jan 16, 2016 at 11:45

Do not use! All of the examples which generate a script are extremely brittle; if the value of a contains a single quote, you have not only broken behavior, but a serious security problem. – tripleee Jan 16 '16 at 11:45

Scott Smedley , Jan 27, 2011 at 5:37

In zsh:
echo $a:u

Gotta love zsh!

Scott Smedley , Jan 27, 2011 at 5:39

or $a:l for lower case conversion – Scott Smedley Jan 27 '11 at 5:39

biocyberman , Jul 24, 2015 at 23:26

Add one more case: echo ${(C)a} #Upcase the first char onlybiocyberman Jul 24 '15 at 23:26

devnull , Sep 26, 2013 at 15:45

Using GNU sed :
sed 's/.*/\L&/'

Example:

$ foo="Some STRIng";
$ foo=$(echo "$foo" | sed 's/.*/\L&/')
$ echo "$foo"
some string

technosaurus , Jan 21, 2012 at 10:27

For a standard shell (without bashisms) using only builtins:
uppers=ABCDEFGHIJKLMNOPQRSTUVWXYZ
lowers=abcdefghijklmnopqrstuvwxyz

lc(){ #usage: lc "SOME STRING" -> "some string"
    i=0
    while ([ $i -lt ${#1} ]) do
        CUR=${1:$i:1}
        case $uppers in
            *$CUR*)CUR=${uppers%$CUR*};OUTPUT="${OUTPUT}${lowers:${#CUR}:1}";;
            *)OUTPUT="${OUTPUT}$CUR";;
        esac
        i=$((i+1))
    done
    echo "${OUTPUT}"
}

And for upper case:

uc(){ #usage: uc "some string" -> "SOME STRING"
    i=0
    while ([ $i -lt ${#1} ]) do
        CUR=${1:$i:1}
        case $lowers in
            *$CUR*)CUR=${lowers%$CUR*};OUTPUT="${OUTPUT}${uppers:${#CUR}:1}";;
            *)OUTPUT="${OUTPUT}$CUR";;
        esac
        i=$((i+1))
    done
    echo "${OUTPUT}"
}

Dereckson , Nov 23, 2014 at 19:52

I wonder if you didn't let some bashism in this script, as it's not portable on FreeBSD sh: ${1:$...}: Bad substitution – Dereckson Nov 23 '14 at 19:52

tripleee , Apr 14, 2015 at 7:09

Indeed; substrings with ${var:1:1} are a Bashism. – tripleee Apr 14 '15 at 7:09

Derek Shaw , Jan 24, 2011 at 13:53

Regular expression

I would like to take credit for the command I wish to share but the truth is I obtained it for my own use from http://commandlinefu.com . It has the advantage that if you cd to any directory within your own home folder that is it will change all files and folders to lower case recursively please use with caution. It is a brilliant command line fix and especially useful for those multitudes of albums you have stored on your drive.

find . -depth -exec rename 's/(.*)\/([^\/]*)/$1\/\L$2/' {} \;

You can specify a directory in place of the dot(.) after the find which denotes current directory or full path.

I hope this solution proves useful the one thing this command does not do is replace spaces with underscores - oh well another time perhaps.

Wadih M. , Nov 29, 2011 at 1:31

thanks for commandlinefu.comWadih M. Nov 29 '11 at 1:31

John Rix , Jun 26, 2013 at 15:58

This didn't work for me for whatever reason, though it looks fine. I did get this to work as an alternative though: find . -exec /bin/bash -c 'mv {} `tr [A-Z] [a-z] <<< {}`' \; – John Rix Jun 26 '13 at 15:58

Tino , Dec 11, 2015 at 16:27

This needs prename from perl : dpkg -S "$(readlink -e /usr/bin/rename)" gives perl: /usr/bin/prenameTino Dec 11 '15 at 16:27

c4f4t0r , Aug 21, 2013 at 10:21

In bash 4 you can use typeset

Example:

A="HELLO WORLD"
typeset -l A=$A

community wiki, Jan 16, 2016 at 12:26

Pre Bash 4.0

Bash Lower the Case of a string and assign to variable

VARIABLE=$(echo "$VARIABLE" | tr '[:upper:]' '[:lower:]') 

echo "$VARIABLE"

Tino , Dec 11, 2015 at 16:23

No need for echo and pipes: use $(tr '[:upper:]' '[:lower:]' <<<"$VARIABLE")Tino Dec 11 '15 at 16:23

tripleee , Jan 16, 2016 at 12:28

@Tino The here string is also not portable back to really old versions of Bash; I believe it was introduced in v3. – tripleee Jan 16 '16 at 12:28

Tino , Jan 17, 2016 at 14:28

@tripleee You are right, it was introduced in bash-2.05b - however that's the oldest bash I was able to find on my systems – Tino Jan 17 '16 at 14:28

Bikesh M Annur , Mar 23 at 6:48

You can try this
s="Hello World!" 

echo $s  # Hello World!

a=${s,,}
echo $a  # hello world!

b=${s^^}
echo $b  # HELLO WORLD!

ref : http://wiki.workassis.com/shell-script-convert-text-to-lowercase-and-uppercase/

Orwellophile , Mar 24, 2013 at 13:43

For Bash versions earlier than 4.0, this version should be fastest (as it doesn't fork/exec any commands):
function string.monolithic.tolower
{
   local __word=$1
   local __len=${#__word}
   local __char
   local __octal
   local __decimal
   local __result

   for (( i=0; i<__len; i++ ))
   do
      __char=${__word:$i:1}
      case "$__char" in
         [A-Z] )
            printf -v __decimal '%d' "'$__char"
            printf -v __octal '%03o' $(( $__decimal ^ 0x20 ))
            printf -v __char \\$__octal
            ;;
      esac
      __result+="$__char"
   done
   REPLY="$__result"
}

technosaurus's answer had potential too, although it did run properly for mee.

Stephen M. Harris , Mar 22, 2013 at 22:42

If using v4, this is baked-in . If not, here is a simple, widely applicable solution. Other answers (and comments) on this thread were quite helpful in creating the code below.
# Like echo, but converts to lowercase
echolcase () {
    tr [:upper:] [:lower:] <<< "${*}"
}

# Takes one arg by reference (var name) and makes it lowercase
lcase () { 
    eval "${1}"=\'$(echo ${!1//\'/"'\''"} | tr [:upper:] [:lower:] )\'
}

Notes:

JaredTS486 , Dec 23, 2015 at 17:37

In spite of how old this question is and similar to this answer by technosaurus . I had a hard time finding a solution that was portable across most platforms (That I Use) as well as older versions of bash. I have also been frustrated with arrays, functions and use of prints, echos and temporary files to retrieve trivial variables. This works very well for me so far I thought I would share. My main testing environments are:
  1. GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
  2. GNU bash, version 3.2.57(1)-release (sparc-sun-solaris2.10)
lcs="abcdefghijklmnopqrstuvwxyz"
ucs="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
input="Change Me To All Capitals"
for (( i=0; i<"${#input}"; i++ )) ; do :
    for (( j=0; j<"${#lcs}"; j++ )) ; do :
        if [[ "${input:$i:1}" == "${lcs:$j:1}" ]] ; then
            input="${input/${input:$i:1}/${ucs:$j:1}}" 
        fi
    done
done

Simple C-style for loop to iterate through the strings. For the line below if you have not seen anything like this before this is where I learned this . In this case the line checks if the char ${input:$i:1} (lower case) exists in input and if so replaces it with the given char ${ucs:$j:1} (upper case) and stores it back into input.

input="${input/${input:$i:1}/${ucs:$j:1}}"

Gus Neves , May 16 at 10:04

Many answers using external programs, which is not really using Bash .

If you know you will have Bash4 available you should really just use the ${VAR,,} notation (it is easy and cool). For Bash before 4 (My Mac still uses Bash 3.2 for example). I used the corrected version of @ghostdog74 's answer to create a more portable version.

One you can call lowercase 'my STRING' and get a lowercase version. I read comments about setting the result to a var, but that is not really portable in Bash , since we can't return strings. Printing it is the best solution. Easy to capture with something like var="$(lowercase $str)" .

How this works

The way this works is by getting the ASCII integer representation of each char with printf and then adding 32 if upper-to->lower , or subtracting 32 if lower-to->upper . Then use printf again to convert the number back to a char. From 'A' -to-> 'a' we have a difference of 32 chars.

Using printf to explain:

$ printf "%d\n" "'a"
97
$ printf "%d\n" "'A"
65

97 - 65 = 32

And this is the working version with examples.
Please note the comments in the code, as they explain a lot of stuff:

#!/bin/bash

# lowerupper.sh

# Prints the lowercase version of a char
lowercaseChar(){
    case "$1" in
        [A-Z])
            n=$(printf "%d" "'$1")
            n=$((n+32))
            printf \\$(printf "%o" "$n")
            ;;
        *)
            printf "%s" "$1"
            ;;
    esac
}

# Prints the lowercase version of a sequence of strings
lowercase() {
    word="$@"
    for((i=0;i<${#word};i++)); do
        ch="${word:$i:1}"
        lowercaseChar "$ch"
    done
}

# Prints the uppercase version of a char
uppercaseChar(){
    case "$1" in
        [a-z])
            n=$(printf "%d" "'$1")
            n=$((n-32))
            printf \\$(printf "%o" "$n")
            ;;
        *)
            printf "%s" "$1"
            ;;
    esac
}

# Prints the uppercase version of a sequence of strings
uppercase() {
    word="$@"
    for((i=0;i<${#word};i++)); do
        ch="${word:$i:1}"
        uppercaseChar "$ch"
    done
}

# The functions will not add a new line, so use echo or
# append it if you want a new line after printing

# Printing stuff directly
lowercase "I AM the Walrus!"$'\n'
uppercase "I AM the Walrus!"$'\n'

echo "----------"

# Printing a var
str="A StRing WITH mixed sTUFF!"
lowercase "$str"$'\n'
uppercase "$str"$'\n'

echo "----------"

# Not quoting the var should also work, 
# since we use "$@" inside the functions
lowercase $str$'\n'
uppercase $str$'\n'

echo "----------"

# Assigning to a var
myLowerVar="$(lowercase $str)"
myUpperVar="$(uppercase $str)"
echo "myLowerVar: $myLowerVar"
echo "myUpperVar: $myUpperVar"

echo "----------"

# You can even do stuff like
if [[ 'option 2' = "$(lowercase 'OPTION 2')" ]]; then
    echo "Fine! All the same!"
else
    echo "Ops! Not the same!"
fi

exit 0

And the results after running this:

$ ./lowerupper.sh 
i am the walrus!
I AM THE WALRUS!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
myLowerVar: a string with mixed stuff!
myUpperVar: A STRING WITH MIXED STUFF!
----------
Fine! All the same!

This should only work for ASCII characters though .

For me it is fine, since I know I will only pass ASCII chars to it.
I am using this for some case-insensitive CLI options, for example.

nitinr708 , Jul 8, 2016 at 9:20

To store the transformed string into a variable. Following worked for me - $SOURCE_NAME to $TARGET_NAME
TARGET_NAME="`echo $SOURCE_NAME | tr '[:upper:]' '[:lower:]'`"

[Jun 18, 2017] An introduction to parameter expansion in Bash by James Pannacciulli

Notable quotes:
"... parameter expansion ..."
"... var="" ..."
"... var="gnu" ..."
"... parameter expansion ..."
"... offset of 5 length of 4 ..."
"... parameter expansion ..."
"... pattern of string of _ ..."
Jun 18, 2017 | opensource.com
About conditional, substring, and substitution parameter expansion operators Conditional parameter expansion

Conditional parameter expansion allows branching on whether the parameter is unset, empty, or has content. Based on these conditions, the parameter can be expanded to its value, a default value, or an alternate value; throw a customizable error; or reassign the parameter to a default value. The following table shows the conditional parameter expansions-each row shows a parameter expansion using an operator to potentially modify the expansion, with the columns showing the result of that expansion given the parameter's status as indicated in the column headers. Operators with the ':' prefix treat parameters with empty values as if they were unset.

parameter expansion unset var var="" var="gnu"
${var-default} default - gnu
${var:-default} default default gnu
${var+alternate} - alternate alternate
${var:+alternate} - - alternate
${var?error} error - gnu
${var:?error} error error gnu

The = and := operators in the table function identically to - and :- , respectively, except that the = variants rebind the variable to the result of the expansion.

As an example, let's try opening a user's editor on a file specified by the OUT_FILE variable. If either the EDITOR environment variable or our OUT_FILE variable is not specified, we will have a problem. Using a conditional expansion, we can ensure that when the EDITOR variable is expanded, we get the specified value or at least a sane default:

$
echo
${
EDITOR
}

/usr/bin/vi
$
echo
${
EDITOR
:-
$(
which nano
)
}

/usr/bin/vi
$
unset
EDITOR
$
echo
${
EDITOR
:-
$(
which nano
)
}

/usr/bin/nano

Building on the above, we can run the editor command and abort with a helpful error at runtime if there's no filename specified:

$
${
EDITOR
:-
$(
which nano
)
}
${
OUT_FILE
:?
Missing filename
}

bash: OUT_FILE: Missing filename
Substring parameter expansion

Parameters can be expanded to just part of their contents, either by offset or by removing content matching a pattern. When specifying a substring offset, a length may optionally be specified. If running Bash version 4.2 or greater, negative numbers may be used as offsets from the end of the string. Note the parentheses used around the negative offset, which ensure that Bash does not parse the expansion as having the conditional default expansion operator from above:

$
location
=
"
CA 90095
"
$
echo
"
Zip Code: 
${
location
:
3
}
"

Zip Code: 90095
$
echo
"
Zip Code: 
${
location
:
(-5)
}
"

Zip Code: 90095
$
echo
"
State: 
${
location
:
0
:
2
}
"

State: CA

Another way to take a substring is to remove characters from the string matching a pattern, either from the left edge with the # and ## operators or from the right edge with the % and %% operators. A useful mnemonic is that # appears left of a comment and % appears right of a number. When the operator is doubled, it matches greedily, as opposed to the single version, which removes the most minimal set of characters matching the pattern.

var="open source"
parameter expansion offset of 5
length of 4
${var:offset} source
${var:offset:length} sour
pattern of *o?
${var#pattern} en source
${var##pattern} rce
pattern of ?e*
${var%pattern} open sour
${var%%pattern} o

The pattern-matching used is the same as with filename globbing: * matches zero or more of any character, ? matches exactly one of any character, [...] brackets introduce a character class match against a single character, supporting negation ( ^ ), as well as the posix character classes, e.g. . By excising characters from our string in this manner, we can take a substring without first knowing the offset of the data we need:

$
echo
$
PATH

/usr/local/bin:/usr/bin:/bin
$
echo
"
Lowest priority in PATH: 
${
PATH
##
*:
}
"

Lowest priority in PATH: /bin
$
echo
"
Everything except lowest priority: 
${
PATH
%
:*
}
"

Everything except lowest priority: /usr/local/bin:/usr/bin
$
echo
"
Highest priority in PATH: 
${
PATH
%%
:*
}
"

Highest priority in PATH: /usr/local/bin
Substitution in parameter expansion

The same types of patterns are used for substitution in parameter expansion. Substitution is introduced with the / or // operators, followed by two arguments separated by another / representing the pattern and the string to substitute. The pattern matching is always greedy, so the doubled version of the operator, in this case, causes all matches of the pattern to be replaced in the variable's expansion, while the singleton version replaces only the leftmost.

var="free and open"
parameter expansion pattern of
string of _
${var/pattern/string} free_and open
${var//pattern/string} free_and_open

The wealth of parameter expansion modifiers transforms Bash variables and other parameters into powerful tools beyond simple value stores. At the very least, it is important to understand how parameter expansion works when reading Bash scripts, but I suspect that not unlike myself, many of you will enjoy the conciseness and expressiveness that these expansion modifiers bring to your scripts as well as your interactive sessions.

[Nov 04, 2016] Coding Style rear-rear Wiki

Reading rear sources is an interesting exercise. It really demonstrates attempt to use "reasonable' style of shell programming and you can learn a lot.
Nov 04, 2016 | github.com

Relax-and-Recover is written in Bash (at least bash version 3 is needed), a language that can be used in many styles. We want to make it easier for everybody to understand the Relax-and-Recover code and subsequently to contribute fixes and enhancements.

Here is a collection of coding hints that should help to get a more consistent code base.

Don't be afraid to contribute to Relax-and-Recover even if your contribution does not fully match all this coding hints. Currently large parts of the Relax-and-Recover code are not yet in compliance with this coding hints. This is an ongoing step by step process. Nevertheless try to understand the idea behind this coding hints so that you know how to break them properly (i.e. "learn the rules so you know how to break them properly").

The overall idea behind this coding hints is:

Make yourself understood

Make yourself understood to enable others to fix and enhance your code properly as needed.

From this overall idea the following coding hints are derived.

For the fun of it an extreme example what coding style should be avoided:

#!/bin/bash for i in `seq 1 2 $((2*$1-1))`;do echo $((j+=i));done



   

Try to find out what that code is about - it does a useful thing.

Code must be easy to read Code should be easy to understand

Do not only tell what the code does (i.e. the implementation details) but also explain what the intent behind is (i.e. why ) to make the code maintainable.

Here the initial example so that one can understand what it is about:

#!/bin/bash # output the first N square numbers # by summing up the first N odd numbers 1 3 ... 2*N-1 # where each nth partial sum is the nth square number # see https://en.wikipedia.org/wiki/Square_number#Properties # this way it is a little bit faster for big N compared to # calculating each square number on its own via multiplication N=$1 if ! [[ $N =~ ^[0-9]+$ ]] ; then echo "Input must be non-negative integer." 1>&2 exit 1 fi square_number=0 for odd_number in $( seq 1 2 $(( 2 * N - 1 )) ) ; do (( square_number += odd_number )) && echo $square_number done

Now the intent behind is clear and now others can easily decide if that code is really the best way to do it and easily improve it if needed.

Try to care about possible errors

By default bash proceeds with the next command when something failed. Do not let your code blindly proceed in case of errors because that could make it hard to find the root cause of a failure when it errors out somewhere later at an unrelated place with a weird error message which could lead to false fixes that cure only a particular symptom but not the root cause.

Maintain Backward Compatibility

Implement adaptions and enhancements in a backward compatible way so that your changes do not cause regressions for others.

Dirty hacks welcome

When there are special issues on particular systems it is more important that the Relax-and-Recover code works than having nice looking clean code that sometimes fails. In such special cases any dirty hacks that intend to make it work everywhere are welcome. But for dirty hacks the above listed coding hints become mandatory rules:

For example a dirty hack like the following is perfectly acceptable:

# FIXME: Dirty hack to make it work # on "FUBAR Linux version 666" # where COMMAND sometimes inexplicably fails # but always works after at most 3 attempts # see http://example.org/issue12345 # Retries should have no bad effect on other systems # where the first run of COMMAND works. COMMAND || COMMAND || COMMAND || Error "COMMAND failed."

Character Encoding

Use only traditional (7-bit) ASCII charactes. In particular do not use UTF-8 encoded multi-byte characters.

Text Layout Variables Functions Relax-and-Recover functions

Use the available Relax-and-Recover functions when possible instead of re-implementing basic functionality again and again. The Relax-and-Recover functions are implemented in various lib/*-functions.sh files .

test, [, [[, (( Paired parenthesis See also

[Nov 01, 2014] How to determine if a string is a substring of another in bash?

I want to see if a string is inside a portion of another string.e.g.:
'ab' in 'abc' -> true
'ab' in 'bcd' -> false

How can I do this in a conditional of a bash script?

A: You can use the form ${VAR/subs} where VAR contains the bigger string and subs is the substring your are trying to find:

my_string=abc
substring=ab
if [ "${my_string/$substring}" = "$my_string" ] ; then
  echo "${substring} is not in ${my_string}"
else
  echo "${substring} was found in ${my_string}"
fi

This works because ${VAR/subs} is equal to $VAR but with the first occurrence of the string subs removed, in particular if $VAR does not contains the word subs it won't be modified.

I think that you should change the sequence of the echo statements. Because I get ab is not in abc

Mmm.. No, the script is wrong. Like that I get ab was found in abc, but if I use substring=z I get z was found in abcLucio May 25 '13 at 0:08

===

Sorry again I forgot the $ in substring. – edwin May 25 '13 at 0:10

Now I get ab is not in abc. But z was found in abc. This is funny :D – Lucio May 25 '13 at 0:11

===

[[ "ab" =~ "bcd" ]]
[[ "ab" =~ "abc" ]]

the brackets are for the test, and as it is double brackets, it can so some extra tests like =~.

So you could use this form something like

var1="ab"
var2="bcd"
if [[ "$var2" =~ "$var1" ]]; then
    echo "pass"
else
    echo "fail"
fi

Edit: corrected "=~", had flipped.

I get fail with this parameters: var2="abcd"Lucio May 25 '13 at 0:02

===

@Lucio The correct is [[ $string =~ $substring ]]. I updated the answer. – Eric Carvalho May 25 '13 at 0:38

===

@EricCarvalho opps, thanks for correcting it. – demure May 25 '13 at 0:49

===

Using bash filename patterns (aka "glob" patterns)

substr=ab
[[ abc == *"$substr"* ]] && echo yes || echo no    # yes
[[ bcd == *"$substr"* ]] && echo yes || echo no    # no
===

The following two approaches will work on any POSIX-compatible environment, not just in bash:

substr=ab
for s in abc bcd; do
    if case ${s} in *"${substr}"*) true;; *) false;; esac; then
        printf %s\\n "'${s}' contains '${substr}'"
    else
        printf %s\\n "'${s}' does not contain '${substr}'"
    fi
done
substr=ab
for s in abc bcd; do
    if printf %s\\n "${s}" | grep -qF "${substr}"; then
        printf %s\\n "'${s}' contains '${substr}'"
    else
        printf %s\\n "'${s}' does not contain '${substr}'"
    fi
done

Both of the above output:

'abc' contains 'ab'
'bcd' does not contain 'ab'

The former has the advantage of not spawning a separate grep process.

Note that I use printf %s\\n "${foo}" instead of echo "${foo}" because echo might mangle ${foo} if it contains backslashes.

===

Mind the [[ and ":

[[ $a == z* ]]   # True if $a starts with an "z" (pattern matching).
[[ $a == "z*" ]] # True if $a is equal to z* (literal matching).

[ $a == z* ]     # File globbing and word splitting take place.
[ "$a" == "z*" ] # True if $a is equal to z* (literal matching).

So as @glenn_jackman said, but mind that if you wrap the whole second term in double quotes, it will switch the test to literal matching.

Source: http://tldp.org/LDP/abs/html/comparison-ops.html

[Mar 16, 2011] Bash pattern substitution

Here's the actual formal definition from the bash man pages:
 ${parameter/pattern/string}
 ${parameter//pattern/string}

The pattern is expanded to produce a patter t- tern against its value is replaced with string. In the first form, only the first match is replaced. The second form causes all matches of pattern to be replaced with string. If pattern begins with #, it must match at the beginning of the expanded value of parameter. If pattern begins with %, it must match at the end of the expanded value of parameter. If string is null, matches of pattern are deleted and the / following pattern may be omitted. If parameter is @ or *, the substitution operation is applied to each positional parameter in turn, and the expan- sion is the resultant list. If parameter is an array variable subscripted with @ or *, the substitution operation is applied to each member of the array in turn, and the expansion is the resultant list.

[Mar 16, 2011] Bash info, scripting examples, regex parameter substitution, interactive shell, and more

Regular expressions and globbing

Globbing is use of * as a wildcard to glob file name list together. Use of wildcards is not a regular expression.

These following examples should also work inside bash scripts. These may or may not be compatible with sh. These are "interesting" regex or globbing examples. I say "interesting" because they don't seem to follow the path of "true" regular expressions used by Perl.

[mst3k@zeus ~]$ echo ${HOME/\/home\//}
mst3k
[mst3k@zeus ~]$ echo ${HOME##home}
/home/mst3k
[mst3k@zeus ~]$ echo ${HOME##/home}
/mst3k
[mst3k@zeus ~]$ echo ${HOME##/home/}
mst3k
[mst3k@zeus ~]$ echo ${HOME##*}

[mst3k@zeus ~]$ echo ${HOME##*/}

Trouble with the string replacement function in bash

I'm having trouble with the string replacement function in bash. The problem is that I want to replace a printing character, in this case &, by a non-printing character, either new line or null in this case. I don't see how to specify the non-printing character in the string replacement function ${variable//a/b}.

I have a long, URL-encoded-like file name that I would like to parse with grep. I have used & as a delimiter between variables within the long file name. I would like to use the string replacement function in bash to search for all instances of & and replace each one with either the null character or the new line character since grep can recognize either one.

How do I specify a non-printing character in the bash string replacement function ?

Thank you.

Special Syntax
Submitted by Mitch Frazier on Fri, 04/02/2010 - 11:43.
Use the $'\xNN' syntax for the non-printing character. Note though that a NULL character does not work:

$ cat j.sh

v="hello=yes&world=no"

v2=${v/&/$'\x0a'}
# ^^^^^^^ change to newline
echo -n ">>$v2<<" | hexdump -C

v2=${v/&/$'\x00'}
# ^^^^^^^ change to null (doesn't work)
echo -n ">>$v2<<" | hexdump -C
If you run this you can see that the substitution works for a newline but not for a NULL:

$ sh j.sh
00000000 3e 3e 68 65 6c 6c 6f 3d 79 65 73 0a 77 6f 72 6c |>>hello=yes.worl|
00000010 64 3d 6e 6f 3c 3c |d=no<<|
00000016
00000000 3e 3e 68 65 6c 6c 6f 3d 79 65 73 77 6f 72 6c 64 |>>hello=yesworld|
00000010 3d 6e 6f 3c 3c |=no<<|
00000015
Mitch Frazier is an Associate Editor for Linux Journal.

Multiple operations?
Submitted by Anonymous on Wed, 03/24/2010 - 10:56.
Very interesting article.
A question: is it possible to use in the same expression many operators, as:
${var#t*is%t*st} which uses both '#t*is' and '%t*st' which gives 'is a' in the example?
I tried some forms but it doesn't work... Has someone an idea?

Doesn't Work
Submitted by Mitch Frazier on Wed, 03/24/2010 - 14:59.
You can't do multiple operations in one expression.

Mitch Frazier is an Associate Editor for Linux Journal.

Variable
Submitted by First question (not verified) on Tue, 09/01/2009 - 07:07.
I want to do something like this using linux bash script:
a1="Chris Alonso"
i="1"
echo $a$i #I only trying to write: echo $a1 using the variable i

Someone can help me, please?

Eval
Submitted by Mitch Frazier on Tue, 09/01/2009 - 13:41.
Eval will do this for you but you may decide you really don't want to do this after seeing it:

eval echo \$$(echo a$i)
or
eval echo \$`echo a$i`
A slightly less complicated sequence would be something like:

v=a$i
eval echo \$$v
It looks like what you're trying to do here is simulate arrays. If that's the case then you'd be better or using bash's built-in arrays.

Mitch Frazier is an Associate Editor for Linux Journal.

How about v=a$i echo ${!v}
Submitted by Anonymous (not verified) on Fri, 09/25/2009 - 15:09.
How about

v=a$i
echo ${!v}

simplification of indirect reference
Submitted by Anonymous on Fri, 03/12/2010 - 23:16.
Is there any way to rid the statements of the variable assignment? As in, make it so that:


echo ${!a$i}

works? I'm thinking that there has to be a way to escape the "a$i" inside the indirect reference construct. I have a case where I'm trying to do this with the result of a regex match, and am not able to figure out the right syntax:


SRC_FOLDER=/var/website
needleA_FOLDER=/var/www
needleB_FOLDER=/var/htdocs

for item in ${ARRAY[@]; do
[[ "$item" =~ hay(needle)stack ]] &&
DIR=${!${BASH_REMATCH[1]}_FOLDER};
cp -R $SRC_FOLDER/* $DIR;
done;

But the seventh line (with the indirect reference) chokes with a "bad substitution" error. I should be able to do this on one line, without using eval with the right syntax, no?

Sincerely,
Tyler

Yes
Submitted by Mitch Frazier on Fri, 09/25/2009 - 15:36.
That works and is simpler than my solution.

Mitch Frazier is an Associate Editor for Linux Journal.

: or not to :
Submitted by Ash (not verified) on Mon, 01/08/2007 - 07:47.
Interesting, ":" can be ommited for "numeric" variables (script/function arguments).

baz=${var:-bar}
vs.

baz=${1-bar}
First time I thought it is a typo, but it is not.

interesting, this will save a few seds and greps!
Submitted by mangoo (not verified) on Tue, 01/29/2008 - 06:24.
Interesting article, this will save me a few seds, greps and awks!

what if we are to operate on
Submitted by MgBaMa req (not verified) on Mon, 09/03/2007 - 22:44.
what if we are to operate on the param $1 $2, ...?
i mean is it feasible to see a result of ${4%/*}
to get a valule as from
$ export $1="this is a test/none"
$ echo ${$1/*}
> this is a test
$ echo ${$2#*/}
> none
?
thanks

variable contents confusing
Submitted by Paul Archerr (not verified) on Wed, 03/29/2006 - 09:39.
Minor typos not withstanding, I had a bit of a problem with the values of the variables used. assigning the value 'bar' to the variable bar makes it confusing to quickly figure out which is which. (Is that 'bar' another variable? Or a value?)
I would suggest making the simple change of putting your values in uppercase. They would stand out and make the article more readable.
For example:
$ export var=var
$ echo ${var}bar # var exists so this works as expected
varbar
$ echo $varbar # varbar doesn't exist, so this doesn't
$

becomes

$ export var=VAR
$ echo ${var}bar # var exists so this works as expected
VARbar
$ echo $varbar # varbar doesn't exist, so this doesn't
$

You can see how the 'VARbar' on the third line becomes differentiated from the 'varbar' on the fourth line.

var, bar, +varbar: worst things for programming since Microsoft.
Submitted by Anonymous (not verified) on Mon, 09/10/2007 - 14:35.
Using var and bar to try to inform someone is pretty much uniformly bad everywhere it's done, as var and bar explicitly indicate things that don't have any meaning whatsoever. Varbar is even worse, since it is visibly only different from var bar because of a single " " (space).

If you're trying to confuse the reader, use var, bar, and especially varbar.

If you're trying to be informative, please, give your damn variables a short but logically useful name.

Part II?
Submitted by Anonymous (not verified) on Fri, 03/24/2006 - 08:58.
OK but there's a lot more to it than just this. How about some of the following?

${var:pos[:len]} # extract substr from pos (0-based) for len

${var/substr/repl} # replace first match
${var//substr/repl} # replace all matches
${var/#substr/repl} # replace if matches at beginning (non-greedy)
${var/##substr/repl} # replace if matches at beginning (greedy)
${var/%substr/repl} # replace if matches at end (non-greedy)
${var/%%substr/repl} # replace if matches at end (greedy)

${#var} # returns length of $var
${!var} # indirect expansion

...Sorry, those round parens


Submitted by Anonymous (not verified) on Fri, 03/24/2006 - 09:07.
...Sorry, those round parens should be curlies.

Interesting
Submitted by Stephanie (not verified) on Sun, 03/12/2006 - 21:51.
I think they author did a great job explaining the article and am glad that I was able to learn from it and finally found something interesting to read online!

Examples in Table 1 are rubbish
Submitted by Anonymous (not verified) on Sun, 03/12/2006 - 20:56.
In addition to the incorrect $var= (should be var=), the last two examples don't illustrate the use of the construct they are supposed to . Pity the author did not proof-read the first table.

Examples using same operator yet differing results?!
Submitted by really-txtedmacs (not verified) on Fri, 03/10/2006 - 19:46.
${var#t*is} deletes the shortest possible match from the left export $var="this is a test" echo ${var#t*is} is a test

fine, but next in line is supposed to remove the maximum from the left, but uses the same exact operator, how does it get the correct result?

${var##t*is} deletes the longest possible match from the left export $var="this is a test" echo ${var#t*is} a test

Get's worse when going from the right, the original operation from the right is employed. Moreover, on my system an Ubuntu 05.10 descktop, this gave:

txtedmacs@phpserver:~$ export $var="this is a test"
bash: export: `=this is a test': not a valid identifier

Take out the $var, and it works fine.

Much easier to catch someone else's errors than one's own - I hate looking at my articles or emails.

Errors in article
Submitted by Anonymous (not verified) on Mon, 03/13/2006 - 22:49.
As really-txtedmacs tried to politely point out, there are errors in the Pattern Matching table - Example column, as of when he looked at it and as of now. Each instance of "export $var" should be "export var" in bash and most similar shells. Also, the operator in the echo command needs to match exactly the operator in the first column. Interestingly, some but not all of these errors still exist in the original article at http://linuxgazette.net/issue57/eyler.html, which is in issue 57, not 67.
Otherwise, a very good article. I will save the info in my bag of tricks.

You're channelling Larry Wall, dude!
Submitted by Jim Dennis (not verified) on Sat, 03/11/2006 - 18:30.
In Bourne shell and its ilk (like Korn shell and bash) the assignment syntax is:
var=... You only prefix a variable's name with $ when you're "dereferencing" it (expanding it into its value).

So the shell was parsing
export $var="this is a test" as:

export ???="this is a test" (where ??? is whatever "var" was set to before this statement ... probably the empty string if the variable was previously unset).

I know this is confusing because Perl does it completely differently. In Perl the $ is a "sigil" which, on an "lvalue" (a variable name or other assignable token) tells the interpeter what "type" of assignment is occuring. Thus a Perl statement like:
$var="this is a test"; (note the required semicolon, too) is a "scalar" assignment. This also sets the context of the assignment. In Perl a scalar value in scalar context is conceptually the closest to a normal shell variable assignment. However, a list value in a scalar assignment context is a different beast entirely. So a line of Perl like
perl -e '@bar=(1,2,3)]; $var=@bar; print $var ;' will set $var to the number of items in the bar array. (Of course we could use @var for the array name since they are different namespaces in Perl. But I wanted my example to be clear). So an array/list value in scalar context returns an integer (a type of scalar) which represents the number of elements in the list.

Anyway, just remembrer that the shell $ it more like the C programming * operator ... it dereferences the variable into its value.

JimD
The Linux Gazette "Answer Guy"

USA <> World :-)
Submitted by peter.green on Fri, 03/10/2006 - 14:57.
Although the # and % identifiers may not seem obvious, they have a convenient mnemonic. The # key is on the left side of the $ key and operates from the left.
In the USA, perhaps, but my UK keyboard has the # key nestling up against the Enter and right-Shift keys. Not to mention layouts such as Dvorak...!

Other (non-USA-specific?!) Mnemonics
Submitted by Anonymous (not verified) on Sat, 03/18/2006 - 16:29.
Another way to keep track is that we say "#1" and "1%", not "1#" and "%1". That is, unless you're using "#" to mean "pounds", in which case "1#" is correct, but it's antiquated at best in the USA, and presumably a nonissue for other countries that use metric...

C programmers are used to using "#" at the start of lines (#define, #include). LaTeX authors are used to "%" at the end of lines when writing macro definitions, as a comment to keep extraneous whitespace from creeping in--but "%" is comment to end-of-line so it's also likely to show up at the start of a line too...

Mnemonics
Submitted by Island Joe (not verified) on Tue, 03/21/2006 - 04:25.
Thanks for sharing those mnemonic insights, it's most helpful.

[Feb 09, 2011] Pattern matching with replacement

${var:pos[:len]} # extract substr from pos (0-based) for len

${var/substr/repl} # replace first match
${var//substr/repl} # replace all matches
${var/#substr/repl} # replace if matches at beginning (non-greedy)
${var/##substr/repl} # replace if matches at beginning (greedy)
${var/%substr/repl} # replace if matches at end (non-greedy)
${var/%%substr/repl} # replace if matches at end (greedy)

${#var} # returns length of $var
${!var} # indirect expansion

[Feb 09, 2011] : or not to :

Jan 08, 2007

Ash:

Interesting, ":" can be omitted for "numeric" variables (script/function arguments).

baz=${var:-bar}
vs.
baz=${1-bar}
First time I thought it is a typo, but it is not.

bash String Manipulations By Jim Dennis, [email protected]

The bash shell has many features that are sufficiently obscure you almost never see them used. One of the problems is that the man page offers no examples.

Here I'm going t... some of these features to do the sorts of simple string manipulations that are commonly needed on file and path names.

In traditional Bourne shell programming you might see references to the basename and dirname commands. These perform simple string manipulations on their arguments. You'll also see many uses of sed and awk or perl -e to perform simple string manipulations.

Often these machinations are necessary perform on lists of filenames and paths. There are many specialized programs that are conventionally included with Unix to perform these sorts of utility functions: tr, cut, paste, and join. Given a filename like /home/myplace/a.data.directory/a.filename.txt which we'll call $f you could use commands like:

dirname $f 
basename $f 
basename $f.txt

... to see output like:

/home/myplace/a.data.directory
a.filename.txt
a.filename 

Notice that the GNU version of basename takes an optional parameter. This handy for specifying a filename "extension" like .tar.gz which will be stripped off of the output. Note that basename and dirname don't verify that these parameters are valid filenames or paths. They simple perform simple string operations on a single argument. You shouldn't use wild cards with them -- since dirname takes exactly one argument (and complains if given more) and basename takes one argument and an optional one which is not a filename.

Despite their simplicity these two commands are used frequently in shell programming because most shells don't have any built-in string handling functions -- and we frequently need to refer to just the directory or just the file name parts of a given full file specification.

Usually these commands are used within the "back tick" shell operators like TARGETDIR=`dirname $1`. The "back tick" operators are equivalent to the $(...) construct. This latter construct is valid in Korn shell and bash -- and I find it easier to read (since I don't have to squint at me screen wondering which direction the "tick" is slanted).

Although the basename and dirname commands embody the "small is beautiful" spirit of Unix -- they may push the envelope towards the "too simple to be worth a separate program" end of simplicity.

Naturally you can call on sed, awk, TCL or perl for more flexible and complete string handling. However this can be overkill -- and a little ungainly.

So, bash (which long ago abandoned the "small is beautiful" principal and went the way of emacs) has some built in syntactical candy for doing these operations. Since bash is the default shell on Linux systems then there is no reason not to use these features when writing scripts for Linux.

The bash man page is huge. In contains a complete reference to the "readline" libraries and how to write a .inputrc file (which I think should all go in a separate man page) -- and a run down of all the csh "history" or bang! operators (which I think should be replaced with a simple statement like: "Most of the csh work the same way in bash").

However, buried in there is a section on Parameter Substitution which tells us that $var is really a shorthand for ${var} which is really the simplest case of several ${var:operators} and similar constructs.

Are you confused, yet?

Here's where a few examples would have helped. To understand the man page I simply experimented with the echo command and several shell variables. This is what it all means:

Here we notice two different "operators" being used inside the parameters (curly braces). Those are the # and the % operators. We also see them used as single characters and in pairs. This gives us four combinations for trimming patterns off the beginning or end of a string:

${variable%pattern}
Trim the shortest match from the end
${variable##pattern}
Trim the longest match from the beginning
${variable%%pattern}
Trim the shortest match from the end
${variable#pattern}
Trim the shortest match from the beginning

It's important to understand that these use shell "globbing" rather than "regular expressions" to match these patterns. Naturally a simple string like "txt" will match sequences of exactly those three characters in that sequence -- so the difference between "shortest" and "longest" only applies if you are using a shell wild card in your pattern.

A simple example of using these operators comes in the common question of copying or renaming all the *.txt to change the .txt to .bak (in MS-DOS' COMMAND.COM that would be REN *.TXT *.BAK).

This is complicated in Unix/Linux because of a fundamental difference in the programming API's. In most Unix shells the expansion of a wild card pattern into a list of filenames (called "globbing") is done by the shell -- before the command is executed. Thus the command normally sees a list of filenames (like "var.txt bar.txt etc.txt") where DOS (COMMAND.COM) hands external programs a pattern like *.TXT.

Under Unix shells, if a pattern doesn't match any filenames the parameter is usually left on the command like literally. Under bash this is a user-settable option. In fact, under bash you can disable shell "globbing" if you like -- there's a simple option to do this. It's almost never used -- because commands like mv, and cp won't work properly if their arguments are passed to them in this manner.

However here's a way to accomplish a similar result:

for i in *.txt; do cp $i ${i%.txt}.bak; done 

... obviously this is more typing. If you tried to create a shell function or alias for it -- you have to figure out how to pass this parameters. Certainly the following seems simple enough:

function cp-pattern { for i in $1; do cp $i ${i%$1}$2; done

... but that doesn't work like most Unix users would expect. You'd have to pass this command a pair of specially chosen, and quoted arguments like:

cp-pattern '*.txt' .bak 

... note how the second pattern has no wild cards and how the first is quoted to prevent any shell globbing. That's fine for something you might just use yourself -- if you remember to quote it right. It's easy enough to add check for the number of arguments and to ensure that there is at least one file that exists in the $1 pattern. However it becomes much harder to make this command reasonably safe and robust. Inevitably it becomes less "unix-like" and thus more difficult to use with other Unix tools.

I generally just take a whole different approach. Rather than trying to use cp to make a backup of each file under a slightly changed name I might just make a directory (usually using the date and my login ID as a template) and use a simple cp command to copy all my target files into the new directory.

Another interesting thing we can do with these "parameter expansion" features is to iterate over a list of components in a single variable.

For example, you might want to do something to traverse over every directory listed in your path -- perhaps to verify that everything listed therein is really a directory and is accessible to you.

Here's a command that will echo each directory named on your path on it's own line:

p=$PATH until [ $p = $d ]; do d=${p%%:*}; p=${p#*:}; echo $d; 
		done 

... obviously you can replace the echo $d part of this command with anything you like.

Another case might be where you'd want to traverse a list of directories that were all part of a path. Here's a command pair that echos each directory from the root down to the "current working directory":

p=$(pwd) until [ $p = $d ]; do p=${p#*/}; d=${p%%/*}; echo $d; 
		done 

... here we've reversed the assignments to p and d so that we skip the root directory itself -- which must be "special cased" since it appears to be a "null" entry if we do it the other way. The same problem would have occurred in the previous example -- if the value assigned to $PATH had started with a ":" character.

Of course, its important to realize that this is not the only, or necessarily the best method to parse a line or value into separate fields. Here's an example that uses the old IFS variable (the "inter-field separator in the Bourne, and Korn shells as well as bash) to parse each line of /etc/passwd and extract just two fields:

cat /etc/passwd | ( \
   IFS=: ;
   while read lognam pw id gp fname home sh; \
   do echo $home \"$fname\"; done \
)

Here we see the parentheses used to isolate the contents in a subshell -- such that the assignment to IFS doesn't affect our current shell. Setting the IFS to a "colon" tells the shell to treat that character as the separater between "words" -- instead of the usual "whitespace" that's assigned to it. For this particular function it's very important that IFS consist solely of that character -- usually it is set to "space," "tab," and "newline.

After that we see a typical while read loop -- where we read values from each line of input (from /etc/passwd into seven variables per line. This allows us to use any of these fields that we need from within the loop. Here we are just using the echo command -- as we have in the other examples.

My point here has been to show how we can do quite a bit of string parsing and manipulation directly within bash -- which will allow our shell scripts to run faster with less overhead and may be easier than some of the more complex sorts of pipes and command substitutions one might have to employ to pass data to the various external commands and return the results.

Many people might ask: Why not simply do it all in perl? I won't dignify that with a response. Part of the beauty of Unix is that each user has many options about how they choose to program something. Well written scripts and programs interoperate regardless of what particular scripting or programming facility was used to create them. Issue the command file /usr/bin/* on your system and and you may be surprised at how many Bourne and C shell scripts there are in there

In conclusion I'll just provide a sampler of some other bash parameter expansions:

${parameter:-word}
Provide a default if parameter is unset or null.
Example:
echo ${1:-"default"}
Note: this would have to be used from within a functions or shell script -- the point is to show that some of the parameter substitutions can be use with shell numbered arguments. In this case the string "default" would be returned if the function or script was called with no $1 (or if all of the arguments had been shifted out of existence. ${parameter:=word}
Assign a value to parameter if it was previously unset or null.
Example:
echo ${HOME:="/home/.nohome"}
${parameter:?word}
Generate an error if parameter is unset or null by printing word to stdout.
Example:
${HOME:="/home/.nohome"}
${TMP:?"Error: Must have a valid Temp Variable Set"}

This one just uses the shell "null command" (the : command) to evaluate the expression. If the variable doesn't exist or has a null value -- this will print the string to the standard error file handle and exit the script with a return code of one.

Oddly enough -- while it is easy to redirect the standard error of processes under bash -- there doesn't seem to be an easy portable way to explicitly generate message or redirect output to stderr. The best method I've come up with is to use the /proc/ filesystem (process table) like so:

function error { echo "$*" > /proc/self/fd/2 } 

... self is always a set of entries that refers to the current process -- and self/fd/ is a directory full of the currently open file descriptors. Under Unix and DOS every process is given the following pre-opened file descriptors: stdin, stdout, and stderr.

${parameter:+word}
Alternative value. ${TMP:+"/mnt/tmp"}
use /mnt/tmp instead of $TMP but do nothing if TMP was unset. This is a weird one that I can't ever see myself using. But it is a logical complement to the ${var:-value} we saw above.
${#variable}
Return the length of the variable in characters.
Example:

    echo The length of your PATH is ${#PATH}

Manipulating Strings

From Advanced Bash-Scripting Guide: Chapter 10. Manipulating Variables

Bash supports a number of string manipulation operations. Unfortunately, these tools lack a unified focus. Some are a subset of parameter substitution, and others fall under the functionality of the UNIX expr command. This results in inconsistent command syntax and overlap of functionality, not to mention confusion.

expr match "$string" '\($substring\)'
Extracts $substring at beginning of $string, where $substring is a regular expression.
expr "$string" : '\($substring\)'
Extracts $substring at beginning of $string, where $substring is a regular expression.
stringZ=abcABC123ABCabc
#     =======	  echo `expr match "$stringZ" '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1
echo `expr "$stringZ" : '\(.[b-c]*[A-Z]..[0-9]\)'`     # abcABC1
echo `expr "$stringZ" : '\(.......\)'`                 # abcABC1
# All of the above forms give an identical result.
expr match "$string" '.*\($substring\)'
Extracts $substring at end of $string, where $substring is a regular expression.
expr "$string" : '.*\($substring\)'
Extracts $substring at end of $string, where $substring is a regular expression.

stringZ=abcABC123ABCabc
#              ======

echo `expr match "$stringZ" '.*\([A-C][A-C][A-C][a-c]*\)'`  # ABCabc
echo `expr "$stringZ" : '.*\(......\)'`                     # ABCabc
Substring Removal
${string#substring}
Strips shortest match of $substring from front of $string.
${string##substring}
Strips longest match of $substring from front of $string.
stringZ=abcABC123ABCabc
#     |----|
#     |----------|

echo ${stringZ#a*C}    # 123ABCabc
# Strip out shortest match between 'a' and 'C'.

echo ${stringZ##a*C}   # abc
# Strip out longest match between 'a' and 'C'.

${string%substring}
Strips shortest match of $substring from back of $string.
${string%%substring}
Strips longest match of $substring from back of $string.

stringZ=abcABC123ABCabc
#                  ||
#      |------------|

echo ${stringZ%b*c}    # abcABC123ABCa
# Strip out shortest match between 'b' and 'c', from back of $stringZ.

echo ${stringZ%%b*c}   # a
# Strip out longest match between 'b' and 'c', from back of $stringZ.
Example 9-10. Converting graphic file formats, with filename change
#!/bin/bash
#  cvt.sh:
#  Converts all the MacPaint image files in a directory to "pbm" format.

#  Uses the "macptopbm" binary from the "netpbm" package,
#+ which is maintained by Brian Henderson ([email protected]).
#  Netpbm is a standard part of most Linux distros.

OPERATION=macptopbm
SUFFIX=pbm        # New filename suffix.

if [ -n "$1" ]
then
  directory=$1    # If directory name given as a script argument...
else
  directory=$PWD  # Otherwise use current working directory.
fi 
 
#  Assumes all files in the target directory are MacPaint image files,
# + with a ".mac" suffix.

for file in $directory/*  # Filename globbing.
do
  filename=${file%.*c}    #  Strip ".mac" suffix off filename
                          #+ ('.*c' matches everything
			  #+ between '.' and 'c', inclusive).
  $OPERATION $file > $filename.$SUFFIX
                          # Redirect conversion to new filename.
  rm -f $file             # Delete original files after converting. echo "$filename.$SUFFIX"  # Log what is happening to stdout.
done

exit 0
Substring Replacement
${string/substring/replacement}
Replace first match of $substring with $replacement.
${string//substring/replacement}
Replace all matches of $substring with $replacement.
stringZ=abcABC123ABCabc

echo ${stringZ/abc/xyz}         # xyzABC123ABCabc
                                # Replaces first match of 'abc' with 'xyz'.

echo ${stringZ//abc/xyz}        # xyzABC123ABCxyz
                                # Replaces all matches of 'abc' with # 'xyz'.

${string/#substring/replacement}
If $substring matches front end of $string, substitute $replacement for $substring.
${string/%substring/replacement}
If $substring matches back end of $string, substitute $replacement for $substring.
stringZ=abcABC123ABCabc

echo ${stringZ/#abc/XYZ}        # XYZABC123ABCabc
                                # Replaces front-end match of 'abc' with 'xyz'.

echo ${stringZ/%abc/XYZ}        # abcABC123ABCXYZ
                                # Replaces back-end match of 'abc' with 'xyz'.

Manipulating strings using awk

A Bash script may invoke the string manipulation facilities of awk as an alternative to using its built-in operations.

Example 9-11. Alternate ways of extracting substrings
#!/bin/bash
# substring-extraction.sh

String=23skidoo1
#    012345678  Bash
#    123456789  awk
# Note different string indexing system:
# Bash numbers first character of string as '0'.
# Awk  numbers first character of string as '1'.

echo ${String:2:4} # position 3 (0-1-2), 4 characters long
                                       # skid

# The awk equivalent of ${string:pos:length} is substr(string,pos,length).
echo | awk '
{ print substr("'"${String}"'",3,4)    # skid
}
'
#  Piping an empty "echo" to awk gives it dummy input,
#+ and thus makes it unnecessary to supply a filename.

exit 0

For more on string manipulation in scripts, refer to Section 9.3 and the relevant section of the expr command listing. For script examples, see:

  1. Example 12-6
  2. Example 9-14
  3. Example 9-15
  4. Example 9-16
  5. Example 9-18

David Korn Tells All

# More to the point, thanks to the way ksh works, you can do this:
# make an array, words, local to the current function

typeset -A words

# read a full line

read line

# split the line into words

echo "$line" | read -A words

# Now you can access the line either word-wise or string-wise - useful if you want to, say, check for a command as the Nth parameter,
# but also keep formatting of the other parameters...

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: May 10, 2021