Advanced Bash-Scripting Guide: An in-depth exploration of the art of shell scripting | ||
---|---|---|
Prev | Chapter 9. Variables Revisited | Next |
Bash supports a surprising number of string manipulation operations. Unfortunately, these tools lack a unified focus. Some are a subset of parameter substitution, and others fall under the functionality of the UNIX expr command. This results in inconsistent command syntax and overlap of functionality, not to mention confusion.
String Length
1 stringZ=abcABC123ABCabc 2 3 echo ${#stringZ} # 15 4 echo `expr length $stringZ` # 15 5 echo `expr "$stringZ" : '.*'` # 15 |
Example 9-10. Inserting a blank line between paragraphs in a text file
1 #!/bin/bash 2 # paragraph-space.sh 3 4 # Inserts a blank line between paragraphs of a single-spaced text file. 5 # Usage: $0 <FILENAME 6 7 MINLEN=45 # May need to change this value. 8 # Assume lines shorter than $MINLEN characters 9 #+ terminate a paragraph. 10 11 while read line # For as many lines as the input file has... 12 do 13 echo "$line" # Output the line itself. 14 15 len=${#line} 16 if [ "$len" -lt "$MINLEN" ] 17 then echo # Add a blank line after short line. 18 fi 19 done 20 21 exit 0 |
Length of Matching Substring at Beginning of String
$substring is a regular expression.
$substring is a regular expression.
1 stringZ=abcABC123ABCabc 2 # |------| 3 4 echo `expr match "$stringZ" 'abc[A-Z]*.2'` # 8 5 echo `expr "$stringZ" : 'abc[A-Z]*.2'` # 8 |
Index
Numerical position in $string of first character in $substring that matches.
1 stringZ=abcABC123ABCabc 2 echo `expr index "$stringZ" C12` # 6 3 # C position. 4 5 echo `expr index "$stringZ" 1c` # 3 6 # 'c' (in #3 position) matches before '1'. |
This is the near equivalent of strchr() in C.
Substring Extraction
Extracts substring from $string at $position.
If the $string parameter is "*" or "@", then this extracts the positional parameters, [1] starting at $position.
Extracts $length characters of substring from $string at $position.
1 stringZ=abcABC123ABCabc 2 # 0123456789..... 3 # 0-based indexing. 4 5 echo ${stringZ:0} # abcABC123ABCabc 6 echo ${stringZ:1} # bcABC123ABCabc 7 echo ${stringZ:7} # 23ABCabc 8 9 echo ${stringZ:7:3} # 23A 10 # Three characters of substring. 11 12 13 14 # Is it possible to index from the right end of the string? 15 16 echo ${stringZ:-4} # abcABC123ABCabc 17 # Defaults to full string, as in ${parameter:-default}. 18 # However . . . 19 20 echo ${stringZ:(-4)} # Cabc 21 echo ${stringZ: -4} # Cabc 22 # Now, it works. 23 # Parentheses or added space "escape" the position parameter. 24 25 # Thank you, Dan Jacobson, for pointing this out. |
If the $string parameter is "*" or "@", then this extracts a maximum of $length positional parameters, starting at $position.
1 echo ${*:2} # Echoes second and following positional parameters. 2 echo ${@:2} # Same as above. 3 4 echo ${*:2:3} # Echoes three positional parameters, starting at second. |
Extracts $length characters from $string starting at $position.
1 stringZ=abcABC123ABCabc 2 # 123456789...... 3 # 1-based indexing. 4 5 echo `expr substr $stringZ 1 2` # ab 6 echo `expr substr $stringZ 4 3` # ABC |
Extracts $substring at beginning of $string, where $substring is a regular expression.
Extracts $substring at beginning of $string, where $substring is a regular expression.
1 stringZ=abcABC123ABCabc 2 # ======= 3 4 echo `expr match "$stringZ" '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1 5 echo `expr "$stringZ" : '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1 6 echo `expr "$stringZ" : '\(.......\)'` # abcABC1 7 # All of the above forms give an identical result. |
Extracts $substring at end of $string, where $substring is a regular expression.
Extracts $substring at end of $string, where $substring is a regular expression.
1 stringZ=abcABC123ABCabc 2 # ====== 3 4 echo `expr match "$stringZ" '.*\([A-C][A-C][A-C][a-c]*\)'` # ABCabc 5 echo `expr "$stringZ" : '.*\(......\)'` # ABCabc |
Substring Removal
Strips shortest match of $substring from front of $string.
Strips longest match of $substring from front of $string.
1 stringZ=abcABC123ABCabc 2 # |----| 3 # |----------| 4 5 echo ${stringZ#a*C} # 123ABCabc 6 # Strip out shortest match between 'a' and 'C'. 7 8 echo ${stringZ##a*C} # abc 9 # Strip out longest match between 'a' and 'C'. |
Strips shortest match of $substring from back of $string.
For example:
1 # Rename all filenames in $PWD with "TXT" suffix to a "txt" suffix. 2 # For example, "file1.TXT" becomes "file1.txt" . . . 3 4 SUFF=TXT 5 suff=txt 6 7 for i in $(ls *.$SUFF) 8 do 9 mv -f $i ${i%.$SUFF}.$suff 10 # Leave unchanged everything *except* the shortest pattern match 11 #+ starting from the right-hand-side of the variable $i . . . 12 done ### This could be condensed into a "one-liner" if desired. 13 14 # Thank you, Rory Winston. |
Strips longest match of $substring from back of $string.
1 stringZ=abcABC123ABCabc 2 # || 3 # |------------| 4 5 echo ${stringZ%b*c} # abcABC123ABCa 6 # Strip out shortest match between 'b' and 'c', from back of $stringZ. 7 8 echo ${stringZ%%b*c} # a 9 # Strip out longest match between 'b' and 'c', from back of $stringZ. |
This operator is useful for generating filenames.
Example 9-11. Converting graphic file formats, with filename change
1 #!/bin/bash 2 # cvt.sh: 3 # Converts all the MacPaint image files in a directory to "pbm" format. 4 5 # Uses the "macptopbm" binary from the "netpbm" package, 6 #+ which is maintained by Brian Henderson (bryanh@giraffe-data.com). 7 # Netpbm is a standard part of most Linux distros. 8 9 OPERATION=macptopbm 10 SUFFIX=pbm # New filename suffix. 11 12 if [ -n "$1" ] 13 then 14 directory=$1 # If directory name given as a script argument... 15 else 16 directory=$PWD # Otherwise use current working directory. 17 fi 18 19 # Assumes all files in the target directory are MacPaint image files, 20 #+ with a ".mac" filename suffix. 21 22 for file in $directory/* # Filename globbing. 23 do 24 filename=${file%.*c} # Strip ".mac" suffix off filename 25 #+ ('.*c' matches everything 26 #+ between '.' and 'c', inclusive). 27 $OPERATION $file > "$filename.$SUFFIX" 28 # Redirect conversion to new filename. 29 rm -f $file # Delete original files after converting. 30 echo "$filename.$SUFFIX" # Log what is happening to stdout. 31 done 32 33 exit 0 34 35 # Exercise: 36 # -------- 37 # As it stands, this script converts *all* the files in the current 38 #+ working directory. 39 # Modify it to work *only* on files with a ".mac" suffix. |
Example 9-12. Converting streaming audio files to ogg
1 #!/bin/bash 2 # ra2ogg.sh: Convert streaming audio files (*.ra) to ogg. 3 4 # Uses the "mplayer" media player program: 5 # http://www.mplayerhq.hu/homepage 6 # Appropriate codecs may need to be installed for this script to work. 7 # Uses the "ogg" library and "oggenc": 8 # http://www.xiph.org/ 9 10 11 OFILEPREF=${1%%ra} # Strip off the "ra" suffix. 12 OFILESUFF=wav # Suffix for wav file. 13 OUTFILE="$OFILEPREF""$OFILESUFF" 14 E_NOARGS=65 15 16 if [ -z "$1" ] # Must specify a filename to convert. 17 then 18 echo "Usage: `basename $0` [filename]" 19 exit $E_NOARGS 20 fi 21 22 23 ########################################################################## 24 mplayer "$1" -ao pcm:file=$OUTFILE 25 oggenc "$OUTFILE" # Correct file extension automatically added by oggenc. 26 ########################################################################## 27 28 rm "$OUTFILE" # Delete intermediate *.wav file. 29 # If you want to keep it, comment out above line. 30 31 exit $? 32 33 # Note: 34 # ---- 35 # On a Website, simply clicking on a *.ram streaming audio file 36 #+ usually only downloads the URL of the actual audio file, the *.ra file. 37 # You can then use "wget" or something similar 38 #+ to download the *.ra file itself. 39 40 41 # Exercises: 42 # --------- 43 # As is, this script converts only *.ra filenames. 44 # Add flexibility by permitting use of *.ram and other filenames. 45 # 46 # If you're really ambitious, expand the script 47 #+ to do automatic downloads and conversions of streaming audio files. 48 # Given a URL, batch download streaming audio files (using "wget") 49 #+ and convert them. |
A simple emulation of getopt using substring extraction constructs.
Example 9-13. Emulating getopt
1 #!/bin/bash 2 # getopt-simple.sh 3 # Author: Chris Morgan 4 # Used in the ABS Guide with permission. 5 6 7 getopt_simple() 8 { 9 echo "getopt_simple()" 10 echo "Parameters are '$*'" 11 until [ -z "$1" ] 12 do 13 echo "Processing parameter of: '$1'" 14 if [ ${1:0:1} = '/' ] 15 then 16 tmp=${1:1} # Strip off leading '/' . . . 17 parameter=${tmp%%=*} # Extract name. 18 value=${tmp##*=} # Extract value. 19 echo "Parameter: '$parameter', value: '$value'" 20 eval $parameter=$value 21 fi 22 shift 23 done 24 } 25 26 # Pass all options to getopt_simple(). 27 getopt_simple $* 28 29 echo "test is '$test'" 30 echo "test2 is '$test2'" 31 32 exit 0 33 34 --- 35 36 sh getopt_example.sh /test=value1 /test2=value2 37 38 Parameters are '/test=value1 /test2=value2' 39 Processing parameter of: '/test=value1' 40 Parameter: 'test', value: 'value1' 41 Processing parameter of: '/test2=value2' 42 Parameter: 'test2', value: 'value2' 43 test is 'value1' 44 test2 is 'value2' |
Substring Replacement
Replace first match of $substring with $replacement.
Replace all matches of $substring with $replacement.
1 stringZ=abcABC123ABCabc 2 3 echo ${stringZ/abc/xyz} # xyzABC123ABCabc 4 # Replaces first match of 'abc' with 'xyz'. 5 6 echo ${stringZ//abc/xyz} # xyzABC123ABCxyz 7 # Replaces all matches of 'abc' with # 'xyz'. |
If $substring matches front end of $string, substitute $replacement for $substring.
If $substring matches back end of $string, substitute $replacement for $substring.
1 stringZ=abcABC123ABCabc 2 3 echo ${stringZ/#abc/XYZ} # XYZABC123ABCabc 4 # Replaces front-end match of 'abc' with 'XYZ'. 5 6 echo ${stringZ/%abc/XYZ} # abcABC123ABCXYZ 7 # Replaces back-end match of 'abc' with 'XYZ'. |
A Bash script may invoke the string manipulation facilities of awk as an alternative to using its built-in operations.
Example 9-14. Alternate ways of extracting substrings
1 #!/bin/bash 2 # substring-extraction.sh 3 4 String=23skidoo1 5 # 012345678 Bash 6 # 123456789 awk 7 # Note different string indexing system: 8 # Bash numbers first character of string as '0'. 9 # Awk numbers first character of string as '1'. 10 11 echo ${String:2:4} # position 3 (0-1-2), 4 characters long 12 # skid 13 14 # The awk equivalent of ${string:pos:length} is substr(string,pos,length). 15 echo | awk ' 16 { print substr("'"${String}"'",3,4) # skid 17 } 18 ' 19 # Piping an empty "echo" to awk gives it dummy input, 20 #+ and thus makes it unnecessary to supply a filename. 21 22 exit 0 |
For more on string manipulation in scripts, refer to Section 9.3 and the relevant section of the expr command listing. For script examples, see:
[1] | This applies to either command-line arguments or parameters passed to a function. |