file-243119101

Better Bashing: 4 Tips to Strengthen DevOps using Bash

describe the imageYou might know bash as the old-fashioned, command-line, lowest-common-denominator fossil built into Linux. It’s actually much more. Here are a handful of reasons to reconsider bash for automation – including, perhaps, things you didn’t know bash could accomplish.

Smart devOps and sysadmins have looked for ways to get the computer to do the work for them since… well, long before anyone used the term “devOps.” In recent years, the vehicles for these automations have been fashionable dynamic languages such as Python, Ruby, JavaScript, or Visual Basic.

Often lost in the shuffle is the Unix shell. Even those who recognize that “… it should still be your first automation choice” sometimes aren’t aware of the many features offered by a modern shell, such as bash 3.0 or higher. Here are four examples where bash may be the right tool.

1. Parse content or a configuration string

One hazard of a long and successful life is that longtime acquaintances might not realize you moved on past your old habits. That’s one of bash’s afflictions: It successfully handled chores 20 years ago that people assume should still be done the same way today.

Not so. Consider a common case: A bash script needs to retrieve a value from within an HTML (or XML) instance. Suppose the document looks like

          ...
        <h1>The Title</h1>
          ...      

and the script needs “The Title” for further calculations.

First, put aside the perception that friends (mostly) don’t let friends parse HTML (or XML). The recommended alternative, of course, is to use a real parser. While there are indeed many available for command-line invocation, including XMLStarlet, xpath, and xmllint, these often aren’t available on “vanilla” Unix or Linux servers. More than that, situations where bash is involved often lack a requirement to parse fully-general HTML or XML; they only need to extract one small value from a restricted universe of document instances. In such a circumstance, it’s more robust to code a simplistic parser in bash than to assume that an external package like XMLStarlet is available.

That sort of analysis has led to years of computations like:

        TITLE=$(grep '<h1>' $INSTANCE | 
                awk -F '>' '{print $2}' | 
                awk -F '<' '{print $1}')     

Quick searches of public code corpora show plenty of uses of formulae like this.

You can do better, though. As comfortable as grep and awk were for those trained in Linux before 2000, sed should be equally familiar. sed lets us write:

        TITLE=$(sed -n -e 's/<h1>(.*)</h1>/1/p' $INSTANCE)       

Buried in all the quoting and escaping is the regular expression <h1>(.*)</h1>, which nicely captures the idea behind this particular calculation. For more about what sed can do, incidentally, our team likes Peteris Krumin’s Sed One-liners Explained.

That’s not all, though. If you’re using at least 3.0 of bash, first released in the second half of 2004, you can write:

       REGULAR_EXPRESSION='<h1>(.*)</h1>'
       TITLE=$([[ `cat $INSTANCE` =~ $REGULAR_EXPRESSION ]] && 
                                  echo ${BASH_REMATCH[1]})    

While this is a few characters more than the sed version, bash makes the regular expression much easier to read and therefore more maintainable. If content is already in bash variables, use of =~ also saves an external process.

One common complaint about bash, early in its history – and the impetus for many to turn to Perl, Python, and so on – was that effective bash coding required use of so many “helper” utilities, such as awk and sed. As this section illustrates, recent bash versions largely eliminated that particular motivation; many more common scripting calculations can now be done in “pure” bash.

While nearly all Unix installations operate with bash at 3.0 or later, and thus enjoy full regular-expression matching, there remains a version skew related to regular expressions that deserves warning: versions 3.x and 4.x treat quoting defaults slightly differently on the right-hand side of =~. Don’t let this surprise you. There are at least a couple of workarounds, including a compatibility flag in bash 4.x.

2. Dispose of variable variables

If you can advance at least to 3.2 of bash, associative arrays become available. What’s interesting about associative arrays (also called hashes, dictionaries, and so on) is that, like regular expression matching, they were often mimicked in a clumsy way in older scripts. Suppose, for instance, you had:

Alabama=Montgomery
Alaska=Juneau
Arizona=Phoenix

for STATE in Alabama Alaska Arizona
do
        # Here are the two most popular ways to doubly
        # dereference "variable variables".
           echo "The capital of $STATE is ${!STATE}."
           eval echo "The capital of $STATE is $${STATE}."
done      

Does this look familiar? It remains common to this day to come across “variable variables,” not just in bash, but also in Perl and PHP (and, less often, in many other languages).

As frequent as these sorts of uses are, they present a number of difficulties. Arkansas requires a little care:

# These are two common alternatives.
Arkansas=Little Rock
Arkansas='Little Rock'     

but the pattern truly breaks down at New Hampshire, because the names of bash variables cannot embed whitespace.

A more satisfying alternative is:

declare -A CAPITAL
CAPITAL[Alabama]=Montgomery
CAPITAL[Alaska]=Juneau
CAPITAL[Arizona]=Phoenix
CAPITAL[New Mexico]="Santa Fe"
 
for STATE in "${!CAPITAL[@]}"
do
    echo "The capital of $STATE is ${CAPITAL[$STATE]}."
done     

Associative arrays not only gracefully handle whitespace in both the key (like “New Mexico”) and value (“Santa Fe”), but they allow for far more convenient expression of computations on the array as a whole. In this example, there’s no need to re-list the names of the states; we can just iterate over "${!CAPITAL[@]}". Associative arrays are a considerable step forward for bash, and you’ll soon wonder how you got along without them.

This same example also illustrates an ongoing limitation of shell scripting. The previous example was evidence that bash now builds in most common programming operations. “Most” is not “all,” though, and real-world scripts often need to “call out” to the external sort process. As it stands, the last script above does not alphabetize the STATEs. To do so effectively requires sort, and in a clumsy invocation besides. Other popular scripting languages build in convenient sort functions.

3. functions as almost-first-class

Other notorious limitations of bash are its lack of built-in support for Unicode, testing, exception handling, and all but the most rudimentary type systems. bash’s type system, for instance, essentially regards everything as a string (a marked enough choice that programming culture reifies the slogan EIAS –”everything is a string” — on occasion), and doesn’t even recognize functions as first-class values.

Clever use of eval, though, makes it possible to model nearly all first-class handling of functions, including currying, use of functions as arguments or return values, and so on. Here is a hint of what’s possible:

       f1() {
           echo "This is the function f1."
       }
 
       f2() {
           echo "This is the function f2."
       }
 
       for COUNTER in 1 2
       do
           FUNCTION=f$COUNTER
           eval $FUNCTION
       done    

This script prints out

      This is the function f1.
      This is the function f2.    

as a demonstration that a function name itself can be variable.

4. printf’s power

Along with use of regular expressions and associative arrays, expert bash-ers go far, especially in preparation of reports, with the command-line printf executable. As much as practical, command-line printf matches the behavior of C’s familiar printf from its run-time library. An example:

      printf %7d%4d\n 1 12 123 1234    

produces output in columns

          1  12
        1231234    

How far one can go

Sufficiently motivated bash-ers have achieved surprising results. Here, for example, is a JSON parser that bash scripts can use (it is not unique). Enthusiasts have also demonstrated that bash can play chess, implement the simplex algorithm, and much more.

The point for today is not to encourage such exploits, which frankly smack of “extreme programming.” For me, what’s truly compelling about bash is that, with just an idea or two beyond what are already widely known, many common programming chores become simpler and quicker. For more ideas on what bash can achieve, see its FAQ.

See also:



subscribe-to-our-blog




Golden Age APIs-Webinar - CTA


  • http://dev2ops.org Alex Honor

    Enjoyed your post. Bash is often misaligned as a bad language but that’s down to it being used like a general purpose language. Bash like other shells plays two modes: interactive shell and scripting tool. As a scripting tool bash uses a special language used for running commands. The commands do the real heaving lifting. Another pitfall is bash’s default mode of *not* failing fast. Every script should begin with `set -eu`. 
     
    Bash and shell scripts can also suffer from lack of modularity. A shell script can become a big bloated sprawl.  
    [Rerun](http://github.com/rerun/rerun) for example, is a framework that helps keep scripts well encapsulated as named commands and encourages the fail-fast mode of execution through its standard script templates.

  • http://regularexpressions.com Cameron Laird

    Well said, Alex Honor! bash is widely used; with just a few low-cost improvements (like more careful modularization), it can be well used.

  • Cameron Laird

    Thank you, Damon; I’m glad you provided these details and explanation about Rerun. What you write here helps *me*.

  • Paddy3118

    Your use of regexps to parse XML can lead to “XML” config files that are extremely brittle in that they have to be written in a miniscule subset of the language and become less robust the more scripts rely on them. It is usually better to use an XML parser for an XML file (and generate a DTD too). Using an XML parser would allow you to add XML comments to an XML file about the contents of the XML file, safe in the knowledge that if it is valid XML then it is likely to work on all parts of the system; and that adding extra, insignificant to XML whitespace (such as automatic prettyprinting), is more likely to work.

  • Pingback: Tagging the Web Daily 11/17/2013 | PAB Skunkworks Weblog