From  CertCities.com
Column
Inside the Kernal
Using Obscure Command Line Utilities
Emmett solves a child's homework assignment the best way he knows how -- by using a couple of Unix/Linux utilities that rarely see the light of day.

by Emmett Dulaney

1/15/2009 -- One of the great things about the Unix/Linux operating systems is the sheer number of utilities included. If you can get an idea of the necessary steps, you can perform almost any text manipulation imaginable.

This month, I'll illustrate that very fact using some of the more obscure utilities that are included in most distributions.

Pulling Words from a Dictionary
One of my kids came home from school the other day with this assignment: Make a list of words that, when spelled backward, make other words. While he could include palindromes (words that spell the same thing both ways, like "level"), he didn't have to limit his list to just those. For instance, "bats" was fair game.

Even though the teacher required only a handful of words for his assignment, a light bulb went on in my head: What's to stop us from finding all such words that exist in a standard dictionary?

A list of standard dictionary words exists in most distributions as a text file (/usr/share/dict/linux.words), a carryover from the days when vi was a revolutionary text editor and you checked for spelling errors by running the spell utility. This utility would look at the words in your document and pinpoint any that didn't exist in the dictionary file as possible spelling errors. (It was a text-based utility, so you could add words you didn't want to get marked as errors, like your name, into the file.)

The great thing about this file is that it contains standard words (no slang, etc.) and it seemed like a great starting point for our word hunt. Granted, it doesn't contain words recently added to the vernacular, but I was willing to live with that. The file is in alphabetical order and each word is written on a separate line with no duplicate entries.

Since the operation being done was a one-time thing, I saw no need to create a shell script. Instead, I ran each command at the command line individually. I also created temporary files as I went along, and simply removed them when I was finished. If we were going to perform this kind of an operation regularly, a shell script should be created and files should be created and removed only as needed.

After trying to determine the ideal way to approach this task, it occurred to me that the best way to find out if a word is still a word when reversed was to see if the reversed word also appeared in the linux.words file. For example, when "but" is reversed it becomes "tub," which is in the file and so should be kept. But when "buy" is reversed it becomes "yub," which isn't in the file and should be tossed.

The first order of business? Reverse each word in the linux.words file. Since each word is written on a separate line, the rev utility -- the sole purpose of which is to reverse the characters in each line -- was able to do this very task. I ran this utility against the linux.words file and wrote the output to a file called temp1.txt.

At this point, an entry that existed in both files would indeed be a word that was still a word when written backward. On the other hand, a word appearing only once (in either file, logically) would be a word that didn't become another word when written backward. Rather than trying to find entries in both files -- a doable task, but one that would require more work -- I decided to combine the files and look for duplicate entries.

I combined the files using simple concatenation: cat linux.words temp1.txt > temp2.txt. While it's possible to append one file to the other (cat linux.words >> temp1.txt), I wanted to keep the first temp file intact in case what I was trying didn't work out and I needed to try something different.

With all the words now in one file, the next order of business was to sort the file alphabetically (using sort) and then look only for entries that appeared more than once (using uniq -d). The sort operation was necessary to appease uniq, because the way it works is by sequentially moving through the file and looking at two lines at a time. If the two lines match -- which would be possible only after a sort -- then they're considered duplicates. Otherwise, they're considered unique entities. By default, uniq will print each unique line. Since that's not what we wanted in this case, the -d option is used to print only lines that are duplicates -- in other words, words that spell other words.

In the end, we got exactly what we needed: a complete list of all the words that spelled other words when reversed. Not only was it an interesting learning experience to read the final list -- I expected "flow and "wolf," for example, but "decaf" and "faced" not so much -- it was also great to dust off some wonderful utilities that often languish on the hard drive, waiting for just such a use.


Emmett Dulaney is the author of several books on Linux, Unix and certification. He can be reached at .

 

 

top

Copyright 2000-2009, 101communications LLC. See our Privacy Policy.
For more information, e-mail .