CertCities.com | Column: Fast, Cheap and Partially Effective: Do-It-Yourself Open-Source Tools for System Management

CertCities.com -- The Ultimate Site for Certified IT Professionals

Certification Communities:

Home

Microsoft®

Cisco®

Oracle®

A+/Network+"

Linux/Unix

CertCities.com is proud to present our sponsor this month: Thomson Prometric

Editorial

Choose a Cert

News

Exam Reviews

Features

Columns

Salary Surveys

Free Newsletter

Cert on the Cheap

Resources

Forums

Practice Exams

Cert Basics

Links Library

Tips

Pop Quiz

Industry Releases

Windows Certs

Job Search

Conferences

Contributors

About Us

Advanced Search

Let us know what you
think! E-mail us at:

Home

Certifications

Linux Unix

Columns

Column Story

Tuesday, August 05, 2003

James Ervin

PRINTABLE FORMAT

E_MAIL STORY

POST COMMENTS

MORE COLUMNS

Fast, Cheap and Partially Effective: Do-It-Yourself Open-Source Tools for System Management

Got more machines to manage but less money to spend on a wholesale management solution? Our Unix guru James Ervin offers this guide to creating your own.

by James Ervin

1/23/2002 -- According to ZDNet 2001 was a "grim" year during which anemic hardware sales were partially offset by tremendous growth in rack-optimized servers. By way of their standardized small sizes and interchangeability, rack-optimized servers can dramatically increase a datacenter's capacity -- so "grim" seems to equate to "less profitable" in this sense, and doesn't connote a decrease in the server population. For the system administrator, this is a somber state of affairs: Economies of scale mean more servers to manage.

Managing diverse systems via a single point of entry -- Web page, command line, or standalone application -- is undeniably convenient. However, most vendors provide proprietary management suites, to encourage their clients to develop homogenous computing infrastructures, while standard methods for managing heterogeneous environments remain rare. If you've already downloaded and installed a tool like IBM's Cluster Systems Management (CSM), purchasing additional IBM servers makes sense. Similarly, Sun freely distributes its own Sun Management Center (SMC), a comprehensive suite of management tools for Sun hardware -- which includes everything exceptthe modules that offer the interesting capabilities, such as hardware diagnostics. The distance between these sales tactics and a crack dealer's is not spacious: In both cases, the first hit is free.

Monitoring a system requires only that a machine behave normally, or not -- for instance, you might ping a machine or request a Web page to see if it responds. Managing a system, on the other hand, requires the ability to initiate corrective action. IBM's CSM and Sun's SMC are typical agent-based management suites, in which a process on the target (client) machine known as the agent listens for commands from a management server and performs any prescribed actions. Agent-based management is simply an extension of the client-server model. Typically, agents also enhance monitoring capabilities. Multi-platform systems management requires a multi-platform agent. Numerous commercial management suites provide such agents, including Tivoli and CA Unicenter. Most commercial suites also imbue their agent software with internal logic, enabling it to take corrective action without human intervention -- imagine a car capable of changing its own oil. These are often called intelligent agents.

Links to Additional Information
(in order of appearance in the article)

"Blades Shine in Grim Server Market" -- ZDNet article.
IBM Cluster Systems Management for AIX 5L Technical Reference -- Contains a good explanation of IBM's dsh command. Useful by comparison with the GPL'ed dsh command for those who might want to add features or construct their own distributed shell. Requires a fake name to login.
OpenSSH Key Management (Parts 1 and 2) -- Excellent two-part series on configuring RSA/DSA authentication for password-free logins.
DSH: The Distributed Shell -- Perl-based replacement for IBM's dsh. Works on any platform with Perl 5.005_03 or higher.
Remote Update -- remote_update.pl; Excellent alternative to dsh.pl with a slightly different feature set.
FSH -- Fast Remote Command Execution; A replacement/add-on for SSH that allows you to reuse a single SSH session for multiple remote commands; can speed things up considerably.
BitCluster -- Emergent toolkit for distributed computing; shows promise. --J.E.

However, the cost of licensing, installing and maintaining a commercial management suite can be prohibitive. Are there any alternatives for the smaller datacenter, unwilling to invest in a wholesale management solution, but where the number and type of machines is becoming unmanageable? Building a systems management platform from scratch seems like a very unappetizing prospect. However, let's begin from a simple assumption:Whatever management solution we develop, we would like it to:

1) Be Platform-Independent
2) Be Secure
3) Permit the Execution of Arbitrary Commands on a Target Machine
4) Address Multiple Machines Simultaneously.

Remote Command Execution
On some Unix and Unix-like systems the rcmd command, short for "remote command," allows you to execute a command on a remote machine (on which an agent-like process must be running). Some variant of rcmd is available on most versions of Unix. At first glance, rcmd would seem to be a good foundation for a "roll your own" management tool. Unfortunately, rcmd and its relations are now unadvisable, as they do not encrypt their communication. Even if your organization resides behind a firewall, the majority of security breaches originate internally. Therefore, any management system should be protected as if there were nothing between you and the outer dark.

SSH, or the Secure Shell, should be familiar as a secure alternative for the telnet protocol. Though not the original version of SSH, the OpenSSH package is widely regarded as the best version of SSH, if only for its price and cross-platform availability: It has been ported to most of the major Unix and Unix-like platforms and is freely available under the BSD license. A less well-known feature of SSH is that it permits remote command execution, just like rcmd. Thus, the sshd server program will function as the software agent in our patchwork management framework.

In the absence of specific instructions, ssh simply opens an interactive shell on the remote machine, just as if you had opened a telnet session. Given a command, however, SSH will execute the command, return the output to your screen nd then exit. Here, I determine the platform type of a remote machine named Constantine.

$ ssh constantine "uname -a"
james@constantine's password:
OpenBSD constantine 3.0 GENERIC#94 i386
$

Multiple commands can be executed by separating them with a semicolon, just like on a normal Unix command line. Here, I check to see how many files are in my /tmp directory (since I could determine this with one "ls" command, this is a poor, but utilitarian example):

$ ssh constantine "cd /tmp ; ls -1 | wc -l"
jervin@constantine's password:
0
$

I can even open up a text editor and edit a file on the remote machine interactively. Some interactive commands like the vi text editor require that I add the "-t" switch to the ssh command line, in order to emulate a real terminal session. This command line would allow me to edit the /etc/passwd file on the machine constantine as the root user:

$ ssh -l root -t constantine "vi /etc/passwd"

So the Secure Shell seems like a good starting point for any homegrown management system: it's platform-independent, uses strong encryption for security, and can execute arbitrary commands. Now, can we access more than one system at a time? Can we make the Secure Shell, in essence, a distributed shell?

Distributed Shells
Efforts to create a true multi-platform, distributed shell are probably bound to fail, since Unix systems, and shells in particular, are so idiosyncratic. Vendors who've attacked this problem have been forced into unusual compromises, even when concerned with a single platform. For instance, Sun provides a "distributed shell" of sorts. It works by opening multiple terminal windows, one per machine, then mirroring mouse movements and keyboard input between a "master" window and several "slaves." Perhaps I'm not familiar enough with X Windows, but this doesn't strike me as particularly reliable or robust, nor scalable beyond a handful of machines. Other vendors, such as Tivoli, in order to assure a standard environment for their software agents, evade the problem by including a customized shell as part of their agent software.

IBM has come closest to what we want to accomplish. A command called dsh, or "distributed shell," is included with their CSM product. Dsh connects to multiple machines, executes commands and returns the output, including any errors. Although it uses the rsh ("remote shell") command, a relative of the discredited rcmd, by default, it can be more tightly secured. A script available at http://dsh.sourceforge.net/ replicates most of the functionality of IBM's dsh for other platforms. Though originally intended for Beowulf-style computing clusters on private networks, dsh.pl is written in Perl, hence easily customizable.

To use this script, we make a few alterations to define SSH as the connection mechanism (these changes are well-documented in the code itself and thus left as an exercise to the reader), and then create a file in which we place the names of the servers we want to manage. Let's call this file "web" to indicate that this is a cluster of Web servers, and show the contents:

$ cat web
web1
web2
web3

Executing the dsh.pl command will allow me to run a command on web1, web2 and web3 simultaneously. The "-N" switch indicates the name of the file I wish to obtain the hostnames from, and single quotes enclose the command I want to issue. Here, I simply check each server's platform type:

$ dsh.pl -N web 'uname -a'
executing 'uname -a'
web1: SunOS web1 5.8 Generic_108528-12 sun4u sparc SUNW,UltraSPARC-IIi-cEngine
web2: SunOS web2 5.8 Generic_108528-12 sun4u sparc SUNW,UltraSPARC-IIi-cEngine
web3: SunOS web4 5.8 Generic_108528-12 sun4u sparc SUNW,UltraAX-i2

From these results, I can tell that I have a cluster of 3 Solaris machines, of two different architectures, and that one of them is slightly out of date on its kernel patch revision (the one with the "108528-08" instead of "108528-12"). This is already useful! If I want to run a quick check to see if my logging partitions are filling up:

$ dsh.pl -N web 'df -k | grep var$'
executing 'df -k | grep var$'
web1: /dev/dsk/d3 4131866 1676089 2414459 41% /var/log
web2: /dev/dsk/d3 4131866 1540986 2549562 38% /var/log
web3: /dev/dsk/d3 4131866 1703579 2386969 42% /var/log

The utility of such a tool should be immediately apparent. If you don't specify a command to use, dsh.pl will drop into a limited interactive mode, complete with command history, that looks like this:

dsh>

Any command then typed will be executed on all the machines listed in the "web" file. This tool seems to fulfill our four initial requirements, although there are some gaps we need to fill in.

Authentication
Eagle-eyed readers will notice that I provided no password in the previous example. This is because I previously configured all the machines to use RSA authentication, a public-key authentication scheme supported by OpenSSH that obviates the need for a password. If I disable RSA authentication on all my machines, I'll run into trouble using dsh.pl:

$ dsh.pl -N web 'uname -a'
executing 'uname -a'
james@web1's password: james@web2's password: james@web3's password:

Here, each remote machine is expecting a password, but dsh.pl didn't wait for me to give one. Also, the whole point of this exercise was to manage multiple machines via a single point of entry, hence we don't want to have to log in to more than once, if at all.

Configuring RSA authentication is outside the scope of this article, but explained in numerous locations on the Web (see the sidebar, "Links"). However, RSA authentication many not be appropriate in your environment, especially if your management station isn't used exclusively by trusted personnel. If RSA authentication isn't an option, I recommend another useful script, remote-update.pl. This Perl-based utility obtains a password for each remote machine from a local file, copies a script to be executed and any corollary files to the remote host (via scp, the secure version of FTP provided with the OpenSSH command suite) and then executes the script. However, using a local, unencrypted file to store password information is a sizable trade-off in overall security, and I would not recommend permanently storing such information. Password-less yet secure authentication is probably the biggest hurdle to cross in developing any distributed systems management tool, but it's likely that one of these two approaches will meet your needs.

Interactivity
Complete interactivity via a distributed shell is the Holy Grail of systems management -- i.e., I don't expect to find it. There will always be tasks that any management system is unsuited to. In the case of distributed shells that use SSH behind the scenes, interactive commands are problematic, because the nature of the ssh command is to close a connection after the command is executed. For instance, if we use dsh.pl's interactive mode as we would use a normal shell, we discover a curious fact:

dsh> cd /etc
executing 'cd /etc'
dsh> ls
executing 'ls'
web1: bin
web1: etc
web1: var
…

Wait a minute… didn't we just change to the /etc directory? In fact, the dsh.pl utility closes the SSH session after each command is executed and opens a new session for the next command, so each command is, in effect, starting from scratch. For complex tasks, it's best to consolidate what you want to accomplish in a script, and then use dsh.pl to run that script. This works best if you have a shared file system such as an NFS mount in which to place the script. For instance, here I execute a script that I've placed in my shared NFS file system, /export, and use some basic shell functionality to let me know whether or not the command succeeded:

$ dsh -N webmail '/export/my_script.sh || echo oops'
executing '/export/my_script.sh || echo oops'
web2: oops
$

Since I happen to know that my script produces no other output, the absence of the servers web1 and web3 in the dsh.pl output indicates that all is well; however, something is not right on web2.

Functionality
An old engineering adage, frequently appropriated, is that you can choose any two of the following: cheap, fast or effective. Our cobbled-together management solution is clearly lacking in the last. Dsh.pl makes any of the native commands on multiple remote systems available to us, but if we truly want to manage a heterogeneous environment, how can we account for differences between Linux, Solaris, AIX, so on and so forth? We can use some simple commands like "uname" and "ls" that exist across Unix variants, but what if we want to do something more complex with commands that don't behave the same way? For instance, to list running processes, OpenBSD's "ps -aux" and Solaris's "ps -ef" do the same thing -- how do I tell my distributed tool which one to use? What about automating graphical tasks? What about automating even the simpler tasks that a proprietary management system will allow us to do immediately?

The first step is the creation of a script library to account for your various system types. For instance, use shell scripts similar to the following to determine variables for your platform and make your scripts more portable. Similar logic can be used to account for just about any difference in your systems:

if [ "`uname -s`" = "SunOS" ] ; then
PS="ps -ef"
fi

An Open Source approach to systems management entails re-inventing some wheels, but can be extremely useful.

Future Solutions
I predict that many of the difficulties of heterogeneous systems management will be addressed in the not-too-distant future, one way or another. Another trend noted by ZDNet amid otherwise gloomy forecasting is an increase in sales of "blade" technology. The term "blade" is bandied about indiscriminately at the moment, used to refer to products as different as Scalant's rack-optimized chassis capable of holding multiple independent servers and Egenera's BladeFrame, which pools resources into a "virtual server." At the same time, efforts such as the Globus project are beginning to develop protocols for distributed systems management -- in this case, to enable "grid" computing where distant resources are shared, but which may have other benefits.

An increasingly disjointed computing landscape seems to be emerging, where memory, processors, and storage can be used and shared almost completely independently of their physical confines. Whether or not there will be operating systems left to administer after the landscape shifts is another question.

Questions? Comments? Tips to share? Post 'em below!

James Ervin is alone among his coworkers in enjoying Michelangelo Antonioni films, but in his more lucid moments suspects that they're not entirely wrong.

More articles by James Ervin:

DNS Caching for Fun and Profit

Endmail Part I: The War on Spam

An Introduction to the BSDs

Closing Up Open Source

DNS Caching for Fun and Profit

Post your comment below, or better yet, go to our Discussion Forums and really post your mind.



Current user Comments for "Fast, Cheap and Partially Effective: Do-It-Yourself Open-Source Tools for System Management"
5/15/02 - Reenen Kroukamp says:	Also see http://pikt.org

top

Sponsored Link:

Don’t let your IT Investment Go to Waste: Get Certified with Thomson Prometric!