101communication LLC CertCities.com -- The Ultimate Site for Certified IT Professionals
   Certification Communities:  Home  Microsoft®  Cisco®  Oracle®  A+/Network+"  Linux/Unix  More  
Editorial
Choose a Cert
News
Exam Reviews
Features
Columns
Salary Surveys
Free Newsletter
Resources
Forums
Practice Exams
Cert Basics
Links Library
Tips
Pop Quiz
Industry Releases
Job Search
Conferences
Contributors
About Us
Search


Advanced Search
CertCities.com

CertCities.com
Let us know what you
think! E-mail us at:
.. Home .. Certifications .. Linux Unix .. Columns ..Column Story Wednesday, June 04, 2003

 Notes from Underground   James Ervin
James Ervin



 OpenAFS
Network file systems, righteous indignation and you.
by James Ervin  
11/29/2000 -- Andrew File System (AFS) is a network file system originally developed at Carnegie-Mellon University, spun off into a company called Transarc, and eventually absorbed by IBM. The source code to AFS was "forked," and one version of that code was released under the IBM Public License (IPL) on Halloween (perhaps "pitchforked" would be more apropos?).

'Anything IBM does is probably in its own best interest.'

AFS is one of only two commercial-quality network file systems of its kind. The other, DFS, is also marketed by IBM. Both have a proven track record in extremely large (terabyte-sized) deployments, although AFS is more popular. This is a boon to anyone previously hobbled by IBM/Transarc's rather extravagant client pricing, and to the open-source community as a whole. To understand the full implications of the OpenAFS release, though, some technical history is in order.

About NFSs
Network file system (NFS) implies just that: a file system accessible over the network, in contrast to one accessible via local interfaces like SCSI or -- I shudder to think -- ATA. First-generation network file systems should be familiar to anyone who has ever shared files using one of the common protocols:

  • CIFS (Common Internet File System). Previously known as SMB or Server Message Block, this protocol is the foundation of Windows filesharing.
  • AFP (Appleshare File Protocol). The equivalent of CIFS for Macintoshes.
  • NFS (Network File System). First developed by Sun and subsequently spreading like kudzu, NFS is the de facto standard for Unix filesharing.
  • NCP (Netware Client Protocol).

There are others. These protocols enable filesharing in both client-server and peer-to-peer (P2P) scenarios.

P2P is notorious of late because of its popularity in loose-knit communities. Napster and Gnutella are the best-known examples, although it's misleading to group Napster, which depends on a central server, with true P2P networks such as Gnutella that don't. In general, first-generation network file systems weren't designed for security, redundancy or other things system administrators have to worry about.

Standards grow more stringent over time, though, and administrators who have to deal with serious quantities of data are abandoning the old ways. Today, a true network file system makes provision for reliability, redundancy, security, backups, manageability and so on. Even Microsoft has joined the fray, touting the Windows 2000 version of its Distributed File System (DFS) with next-generation buzzwords like "single-instance storage," "fault tolerance" and "junctions."

Why AFS?
AFS was designed to deal with deficiencies in early implementations of NFS and common to first-generation network file systems in general. Though many of these deficiencies disappeared in later NFS releases, AFS continues to boast certain advantages:

1) Security
Bare-bones NFS lacks a standard security mechanism, and relies on the underlying file system's mechanisms: typically, standard Unix user and group permissions. Administrative oversight or ignorance means even this "protection" is frequently nonexistent. Additionally, several NFS bugs are easily exploitable and still linger in the wild, landing NFS at number 3 and 7 on the SANS list of "Top Ten Internet Security Threats."

AFS incorporates MIT's Kerberos -- the freely available, shared, secret network authentication protocol notoriously "extended" by Microsoft in Windows 2000. Other than a recent article on "Hacking AFS" in 2600 magazine (www.2600.com), there just aren't many security incidents related to the Kerberos architecture, although like everything networked, it has its share of vulnerabilities. AFS's reliance on Kerberos v4, an outdated and less secure implementation than its successor, Kerberos v5, has been a major gripe of large AFS sites for some time. The open sourcing of the AFS code should rectify this situation.

2) The Naming of Things (Apologies to Lewis Carroll)
Mount points are critical to understanding AFS and advanced file systems in general. While familiar to most Unix users, they've only recently appeared on Windows file systems (after being purchased from Veritas). The concept is simple: "mount a file system" means to attach it to the file system hierarchy at a "mount point." Check the manual page for the Unix "mount" command if you don't believe me. The critical difference -- or at least one of them -- between Unix and Windows is that Unix conceives of a single root directory: / . Windows has lots of root directories called drive letters: C:\, D:\, and so on.

NFS clients can use any mount point they want: For instance, I could mount an NFS share under "/shoes/and/ships" on my machine while you could mount it under "/sealing/wax." Nomenclature differences like this make it difficult to scale or automate an NFS environment by virtue of unpredictability. By analogy, user-defined drive mappings on Windows machines present similar problems.

AFS obviates this problem via a global namespace. Files in AFS can be found under the same path, /afs, on any AFS client. By abstracting the namespace of the file system from the vagaries of individual client machines, AFS eases administration considerably. The Windows AFS client is an exception, unfortunately, requiring AFS to be mounted under a drive letter anyway. To circumvent this, Windows AFS sites generally agree that a certain drive letter will always refer to /afs.

Each AFS site creates a "cell" of its own. Thus, in the following hypothetical directory listing, the originators of AFS at CMU have their cell and IBM/Transarc, the vendor, has its cell:

$ ls /afs
cmu.edu/
transarc.com/

Clients from any cell can get to a given file in any cell via the same path, provided they have permission. The client works anywhere with IP connectivity (performance may vary), making AFS a truly global file system. IBM/Transarc actually distributes AFS to its customers worldwide via AFS.

3) Volume Replication/Redundancy
If an NFS server goes down, files become unavailable until restored. AFS manages data in terms of volumes, letting you manipulate the physical location of the volume independent of the mount point. For instance, a volume called "things" could be mounted in several places: /afs/mycell/castles and /afs/mycell/kings, for instance. An administrator could then rename, resize or move the "things" volume physically to a new AFS file server, completely transparently.

AFS volumes can also be replicated. Replicas are generally placed on different physical machines to eliminate single points of failure. Generally, three or more AFS servers comprise a cell, with critical data replicated on each. Should the read-write replica spontaneously combust, the data will be available, although unalterable, until a read-write volume is restored. This arrangement has the added benefit of increasing performance by dispersing read requests among the replicas.

4) Delegated Administration
Under NFS, if you want to create a new group, someone with sufficient privileges on the NFS server has to add the entry manually to the group file, and then this group file has to be synchronized with client machines. This can be no end of hassle.

AFS layers an authorization database called the PTS (protection server) database atop the Kerberos authentication protocol. First Kerberos _authenticates_ you so you are who you say you are, then the PTS database _authorizes_ you when you request access to a particular file. The real strength of this scheme lies in the fact that users can create new groups on their own without administrative intervention. If I want to give my friends access to some files and deny my enemies, I simply create groups called james:friends and james:mortalenemies, and adjust my permissions accordingly.

5) Performance
NFS is a stateless protocol, meaning it retains no information about the state of its clients. This allows it to continue operating after a client or server failure, but also limits its performance in certain respects, aside from creating loads of network traffic since data isn't cached.

AFS uses a persistent local-disk caching scheme, using a system of callbacks to check whether or not a file has been modified since its last access. If it hasn't been, and it's still in the cache, the client doesn't bother to read it from the network again. Unix clients can even place this cache in RAM for screaming performance (although clients so configured have to recreate their cache files after a reboot).

Developments and Derivatives
AFS has several derivatives that address some of its deficiencies. In fact, most network file system development of note is indebted in some way to ideas pioneered in AFS:

  • Coda is an ambitious free derivative of AFS that implements disconnected operation and bandwidth adaptation, but isn't ready for prime time.
  • Arla is a free, reverse-engineered AFS implementation (conceptually similar to the Samba team's reverse-engineered implementation of CIFS), boasting a working client and beta-quality server.
  • Intermezzo is a network file system in early beta that wraps itself around native on-disk file systems and exports them, similar to NFS, while retaining some of the global namespace innovations and caching capabilities of AFS. Files on a disk partition used to provide AFS filespace are illegible garbage without the cluster of
    AFS database servers to interpret them for you; files on an Intermezzo partition are readable all by themselves.
  • DFS is AFS's big brother, also distributed by IBM. It provides several enhancements over AFS, most notably the ability to set permissions (Access Control Lists, or ACLs) on files, not just directories.

Many other strange and esoteric network file systems lurk out there, in the dark: xFS is a file system that uses cooperative caching between all clients; stackable file systems and more.

Politics: The Mind Reels
So why release OpenAFS, and why now? To fully understand this, you need to cultivate a certain degree of cynicism. Personally, this wasn't a stretch.

Anything IBM does is probably in its own best interest. IBM has treated AFS as an end-of-life product for the past several years -- meaning support is available, but no significant feature development is pending. There's also a time-honored trend of large companies releasing unprofitable projects under pseudo-open-source licenses, garnering free goodwill support. The OpenAFS release seems to be one of these cases. Conversely, the core Internet protocols also seem to get extended and enhanced as commercial products: QIP produces a commercial version of BIND, Lyris products a commercial listserver, Sendmail markets a commercial version of… well, Sendmail.

Also realize that the promise and dream of many Storage Area Network (SAN) vendors is to provide what AFS provides, but simplified. If you have global access to an AFS file via this chain:

Client --> (network) --> AFS database server --> (network) --> AFS fileserver

SAN vendors want you to access files like this:

Client --> (network) --> SAN appliance

For More Information

IBM Public License:
http://oss.software.ibm.com/developerworks/opensource
/license10.html?dwzone=opensource)
IBM Flexes its Supercomputing Muscle:
http://www.forbes.com/2000/11/04/1104supercomp.html
IBM Almaden Research Center (see what Big Blue is up to!)
http://www.almaden.ibm.com/
SANS' "How To Eliminate The Ten Most Critical
Internet Security Threats: The Experts' Consensus
"
http://www.sans.org/topten.htm
Kerberos
http://web.mit.edu/kerberos/www/

OpenAFS
http://oss.software.ibm.com/developerworks/opensource/afs/
AFS and DFS
http://www.transarc.ibm.com/Product/EFS/index.html
Arla
http://www.stacken.kth.se/projekt/arla/
Coda
http://www.coda.cs.cmu.edu/
InterMezzo
http://inter-mezzo.org/
NFSv4
http://www.nfsv4.org/

The middleman is eliminated, technically and commercially. This would reduce second-generation network file systems to hardware appliances. But you'll still need a client on your operating system to access the SAN. Giving away their client software admittedly didn't do much for Netscape; but then again, Netscape doesn't sell hardware.

Despite -- or perhaps because -- of its rather staid "Big Blue" image, IBM is the only technology company to land in the top five of the Forbes 500 for the last four years running. At that level, the only competition is insurance companies, banks, big oil, General Electric and occasionally Wal-Mart. By comparison, the companies we usually think of as the technocracy don't even rate: Microsoft didn't even make the top 25 until this year.

You don't get to be a Fortune 5 (sic) company by selling software. IBM is first and foremost a hardware company, and by releasing OpenAFS, they stand to create a huge user community capable of support, pre-trained to buy IBM SAN products and support when they're eventually available. They can eliminate the distasteful prospect of having their commercial file system product run on multiple operating systems, if they choose. If they're lucky, they may kill off Arla and Coda development in the process. Looking more closely at the IBM Public License -- and I'm using admittedly thin legal training and expertise here -- IBM is permitted under the terms of the license to resell AFS, including any improvements made by OpenAFS contributors.

Deploy It Anyway
Political implications aside, AFS/OpenAFS is a solid enterprise file system with undeniable benefits for large deployments, or even small deployments with especially industrious administrators. With the exception of the Windows client, which requires some header files from Microsoft for compilation, building the software on most operating systems should be straightforward. As with Microsoft's Active Directory, it's a good idea to spend a lot of time up front designing your AFS deployment. Once installed, though, the product is designed to delegate administration, giving the local administrator complete control over their portion of the file system, with unparalleled security and flexibility to boot.

I know I left my righteous indignation around here somewhere.


James Ervin is alone among his coworkers in enjoying Michelangelo Antonioni films, but in his more lucid moments suspects that they're not entirely wrong.

 

More articles by James Ervin:

Post your comment below, or better yet, go to our Discussion Forums and really post your mind.
Current CertCities.com user Comments for "OpenAFS "
12/29/02 - Craig Scratchley  says: A very useful article. Thanks!
Add your comment here:
Name: (optional)
Location: (optional)
E-mail Address: (optional)
Comments:  
 
top

Sponsored Links:
.NET at 2 Years: Free Special Report from ENTmag.com
Hard-Core Technical Training: TechMentor, Sept. 2-6, 2003, San Diego
Free CertCities.com Newsletter: The best source for weeekly IT certification news!
Turn Up the Volume on IT: Listen to MCP Radio
Home | Microsoft | Cisco | Oracle | A+/Network+ | Linux/Unix | MOUS | List of Certs
Advertise | Certification Basics | Conferences | Contact Us | Contributors | Features | Forums | Links | News | Pop Quiz | Industry Releases | Tips
Search | Site Map | MCPmag.com | TCPmag.com | OfficeCert.com | TechMentor Conferences | 101communications | Privacy Policy
This Web site is not sponsored by, endorsed by or affiliated with Cisco Systems, Inc., Microsoft Corp., Oracle Corp., The Computing Technology Industry Association, Linus Torvolds, or any other certification or technology vendor. Cisco® and Cisco Systems® are registered trademarks of Cisco Systems, Inc. Microsoft, Windows and Windows NT are either registered trademarks or trademarks of Microsoft Corp. Oracle® is a registered trademark of Oracle Corp. A+®, i-Net+™, Network+™, and Server+™ are trademarks and registered trademarks of The Computing Technology Industry Association. (CompTIA). Linux™ is a registered trademark of Linus Torvalds. All other trademarks belong to their respective owners.
All content copyright 2000-03 101communications LLC, unless otherwise noted. All rights reserved.
Reprints allowed with written permission from the publisher. For more information, e-mail