4/17/2002 -- Microsoft won the browser wars, but Apache maintains a comfortable lead in the Web server market. In March 2002, 64 percent of active Web sites surveyed by Netcraft used Apache. If the comparison is limited to .com sites, however, Apache and Microsoft's Internet Information Server (IIS) are neck-and-neck, with 43.92 percent and 43.65 percent of the market, respectively.
Apache 2.0's recent release won't change this landscape instantly. Existing Apache modules are incompatible with Apache 2.0, though major modules such as the PHP and Perl interpreters are being updated rapidly. Many of Apache 2.0's features are poorly advertised or understood. Most importantly, real-world deployments and benchmarks are still rare. However, Apache 2.0 is the first whole number release of the server since version 1.0 in 1995, implying significance -- correctly, in this case.
Apache's convoluted history is reflected in the derivation of its name. It is a "patchy" server, ported to an ever-expanding list of operating systems without significant departure from its initial design. Consequently, it does not fully exploit the capabilities of modern operating systems, and is deficient on certain platforms for which it was not originally intended. To correct these matters, the Apache developers rewrote Apache's core in 2.0 to provide a foundation for future scalability improvements and improve the flexibility of the server for administrators and developers alike.
Scalability
Multi-threading is the most notable new capability of Apache 2.0, and the most directly related to scalability. Apache 1.3 is not multi-threaded, except on the Windows platform.
The multithreaded versus single-threaded distinction merits some explanation. System administration is concerned with the allocation of resources to processes, and tools are provided for this purpose. For instance, the "renice" command can be used to increase or decrease the priority of a particular process, making it execute slower or faster. However, "manually setting process priorities is becoming a thing of the past" (Unix System Administration Handbook, 3rd. Ed., pg 52). Today, optimizing I/O (input/output), managing data, and interacting with users consume most of an administrator's time. Nevertheless, most administrators are familiar with processes, while threads can seem conceptually alien.
From the administrator's vantage point (not the programmer's), a thread is a process-within-a-process. Multiple threads reside within a single process. Threading has several advantages:
- Resources (memory, etc.) can be shared between threads.
- Multiple threads can execute simultaneously.
- Environment variables can be shared and manipulated by several threads because no thread is the parent of another. By contrast, a child process cannot affect the environment of its parent.
More Information on Apache 2.0
|
- Apache 2.0 at ApacheWeek.com -- An excellent overview of the history of Apache 2.0, with links to useful in-depth articles.
- Apache 2.0: A Look Under the Hood -- From Linux magazine.
- POSIX Threads Explained -- An excellent article, contrasting threads and processes.
- Ryan Bloom's Apache 2.0 Series -- A series of articles on Apache 2.0's advanced features, from a programmer's viewpoint. --J.E.
|
|
|
|
Since few tools are available to manipulate threads, they are almost completely insulated against administrator intervention, but their presence or absence affects the behavior and design of an application. In Apache 1.3's case, the lack of multiple threads means that a separate process must be used to respond to each incoming request. This approach has an obvious advantage over Web servers that use a single process to respond to all requests: If the Internet Information Server (IIS) process dies on a Windows Web server, no further requests are served until the process is restarted. If a single Apache process dies, only the request being served by that process is affected.
However, multiple processes require more system resources. Twenty Apache 1.3 processes require 20 times as much memory. Also, if there are insufficient servers to handle a sudden spike in the number of requests, additional servers must be created using the fork system call. On some UNIX systems such as AIX (which displays its mainframe heritage in this respect), the creation of additional processes via fork is very expensive and can significantly delay a Web server's response time. For large Apache 1.3 sites, then, performance tuning becomes a matter of balancing the number of available Apache processes. The administrator must ensure that enough processes are available to handle incoming requests without forking new ones, but not so many that the system hits resource limits. Several directives in the Apache configuration file accomplish this:
- The MaxClients setting limits the number of Apache processes that will be created. Typically, memory is the limitation on this setting. If your Apache process takes up 20 MB of memory, and you have 1000MB of free RAM, you could have up to 50 Apache processes (1000MB/20MB = 50).
- The MinSpareServers and MaxSpareServers settings keep a number of processes waiting around, to avoid the delay imposed by forking a new process. New processes are forked continually to keep the number of available servers between these thresholds, but incoming HTTP requests do not have to wait for processes to be forked because spares are available.
Apache 2.0 is multithreaded, allowing a single Apache process to respond to multiple requests. This vastly increases Apache's scalability. Unlike many multithreaded applications, Apache 2.0 even allows the administrator the ability to manipulate the threads within each process, to an extent. To account for differences between platforms, while retaining the reliability of multiple processes, Apache 2.0 provides several different models for controlling Apache processes and threads in the form of Multi-Processing Modules (MPMs):
- The prefork MPM replicates the single-threaded behavior of Apache 1.3. This is the default MPM for UNIX systems.
- The worker MPM "implements a hybrid multithreaded multi-process Web server." Several processes are started, each with a fixed number of threads. Processes are started or stopped as necessary to regulate the total number of threads.
- The perchild MPM regulates the total number of threads by varying the number of threads in each process. This MPM also allows Apache processes to operate as multiple user IDs, which can be useful for managing several virtual hosts.
- Other MPMs are provided for specific operating systems, including BeOS, Netware and Windows NT.
For any Web server, the parameter you need to regulate is the number of simultaneous incoming requests that will be served. In Apache 1.3, simultaneous requests corresponded to an equivalent number of processes, thus, processes required regulation. In Apache 2.0, threads are the metric.
Flexibility
While multithreading should be an immediate boon to heavily loaded Web servers, some of Apache 2.0's new features are not immediately relevant to the administrator, but provide a flexible architecture for developers.
Content filters can be applied to incoming or outgoing content in Apache 2.0 on a per-directory or per-virtual host basis. Multiple content filters can be applied in an order specified either in the main server configuration or in a content developer's .htaccess file. Filtering allows, for instance, the results of a Server-Side Include (SSI) to be sent to a CGI script for further processing. This was possible in Apache 1.3, but only because the module that handled SSIs, mod_include, contained the code necessary to recognize when a CGI script was being requested. In Apache 2.0, the mod_include module actually works in concert with the mod_cgi module to perform this recognition -- a much more efficient method. Only a few modules use filtering capabilities in the base 2.0 distribution, among them the aforementioned mod_include and mod_ext_filter, an experimental module allowing content to be sent to an external program for processing. Filtering's usefulness will become increasingly apparent as more modules that use it are developed.
Apache 2.0 has also been designed for protocol-independence. In theory, Apache modules could be written to serve protocols such as FTP, SMTP, perhaps even Microsoft's file sharing protocol, SMB. Integration of Web services with file services would put Apache on more equal footing with competing Web servers such as IIS and iPlanet, which provide a single point of administration for multiple Web and file services.
Lastly, the Apache core has been turned into something called the Apache Portable Runtime (APR). This means that the internals of Apache have been abstracted so that they are independent of any particular operating system. While this is largely irrelevant from the administrator's standpoint, it eases the process of porting Apache to other platforms. However, APR is not limited to use in Apache 2.0 -- it has already been used for several other of the Apache team's projects, including a HTTP load testing tool, Flood, and the source code control tool subversion, a competitor to CVS (Concurrent Versions System).
Apache 2.0: Bloat-Free
Apache 2.0 was in development for well over three years, but the results are well worth the wait. Instead of succumbing to feature bloat, Apache 2.0 concentrates on performing a task more efficiently. This is apparent on even the crudest level -- by my count, there are only six more modules in the default distribution of 2.0 than in 1.3, and some of those are mutually exclusive. The popularity of any software product seems to be proportional to its ability to "play nice" with other programs: for instance, Perl's prevalence is due to its unparalleled ability to glue other programs together. The filtering, portability and protocol independence of Apache 2.0 make it an even more viable choice for Web-based applications. More importantly, they extend its applicability beyond the World Wide Web, into the realm of file services and beyond. Given time for the rest of the world to catch up and think up new uses for the Apache 2.0 architecture, it should prove to be a very nice playmate indeed. 
|