The following technical overview should be considered only as an attempt to give a few very
limited examples about
some features that may be
implemented in a web server and
some of the tasks that it may perform in order to have a sufficiently wide scenario about the topic. A
web server program plays the role of a server in a
client–server model by implementing one or more versions of HTTP protocol, often including the HTTPS secure variant and other features and extensions that are considered useful for its planned usage. The complexity and the efficiency of a web server program may vary a lot depending on: • to read an HTTP request message; • to interpret it; • to verify its syntax; • to identify known
HTTP headers and to extract their values from them. Once an HTTP request message has been decoded and verified, its values can be used to determine whether that request can be satisfied or not. This requires many other steps, including
security checks.
URL normalization Web server programs usually perform some type of
URL normalization (
URL found in most HTTP request messages) in order to: • make resource path always a clean uniform path from root directory of website; • lower security risks (e.g., by intercepting more easily attempts to access static resources outside the root directory of the website or to access to portions of path below website root directory that are forbidden or which require authorization); • make path of web resources more recognizable by human beings and
web log analysis programs (also known as log analyzers or statistical applications). The term
URL normalization refers to the process of modifying and standardizing a URL in a consistent manner. There are several types of normalization that may be performed, including the conversion of the scheme and host to lowercase. Among the most important normalizations are the removal of "." and ".." path segments and adding trailing slashes to a non-empty path component.
URL mapping "URL mapping is the process by which a web server or application framework determines how an incoming URL request is routed to the appropriate resource, handler, or action. Modern URL mapping mechanisms analyse the structure of the requested URL and use routing rules or configuration patterns to deliver static resources, invoke dynamic handlers, or perform rewrites and redirects without directly relying on file system paths. This approach allows clean, human-readable URLs and flexible application architectures.In practice, web server programs that implement advanced features, beyond the simple
static content serving (e.g., URL rewrite engine, dynamic content serving), usually have to figure out how that URL has to be handled as a: •
URL redirection, a redirection to another URL; •
static request of
file content; •
dynamic request of: •
directory listing of files or other sub-directories contained in that directory; • other types of dynamic request in order to identify the program or module processor able to handle that kind of URL path and to pass to it other
URL parts, (i.e., usually path-info and
query string variables). One or more configuration files of web server may specify the mapping of parts of
URL path (e.g., initial parts of
file path,
filename extension and other path components) to a specific URL handler (file, directory, external program or internal module). When a web server implements one or more of the above-mentioned advanced features then the path part of a valid URL may not always match an existing file system path under website directory tree (a file or a directory in
file system) because it can refer to a virtual name of an internal or external module processor for dynamic requests.
URL path translation to file system Web server programs are able to translate an URL path (all or part of it), that refers to a physical file system path, to an
absolute path under the target website's root directory. Example of a
dynamic request using a program file to generate output: http://www.example.com/cgi-bin/forum.php?action=view&orderby=thread&date=2021-10-15 The client's
user agent connects to www.example.com and then sends the following
HTTP/1.1 request: GET /cgi-bin/forum.php?action=view&ordeby=thread&date=2021-10-15 HTTP/1.1 Host: www.example.com Connection: keep-alive The result is the local file path of the program (in this example, a
PHP program): /home/www/www.example.com/cgi-bin/forum.php The web server executes that program, passing in the path-info and the
query string action=view&orderby=thread&date=2021-10-15 so that the program has the info it needs to run. (In this case, it will return an HTML document containing a view of forum entries ordered by thread from October 15, 2021). In addition to this, the web server reads data sent from the external program and resends that data to the client that made the request.
Manage request message Once a request has been read, interpreted, and verified, it has to be managed depending on its method, its URL, and its parameters, which may include values of HTTP headers. In practice, the web server has to handle the request by using one of these response paths:
Serve dynamic content If a web server program is capable of
serving dynamic content and it has been configured to do so, then it is able to communicate with the proper internal module or external program (associated with the requested URL path) in order to pass to it the parameters of the client request. After that, the web server program reads from it its data response (that it has generated, often on the fly) and then it resends it to the client program who made the request. NOTE: when serving
static and dynamic content, a web server program usually has to support also the following HTTP method in order to be able to safely
receive data from clients and so to be able to host also websites with interactive forms that may send large data sets (e.g., lots of
data entry or
file uploads) to web server, external programs or modules: • POST In order to be able to communicate with its internal modules or external programs, a web server program must have implemented one or more of the many available
gateway interfaces (see also
Web Server Gateway Interfaces used for dynamic content). The three
standard and historical
gateway interfaces are the following ones. ;
CGI : An external CGI program is run by web server program for each dynamic request, then web server program reads from it the generated data response and then resends it to client. ;
SCGI : An external SCGI program (it usually is a process) is started once by web server program or by some other program or process and then it waits for network connections; every time there is a new request for it, web server program makes a new network connection to it in order to send request parameters and to read its data response, then network connection is closed. ;
FastCGI : An external FastCGI program (it usually is a process) is started once by web server program or by some other program or process and then it waits for a network connection which is established permanently by web server; through that connection are sent the request parameters and read data responses.
Directory listings A web server program may be capable to manage the dynamic generation (on the fly) of a
directory index list of files and sub-directories. If a web server program is configured to do so and a requested URL path matches an existing directory and its access is allowed and no static index file is found under that directory then a web page (usually in HTML format), containing the list of files or subdirectories of above mentioned directory, is
dynamically generated (on the fly). If it cannot be generated an error is returned. Some web server programs allow the customization of directory listings by allowing the usage of a web page template—an HTML document containing placeholders, (e.g., $(FILE_NAME), $(FILE_SIZE), etc.) that are replaced with the field values of each file entry found in directory by web server (e.g., index.tpl) or the usage of HTML and embedded source code that is interpreted and executed (e.g.,, index.asp) or by supporting the usage of dynamic index programs such as CGIs, SCGIs, FCGIs (e.g., index.cgi, index.php, index.fcgi). Usage of dynamically generated
directory listings is usually avoided or limited to a few selected directories of a website because that generation takes much more OS resources than sending a static index page. The main usage of
directory listings is to allow the download of files (usually when their names, sizes, modification date-times or
file attributes may change randomly and frequently)
as they are, without requiring to provide further information to requesting user.
Program or module processing An external program or an internal module (
processing unit) can execute some sort of application function that may be used to get data from or to store data to one or more
data repositories: • files (file system); •
databases (DBs); • other sources located in local computer or in other computers. A
processing unit can return any kind of web content, also by using data retrieved from a data repository: • a document (e.g.,
HTML,
XML, etc.); • an image; • a video; • structured data (e.g., that may be used to update one or more values displayed by a dynamic page (
DHTML) of a
web interface and that maybe was requested by an
XMLHttpRequest API) (see also:
dynamic page). In practice whenever there is content that may vary, depending on one or more parameters contained in client request or in configuration settings, then, usually, it is generated dynamically.
Send response message Web server programs are able to send response messages as replies to client request messages. •
HTTP server errors, due to internal server errors. When an error response or message is received by a client browser, then if it is related to the main user request (e.g., an URL of a web resource such as a web page) then usually that error message is shown in some browser window or message.
URL authorization A web server program may be able to verify whether the requested URL path: • can be freely accessed by everybody; • requires a user authentication (request of user credentials such as
user name and
password); • access is forbidden to some or all kind of users. If the authorization or access-rights feature has been implemented and enabled and access to web resource is not granted, then, depending on the required access rights, a web server program: • can deny access by sending a specific error message (e.g., access
forbidden); • may deny access by sending a specific error message (e.g., access
unauthorized) that usually forces the client browser to ask human user to provide required user credentials; if authentication credentials are provided then web server program verifies and accepts or rejects them.
URL redirection A web server program
may have the capability of doing URL redirections to new URLs (new locations) which consists in replying to a client request message with a response message containing a new URL suited to access a valid or an existing web resource (client should redo the request with the new URL). URL redirection of location is used: If web resource data is sent back to client, then it can be
static content or
dynamic content depending on how it has been retrieved (from a file or from the output of some program or module).
Content cache In order to speed up web server responses by lowering average HTTP response times and hardware resources used, many popular web servers implement one or more content
caches, each one specialized in a content category. Content is usually cached by its origin: • static content: •
file cache; • dynamic content: •
dynamic cache (module or program output).
File cache Historically, static contents found in
files which had to be accessed frequently, randomly and quickly, have been stored mostly on electro-mechanical
disks since mid-late 1960s and 1970s; regrettably reads from and writes to those kind of
devices have always been considered very slow operations when compared to
RAM speed and so, since early
OSs, first disk caches and then also
OS file
cache sub-systems were developed to speed up
I/O operations of frequently accessed data. Even with the aid of an OS file cache, the relative or occasional slowness of I/O operations involving directories and files stored on disks became soon a
bottleneck in the increase of
performances expected from top level web servers, specially since mid-late 1990s, when web Internet traffic started to grow exponentially along with the constant increase of speed of Internet or network lines. The problem about how to further efficiently speed-up the serving of static files, thus increasing the maximum number of requests or responses per second (
RPS), started to be studied and researched since mid 1990s, with the aim to propose useful cache models that could be implemented in web server programs. In practice, nowadays, many web server programs include their own
userland file cache, tailored for a web server usage and using their specific implementation and parameters. The wide spread adoption of
RAID and fast
solid-state drives (storage hardware with very high I/O speed) has slightly reduced but of course not eliminated the advantage of having a file cache incorporated in a web server.
Dynamic cache Dynamic content, output by an internal module or an external program, may not always change very frequently (given a unique URL with keys or parameters) and so, maybe for a while (e.g., from one second to several hours or more), the resulting output can be cached in RAM or even on a fast
disk. The typical usage of a dynamic cache is when a website has
dynamic web pages about news, weather, images, maps, etc. that do not change frequently (e.g., every
n minutes) and that are accessed by a huge number of clients per minute per hour; in those cases it is useful to return cached content too (without calling the internal module or the external program) because clients often do not have an updated copy of the requested content in their browser caches. Anyway, in most cases those kind of caches are implemented by external servers (e.g.,
reverse proxy) or by storing dynamic data output in separate computers, managed by specific applications (e.g.,
memcached), in order to not compete for hardware resources (CPU, RAM, disks) with web servers.
Kernel-mode and user-mode web servers A web server software can be either incorporated into the
OS and executed in
kernel space, or it can be executed in
user space (like other regular applications). Web servers that run in
kernel mode (usually called
kernel space web servers) can have direct access to kernel resources and so they can be, in theory, faster than those running in user mode, but there are disadvantages in running a web server in kernel mode (e.g., difficulties in developing and
debugging software) whereas
run-time critical errors may lead to serious problems in OS kernel. Web servers that run in
user-mode have to ask the system for permission to use more memory or more
CPU resources. Not only do these requests to the kernel take time, but they might not always be satisfied because the system reserves resources for its own usage and has the responsibility to share hardware resources with all the other running applications. Executing in user mode can also mean using more buffer or data copies (between user-space and kernel-space) which can lead to a decrease in the performance of a user-mode web server. Nowadays almost all web server software is executed in user mode (because many of the aforementioned small disadvantages have been overcome by faster hardware, new OS versions, much faster OS
system calls and new optimized web server software). See also
comparison of web server software to discover which of them run in kernel mode or in user mode (also referred as kernel space or user space). ==Performances==