Back-end developers have access to a lot of great web servers and frameworks. This has let people work on solving business problems through code without having to worry about networking, concurrency, and protocols. That is true for most of the typical web applications we have.

But then there are applications where you have to deal with scalability, concurrency, and performance issues. And it's not possible to deal with these by just optimizing our application logic. This is when we need to look at the nitty-gritty details of how our web server is configured: what concurrency model it's using, whether our application is IO-bound or compute-bound, whether the server is configured to optimize that kind of workload, etc.

To be able to make these choices in an informed manner, we need to understand how these web servers actually work. In my opinion, the best way to do this is to write a web server yourself. We don't need to write a production-grade server. Neither do we need to incorporate all the features these servers have or deal with all the use cases. We just need to develop a bare-bones working server to understand the different parameters and design choices that we have at our disposal.

In this article, we will develop an HTTP server from scratch. Further, we will make the server WSGI compliant (we will discuss what that is and why is it needed). We will also develop a minimal Web Framework that uses our HTTP server. While we'll develop the server in Python, the learnings here will be generic, and one can use any programming language of choice for implementation. Some development might rely on specific features of the Python programming language, but similar features are mostly available in other programming languages as well, either as part of the standard library or through a third-party package.

I will also link to external resources wherever necessary so that we can learn about the technologies used without cluttering the pieces with too many divergences.

With the introduction out of the way, let's get started.

A Common Deployment Scenario

Flow of requests from client to application

In the Python world, Django is a popular choice of framework for developing web applications. While Django is used to develop applications, we use a WSGI server to serve these applications. Popular WSGI servers are Gunicorn, uWSGI, Apache HTTP Server with mod_wsgi, etc. And generally Nginx is used in front of these servers as a load balancer.

We want to understand how an HTTP request that originates from a client actually reaches our application. To that end, we want to understand the role of each of the pieces in the above diagram.

This article would discuss the following:

  • The very basics of what an HTTP server is
  • Implementing a basic thread-based HTTP server
  • What WSGI is and making our server WSGI compliant (well, almost)
  • Building a bare-bones web framework that can actually talk to our server using WSGI

The complete code is available at Python Sandbox for testing and experimentation.

The Basics

Before we dive into implementing our own server, let's take some time and understand what an HTTP server actually is.

When we talk about client-server architecture, the server is an application that has access to a resource. The resource may be a file, a database, some compute logic, etc. The client wants access to that resource. Hence the client needs to first connect with the server and then make a request for that resource. During this process, it might also need to authenticate itself, and the server might also check whether this client has the authorization to access the requested resource or not. Also, the server needs to check whether the requested resource is even available. If not, it needs to inform the client.

Hence the client and server need to agree on two things:

  • How the client will connect to the server
  • What the format of the requests and responses will be
As long as they agree on these, and they have a way to reach each other (like the internet), they can talk. Let's deal with both of these, one by one.

Transport

For two applications to communicate, they must be able to send bytes of data to one another. The data may be anything and is application dependant. But there must be a way to take data from one application process to another. That's the job of the transport. There are a lot of different ways of transporting data, and which ones can be used depends on where the server and client processes reside relative to each other.

For example, if both the client and server processes are running on the same machine, then they can use a file as the transport. The client might write requests to one file and the server might write responses to another. We can even use files when the processes are not running on the same machine, but then both the processes need to have access to the same file somehow. This is possible if there is a network filesystem like NFS. While using a file might not be the best way to orchestrate this kind of communication, it's important to realize that it is entirely doable.

Another way for two processes on the same machine to communicate is through Pipes or Unix sockets. Another option might be shared memory. Basically any IPC mechanism works.

But when we want processes on different machines to communicate, we need to involve the network somehow. And hence we need to use a network-based transport. There are a lot of network-based transport protocols, but the most popular, reliable transport for IP-based networks is the Transmission Control Protocol or TCP. TCP is used by all the browsers to connect to the servers of various websites. Hence if you want to talk to a client that is a browser, your choices are pretty much limited to TCP. Even when a browser is not involved, TCP is the protocol of choice for reliable communication.

Message format

Now that we know how to reach the other process, we need to decide on the message format that we'll use for communication. Let's see what data needs to be sent from the client to the server, and vice versa, to be able to request and provide a resource.

One, the client needs to somehow recognise the resource it wants access to. It might also need to provide some more metadata, like authentication details. Also, it might need to pass other information to the server, such as whether it can accept compressed data.

The server, on the other hand, needs to be able to send the actual resource in a way that the client knows its beginning and end. The server should also communicate other metadata, such as whether the content is compressed, and the client should uncompress it before consumption. It also needs to communicate if there was an error because either the client is not permitted to access the resource, the client asked for a resource that doesn't exist, or there was a fault on the server.

This is where HTTP comes in. HTTP is an application-layer protocol. That means it is the format that applications use to interpret the message from their remote peer. While HTTP is not the only application-layer protocol, it's the one that browsers use to access a website. Hence any web application that needs to be reachable by a browser needs to talk HTTP. If the application will only be accessed by other applications, they can choose to talk via whatever protocol they wish.

HTTP has URLs that are used to identify resources. It has methods/verbs that can be used to represent the action one wants to take on a resource. It has headers for all kinds of metadata. There is status code to represent the success/error scenarios. And finally, there is the body that can hold the actual data represented by the resource.


When we say that we want to build an HTTP server, we mean that we want to create an application that can be connected to, using TCP as the transport layer protocol, and that can interpret the HTTP messages sent to it, parse them, and hand them over to the web framework. When we start building an HTTP server, these are the bare minimums that we'll need to support. Later we can start to worry about other aspects, such as whether the server can handle multiple requests in parallel or whether it supports features like Keep Alive, etc.

Next, let's build a server that supports both of the above requirements and hence would qualify as an HTTP server.

Implementing a thread based HTTP Server

After getting the basic understanding of what an HTTP server must be capable of doing, and knowing the transport and application layer protocols that we want to use, we are now ready to actually start implementing the server. So let's get started.

Server Setup

Let's first get some basic setup done. I will outline the application structure and deal with things like configuration and signal handling here, and then implement the real logic. The setup code is below and is discussed in the following subsections.

Configuration

All production servers have a lot of different parameters which can be configured, generally though a config file, thereby letting the users tune the server according to their use case. We will also support such a configuration. When running the server, we will require the user to provide a config file location using the -c command-line option. The config file would be in YAML format. We use the pyyaml library to parse YAML files.

The configuration is then passed on to the server and worker components of the server. We will discuss these components in a later section. But it is these components that use the provided configurations to set up their parameters.

Signal handling

The other thing we do is set up signal handlers for SIGINT, SIGTERM and SIGHUP signals. When one of these signals is received, we want to gracefully shut down the server. Hence, we register the shutdown function as the signal handler and that function calls the shutdown method of the server and worker classes.

That is essentially all the setup code that we need to have. Now let's discuss the core logic of the server which resides in the server and worker.

Listening for Connections

As we learnt, first we need to set up the transport so that clients can connect to the server and subsequently send and receive data.

We learnt that the transport layer protocol we need to use is TCP. To use TCP, we can use the socket API provided by almost all major operating systems. The socket API provides an abstraction over the transport layer protocols. It lets you access the network just like accessing a file. Hence, you can read and write to a socket, and those reads and writes get translated to network packets in the format of the underlying transport layer protocol being used. In a communication between two processes, both the processes have a socket each that's connected to the other one. The sockets act as the endpoints for the communication.

Let's look at the man page for the socket system call provided by the Linux operating system. The details of the API can be read from there. But briefly, a new socket can be created using the following function

int socket(int domain, int type, int protocol);

There are different domain and type arguments that can be used with the API, and they represent the different transport-layer protocols and their behaviour. We care about the TCP protocol over IPv4. The domain for that is AF_INET. And the type of socket that is needed is SOCK_STREAM. The details of the Python API for the same are also available. To create a socket in Python, we call the following function, which is identical to the C version we saw above.

socket.socket(family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None)

Now that we know what socket is, let's see how to use it. On the server side, there are two kinds of socket. One is called the listener socket and the other is called the client socket. Both the server and the client use client sockets, but only the server uses a listener socket.

A listener socket is special. We bind a listener socket to an IP and port, and then wait for clients to connect. This is what happens behind the scene when you type the address of any website in your browser. The browser creates a client socket on its end and then tries to connect to the web server using the IP address that it gets through DNS resolution of the website domain. By default, port number 80 is used for HTTP and 443 for HTTPS, but you can explicitly give a different port number in the browser.

When the browser is trying to connect to the server, it is actually establishing a connection with the listener socket of the server. When the connection request arrives, the listener socket creates a new client socket on the server and connects the client's client socket to this new client socket. Now both the client sockets are free to have conversations with each other. And the listener socket is free to accept any new connection requests. This is why we need two types of sockets on the server. If the listener socket started talking to the client's client socket, then no one else would be able to connect to the server until that connection terminated.

I mentioned that the listener socket binds to an IP address and port number. Hence, only the data that is sent to this IP and port can be received by the socket. Let's say a machine has multiple ethernet interfaces and each one has a different IP address. If we only want to accept requests from one of those interfaces, we can bind to the IP address of that interface. Similarly, if we bind to the virtual localhost interface (127.0.0.1), we can only accept requests originating from the same machine. If we don't care which IP address the request was destined for, then we can bind to IP address 0.0.0.0.

After binding, one needs to start listening on the socket. To do so, the listen method is invoked. This is when the socket becomes a listener socket and becomes available for connection requests. Next, to actually wait for a connection request and to accept it when it arrives, one needs to call the accept method. This is a blocking function(by default, unless we use a non-blocking socket, which we don't need to worry about yet) that will block until a connection request arrives. When a request does arrive, it will create a new client socket, establish the connection, and return the client socket and the address bound to the socket on the other end of the connection.

In our server, we create a listener socket, bind it, and listen on it in the server class.

In the server class, we find out the address and the port number to bind the listener socket using the config that was sent to us. The code is pretty concise and self-explanatory. The only unknown is the backlog parameter that's passed to the listen method.

Backlog specifies how many connection requests the operating system should hold on to in a queue while the server is busy accepting other connections. If the server is under a heavy load, it might not be able to accept all the incoming connections immediately. Once the number of pending connections exceeds the backlog, any new connection requests fail immediately. This number is provided as a configuration element so that the user of the server gets to decide how big the queue should be.

The larger you keep the queue, the more resources are used by the OS to hold on to those connections. Read this answer on StackOverflow for a more detailed discussion about the backlog.

Handling Connections

Now that our Socket is set up to listen for incoming connections, we need to invoke the accept method of the Socket. As discussed before, this method will block until a new connection comes in. Then it will return with the new client socket that's created for communicating with the client.

This is when we need to think about what to do next. We have a client socket on which the client will send up the request message. We need to call the recv method to wait for that message to arrive, and once it does, process it. But if we start waiting for the client to send data and then process it, we cannot accept another connection until then. Remember, to accept any new connections, we need to call the accept method.

In general, a server is an infinite loop where the accept method is called again and again to accept any new connections which arrive. We want to be able to process the request of one connection as quickly as possible so that we can then call accept again and handle the new connection.

But processing the request quickly is not always possible. What if the client is slow in sending the request? What if the processing that we need to do is time-consuming? What if we need to reach out to other systems (such as a database server) over the network to fulfil the request? If we start processing one request, all the other clients will be blocked. Imagine not being able to connect to your favourite website because someone else is connected to it.

Hence we must delegate the handling of the client socket to a separate task that can run concurrently while we try and accept new connections. I have deliberately used the word task because we can use any concurrency model here. Different choices will have different tradeoffs, and the choice should be made depending on the characteristics of the application we're trying to serve using the server. In our case, let's use OS Threads as the concurrency model.

We want each client socket to be serviced by one thread. But that has a couple of problems. If we create a new thread every time we receive a connection, we will be adding latency to the processing of requests since creating threads takes some time. The other issue is more serious. If the server is under a heavy load, or even an attack, we could have thousands of requests coming in per second. If we create as many threads, then the OS is going to stall, thereby preventing us from serving any requests successfully.

To overcome both of these problems, we will have a pool of pre-created threads that will handle the requests. The number of threads available in the pool will be picked up from the config file. This number should be set based on the hardware resources available, the kind of work that needs to be done to process a request, and the expected load on the server. When all the threads are busy, the client socket will be put on a queue and will be serviced once a thread becomes available.

We can't have an unbounded queue here since that can again take down the server in case of a heavy load. Hence, the queue needs to have a fixed size. If the queue is full, the PUT call to it will block. We don't want to block it indefinitely, so we will put a timeout there. If we are not able to add a connection to the queue before timeout then we are going to respond to the client with an error (503 Service Unavailable).

When the request is picked up by a thread for processing, then the process method is called where we need to actually read the request and send a response. Since we don't have an actual application here, for now we will always send back a hard-coded response. Once the response is sent, the communication is done, and therefore, we close the socket. HTTP allows the clients to request the server not to close the connection.

This is because the client might want to make more requests to the same server, so creating a new connection per request would be unnecessary work. The client can indicate its wish through the Keep-Alive header in the HTTP request. The server may choose to honour the request, or it might ignore it as we have done. We have not even looked at the request, and, regardless of what the client asked for, we have sent the same dummy response back.

We did read a part of the request data coming in by calling the recv method. This method is blocking and will not return until at least one byte is available to be read. Hence, it's not a good idea to call recv without a timeout. Imagine what happens if an attacker wants to bring down your application. They can create many connections but then never actually send any data. Your receive call will block indefinitely waiting for the data to arrive, and meanwhile, all the threads in your pool will be blocked, making it impossible to serve any other requests from genuine users. We can set the recv timeout for the socket while creating it by setting a flag(SO_RCVTIMEO). Then, if the timeout happens, we can send a response to the client with the 408 Request Timeout error code.

While we have not handled some error/security scenarios and are not actually parsing the HTTP request, what we have built is an actual HTTP server. It's not very useful in the real world, but it does tell you how things work internally and what kind of things we need to consider while building such an application. The next responsibility of the server is to parse the HTTP request coming in and then hand it over to the application, which would then produce the response. Since we don't have an application using this server yet, we will just deal with the request parsing for now.

HTTP Request Parsing

A typical HTTP request looks like the following

The first line contains the HTTP Method/Verb followed by the resource URL and ending with the HTTP version. After this line comes a variable number of lines, each containing a header and value separated by a colon. When all the headers are done, there is a blank line, and whatever follows the blank line is the body. The Content-Length header tells how long the body is so that the server can read those many bytes and then know that it doesn't need to wait for more data to arrive.

All of the above data might not arrive at once. For this reason, one needs to do multiple calls to recv method of the socket until the complete request is parsed. While the logic to write the parser is not complicated, it's tedious to write a parser. Let's use a library to do that for us. We'll use the http-parser library. Let's modify the process method so that the request is parsed.

So finally we have parsed the request as well. We are still sending back a hard-coded response since we do not have an application yet that can generate a valid response for us. That's the topic of the next section. But now we're in a position to be able to run this server on our machine and access it through the browser or through any other HTTP client, like cURL or Postman.

Next we need to figure out a way so that application code can be loaded by our server so that we can pass on the parsed HTTP requests to it, and it can produce a more useful response for us than the hard-coded one we've had up until now. This is where WSGI comes into picture.

Make your HTTP server WSGI compliant

Until now, we have built a working HTTP server that could accept connections from a client, parse the request, and send back a dummy response. The only thing that remains is to be able to load an application developed by the end user of our server so that we invoke the request handlers of the application and get a useful response back.

Loading someone's application should not be that hard. We have importlib available in Python, which lets us import the user's application dynamically at run time. The user just needs to tell us where the application code is available on disk. That can be done using the config file. The harder part is to know how to communicate with that application, i.e. which function to invoke, with what parameters, and then what kind of reply to expect. This is where WSGI comes in.

Web Server Gateway Interface

When we develop web applications, we want to be able to deploy them on any server of our choice. And later we might decide to change to a different web server. Doing so should not require us to make any changes to our application. Our application should be oblivious to which server will be used to host it. This is only possible if all the servers decide to communicate with the applications in the same way. And that is what the Web Server Gateway Interface (WSGI) tries to achieve.

WSGI defines how any web server is going to invoke the application, what parameters it will send in, and what outcome it expects. PEP 3333, the specification for WSGI, describes it in detail, along with code samples for implementation. We'll discuss the specification in brief, and interested readers can read all the details from the PEP.

WSGI requires the imported application object to be a callable. Hence, it can be a function, a method, or a class or object with __call__ method defined. Every time a request comes in, the server is going to invoke this callable.

result = application(environ, start_response)

While invoking it, it's going to pass in two arguments:

  • The environment: This would be a dictionary that contains CGI-style environment variables describing the HTTP request. Some keys of this dictionary would be REQUEST_METHOD, SCRIPT_NAME, PATH_INFO, QUERY_STRING, CONTENT_TYPE, and CONTENT_LENGTH. This is how the parsed HTTP request is sent over to the application. There are also going to be WSGI-specific keys, such as wsgi.version, wsgi.url_scheme, wsgi.input, wsgi.multithread, etc. This extra metadata is for the application to know how it will be invoked. For example, if wsgi.multithread is set to True, then the application will be invoked by multiple threads at the same time, and hence would need to be thread-safe.
  • A callback function: The second argument provided is a callable. The application must invoke it once it wants to start sending the response. The first argument the application will send to it will be the HTTP response status code, such as 200 OK. The second argument is the dictionary of response headers to send back to the client. The third argument has to do with error handling, and we won't be discussing that here.

start_response(status, response_headers, exc_info=None)

As the result of the invocation, the application must return an iterable of bytestring. This will be the response body sent back to the client. The server needs to iterate over the returned iterable and send the bytestring on to the socket. When the iteration completes, the response is over, and the server may close the connection to the client.

Implementation

Now that we understand the interface we need to implement, let's get to it.

I have changed the worker to have an instance of class WSGI that we're going to create next. This class is going to implement the WSGI interface and load the user's application based on the configuration provided. When a new request arrives, the server is going to parse it and then delegate it to this class. It is the responsibility of the WSGI class to then delegate it to the loaded application. Let's look at the WSGI class implementation.

As can be seen, WSGI class uses importlib to import the application using the details provided in the config file. There are three things that we need to know to be able to load the application: the path where the module resides on the disk, the name of the module, and the name of the application within the module. Using these, WSGI class first loads the module from the given path and then extracts the application from the module. When the request arrives, the process method of this class is called with the parsed request and a callable through which any data can be sent to the client. In the process method, we create the parameters according to the WSGI specification and then invoke the loaded application with those parameters.

Once the application returns, we already know the HTTP status code to respond with and all the response headers to send. We send those first, and then we iterate through the result returned by the application and send all of those as the response body.

That's the end-to-end flow of how a request arrives at the server, is passed on to the application, and then the result produced by the application is sent back as the response.

The Final Frontier

Now that the server is done, how about we write an application that can use this server? But wait, that would mean the application will have to deal with all this WSGI stuff. In our day-to-day work, that is not something that the application developer deals with. Then who does? The web framework.

Why don't we write a simple web framework before we go on to write the application.

Let's call our framework PyDev. PyDev works a lot like Django. It requires you to have a settings module, the path to which is set in an environment variable. The framework loads the module dynamically. This module defines all the stuff that's needed to set up the framework.

In our case, the only thing this module will give us is a dictionary of URL-to-function mappings so that the framework knows which function of the application to invoke when the request comes in for a particular URL. The framework also defines its own request and response classes so that the application does not need to deal with the environment dictionary provided by the server based on the WSGI specification.

The framework has a WSGIHandler that's actually the application that the server will invoke. Its __call__ method would then proxy the request to the appropriate application function based on the request path. It understands the WSGI interface and communicates with the server accordingly, thereby abstracting the entire process from the application developer.

Let's look at an application built using this framework. The application will have three files: one corresponding to the views (i.e., the functions invoked in response to a URL being requested), one for the settings required by the PyDev framework, and one that will be the application module passed to the server.

The views

We have just one view function here for a health check of the application. Next, we want to bind this function to a particular URL. That will happen in the settings file.

The settings

We have mapped both / and /health to the same view. You can imagine an application having a lot of view functions with very complex logic. For the framework, the only thing that matters is the URL and corresponding view function.

WSGI application module

In this module, we set the PYDEV_SETTINGS_MODULE environment variable to point to the right module in our application. Then we create an attribute called application by calling the get_wsgi_application function of the PyDev framework. This function returns an instance of the WSGIHandler class. This application object will be used by the server to interact with our application. In Django, this file is auto-generated when startapp is run.

We can use the following config file to deploy this application on the server.

With that, we have created a HTTP server that implements the WSGI specification and developed a framework that can be used to build applications that would be deployed on this server. That's it, really. That's how servers and frameworks are written. If you go look at the source code of Gunicorn, you'll find a lot of similar code. Similarly, Django has some similar code as well.

I think it should be possible to deploy this application we built on Gunicorn. Similarly, it should be possible to deploy a Django application on our server. There might be a few kinks here and there since we didn't implement the WSGI specification entirely. But this should give you a good enough idea of how things can be made interoperable by defining and implementing standard interfaces.

Epilogue

We have successfully built a Python web server from scratch. And if you look at other web servers, no matter which language they are written in, they all work the same way. They all expose some of the same configuration parameters. So now it is possible to go back to the documentation of one these famous servers and actually understand what different configurations mean, and then decide which configuration would be the best fit for our application.