HTTP Protocol¶

Overview¶

HTTP (Hypertext Transfer Protocol) is a protocol for fetching resources such as HTML documents. It is the foundation of any data exchange on the Web and operates as a client-server protocol — requests are initiated by the recipient, usually a web browser.

Developed in the early 1990s, HTTP is a flexible protocol that has continuously evolved. Operating at the application layer (Layer 7 of the OSI model), it typically runs over TCP connections, which may be secured with TLS encryption. Its versatility extends beyond retrieving hypertext documents — HTTP handles images and videos, processes form submissions to servers, and can fetch specific document segments for dynamic webpage updates.

How It Works¶

System Architecture¶

HTTP functions as a client-server protocol where requests originate from a user-agent (typically a web browser) and are sent to servers that process these requests and return responses. Between these endpoints exist various intermediaries called proxies.

The Client Side: The user-agent acts on the user's behalf, with web browsers being the most common example. Browsers initiate all requests in the communication flow. When loading a webpage, a browser first requests the HTML document, then makes subsequent requests for additional resources like scripts, CSS files, images, and videos. These components are assembled to render the complete webpage.

The Server Side: The server responds to client requests by providing the requested documents. While appearing as a single entity, a server may actually be a cluster of machines sharing workload, a collection of software components (caches, databases, e-commerce systems), or multiple server instances on a single physical machine (through the Host header).

Intermediary Proxies: Between browsers and servers, numerous computers relay HTTP messages. Those functioning at the application layer are called proxies, which can be transparent (forwarding without modifications) or non-transparent (altering requests). Proxies serve purposes including caching, content filtering, load balancing, authentication, and request logging.

HTTP Flow¶

When a client wants to communicate with a server, it performs these steps:

Open a TCP connection: Used to send one or more requests and receive answers. The client may open a new connection, reuse an existing one, or open several connections.

Send an HTTP message:

GET / HTTP/1.1
Host: developer.mozilla.org
Accept-Language: fr

Read the response sent by the server:

HTTP/1.1 200 OK
Date: Sat, 09 Oct 2010 14:28:02 GMT
Server: Apache
Content-Type: text/html

<!DOCTYPE html... (the requested web page)

Close or reuse the connection for further requests.

HTTP Messages¶

There are two types of HTTP messages: requests and responses.

Requests consist of:

An HTTP method (verb like GET, POST, or noun like OPTIONS, HEAD) defining the operation
The path of the resource to fetch
The HTTP protocol version
Optional headers conveying additional information
A body for methods like POST

Responses consist of:

The HTTP protocol version
A status code indicating success or failure
A status message (short description of the status code)
HTTP headers
Optionally, a body containing the fetched resource

Status Codes¶

HTTP is stateless — it is up to the client to track request outcomes via response status codes:

200 — OK
301 — Moved Permanently (redirect)
401 — Unauthorized (client must authenticate)
403 — Forbidden (authenticated but not authorized)
404 — Not Found
405 — Method Not Allowed
500 — Internal Server Error

HTTP Headers¶

HTTP headers let the client and server pass additional information with a request or response. A header consists of a case-insensitive name followed by a colon, then its value.

Headers can be grouped by context:

Request headers: Information about the resource to be fetched or about the requesting client
Response headers: Additional information about the response or the server
Representation headers: Information about the body (MIME type, encoding)
Payload headers: Representation-independent info about payload data (content length, encoding)

Important headers include:

Authorization: Basic <credentials> — send basic auth credentials (base64-encoded username:password)
Authorization: Bearer <token> — send a bearer token for token-based authentication
Accept: <MIME_type>/<MIME_subtype> — tell the server which data types the client accepts
Content-Type: text/html; charset=UTF-8 — media type of the message body
Set-Cookie: name=value — server sends cookies to the client for persistent sessions
Host: example.com — specifies which virtual host to serve (critical for name-based virtual hosting)

Key Terminology¶

Stateless: Each HTTP request is independent — the server does not retain information between requests. Session state is managed through cookies, tokens, or other mechanisms.
Idempotent: An HTTP method is idempotent if making the same request multiple times has the same effect as making it once. GET, PUT, and DELETE are idempotent; POST is not.
User-Agent: The client software making the request, most commonly a web browser.
MIME Type: Media type identifier (e.g., text/html, application/json, image/png) used in Content-Type and Accept headers.

Common Ports and Protocols¶

Port	Protocol	Purpose
80	TCP	HTTP — unencrypted web traffic
443	TCP	HTTPS — TLS-encrypted web traffic
8080	TCP	Common alternative HTTP port
8443	TCP	Common alternative HTTPS port

Why It Matters¶

As a system administrator, you will:

Configure web servers to handle HTTP requests and serve content
Set up virtual hosts to serve multiple websites from one server
Configure reverse proxies to route traffic to backend applications
Debug connectivity issues by reading HTTP headers and status codes
Secure HTTP traffic with TLS (HTTPS) on port 443
Analyze access and error logs that record HTTP transactions

Common Pitfalls¶

Forgetting that HTTP is stateless — sessions require explicit mechanisms (cookies, tokens) to persist state between requests.
Not checking status codes — a 200 response doesn't mean the content is correct; a 404 might indicate a misconfigured DocumentRoot, not a missing server.
Ignoring the Host header — name-based virtual hosting depends entirely on this header. Without it, the server cannot determine which site to serve.
Caching surprises — browsers and proxies cache aggressively. Use Ctrl+F5 to bypass cache when testing changes.
Mixed content — serving HTTP resources on an HTTPS page triggers browser security warnings.