Skip to content

HTTP Protocol

Overview

HTTP (Hypertext Transfer Protocol) is a protocol for fetching resources such as HTML documents. It is the foundation of any data exchange on the Web and operates as a client-server protocol — requests are initiated by the recipient, usually a web browser.

Developed in the early 1990s, HTTP is a flexible protocol that has continuously evolved. Operating at the application layer (Layer 7 of the OSI model), it typically runs over TCP connections, which may be secured with TLS encryption. Its versatility extends beyond retrieving hypertext documents — HTTP handles images and videos, processes form submissions to servers, and can fetch specific document segments for dynamic webpage updates.

How It Works

System Architecture

HTTP functions as a client-server protocol where requests originate from a user-agent (typically a web browser) and are sent to servers that process these requests and return responses. Between these endpoints exist various intermediaries called proxies.

The Client Side: The user-agent acts on the user's behalf, with web browsers being the most common example. Browsers initiate all requests in the communication flow. When loading a webpage, a browser first requests the HTML document, then makes subsequent requests for additional resources like scripts, CSS files, images, and videos. These components are assembled to render the complete webpage.

The Server Side: The server responds to client requests by providing the requested documents. While appearing as a single entity, a server may actually be a cluster of machines sharing workload, a collection of software components (caches, databases, e-commerce systems), or multiple server instances on a single physical machine (through the Host header).

Intermediary Proxies: Between browsers and servers, numerous computers relay HTTP messages. Those functioning at the application layer are called proxies, which can be transparent (forwarding without modifications) or non-transparent (altering requests). Proxies serve purposes including caching, content filtering, load balancing, authentication, and request logging.

HTTP Flow

When a client wants to communicate with a server, it performs these steps:

  1. Open a TCP connection: Used to send one or more requests and receive answers. The client may open a new connection, reuse an existing one, or open several connections.
  2. Send an HTTP message:
    GET / HTTP/1.1
    Host: developer.mozilla.org
    Accept-Language: fr
    
  3. Read the response sent by the server:
    HTTP/1.1 200 OK
    Date: Sat, 09 Oct 2010 14:28:02 GMT
    Server: Apache
    Content-Type: text/html
    
    <!DOCTYPE html... (the requested web page)
    
  4. Close or reuse the connection for further requests.

HTTP Messages

There are two types of HTTP messages: requests and responses.

Requests consist of:

  • An HTTP method (verb like GET, POST, or noun like OPTIONS, HEAD) defining the operation
  • The path of the resource to fetch
  • The HTTP protocol version
  • Optional headers conveying additional information
  • A body for methods like POST

Responses consist of:

  • The HTTP protocol version
  • A status code indicating success or failure
  • A status message (short description of the status code)
  • HTTP headers
  • Optionally, a body containing the fetched resource

Status Codes

HTTP is stateless — it is up to the client to track request outcomes via response status codes:

  • 200 — OK
  • 301 — Moved Permanently (redirect)
  • 401 — Unauthorized (client must authenticate)
  • 403 — Forbidden (authenticated but not authorized)
  • 404 — Not Found
  • 405 — Method Not Allowed
  • 500 — Internal Server Error

HTTP Headers

HTTP headers let the client and server pass additional information with a request or response. A header consists of a case-insensitive name followed by a colon, then its value.

Headers can be grouped by context:

  • Request headers: Information about the resource to be fetched or about the requesting client
  • Response headers: Additional information about the response or the server
  • Representation headers: Information about the body (MIME type, encoding)
  • Payload headers: Representation-independent info about payload data (content length, encoding)

Important headers include:

  • Authorization: Basic <credentials> — send basic auth credentials (base64-encoded username:password)
  • Authorization: Bearer <token> — send a bearer token for token-based authentication
  • Accept: <MIME_type>/<MIME_subtype> — tell the server which data types the client accepts
  • Content-Type: text/html; charset=UTF-8 — media type of the message body
  • Set-Cookie: name=value — server sends cookies to the client for persistent sessions
  • Host: example.com — specifies which virtual host to serve (critical for name-based virtual hosting)

Key Terminology

Stateless
Each HTTP request is independent — the server does not retain information between requests. Session state is managed through cookies, tokens, or other mechanisms.
Idempotent
An HTTP method is idempotent if making the same request multiple times has the same effect as making it once. GET, PUT, and DELETE are idempotent; POST is not.
User-Agent
The client software making the request, most commonly a web browser.
MIME Type
Media type identifier (e.g., text/html, application/json, image/png) used in Content-Type and Accept headers.

Common Ports and Protocols

Port Protocol Purpose
80 TCP HTTP — unencrypted web traffic
443 TCP HTTPS — TLS-encrypted web traffic
8080 TCP Common alternative HTTP port
8443 TCP Common alternative HTTPS port

Why It Matters

As a system administrator, you will:

  • Configure web servers to handle HTTP requests and serve content
  • Set up virtual hosts to serve multiple websites from one server
  • Configure reverse proxies to route traffic to backend applications
  • Debug connectivity issues by reading HTTP headers and status codes
  • Secure HTTP traffic with TLS (HTTPS) on port 443
  • Analyze access and error logs that record HTTP transactions

Common Pitfalls

  1. Forgetting that HTTP is stateless — sessions require explicit mechanisms (cookies, tokens) to persist state between requests.
  2. Not checking status codes — a 200 response doesn't mean the content is correct; a 404 might indicate a misconfigured DocumentRoot, not a missing server.
  3. Ignoring the Host header — name-based virtual hosting depends entirely on this header. Without it, the server cannot determine which site to serve.
  4. Caching surprises — browsers and proxies cache aggressively. Use Ctrl+F5 to bypass cache when testing changes.
  5. Mixed content — serving HTTP resources on an HTTPS page triggers browser security warnings.

Further Reading