HTTP protocol and working with headers

HTTP protocol
How the WWW (world wide web, web) works in a nutshell:
- the user's browser (client) sends a request to the server with the site address (URL);
- the server receives this request and gives the client the required content.
In other words, the entire modern web is built on a client-server model. And for this whole process to be possible, a universal protocol language is needed that will be understood by both the server and the browser. There is such a protocol, and it is called HTTP.
How HTTP works and why we need to know
You can program in PHP without knowing the HTTP protocol, but there are a number of situations when you need to know how a web server works to solve problems. After all, PHP is, first of all, a server-side programming language.
The HTTP protocol is very simple and essentially consists of two parts:
- Request / response headers;
- Request / response bodies.
First there is a list of headers, then an empty string, and then (if any) the body of the request / response.
Both the client and the server can send each other the headers and response body, but in the case of the client, the available headers will be one, and with the server - different. Let's consider step by step how the work over the HTTP protocol will look like when the user wants to load the main page of the Vkontakte social network.
1. The user's browser establishes a connection to the vk.com server and sends the following request:
GET / HTTP / 1.1 Host: phpclassroom.com
2. The server accepts the request and sends a response:
HTTP/1.1 200 OK
Server: Apache
<html>
<head>
<title>PHPclassroom</title>
</head>
3. The browser accepts the response and shows the finished page
Most of all, we are interested in the very first step, where the browser initiates a request to the vk.com server.
Let's take a closer look at what happens there. The first line of the query defines several important parameters, namely:
- The method by which the content will be requested;
- Page address;
- Protocol version.
GET
Is the method (verb) that we use to access the specified page.GET
is the most commonly used method because it tells the server that the client only wants to read the specified document. But besides GET
there are other methods, we will consider one of them in the next section.
The method is followed by an indication of the page address - URI (Uniform Resource Identifier). In our case, we are requesting the main page of the site, so just a slash is used - /
.
The last on this line is the version of the protocol and almost always it will beHTTP/1.1
The line indicating the main parameters is always followed by a listing of headers that convey additional useful information to the server: the name and version of the browser, language, encoding, caching parameters, and so on.
Among all these headers, which are transmitted with every request, there is one required and the most important - this is the header Host
. It defines the domain address that the client browser requests.
The server, having received the request, looks for the site with the domain from the header Host
, as well as the specified page.
If the requested site and page are found, a response is sent to the client:HTTP/1.1 200 OK
This response means that everything is fine, the document has been found and will be sent to the client. More generally, the start line of the response has the following structure:HTTP/Версия Код состояния Пояснение
The most interesting thing here is the status code, it is also the server response code.
In this example, the response code is 200, which means: the server is running, the document is found and will be transferred to the client. But things don't always go smoothly.
For example, the requested document may be missing or the server will be overloaded, in which case the client will not receive the content, and the response code will be other than 200.
- 404 - if the server is available, but the requested document was not found;
- 503 - if the server cannot process requests for technical reasons.
The HTTP 1.1 specification defines 40 different HTTP codes.
After the start line, there are headers, and then the body of the response.
Working with headers in PHP
PHP has all the options for interacting with the HTTP protocol:
- Getting the request body;
- Getting request headers;
- Adding / changing response headers;
- Controlling the response body.
Let's sort everything out in order.
Retrieving the request body
The request body is the information that the browser passed when the page was requested.
But the request body is only present if the browser has requested the page using the POST
.
The point is that it POST
is a method specifically designed to send data to a site. Most often, the POST
browser uses the method when the form is submitted. In this case, the body of the request will be the content of the form.
In a PHP script, all submitted form data will be available in a special array $_POST
. More on this in the next chapter on Forms.
Retrieving Request Headers
Recall again that request headers are meta information sent by the browser when a script is requested.
PHP automatically extracts such headers and puts them in a special array - $_SERVER
.
It is worth noting that this array contains other information in addition to headers. The values of the request headers are found under the keys that start with HTTP_
.
An example of how to get the previous page from which the user went:
print($_SERVER['HTTP_REFERER']);
Adding / changing response headers
In a PHP script, you can manage all the response headers that reach the user along with the page content. This is possible because PHP runs on the web server side and has very tight integration with it.
Here are some examples of scenarios when managing response headers comes in handy:
- Caching;
- User redirection;
- Installation of cookies;
- Sending files;
- Sending additional information to the browser.
Response headers are needed to accomplish many important tasks.
PHP has a function to send or change the header: header()
.
It takes the name and value of the header and adds it to the list of all headers that go to the user's browser after the script finishes.
For example, this is how the user is redirected to another page:
header("Location: /index.php");
The header with the name is responsible for the redirection Location
, and the value is set after a colon - the address of the page for the transition.
An important note on the use of headers
There is one limitation: headers cannot be sent if any content has already been sent to the user. That is, if you show something on the screen, for example, through a function print()
, then after that you will not be able to change the headers.
Response body control
Everything PHP prints to the screen is the content of the response. In other words, function calls print
, echo
or displaying text via short tags, are the body of the response that goes to the browser to the user.
Request parameters
We are used to the fact that on our site, each PHP script is responsible for one page. A site visitor enters a path in the address bar that consists of the domain name and the name of the PHP script. For example, as follows: http://weather-diary.ru/day.php
.
But what if one page has to show different information?
On the website of the diary of observations of the weather, we made a separate page to show information about the weather from the history for one specific day. That is, the page is one, but shows different data, depending on the selected day.
Also, users want to bookmark the addresses of the pages with the days they need. It turns out that having only one script to make a page capable of showing the weather diary for any day is impossible? Not at all!
What does a URI consist of
URI is a unique identifier for a resource. The resource in our case is the full path to the site page. And this is what a resource might look like to display the weather for a specific day:http://weather-diary.com/day.php?date=2017-10-15
Let's see what this URI consists of.
Firstly, there is a domain name: weather-diary.ru
.
Then comes the name of the script: day.php
And all that comes after is the request parameters .
Request parameters are like additional attributes of the page address. They are separated from the page name by the request sign. In the example above, there is only one query parameter: date = 2017-10-30 .
The name of the parameter: date
value: 2017-10-15
.
There can be several query parameters, then they are separated by the ampersand sign:?date=2017-10-15&tscale=celsius
The example above specifies two arguments: date and temperature unit.
Request parameters as external variables
Now the page url is using query parameters, but what use is that to us? It consists in the fact that if the page name invokes the corresponding PHP script for execution, then the request parameters are turned into special external variables in this script. That is, if the address contains such parameters, then it is easy to get them inside the script code and perform some actions with them. For example, show the weather for a specific day in selected units.
Retrieving request parameters
If there are external variables, how do you read them?
All request parameters are in a special, associative array $_GET
, which means that the script called with this address: day.php?date=2017-10-15&tscale=celsius
will have two values in this array with keys date
and scale
.
The request to get data for the selected day looks like this:
Never rely on the existence of a parameter in an array $_GET
and do the check either as a function isset()
or as in this example.
Formation of URI with request parameters
Sometimes you need to perform the opposite operation: form the page address by including the required query parameters from the array.
Let's say on the page of the weather diary you need to put a link to the next and the previous day. The selected unit of measure must also be saved. That is, you need to save the current query parameters, change the value of one of them (day), and generate a new link.
Here's how you can do it:
<?php
$params = $_GET;
$date = $params['date'] ?? date('Y-m-d');
$tomorrow = date('Y-m-d', strtotime('tomorrow', strtotime($date)));
$params['date'] = $tomorrow;
$url = basename(__FILE__) . '/?' . http_build_query($params);
print($url);
We have used two functions here:
basename(__FILE__)
- gets the name of the current script;http_build_query()
- converts an associative array to a query string.