Learn about website's request and response
what happens when you visit a website?
Step 1: Extracting the Domain Name
Once you enter http://www.google.com/, your browser determines the domain name from the URL. A domain name identifies which website you’re trying to visit and must adhere to specific rules as defined by RFCs. For example, a domain name can only contain alphanumeric characters and underscores.
In this case, the domain is www.google.com. The domain serves as one way to find the server’s address.
Step 2: Resolving an IP Address
After determining the domain name, your browser uses IP to look up the IP address associated with the domain. This process is referred to as resolving the IP address, and every domain on the internet must resolve to an IP address to work.
Two types of IP addresses exist: Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6). IPv4 addresses are structured as four numbers connected by periods, and each number falls in a range from 0 to 255. IPv6 is the newest version of the Internet Protocol. It was designed to address the problem of available IPv4 addresses running out.
IPv6 addresses are made up of eight groups of four hexadecimal digits separated by colons, but methods exist to shorten IPv6 addresses. For example, 8.8.8.8 is an IPv4 address, and 2001:4860:4860::8888 is a shortened IPv6 address.
To look up an IP address using just the domain name, your computer sends a request to Domain Name System (DNS) servers, which consist of specialized servers on the internet that have a registry of all domains and their matching IP addresses. The preceding IPv4 and IPv6 addresses are Google DNS servers.
In this example, the DNS server you connect to would match www.google.com to the IPv4 address 216.58.201.228 and send that back to your computer. To learn more about a site’s IP address, you can use the command dig A site.com from your terminal and replace site.com with the site you’re looking up.
Step 3: Establishing a TCP Connection
Next, the computer attempts to establish a Transmission Control Protocol (TCP) connection with the IP address on port 80 because you visited a site using http://. The details of TCP aren’t important other than to note that it’s another protocol that defines how computers communicate with each other. TCP provides two-way communication so that message recipients can verify the information they receive, and nothing is lost in transmission.
The server you’re sending a request to might be running multiple services (think of a service as a computer program), so it uses ports to identify specific processes to receive requests. You can think of ports as a server’s doors to the internet. Without ports, services would have to compete for the information being sent to the same place. This means that we need another standard to define how services cooperate with each other and ensure that the data for one service isn’t stolen by another.
For example, port 80 is the standard port for
sending and receiving unencrypted HTTP requests.
Another common port is 443, which is used for
encrypted HTTPS requests. Although port 80 is standard
for HTTP and 443 is standard for HTTPS, TCP
communication can happen on any port, depending on
how an administrator configures an application. You can establish your own TCP connection to a
website on port 80 by opening your terminal and
running nc
Step 4: Sending an HTTP Request
➊ GET / HTTP/1.1
➋ Host: www.google.com
➌ Connection: keep-alive
➍ Accept: application/html, */*
➎ User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36(KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36
The browser makes a GET request to the / path ➊, which is the website’s root. A website’s content is organized into paths, just like the folders and files on your computer. As you get deeper into each folder, the path you take is denoted by recording each folder’s name followed by a /. When you visit the first page of a website, you access the root path, which is just a /. The browser also indicates it’s using the HTTP version 1.1 protocol. A GET request just retrieves information. We’ll learn more about it later.
The host header ➋ holds an additional piece of information that is sent as part of the request. HTTP 1.1 needs it to identify where a server at the given IP address should send the request because IP addresses can host multiple domains.
A connection header ➌ indicates the request to keep the connection with the server open to avoid the overhead of constantly opening and closing connections.
You can see the expected response format at ➍. In this case, we’re expecting application/html but will accept any format, as indicated by the wildcard (*/*). There are hundreds of possible content types, but for our purposes, you’ll see application/html, application/json, application/octet-stream, and text/plain most often.
Finally, the User-Agent ➎ denotes the software responsible for sending the request.
Step 5: Server Response
In response to our request, the server should respond with something that looks like below:
Here, we’ve received an HTTP response with the status code 200 ➊ adhering to HTTP/1.1. The status code is important because it indicates how the server is responding. Also defined by RFC, these codes typically have three-digit numbers that begin with 2, 3, 4, or 5. Although there is no strict requirement for servers to use specific codes, 2xx codes typically indicate a request was successful.
Because there is no strict enforcement of how a server implements its use of HTTP codes, you might see some applications respond with a 200 even though the HTTP message body explains there was an application error.
An HTTP message body is the text associated with a request or response ➌. In this case, we’ve removed the content and replaced it with —snip— because of how big the response body from Google is. This text in a response is usually the HTML for a web page but could be JSON for an application programming interface, file contents for a file download, and so on.
The Content-Type header ➋ informs the browsers of the body’s media type. The media type determines how a browser will render body contents. But browsers don’t always use the value returned from an application; instead, browsers perform MIME snif ing, reading the first bit of the body contents to determine the media type for themselves. Applications can disable this browser behavior by including the header X-Content-TypeOptions: nosnif , which is not included in the preceding example.
Other response codes starting with 3 indicate a redirection, which instructs your browser to make an additional request. For example, if Google theoretically needed to permanently redirect you from one URL to another, it could use a 301 response. In contrast, a 302 is a temporary redirect.
When a 3xx response is received, your browser should make a new HTTP request to the URL defined in a Location header, as follows: HTTP/1.1 301 Found Location: https://www.google.com/ Responses starting with a 4 typically indicate a user error, such as response 403 when a request doesn’t include proper identification to authorize access to content despite providing a valid HTTP request.
Responses starting with a 5 identify some type of server error, such as 503, which indicates a server is unavailable to handle the sent request.
Step 6: Rendering the Response
Because the server sent a 200 response with the content type text/html, our browser will begin rendering the contents it received. The response’s body tells the browser what should be presented to the user. For our example, this would include HTML for the page structure; Cascading Style Sheets (CSS) for the styles and layout; and JavaScript to add additional dynamic functionality and media, such as images or videos. It’s possible for the server to return other content, such as XML, but we’ll stick to the basics for this example.
Because it’s possible for web pages to reference external files such as CSS, JavaScript, and media, the browser might make additional HTTP requests for all a web page’s required files. While the browser is requesting those additional files, it continues parsing the response and presenting the body to you as a web page.
In this case, it will render Google’s home page, www.google.com. Note that JavaScript is a scripting language supported by every major browser. JavaScript allows web pages to have dynamic functionality, including the ability to update content on a web page without reloading the page, check whether your password is strong enough (on some websites), and so on. Like other programming languages, JavaScript has built-in functions and can store values in variables and run code in response to events on a web page.
It also has access to various browser application programming interfaces (APIs). These APIs enable JavaScript to interact with other systems, the most important of which may be the document object model (DOM). The DOM allows JavaScript to access and manipulate a web page’s HTML and CSS. This is significant because if an attacker can execute their own JavaScript on a site, they’ll have access to the DOM and can perform actions on the site on behalf of the targeted user.
very useful
ReplyDeletevery understandable
Delete