The Internet Protocol (IP) is the cornerstone of the TCP/IP protocol suite. TCP/IP refers to a combination of two protocols, IP and the Transmission Control Protocol (TCP), which together provide one of the most common network transport services used today. TCP data is encapsulated within IP, as are most of the other protocols in the TCP/IP suite. IP essentially functions as the envelope that delivers TCP/IP data to its destination.
On a TCP/IP internetwork, IP is the protocol responsible for transmitting data from its source to its final destination. IP is a connectionless protocol, meaning that it transmits messages to a destination without first establishing a connection to the receiving system. IP is connectionless because it carries data generated by many other protocols, only some of which require connection-oriented service. TCP/IP supports both connection-oriented and connectionless services at the transport layer, which makes it possible to keep the network layer connectionless, thus reducing the amount of control overhead generated by the protocol stack.
A transport layer protocol like TCP or the User Datagram Protocol (UDP) passes data down to the network layer, and IP encapsulates it by adding a header, creating what's known as a datagram, as shown in Figure 6.1. The datagram is addressed to the computer that will ultimately make use of the data, whether that computer is on the local network or on another network far away. Except for a few minor modifications, the datagram remains intact throughout the packet's journey to its destination. Once it has created the datagram, IP passes it down to a data-link layer protocol for transmission over the network.
During the transportation process, various systems might encapsulate the datagram in different data-link layer protocol headers, but the datagram itself remains intact. The process is similar to the delivery of a letter by the post office, with IP functioning as the envelope. The letter might be placed into different mailbags and transported by various trucks and planes during the course of its journey, but the envelope remains sealed. Only the addressee is permitted to open it and make use of the contents.
The TCP/IP protocols are defined in documents called Requests For Comments (RFCs), which are published by a body called the Internet Engineering Task Force (IETF). Unlike most networking standards, TCP/IP specifications are released to the public domain and are freely available on the Internet at many different sites, including the IETF's home page at www.ietf.org. The "Internet Protocol" specification was published as RFC 791 in September 1981, and was later ratified as Internet Standard 5.
IP performs several functions that are essential to the internetworking process, including the following:
These functions are discussed in the following sections.
The header that IP applies to the data it receives from the transport layer protocol is typically 20 bytes long. The datagram format is shown in Figure 6.2.
The datagram fields perform the following functions:
The IP protocol is unique among network layer protocols in that it has its own self-contained addressing system that it uses to identify computers on an internetwork of almost any size. Other network layer protocols (such as IPX) use hardware addresses to identify computers on a LAN, with a separate address for the network, while NetBEUI assigns a name to each computer on the LAN and has no network address. IP addresses are 32 bits long and contain both a network identifier and a host identifier. In TCP/IP parlance, the term "host" refers to a network interface adapter found in a computer or other device. In most cases, each computer on a network has one IP address, but it is actually the network interface adapter (generally a network interface card, or NIC) that the address represents. A computer with two adapters (such as a router) or one adapter and a modem connection to a network will actually have two IP addresses, one for each interface.
The IP addresses that a system inserts into the Source IP Address and Destination IP Address fields of the IP header identify, respectively, the system that created the packet and the system that will eventually receive it. If the packet is intended for a system on the local network, the Destination IP Address refers to the same system as the Destination Address in the data-link protocol header. However, if the packet's destination is a system on another network, the Destination IP Address refers to a different system because IP is an end-to-end protocol that deals with the entire journey of the data to its ultimate destination, not just with a single network hop, as is the case with the data-link layer protocol.
Data-link layer protocols cannot work with IP addresses, however, so in order to actually transmit the datagram, IP has to supply the data-link layer protocol with a hardware address of a system on the local network. To do this, IP uses another TCP/IP protocol called the Address Resolution Protocol (ARP). ARP works by generating broadcast messages that contain an IP address on the local network. The system using that IP address must respond to the broadcast, and the data-link layer protocol header of the reply message contains the system's hardware address. If the datagram's destination system is on the local network, the IP protocol generates an ARP message containing the IP address of that system. If the destination system is located on another network, IP generates an ARP message containing the address of a router on the local network. Once it has received the ARP reply, the IP protocol on the original system can pass the datagram down to the data-link layer protocol and provide it with the hardware address it needs to build the frame.
Routing is the most important and the most complex function of the IP protocol. When a TCP/IP system has to transmit data to a computer on another network, the packets must travel through the routers that connect the networks together. As explained in Chapter 1, "Networking Basics," the source and final destination computers in a case like this are called end systems and the routers are called intermediate systems (see Figure 6.3). When the packets pass through an intermediate system, they only travel up through the protocol stack as high as the network layer, where IP is responsible for deciding where to send the packet next. If the router is connected to the network where the destination system is located, it can transmit the packet there, and the packet's journey is over. If the destination system is located on another network, the router sends the packet to another router, which brings the packet one hop closer to its destination. Depending on the complexity of the internetwork, a packet might pass through dozens of routers on the way to its destination.
Because packets only reach as high as the network layer in an intermediate system, the datagrams are not opened and used. The router strips off the data-link layer frame and later builds a new one, but the datagram "envelope" remains sealed until it reaches its destination. However, each intermediate system does make some changes to the IP header. The most important of these is the Time To Live (TTL) field, which is set with a predetermined value by the computer that generates the packet. Each router, as it processes the packet, reduces this value by one. If the TTL value reaches zero, the router discards the packet. This mechanism prevents packets from circulating endlessly around an internetwork, in the event of a routing problem.
When a router discards a packet with a TTL value of zero, it generates an error message called a Time To Live Exceeded In Transit message using the Internet Control Message Protocol (ICMP) and sends it to the system where the packet originated. This informs the system that the packet has not reached its destination. There is a utility program called Traceroute included with most TCP/IP implementations that uses the TTL field to display a list of the routers that packets are using to reach a particular destination system. By generating a series of packets with successively larger TTL values, each router in turn generates an ICMP error message identifying the router that discarded the packet. The Traceroute program assembles the router addresses from the error messages and displays the entire route to the destination. For more information about Traceroute, see Lesson 2: TCP/IP Utilities, in Chapter 10, "TCP/IP Applications."
Routers can connect networks that use different media types and different data-link layer protocols, but in order to forward packets from one network to another, routers must often repackage the datagrams into different data-link layer frames. In some cases, this is simply a matter of stripping off the old frame and adding a new one, but at other times the data-link layer protocols are different enough to require more extensive repackaging. For example, when a router connects a Token Ring network to an Ethernet network, datagrams arriving from the Token Ring network can be up to 4,500 bytes long, while the datagrams in Ethernet packets can only be as large as 1,500 bytes.
To overcome this problem, the router splits the datagram arriving from the Token Ring network into multiple fragments, as shown in Figure 6.4. Each fragment has its own IP header and is transmitted in a separate data-link layer frame. The size of each fragment is based on the maximum transfer unit (MTU) size for the outgoing network. If they encounter a network with an even smaller MTU, fragments can themselves be split into smaller fragments. Once fragmented, the individual parts of a datagram are not reassembled until they reach the end system, which is their final destination.
When it fragments a datagram, IP attaches an IP header to each fragment. The Identification field in each fragment's header contains the same value as the datagram's original header, which enables the destination system to associate the fragments of a particular datagram. The router modifies the value of the Total Length fields to reflect the length of each fragment, and also changes the value of the More Fragments bit in the Flags field from 0 to 1 in all of the fragments except the last one. The value of 1 in this bit indicates that there are more fragments coming for that datagram. The destination system uses this bit to determine when it has received all of the fragments and can begin to assemble them back into the whole datagram.
The Fragment Offset field contains a value that specifies each fragment's place in the datagram. The first fragment has a value of 0 in this field, while the value in the second fragment is the size (in bytes) of the first fragment. The third fragment's offset value is the size of the first two fragments, and so forth. The destination system uses these values to reassemble the fragments in the proper order. Another bit in the Flags field, called the Don't Fragment bit, instructs routers to discard a datagram rather than fragment it. The router returns an ICMP error message to the source system when it discards a packet for this reason.
In order for the destination system to process the incoming datagram properly, it must know which protocol generated the information carried in the Data field. The Protocol field in the IP header provides this information, using codes that are defined in RFC 1700, called "Assigned Numbers," which contains lists of the many codes used by the TCP/IP protocols. "Assigned Numbers" contains dozens of protocol codes, most of which are for obsolete or seldom-used protocols. The most commonly used values for the Protocol field are as follows:
The protocols that you most expect to see in the list are TCP and UDP, which are the transport layer protocols that account for much of the IP traffic on a TCP/IP network. However, IP also carries other types of information in its datagrams, including ICMP messages, which notify systems of errors and other network conditions, and messages generated by routing protocols like GGP and EGP, which TCP/IP systems use to automatically update their routing tables.
IP options are additional header fields that enable datagrams to carry extra information and, in some cases, accumulate information as they travel through an internetwork on the way to their destinations. Some of the options defined in the IP standard are as follows: