1. Concept of Layering
Purpose of Layering: In networking, layering is an approach to divide the complex task of network communication into manageable sub-tasks, each handled by a different layer (Data Communication and Networking | PPT). Each layer provides a specific set of functions and services to the layer above, while relying on services from the layer below (Data Communication and Networking | PPT). This separation of concerns makes implementations modular and transparent to other components (Data Communication and Networking | PPT). For example, a transport layer protocol can function over different network layer technologies without modification. Key advantages of layering include independent design, development, and testing of each layer, and the ability to update one layer’s implementation without affecting others (Data Communication and Networking | PPT). (Overly fine layering can introduce performance overhead, but the seven-layer model discussed below has proven a useful balance (Data Communication and Networking | PPT).) The layering concept also introduces the idea of protocol data units (PDUs) – each layer encapsulates the data from the layer above by adding its own header (and sometimes trailer), forming a PDU that is passed to the layer below (Additive increase/multiplicative decrease – Wikipedia) (Additive increase/multiplicative decrease – Wikipedia). At the receiving end, each layer processes and removes its header, handing the remaining payload up the stack.
1.1 OSI and TCP/IP Protocol Stacks
Two primary reference models illustrate network layering: the OSI model and the TCP/IP model. The ISO/OSI (Open Systems Interconnection) model defines 7 layers, while the TCP/IP (Internet) model traditionally has 4 layers (sometimes expanded to 5 by separating physical and data link) (Difference Between OSI Model and TCP/IP Model – GeeksforGeeks) (Difference Between OSI Model and TCP/IP Model – GeeksforGeeks). Each layer has specific responsibilities:
- Layer 1: Physical (OSI) – Handles transmission of raw bits over a physical medium. It deals with electrical/optical signaling, voltage levels, timing, and data rates. (The TCP/IP model doesn’t explicitly separate physical and link; both are often encompassed in a single “Link” or “Network Interface” layer.)
- Layer 2: Data Link (OSI) – Responsible for node-to-node data transfer over a single link. It packages bits into frames, does error detection, and manages access to the shared medium. In TCP/IP, this is part of the Link layer.
- Layer 3: Network (OSI) – Handles routing of packets across multiple links (internetworking). It assigns logical addresses (e.g., IP addresses) and determines paths for data. In the Internet model, this is the Internet layer (with IP as the core protocol).
- Layer 4: Transport (OSI) – Provides end-to-end communication between hosts. It ensures data is delivered reliably and in order (for connection-oriented transport like TCP) or provides faster, connectionless delivery (like UDP). The TCP/IP model’s Transport layer corresponds directly.
- Layer 5: Session (OSI) – Manages dialogs (connections) between applications. It handles session establishment, maintenance, and termination. (Not distinctly present in TCP/IP; its functions, if needed, are often handled by applications or transport protocols.)
- Layer 6: Presentation (OSI) – Translates data formats between the network and an application. This can include character encoding, encryption, or compression so that systems with different representations can communicate. (Also not separate in TCP/IP; applications handle these tasks if required.)
- Layer 7: Application (OSI) – Interface for end-user applications and processes to access network services. Protocols like HTTP, SMTP, FTP, DNS operate here. In TCP/IP, the Application layer encompasses any protocols above transport.
The OSI model is a theoretical framework that cleanly separates functions into seven layers, whereas the TCP/IP model was developed from the practical protocols of the early Internet and condenses functionality into fewer layers (Difference Between OSI Model and TCP/IP Model – GeeksforGeeks) (Difference Between OSI Model and TCP/IP Model – GeeksforGeeks). Notably, the TCP/IP model’s Application layer covers OSI’s Application, Presentation, and Session layers; and its Link layer covers OSI’s Data Link and Physical layers. Despite these structural differences, the core idea is similar: each layer communicates with its peer layer on another system using a defined protocol, and provides services to the layer above via a defined interface. For instance, TCP (Transport layer) provides a reliable byte-stream service to an application like HTTP, while using the unreliable packet service of IP (Network layer) beneath it (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks).
(Difference Between OSI Model and TCP/IP Model – GeeksforGeeks) Comparison of the OSI (7-layer) model and the TCP/IP (4-layer) model. The OSI scheme is a conceptual framework with separate layers, while the Internet (TCP/IP) model has a simplified layer stack reflecting the actual protocol suite used in practice (Difference Between OSI Model and TCP/IP Model – GeeksforGeeks).
It’s important to note that the OSI model was developed as a standardized reference and had its own protocol suite which largely failed to gain traction (for reasons of complexity and timing) (Andrew S. Tanenbaum – Computer Networks). In contrast, the TCP/IP protocol suite became dominant; its model is less of a strict blueprint but rather a description of the protocols in use (Andrew S. Tanenbaum – Computer Networks). Nonetheless, OSI’s layered principles remain very influential. Each layer in either model carries out a well-defined role: for example, the Network layer (IP) is in charge of logical addressing and routing, regardless of whether the underlying link is Ethernet, Wi-Fi, or another technology. This abstraction is what makes the Internet possible – higher layers need not worry whether data travels over fiber, copper, or radio, or through how many intermediate networks. They see a uniform network service provided by the layer below.
Encapsulation and Peer Communication: As data moves down the layers on the sender side, each layer encapsulates the higher-layer data with its own headers (and footers, like a Frame Check Sequence at data link). For example, an HTTP request (Application data) is encapsulated in a TCP segment (with TCP header), which is placed in an IP packet (with IP header), which in turn goes into an Ethernet frame (with Ethernet header and trailer) for transmission. At the receiver, each layer removes the corresponding header and processes the information, delivering the payload to the next layer up (Additive increase/multiplicative decrease – Wikipedia) (Additive increase/multiplicative decrease – Wikipedia). Each layer thus communicates with its peer layer using the protocol rules, but those peer layers exchange information only through the service of lower layers. For instance, two transport-layer TCP peers exchange TCP segments carried inside IP packets; the IP layer doesn’t understand the TCP header, but simply treats the segment as payload.
In summary, layering provides a modular network architecture: it simplifies design and interoperability by enforcing clear boundaries and standard interfaces between different functionalities (Data Communication and Networking | PPT) (Data Communication and Networking | PPT). This concept is embodied both in the 7-layer OSI reference model and the simplified 4-5 layer TCP/IP model used to describe the Internet.
1.2 Switching Methods
In addition to layered protocols, network design involves how data moves through the network – the switching method. The primary switching methods are circuit switching, packet switching, and virtual circuit switching, each relating to how paths are set up and used through a network.
- Circuit Switching: A dedicated communication circuit (path) is established between the parties for the duration of a communication session (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). Classic telephone networks use circuit switching. Before any user data is transmitted, a connection setup phase reserves a continuous end-to-end channel (through possibly many intermediate switches). Once established, all data follows this path, and the full bandwidth of the path is available exclusively to that connection until it is torn down (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). Circuit switching provides a steady data rate and low latency once the circuit is up (no routing decisions per packet), making it ideal for real-time voice/video. However, it is inflexible – if the dedicated link is idle, its capacity is wasted, and establishing the circuit incurs setup delay (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). For example, in a telephone call, the seconds between dialing and the call connecting correspond to the network finding and reserving a path (circuit) across the switches. Once connected, a continuous circuit exists, and no other call can use those exact resources even if silence is transmitted. Circuit switching guarantees bandwidth but is inefficient for bursty data traffic (Andrew S. Tanenbaum – Computer Networks).
- Packet Switching (Datagram Switching): In a packet-switched network, no dedicated path is reserved. Instead, all data is sent in discrete units called packets that are routed independently, possibly each via different paths, from source to destination (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). The Internet is the prime example of a packet-switched network using datagram switching – each IP packet carries the destination address and is forwarded by each router based on its routing table, with no advance reservation. Packet switching is very efficient for bursty traffic: network links are shared by many flows, and if one host isn’t sending, others can use the capacity (Andrew S. Tanenbaum – Computer Networks). It is also more fault-tolerant: if a particular link or node goes down, packets can be rerouted dynamically around the problem (Andrew S. Tanenbaum – Computer Networks). However, packet switching provides no built-in timing or bandwidth guarantees – packets can experience variable delay or be dropped if the network is congested. For example, in the Internet, your data (whether an email or video stream) is broken into packets that may travel different routes and arrive out of order, to be reassembled by the receiver. There is no initial setup phase; the first packet you send can go out immediately, which gives low initial delay, but each router makes a forwarding decision for each packet, which can introduce per-packet overhead and variable queuing delay.
- Virtual Circuit Switching: This is a hybrid approach. Like circuit switching, a connection is first established before data transfer, but the connection is logical – no fixed bandwidth is reserved, and data still travels in packets. When a virtual circuit (VC) is set up, each router along the path creates an entry in a table, and the connection is identified by a virtual circuit identifier (VC ID) rather than by full destination addresses in every packet (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). Once the virtual circuit is established, packets (often called cells or frames in this context) carry a short VC ID, and routers forward them along the predetermined path. This avoids having to do a full routing lookup for every packet; it’s as if a temporary circuit exists, but it shares link capacity with other circuits (statistical multiplexing). X.25 and Frame Relay are classic examples of virtual-circuit packet networks, and ATM (Asynchronous Transfer Mode) is a well-known technology that uses small fixed-size packets (cells) with virtual circuits (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). The advantage is that you can get some of the benefits of circuit switching (fixed path, possibility of reserving resources for QoS) while retaining the flexibility of packet switching (many virtual circuits can multiplex over one physical link). Virtual circuits ensure all packets for a connection arrive in order (since they follow the same route) and can guarantee quality of service if the network pre-allocates bandwidth for that circuit. However, they share a downside of circuit switching: if a router on the path fails, the virtual circuit is broken (all VCs through that node are lost) (Andrew S. Tanenbaum – Computer Networks), and a new connection setup is needed. Also, the setup phase adds delay for the first packet. Modern MPLS networks (used by ISPs) are a form of virtual circuit switching layered atop IP – an ingress router assigns a label, and subsequent routers forward based on the label through a predefined path.
Key differences and use cases: Circuit switching is well-suited for constant bit rate streams like traditional phone calls. Packet switching (datagram) is well-suited for data that can tolerate variable delays and for bursty communications – it underpins the Internet’s design of sharing infrastructure among many users efficiently (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). Virtual circuits often appear in carrier networks or enterprise backbones where one wants the control of a circuit (and possibly resource reservation) with the flexibility of packets – ATM and Frame Relay were used in telecom networks to carry mixed voice/data. The layered architecture of the Internet can accommodate any of these switching methods at the lower layers. For example, the Network layer (IP) can run over a circuit-switched core (as was done in early ARPANET experiments, or IP over optical circuits), or over a virtual-circuit WAN (like MPLS or ATM) – from IP’s perspective, it is just carrying packets, regardless of whether the link between two routers is maintained by a VC or a true circuit.
In summary, circuit switching sets up a fixed path with reserved resources (good for consistent service, bad for efficiency with idle times) (Andrew S. Tanenbaum – Computer Networks), packet switching sends each packet individually (efficient and robust, but variable performance) (Andrew S. Tanenbaum – Computer Networks), and virtual circuits create a logical flow for packets (allowing per-flow routing decisions and potential guarantees, at the cost of setup and less flexibility in rerouting on failure) (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). Each method relates to layered design in that circuit or VC setup may occur at a lower layer (e.g., a VC at layer 2, like an ATM virtual circuit, carrying layer 3 packets), while pure packet switching is the norm at the network layer in the Internet (IP). The transport and application layers remain largely unaffected by whether the underlying network is circuit or packet switched – they just see a data pipe with certain performance characteristics. For instance, TCP was originally designed for packet switching and assumes losses can happen; it would not need to handle loss recovery on a true circuit (where losses ideally don’t occur once the circuit is established). Thus, switching is a consideration in the design of the network layer and below, impacting how routes are formed and how resources are allocated across the network.
2. Data Link Layer
The Data Link layer (Layer 2 in OSI) is responsible for reliable transmission of data across a single link or local network segment. It takes raw bits from the Physical layer and packages them into structured frames for delivery to the next directly-connected node. Key functions of the data link layer include framing, addressing (using MAC addresses on LANs), error detection/correction, and medium access control on shared media. It also may provide flow control on the link and segmentation of packets into frame-sized units.
2.1 Framing and Error Detection
Framing: Data link layer protocols break the continuous stream of bits from the physical layer into discrete blocks called frames (Data Communication and Networking | PPT). Framing enables the receiver to detect frame boundaries – where a packet of data begins and ends – so that any errors can be contained to that frame and not corrupt an endless stream. There are several common framing methods:
- Byte-Oriented Framing: Frames may start and end with special byte sequences. For example, the older BISYNC protocol used special control bytes (SYN and ETX) to delineate frames. If the binary data inside a frame could accidentally contain the special marker, a technique called byte stuffing (or character stuffing) is used: an escape byte is inserted before any occurrence of the marker within the data payload, so the receiver can distinguish data from frame delimiters (Various kind of Framing in Data link layer – GeeksforGeeks). For instance, in PPP (Point-to-Point Protocol), 0x7E is used as a frame delimiter; if that value appears in the payload, PPP inserts an escape byte (0x7D) and XORs the byte to avoid misinterpretation ([PDF] Link Layer – University of Washington).
- Bit-Oriented Framing: In protocols like HDLC, frames begin and end with a flag bit pattern (e.g., 01111110 in binary). To prevent this pattern from appearing in the data, bit stuffing is used: whenever the sender’s data link layer encounters five consecutive
1
bits in the data, it automatically inserts a0
bit (Various kind of Framing in Data link layer – GeeksforGeeks). This ensures that a flag sequence (which is01111110
) is not falsely formed within the payload. The receiver, after seeing the flag and reading the frame, similarly scans the bits and removes any instance of a0
that follows five1
s (knowing those were stuffed) (Various kind of Framing in Data link layer – GeeksforGeeks). This way, the genuine flag sequences delineating the frame are recognized. Bit stuffing is more general than byte stuffing because it doesn’t depend on byte boundaries and thus can be used in synchronous serial transmissions. - Length Counts: Some protocols include a length field in the frame header (e.g., Ethernet). The header’s length field tells how many bytes are in the frame. The receiver just reads that many bytes for the frame payload. This avoids needing special sentinel values, but it relies on correct length counts (if corrupted, the frame boundary can be lost until resynchronization). Modern Ethernet actually uses both a length field and special physical-layer coding (preambles) to mark frame start.
No matter the method, the goal is that the link layer can mark frame boundaries so the receiver can detect where each frame ends. This is crucial for error containment: if a bit error garbles one frame, the error should ideally not propagate into confusion about where the next frame begins.
Error Detection: To ensure reliable communication, data link frames include error detection codes. The most common are checksums and CRC (Cyclic Redundancy Check) codes:
- Checksum: A checksum is typically the arithmetic sum of all data bytes (or words) in the frame, often with some truncation (like taking the ones-complement of the sum in 16-bit slices, as in IP or TCP/UDP checksums). The checksum is computed by the sender and placed in the frame header or trailer; the receiver recomputes it and compares to detect errors. Checksums are simple and catch many errors, but they are not as robust as CRCs for certain patterns of corruption (especially burst errors). For example, IP headers use a 16-bit ones-complement checksum, which will catch single-bit errors and some multiple-bit errors, but certain combinations of errors can escape detection with low probability.
- CRC: A cyclic redundancy check is a powerful error-detection code based on polynomial division (Cyclic Redundancy Check Example | Gate Vidyalay) (Cyclic Redundancy Check Example | Gate Vidyalay). The data bits of the frame are treated as coefficients of a binary polynomial; the sender divides this polynomial by a agreed-upon generator polynomial, and the remainder of the division is the CRC value appended to the frame. The receiver performs the same division and checks if the remainder matches (or is zero under a certain formulation) (www.ece.unb.ca) (www.ece.unb.ca). Properly chosen generator polynomials can detect common error patterns with very high reliability. For instance, a standard CRC-32 (used in Ethernet) can detect all single-bit and two-bit errors, and all burst errors up to 32 bits long (the degree of the polynomial) (Cyclic Redundancy Check Example | Gate Vidyalay). In general, a CRC will always detect any burst error shorter than the CRC’s length (e.g., CRC-32 catches any burst of 32 or fewer bits in error) (Cyclic Redundancy Check Example | Gate Vidyalay). It also detects longer burst errors with high probability (the chance a random error goes undetected is about 1 in $2^{n}$ for an n-bit CRC) (Checking the error detection capabilities of CRC polynomials). For example, a typical CRC can guarantee detection of all single-bit errors, all double-bit errors (if the generator has at least three 1s) (Cyclic Redundancy Check Example | Gate Vidyalay), and all odd numbers of bit errors (if the generator polynomial has $(x+1)$ as a factor) (Cyclic Redundancy Check Example | Gate Vidyalay). Ethernet’s CRC-32 has a Hamming distance of 4, meaning it can detect up to 3 arbitrary bit errors per frame with certainty, and most 4-bit errors as well (Error Detection with the CRC). The CRC is placed in the frame trailer (e.g., Ethernet’s 32-bit Frame Check Sequence) so it covers the header as well as payload. On reception, if the CRC calculation doesn’t match, the frame is assumed corrupted and is discarded (or a negative acknowledgment can be sent in protocols that support ACK/NACK at the data link).
CRC is a core mechanism for error detection because of its combination of efficiency (implemented with simple shift-XOR circuits in hardware) and robust coverage of likely errors (Cyclic Redundancy Check Example | Gate Vidyalay) (Cyclic Redundancy Check Example | Gate Vidyalay). Higher-layer protocols or transport layers might do additional error checking, but catching errors early at the link layer (and discarding bad frames) prevents wasted effort up the stack.
Reliable Transmission vs Detection: Most link layers (like Ethernet) perform error detection and simply drop bad frames – they rely on higher layers (like TCP) to retransmit if needed. However, some data link protocols (especially older ones or those for unreliable media like wireless) also implement error correction or retransmission at layer 2. For example, Wi-Fi’s link layer uses ARQ (Automatic Repeat reQuest) – it sends ACKs for received frames and will retransmit frames if no ACK is received. Similarly, older point-to-point protocols (like HDLC in reliable mode, or the modem protocol V.42) had their own framing, CRC, and retransmission scheme to assure error-free delivery over a noisy line. These decisions depend on the context: a high-speed LAN like Ethernet chooses to keep link-layer simple (detection only), whereas a link across a very error-prone channel might include link-layer retransmissions to offload that burden from higher layers.
Importance of framing for reliable link communication: Without framing, a single bit error could throw off the alignment of the bit stream, potentially causing the receiver to lose track of packet boundaries entirely. Framing limits the scope of errors: even if a frame is corrupted, the receiver can resynchronize at the next frame boundary (using known delimiters or length). It also allows frame-level recovery – for instance, a single bad frame can be retransmitted rather than the entire stream. In summary, framing delineates messages, and error detection codes (CRC/checksum) ensure the integrity of each frame, together providing the foundation for reliability on each link.
2.2 Medium Access Control (MAC)
When multiple devices share a common transmission medium (as in LANs like Ethernet, Wi-Fi, etc.), the Data Link layer must also address who gets to use the medium when – this is the role of Medium Access Control (MAC). A MAC protocol coordinates access to avoid (or handle) collisions (simultaneous transmissions) on the shared channel.
Ethernet and CSMA/CD: Classic Ethernet (10BASE-5, 10BASE-2, etc., using coaxial cable or a shared hub) is the canonical example of MAC with its CSMA/CD (Carrier Sense Multiple Access with Collision Detection) protocol. In CSMA/CD, each station senses the carrier (the electrical signal on the wire) to see if the medium is free before transmitting (Difference Between OSI Model and TCP/IP Model – GeeksforGeeks). This is the CS (Carrier Sense) part. If the line is idle, a station may begin transmitting its frame. If two stations happen to transmit at nearly the same time (a collision), the signals will interfere. Ethernet NICs can detect the collision by monitoring the signal (the CD part) – the voltage or signal shape will deviate from what it’s sending. Upon detecting a collision, the station immediately stops transmitting and sends a jamming signal (to ensure all colliding parties detect it), then enters a binary exponential backoff algorithm (Difference between Circuit Switching and Packet Switching – GeeksforGeeks) (Difference between Circuit Switching and Packet Switching – GeeksforGeeks). Backoff means each station waits a random time before re-attempting to transmit, doubling the range of the random wait interval after each successive collision for the same frame. This greatly reduces the probability of repeated collisions (Difference between Circuit Switching and Packet Switching – GeeksforGeeks).
CSMA/CD relies on the medium having the property that collisions can be detected and that propagation delay is bounded (Ethernet defines a maximum network diameter so that a sender knows a transmission was successful if no collision is detected within the first 512 bits time – the slot time). If a collision occurs, it happens early (within the first slot time of the frame) (Andrew S. Tanenbaum – Computer Networks). Therefore, Ethernet frames have a minimum size (64 bytes for 10/100/1000 Mbps Ethernet) to ensure even the smallest frame transmission lasts long enough to detect a collision across the network diameter.
For modern Ethernets, which are usually switched (each link is full-duplex dedicated to one device and a switch port), CSMA/CD is no longer in common use – because full-duplex links don’t have collisions. But the concept remains fundamental and is used in half-duplex links or older networks.
Other MAC examples:
- Wireless (CSMA/CA): Wi-Fi cannot detect collisions reliably (because a node can’t transmit and listen effectively at the same time on a radio channel). Instead, 802.11 uses CSMA/CA (Collision Avoidance): a station waits until the channel is free, then waits an additional random backoff before transmitting to reduce collision probability. It also uses acknowledgments to infer collisions/loss (no ACK means likely collision, so retransmit). Optionally, Wi-Fi can use RTS/CTS handshakes to reserve the medium for a transmission to mitigate the hidden terminal problem.
- Token Passing: Some networks use a token – a special frame circulating that grants permission to transmit. For instance, Token Ring (IEEE 802.5) and Token Bus (802.4) used a token passing MAC. A station can only send when it has the token; after sending it passes the token to the next. This avoids collisions entirely and can provide fair access, but the mechanism is more complex and failures (lost token) need to be handled.
- TDMA/FDMA: In some systems (like cellular networks), MAC is achieved by dividing the medium into time slots or frequency channels allocated to different users (not typically part of “data link” in a LAN sense, but conceptually similar resource sharing).
In the context of typical LANs, Ethernet MAC is most relevant. The original Ethernet’s CSMA/CD made Ethernet a distributed protocol with no central arbiter – any node could attempt to send at will and the algorithm would resolve conflicts statistically. This contributed to Ethernet’s simplicity and success.
MAC addressing: At the data link layer, particularly for LANs, each network interface has a MAC address (Media Access Control address), which is a hardware address (e.g., 48-bit Ethernet address) used to identify the sender and receiver of frames on the local link. The MAC sublayer of the Data Link layer ensures frames are delivered to the correct device. Ethernet uses these addresses in each frame header. A switch or bridge uses MAC addresses to forward frames appropriately (more in next section). In broadcast networks like Ethernet or Wi-Fi, MAC addressing allows filtering: each NIC generally ignores frames not addressed to it (except broadcast or multicast). The MAC protocol and addressing together enable multiple nodes to share a medium while allowing intended recipients to receive the data.
In summary, the MAC sublayer is crucial when a link is shared: it implements rules (like CSMA/CD) so that multiple devices can coordinate their transmissions and it provides a addressing scheme (MAC addresses) to distinguish senders/receivers on that link (Difference Between OSI Model and TCP/IP Model – GeeksforGeeks) (Differences between TCP and UDP – GeeksforGeeks). In Ethernet’s case, the MAC protocol (CSMA/CD) was fundamental in the era of coaxial and hub-based networks; today, switched Ethernets mean collisions are rare, but the MAC framing and addressing remain, and the concept of multiple access still applies in Wi-Fi, cable networks, etc., albeit with different algorithms.
2.3 Ethernet Bridging
As networks grow, one may connect multiple LAN segments to extend the network. An Ethernet bridge (today commonly called a switch) operates at the Data Link layer to forward frames from one LAN segment to another, effectively joining them into a larger LAN. Bridging is often contrasted with routing (Network layer forwarding). Let’s clarify what bridging does and how it differs from routing:
Conceptual overview: A bridge/switch has multiple network interfaces (ports) each connected to a LAN segment. It learns which MAC addresses reside on which port by observing source addresses of frames (3.1: Switching and Bridging – Engineering LibreTexts) (3.1: Switching and Bridging – Engineering LibreTexts). It then uses this forwarding table to decide how to forward frames:
- If a frame arrives with destination MAC address that the bridge knows is on the same port it came from, the bridge filters (drops) the frame – no need to forward, as it’s destined for the same segment.
- If the destination is known on a different port, the bridge forwards the frame to that specific port.
- If the destination is unknown (not in the table) or is a broadcast/multicast, the bridge floods the frame, i.e., sends it out on all other ports (3.1: Switching and Bridging – Engineering LibreTexts).
Bridges thereby separate traffic so that frames only reach the segments where they need to go. This reduces unnecessary load compared to a single long coax or hub (which would propagate all traffic everywhere). Crucially, an Ethernet bridge operates on MAC addresses and does not modify the frames’ content (except possibly the transit delay or an optional frame check sequence recomputation in some cases). In OSI terms, a bridge is a pure Layer-2 device: it forwards based on Data Link layer addresses.
The learning process is what makes bridges plug-and-play. Initially, a bridge’s forwarding table is empty (3.1: Switching and Bridging – Engineering LibreTexts). As frames come in, the bridge notes the source MAC and the port it came on, adding an entry (“MAC X is reachable via port 1”) (3.1: Switching and Bridging – Engineering LibreTexts) (3.1: Switching and Bridging – Engineering LibreTexts). These entries have a timeout (aging out after some minutes) (3.1: Switching and Bridging – Engineering LibreTexts) to accommodate devices moving around. Over time, the bridge learns the topology. This is known as a learning bridge (3.1: Switching and Bridging – Engineering LibreTexts) (3.1: Switching and Bridging – Engineering LibreTexts).
Bridging differs from routing in several key ways (3.1: Switching and Bridging – Engineering LibreTexts) (3.1: Switching and Bridging – Engineering LibreTexts):
- Bridges forward frames based on MAC addresses (Layer 2), whereas routers forward packets based on IP addresses (Layer 3). MAC addresses are flat and of limited scope (unique on a LAN), while IP addresses are hierarchical and routable globally.
- Bridges do not decrement a TTL or perform fragmentation; they just relay frames. A frame’s contents (including the Network layer packet inside) are not inspected except for the MAC header.
- Bridges make a LAN larger (extended LAN) but the whole bridged domain is a single broadcast domain. That means broadcast frames (or unknown unicast frames) will be flooded across the entire extended LAN. In contrast, routers by default do not forward broadcasts, thereby segmenting broadcast domains. This makes bridged networks potentially suffer from broadcast traffic if too large.
- Routing can connect different network-layer protocols or different addressing schemes; bridging requires the segments to share the same protocol at layer 2 (e.g., you can’t directly bridge Ethernet to Token Ring unless you have some adaptation, because their frame formats differ). In practice, bridging is usually used to join identical link layers (like multiple Ethernets).
- Bridges tend to be simpler, making forwarding decisions quickly using a hash table lookup for the MAC. Routing is more complex (longest prefix matching in IP, etc.) and can involve more overhead per packet, but it operates at a higher level with more network topology information.
Ethernet bridging and data link forwarding: Bridges make forwarding decisions using the data link header (the MAC destination). This is why bridging is often called “Layer-2 switching.” It is transparent to higher layers – the IP host doesn’t know whether the other host is on the same segment or across a bridge; if the destination is in the same IP subnet, the frame will reach them via bridges if necessary, without involving IP routing. In a sense, bridges propagate frames as if all the bridged segments were one big Ethernet. This creates transparent bridging: hosts don’t need to be aware of the topology. The Spanning Tree Protocol (STP) is used by switches to avoid loops in the topology that would cause endlessly circulating frames. STP computes a tree that spans all switches, blocking some ports if there is a cycle, ensuring only one active path between any two LAN segments. This is necessary because Ethernet has no hop count in frames and bridges flood broadcasts – a loop would cause broadcast storms. Modern switches use Rapid STP or other enhancements, but the idea is to prevent loops while providing redundancy.
So, bridging relates to data link forwarding in that it extends the reach of link-layer connectivity. Instead of limiting an Ethernet to directly connected nodes on one cable, bridges/switches allow large networks of Ethernet segments. Yet to higher layers (like IP), this looks like one IP subnet. When an IP packet goes from one end of a bridged LAN to another, the IP layer is unaware of how many switches it passed – it just sees one link (in terms of IP hop count, it’s still within a single hop). Thus bridging does not isolate network layer domains, whereas routing does.
In summary, an Ethernet bridge/switch learns MAC addresses and forwards frames selectively, confining traffic to where it needs to go (3.1: Switching and Bridging – Engineering LibreTexts) (3.1: Switching and Bridging – Engineering LibreTexts). This reduces collisions and effectively increases total network throughput by creating multiple “collision domains” (each switch port is its own collision domain in a modern switched Ethernet) (3.1: Switching and Bridging – Engineering LibreTexts). Compared to routing, bridging is a lower-level, plug-and-play form of networking – no need to configure IP addresses for it to work. However, large bridged networks can become inefficient due to broadcast propagation and have less control over traffic flows compared to routed networks. Today’s enterprise networks typically use switches for connecting hosts within a subnet, and routers (or Layer-3 switches) to connect different subnets, combining the strengths of both approaches.
3. Routing Protocols & Network Layer Topics
Moving up to the Network Layer (Layer 3), the focus shifts to managing multi-hop communications across potentially many link-layer networks. Key topics include how routers determine paths (routing protocols), how the network layer handles large packets (fragmentation), the addressing scheme used (e.g., IPv4 addressing), auxiliary network-layer protocols like ARP, DHCP, ICMP, and techniques like NAT that extend or modify IP addressing.
3.1 Routing Protocols
In a network with multiple possible paths (like the Internet or any sizable IP network), routing protocols are used by routers to learn and choose the best routes for packets. Two fundamental classes of routing algorithms are distance-vector and link-state, and there are also simpler concepts like flooding and basic shortest path computation to consider:
- Shortest Path Routing: At the heart of routing is the concept of finding a least-cost path (where “cost” might be hops, latency, etc.) between two nodes. Many routing algorithms attempt to find shortest paths. For example, Dijkstra’s algorithm computes the shortest paths from one source to all others given a complete map of the network (used in link-state routing) (Andrew S. Tanenbaum – Computer Networks), and the Bellman-Ford algorithm computes shortest paths in a distributed manner (used in distance-vector routing) (Andrew S. Tanenbaum – Computer Networks). The term shortest path doesn’t necessarily mean geographically shortest, but minimal according to some metric (hop count, delay, etc.).
- Flooding: Flooding is a primitive method where every incoming packet is sent out on every outgoing link except the one it arrived on (Andrew S. Tanenbaum – Computer Networks). This guarantees the packet will reach all nodes (and thus the destination) by sheer brute force, but it generates an exponential number of packets and can overwhelm the network. Pure flooding is rarely used except in controlled circumstances because of its inefficiency. However, controlled flooding is used for specific tasks – e.g., network discovery or as part of other algorithms. For instance, link-state routing uses a controlled flooding process to disseminate link-state packets to all routers (with sequence numbers to limit duplicates) (Andrew S. Tanenbaum – Computer Networks). Flooding ensures information (like topology updates) reaches every router in the network reliably, albeit with overhead that must be managed. Some protocols (like certain wireless ad-hoc network algorithms) may use flooding for route discovery (e.g., to find a path on-demand, a query is flooded and the reply follows the reverse of the discovered path).
- Distance Vector Routing: In distance-vector (DV) routing, each router maintains a table (vector) of the best known distance to every destination and which neighbor to go through (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). Routers periodically exchange their distance vectors with immediate neighbors. Upon receiving neighbors’ information, a router updates its own table using the Bellman-Ford algorithm: “Distance to D via neighbor N” = N’s distance to D + cost to reach N. The router picks the neighbor offering the smallest total distance. This approach is distributed: no router has a complete map initially, but through iterative exchanges, they converge on shortest paths (Andrew S. Tanenbaum – Computer Networks). The classic example is RIP (Routing Information Protocol) which uses distance vector with hop count as the metric (max 15 hops, since 16 is infinity) (Andrew S. Tanenbaum – Computer Networks). Distance-vector is simple and low-overhead, but it has a known issue: the count-to-infinity problem. When a route fails, bad news can propagate slowly and routers can continuously increment distances in a loop (each thinking the other has a route). Techniques like split horizon (don’t report a route back on the interface it was learned), poison reverse (report a failed route with infinite metric immediately), and holddowns are used to alleviate this, but convergence can still be slow in some cases (Andrew S. Tanenbaum – Computer Networks). DV algorithms were used in the early ARPANET and are still used in some interior networks (RIP, IGRP/EIGRP in Cisco, etc.), but for large networks, link-state tends to converge faster. DV is sometimes called the Bellman-Ford algorithm in routing context (Andrew S. Tanenbaum – Computer Networks).
- Link State Routing: In link-state (LS) routing, each router learns the entire network topology by exchanging link-state advertisements (LSAs) that describe its own links (neighbors and the cost to each) (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). The algorithm operates in five steps (Andrew S. Tanenbaum – Computer Networks): (1) Each router discovers its neighbors (via hello packets) (Andrew S. Tanenbaum – Computer Networks), (2) measures the cost (distance) to each neighbor (could be a fixed cost or dynamic like based on delay) (Andrew S. Tanenbaum – Computer Networks), (3) constructs a packet with this link state information, (4) floods these packets to all routers in the network (Andrew S. Tanenbaum – Computer Networks), and (5) each router then independently runs Dijkstra’s shortest path algorithm on the collected topology to find the best route to each destination (Andrew S. Tanenbaum – Computer Networks). The outcome is a routing table of next-hops for every destination. Because every router has a complete view (within one area, for instance), link-state tends to converge faster and is not prone to count-to-infinity problems (it can still have transient loops during convergence, but generally short-lived). Prominent examples of link-state protocols are OSPF (Open Shortest Path First) and IS-IS, both widely used in larger enterprise and ISP networks (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). OSPF, for instance, floods LSAs reliably and uses Dijkstra’s algorithm to compute routes. OSPF also supports multiple metrics, route aggregation, and dividing the network into hierarchical areas for scalability. IS-IS is similar in operation (it was originally for DECnet/OSI but was adapted for IP, and ISPs often use it) (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks). Link-state requires more memory and CPU (each router stores the entire link map and runs computations) and more bandwidth for flooding updates (especially on topology changes, LSAs must be sent network-wide) (Andrew S. Tanenbaum – Computer Networks), but it offers fast convergence and more predictable behavior.
- Hybrid Approaches: Some routing protocols combine ideas. For example, EIGRP (Cisco’s Enhanced IGRP) is often described as a hybrid; it’s distance-vector at core but avoids count-to-infinity with diffusing computations and queries (it doesn’t flood link-states globally, but it has more smart loop avoidance than RIP). At the internet scale, we use path-vector (BGP) which is like distance-vector but with the full AS-level path included to avoid loops.
In practice, within a single organization or autonomous system (AS), an Interior Gateway Protocol (IGP) like OSPF (link-state) or RIP/EIGRP (distance-vector) is run. Between organizations (between ASes), BGP (Border Gateway Protocol) is used, which is a form of path-vector routing (beyond the scope of this summary but essentially a distance-vector that advertises paths and policies, not just raw distances).
Flooding is not used to carry data in modern networks, but interestingly, a form of controlled flooding is used in certain situations like multicast (flood-and-prune algorithms) or discovery protocols. Distance-vector is appreciated for its simplicity in small networks, while link-state scales to larger, dynamic networks due to faster convergence and richer metrics (Andrew S. Tanenbaum – Computer Networks) (Andrew S. Tanenbaum – Computer Networks).
For a concrete example:
- RIP: max 15 hops, updates every 30 seconds (which is slow), uses UDP broadcasts to neighbors. If a route fails, neighbors will find out on next update or via triggered update with infinity metric, but it can take minutes to flush bad routes (hence RIP convergence is slow) (Andrew S. Tanenbaum – Computer Networks).
- OSPF: uses multicast to send LSAs, detects changes quickly (hello packets often every 10 seconds to detect neighbor loss within 40 seconds or faster), and recalculates routes typically within a second or two of a topology change in a well-designed network. It also computes not just shortest path but can incorporate multiple metrics (cost assigned by admin often corresponding to bandwidth). OSPF and IS-IS also support dividing the network into areas to limit the scope of flooding and computation, improving scalability.
In summary, routing protocols enable routers to collectively compute consistent forwarding tables so that each packet takes an efficient path toward its destination. Distance-vector routing shares minimal info (only distances), making it simple but with slower convergence (Andrew S. Tanenbaum – Computer Networks). Link-state routing shares complete topology info, achieving faster and more robust convergence at the cost of overhead (Andrew S. Tanenbaum – Computer Networks). Both are ways to achieve the goal of distributed shortest-path computation. Flooding on its own is not a routing solution but a mechanism often used within these protocols (link-state uses flooding of LSAs; distance-vector might flood a specific query in case of certain route computations as EIGRP does). Modern networks carefully choose and tune routing protocols to ensure quick adaptation to failures and optimal paths for data.
3.2 Fragmentation and IP Addressing
IPv4 Addressing: The Internet Protocol version 4 (IPv4) uses a 32-bit address space, providing about 4.3 billion possible addresses. These addresses are typically written in dotted-decimal format (e.g., 192.168.10.5). Originally, IPv4 addresses were divided into fixed classes (Class A, B, C, etc.) in a scheme called classful addressing:
- Class A: first bit 0 -> network /8 (255.0.0.0 mask), ~16 million hosts,
- Class B: first two bits 10 -> network /16, ~65k hosts,
- Class C: first three bits 110 -> network /24, 254 hosts, and Class D (1110) for multicast, Class E (1111) reserved.
Classful addressing proved inflexible and wasteful – many organizations with Class A or B addresses were assigned far more addresses than needed, while others ran out. To improve allocation efficiency, CIDR (Classless Inter-Domain Routing) was introduced in 1993, allowing arbitrary length network prefixes (IP fragmentation – Wikipedia). With CIDR, an IPv4 network is identified by an address and a mask length (e.g., 172.19.0.0/20 indicates a 20-bit network prefix). This allows for hierarchical aggregation of routes. For example, instead of advertising 16 contiguous Class C networks, an ISP can advertise a single /20 route that covers all of them. CIDR notation (slash format) is now standard. Internally, addresses are paired with a subnet mask (or prefix length) to delineate the network portion vs host portion of the address.
Devices determine if an address is local (same subnet) or remote by comparing the network portion. If the destination is local, ARP (Address Resolution Protocol) is used to find its MAC and the packet is delivered directly; if remote, the packet is sent to a router (default gateway). Subnetting (dividing a larger network into smaller subnets) and supernetting (combining networks for route aggregation) are both facilitated by CIDR.
IP Fragmentation: Different network technologies have different MTUs (Maximum Transmission Units) – the largest packet size they can carry. For example, standard Ethernet has an MTU of 1500 bytes for the IP packet (not counting link headers), some networks may allow larger (Jumbo frames), others smaller (DSL often ~1492, old Ethernet was 576 bytes recommended, etc.). When an IP packet is forwarded and the next link has a smaller MTU than the packet’s size, IPv4 fragmentation may occur (IP fragmentation – Wikipedia) (IP fragmentation – Wikipedia). The router (or sending host) splits the packet into multiple fragments, each fitting within the MTU of the next network (IP fragmentation – Wikipedia). Each fragment becomes its own IP packet, with IP headers largely copied from the original but adjusted:
- The Identification field in the IPv4 header is copied to all fragments to link them together for reassembly (IP fragmentation – Wikipedia). It’s an ID number that was set by the original sender (the IP layer on the source) for that packet.
- The Fragment Offset field indicates the position of this fragment’s data relative to the original packet, in units of 8 bytes (IP Fragmentation and offsets When a packet arrives at a host whose…). (Offsets are multiples of 8 bytes because fragment sizes, except possibly the last, are chosen to be multiples of 8 bytes to keep this alignment.)
- The MF (More Fragments) flag is set on all fragments except the last one (IP fragmentation – Wikipedia). The last fragment has MF=0, indicating it’s the last piece (the “more fragments” flag is not set, meaning no more are coming).
- The total length field in each fragment’s header is set to the fragment’s size (header + data). Typically, all fragments except the final one are the maximum size (MTU of link, minus IP header size).
For example, suppose we have a 3000-byte IP packet (20-byte header + 2980 data) that must traverse an Ethernet with MTU 1500. The IP layer might fragment it into 3 pieces:
- Fragment 0: bytes 0-1479 of data (1480 bytes of data, which is multiple of 8) + 20-byte header = 1500 bytes, offset=0, MF=1.
- Fragment 1: bytes 1480-2959 of data (1480 bytes), header+data =1500 bytes, offset=185 (which is 1480/8), MF=1.
- Fragment 2: bytes 2960-2979 of data (20 bytes left), header+20 bytes data = 40 bytes (plus maybe padding if required?), offset=370 (2960/8), MF=0. These fragments each travel as independent IP packets.
Reassembly is done at the destination host (IPv4 does not require routers to reassemble; routers will forward fragments as needed) (IP fragmentation – Wikipedia) (IP fragmentation – Wikipedia). The receiving IP layer collects all fragments with the same Identification value and source/dest, reconstructs the original data based on offsets, then passes the reassembled packet up to the transport layer. If one or more fragments are lost, the receiver will time-out waiting (and ultimately drop the partial data). IPv4 has no mechanism to ask for a missing fragment; it relies on higher layers (like TCP would time out and retransmit the whole TCP segment, which might span multiple IP packets).
Fragmentation allows interoperability between networks of different MTUs but has downsides: it can lead to inefficient use of bandwidth (small last fragment still carries a full IP header), and if any fragment is lost, the whole packet is lost. It also puts burden on end-hosts to reassemble, requiring memory and complexity. Because of these issues, modern practice tries to avoid fragmentation. One tool is Path MTU Discovery (PMTUD): the sender sets the DF (Don’t Fragment) flag in IP headers and sends packets at what it thinks is the max size. If a router cannot forward because the packet is too large and DF is set, it drops the packet and sends back an ICMP “Fragmentation Needed” message (Type 3 Code 4, an example of an ICMP support protocol) indicating the MTU of the next hop network (IP fragmentation – Wikipedia). The sender can then reduce its segment size and retransmit. This allows discovery of the smallest MTU along the path, so the sender can avoid fragmentation by sending packets no larger than that. PMTUD is commonly used by TCP implementations (which adjust the TCP MSS – Maximum Segment Size – accordingly). IPv6, notably, does not allow routers to fragment packets; IPv6 hosts are expected to use PMTUD to send proper size packets (IPv6 fragmentation, if needed, is only done by source nodes).
IPv4 vs IPv6 fragmentation: In IPv4, routers can fragment packets unless DF is set (IP fragmentation – Wikipedia). In IPv6, routers never fragment; the IPv6 header has no fragmentation fields in normal packets. Instead, if a too-large packet arrives with the “Don’t Fragment” implicit in IPv6 (since all IPv6 are don’t fragment by default), the router drops it and sends ICMPv6 Packet Too Big, similar to PMTUD. IPv6 does have a fragmentation mechanism but only source nodes can originate fragments (with a special Fragment header). This design pushes the responsibility to endpoints and avoids router CPU overhead.
Address Exhaustion and Private IPs: The 32-bit IPv4 space has been largely allocated. Techniques like NAT (discussed next) and reclamation have extended its life. Also, certain address blocks are reserved for private use (not routable on the internet) – e.g., 10.0.0.0/8, 172.16.0.0–172.31.255.255 (172.16/12), 192.168.0.0/16 – per RFC 1918. Organizations use these internally and then use NAT to map to public IPs. This ties into addressing because IP addressing now often involves subnetting for internal networks and NAT to get to the internet.
In summary, IPv4 addressing moved from classful to CIDR to allow flexible prefix lengths and efficient allocation (Difference Between OSI Model and TCP/IP Model – GeeksforGeeks). Fragmentation in IPv4 is a mechanism to cope with MTU differences by splitting packets into smaller pieces (IP fragmentation – Wikipedia). It operates at the network layer and is transparent to transport (except for the potential performance implications). With IPv6 and modern networks, fragmentation is less desired; path MTU discovery is the preferred way to handle MTU heterogeneity, pushing senders to send appropriately sized packets rather than burdening routers with fragmentation tasks.
3.3 IP Support Protocols
Several important protocols support the core Internet Protocol operation, working adjacent to IP in the network layer to handle tasks like address resolution, configuration, and error reporting. Here we highlight ARP, DHCP, and ICMP, which are crucial in IPv4 networking:
- ARP (Address Resolution Protocol): ARP resolves an IP address to a MAC (hardware) address on a local network (Address Resolution Protocol – Simple English Wikipedia, the free encyclopedia). When an IP device wants to send a packet to another IP on the same LAN, it needs the destination’s MAC address (for the Ethernet frame). ARP provides this translation. The process: the sender broadcasts an ARP request: “Who has IP X.X.X.X? Tell Y.Y.Y.Y” (Address Resolution Protocol – Simple English Wikipedia, the free encyclopedia). This request is a link-layer broadcast (MAC ff:ff:ff:ff:ff:ff for Ethernet) carrying the query for the target IP. The host on the LAN with that IP will respond with an ARP reply: “IP X.X.X.X is at MAC ZZ:ZZ:ZZ” (its MAC address) (Address Resolution Protocol – Simple English Wikipedia, the free encyclopedia). The requesting host then caches this mapping in its ARP cache for some time (e.g., a few minutes) to avoid repeated ARPs. ARP is a low-level protocol often considered part of the Link layer-interface to IP (it’s not exactly IP or higher, it sits between Layer 2 and 3). It has no IP headers; ARP messages have their own format and EtherType (0x0806 for ARP). ARP essentially glues IP to Ethernet (or any broadcast link technology). In IPv6, ARP is replaced by Neighbor Discovery Protocol (NDP), which uses ICMPv6 messages. ARP can be a security concern (ARP spoofing attacks, where a malicious host sends fake ARP replies to intercept traffic). But in normal operation, it’s fundamental – e.g., when you ping your local router, your machine ARPs for the router’s MAC, then sends the IP packet.
- DHCP (Dynamic Host Configuration Protocol): DHCP automates the configuration of IP addresses and other network settings for hosts (DHCP (Dynamic Host Configuration Protocol) – CIO Wiki) (DHCP (Dynamic Host Configuration Protocol) – CIO Wiki). When a client (like a laptop or phone) connects to a network, it can use DHCP to obtain an available IP address, the subnet mask, default gateway, DNS servers, etc. The process (assuming IPv4 and the typical “DHCP in 4 packets” known as DORA):
- DHCP Discover: the client, not knowing anything (no IP yet), sends a UDP broadcast (source IP 0.0.0.0, dest IP 255.255.255.255, dest UDP port 67) essentially shouting “I need an IP address.” (DHCP (Dynamic Host Configuration Protocol) – CIO Wiki)
- DHCP Offer: one or more DHCP servers on the network respond (to MAC or broadcast) with an offer: “I can offer you IP X.Y.Z.W with these parameters, for lease time T” (DHCP (Dynamic Host Configuration Protocol) – CIO Wiki). This is sent from server IP (say 192.168.1.1) to the special broadcast or directly to client’s MAC (but using broadcast IP if client has no IP yet, though often it uses client’s MAC at Ethernet and a still-broadcast IP).
- DHCP Request: the client picks one offer (if multiple) and broadcasts a DHCP Request message indicating the server and the offered IP it is accepting (this also serves as a notice to other potential offering DHCP servers that their offer was declined).
- DHCP ACK: the chosen DHCP server sends an ACK, finalizing the lease. It may also NAK if something went wrong (e.g., IP became unavailable).
- ICMP (Internet Control Message Protocol): ICMP is often considered part of the IP layer – it’s used by network devices to send error messages or informational messages regarding IP operations (Internet Control Message Protocol – Wikipedia). ICMP for IPv4 is defined in RFC 792. It is encapsulated in IP packets (protocol number 1 for ICMP). Key uses of ICMP include:
- Destination Unreachable: If a router cannot forward a packet (e.g., no route to host, or administratively prohibited, or fragmentation needed and DF set), it drops the packet and sends an ICMP Destination Unreachable message back to the source indicating the reason (Internet Control Message Protocol – Wikipedia). For example, “Destination network unreachable” or “Port unreachable” (the latter generated by a host if, say, you send a UDP to a closed port – the host will respond with ICMP port unreachable).
- Time Exceeded: This is sent when a packet’s TTL (time-to-live) reaches 0 (meaning it’s been forwarded through so many routers that it expired). Each router decrements the TTL, and if it hits 0, discards the packet and sends ICMP Time Exceeded (TTL exceeded) to the source (Internet Control Message Protocol – Wikipedia). This is the mechanism that traceroute uses: traceroute sends packets with incrementally increasing TTLs to elicit these messages from each hop.
- Echo Request / Echo Reply: These are used by the ping utility. An ICMP Echo Request is sent to a target; if the target is reachable, it should reply with an Echo Reply, containing the same payload (What is ICMP? | Internet Control Message Protocol – Cloudflare). This measures round-trip time and verifies reachability. Ping is essentially an ICMP-level “are you alive?” check. Type 8 is Echo Request, type 0 is Echo Reply.
- Redirect: A router can send an ICMP Redirect to a host to inform it of a better gateway. For example, if host A sends everything to router R1, but R1 notices that for destination B, a closer router R2 would be better, R1 forwards the packet and sends host A an ICMP Redirect message telling it to use R2 for that destination in the future. Hosts receiving redirects will update their routing table for that destination. Redirects are a bit of a security concern and not always honored nowadays.
- Router Advertisement/Solicitation: Part of ICMP (ICMP Router Discovery) though in IPv4 it was optional (in IPv6, ICMPv6 ND performs this function). This allows routers to announce their presence (for hosts to learn default gateway without DHCP). IPv4 Router Advertisements (Type 9) never got as popular, since many used static config or DHCP for gateway. But in IPv6, it’s fundamental.
ICMP is crucial for the network’s self-management – it is how the network layer communicates issues. For example, if your TCP connection is trying to send data and an intermediate network is down, you might get ICMP destination unreachable which signals the TCP stack that delivery is not currently possible. ICMP is considered a control protocol, not meant for user data. As such, it is often rate-limited or filtered to avoid abuse (like ping floods or using ICMP for covert channels). Still, basic tools like ping and traceroute are indispensable diagnostic uses of ICMP (What is ICMP? | Internet Control Message Protocol – Cloudflare) (Internet Control Message Protocol (ICMP) – GeeksforGeeks).
Workflow examples:
- When a host boots, it might use DHCP to get an IP (DHCP involves broadcast at link layer, UDP/IP at network layer).
- Once configured, if it wants to talk to a new host on LAN, it uses ARP to get the MAC (link-layer) (Address Resolution Protocol – Simple English Wikipedia, the free encyclopedia).
- If it wants to visit an internet site, it sends to its default gateway’s MAC (after ARPing for it). If the packet is too large for the next hop, the router might fragment it (unless DF) (IP fragmentation – Wikipedia). If DF is set and fragmentation needed, router sends ICMP “Frag needed” back (IP fragmentation – Wikipedia). The host’s TCP might reduce MSS.
- As it travels, if a router is down, the upstream router might send ICMP dest unreachable to source (Internet Control Message Protocol – Wikipedia). If it goes through 30 hops and loops, eventually TTL=0 and ICMP time exceeded comes back (Internet Control Message Protocol – Wikipedia) (traceroute uses that by sending packets with increasing TTLs).
- Meanwhile, if another device comes online in the LAN needing IP, it does its own DHCP. Possibly the DHCP server’s offer avoids the IPs in use (it might ping an address to ensure it’s free before offering).
- ARP is also used occasionally by hosts gratuitously (a host may ARP for its own IP on boot to detect if an IP conflict exists – if it gets a reply, that means someone else has same IP).
These supporting protocols operate mostly behind the scenes, but they are essential: without ARP, IP wouldn’t know how to send on local networks; without DHCP, configuring devices would be a huge manual chore; without ICMP, troubleshooting network problems and the network’s ability to report errors would be severely hampered. They are considered part of the “suite” of TCP/IP protocols and typically implemented in the operating system’s network stack.
3.4 Network Address Translation (NAT)
Motivation: By the mid-1990s, it was clear that IPv4’s address space would not suffice to give every device a unique public IP. One alleviating strategy (besides IPv6) was Network Address Translation (NAT) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium). NAT allows multiple devices in a private network (using private IP addresses) to share one (or a few) public IPv4 addresses for external communication (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium). NAT breaks the end-to-end addressing model: packets are modified in-transit so that from the perspective of the external internet, all traffic from the private network appears to come from the NAT device’s public address.
How NAT works: A NAT device (typically an Internet gateway router for a home or office) has at least two interfaces – one on the private side (e.g., 192.168.0.1/24) and one on the public side (e.g., ISP assigned IP). Internally, hosts have addresses like 192.168.0.100, .101, etc. When a host 192.168.0.100 sends a packet to some internet server (say 8.8.8.8) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium), the NAT router will rewrite the source address of the IP packet to its own public IP (e.g., 203.0.113.5) and send it out. It records this mapping (192.168.0.100:port -> 203.0.113.5:port) in a NAT translation table. The destination (8.8.8.8) sees a packet from 203.0.113.5 and responds to that. When the response comes back, the NAT device looks up the destination port (which was mapped) and translates the destination address back to the internal host, then forwards it to 192.168.0.100. Thus, multiple internal hosts can share one external address by using distinct port numbers for their traffic (this is NAPT – Network Address and Port Translation, the most common form of NAT) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium).
There are a few types of NAT configurations:
- Static NAT: a one-to-one mapping of one private IP to one public IP (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium). For example, 10.0.0.5 is always mapped to 198.51.100.5 externally. This doesn’t save addresses (it’s basically just a proxy), but it can be used to expose an internal server at a specific public IP. Static NAT doesn’t involve port translation typically (sometimes called DNAT for destination NAT in some contexts).
- Dynamic NAT: a pool of public IPs is available and internal addresses are mapped to one of those public IPs when needed (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium). For instance, 100 internal users share 10 public IPs; at any time up to 10 can go out concurrently and get assigned an IP from the pool (others might be blocked or wait). This is less common now than the port-level NAT.
- PAT (Port Address Translation) aka NAPT (Network Address Port Translation) or colloquially just NAT overload: multiple private IPs share a single public IP by distinguishing flows via port numbers (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium). This is what virtually all home routers do. It is sometimes just called “NAT” because it’s ubiquitous. The router will use unique source port numbers for each connection (or a combination of source IP and port if multiple public IPs) to keep mappings separate. E.g., 192.168.0.100: TCP12345 -> 203.0.113.5: TCP40000 on the outside. 192.168.0.101: TCP12345 could simultaneously be mapped to 203.0.113.5: TCP40001, etc. The NAT table might look like:
- 203.0.113.5:40000 -> 192.168.0.100:12345
- 203.0.113.5:40001 -> 192.168.0.101:12345 This way, replies destined to 203.0.113.5:40000 go to .100, and to 40001 go to .101.
Impact on connectivity: NAT was a pragmatic solution to address exhaustion and has security side-effects. By default, when NAT is used, external hosts cannot directly initiate connections to internal hosts, because the internal addresses are not globally routable and the NAT router has no mapping until the internal side talks first. This is often seen as providing a basic firewall function – it “hides” the internal network (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium). Only replies to internal-initiated requests are admitted. To allow an external host to reach an internal server behind NAT, one must set up port forwarding (static NAT mapping of a specific port to an internal IP:port). For example, forward the NAT device’s port 80 to 192.168.0.200:80 to host a webserver. Without such configuration or an external rendezvous (as used by peer-to-peer NAT traversal techniques), internal nodes can’t accept unsolicited incoming connections. This breaks the end-to-end transparency of the internet. Protocols that carry IP/port information inside them (like FTP’s active mode, SIP for VoIP, certain multiplayer games, etc.) also break unless NAT is protocol-aware or extra measures are taken, because NAT will only rewrite the IP headers by default, not payload. Solutions include ALG (Application Layer Gateways) on NAT that inspect and fix payload (e.g., a FTP ALG that sees the PORT command with internal IP and rewrites it) or using protocols like STUN/TURN/ICE for NAT traversal in VoIP.
NAT also complicates protocols like IPSec which sign the IP headers (AH) – because NAT changes them, it invalidates the signature. Hence NAT traversal modes exist for VPNs (encapsulating IPsec in UDP).
Despite issues, NAT has been extremely widely deployed. It effectively created a two-tier internet: the global side with unique IPs and the private side. Most home and business networks are on private IPv4 with NAT at the edge. One positive side effect is a degree of isolation/security: an attacker from outside cannot directly target an internal host unless a port is forwarded or the internal host initiated something (this is often regarded as the NAT firewall effect, although one should still run a proper firewall – but NAT does drop unsolicited packets by default since no mapping).
NAT and the end-to-end argument: The original design of IP assumed any host could contact any other directly. NAT violates this; it requires intermediate translation. This raised concerns: for example, some applications must know their external IP (to advertise to peers). Techniques like UPnP IGD and NAT-PMP were created to let internal devices program the NAT to open ports or learn their external address. Nevertheless, NAT is entrenched.
From the perspective of layered architecture: NAT happens at the network layer (or boundary between network and transport, since it alters port numbers too). It’s sort of a middleware between the internal network’s IP addresses and external.
Summary: NAT is a method of remapping address spaces by modifying IP addresses (and port numbers) in transit (Network Address Translation – Teltonika Networks Wiki). It conserves global addresses by allowing networks to use private addressing internally (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium). Types include static (1:1), dynamic (pool) and PAT (many:1 using ports) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium), with PAT being most common. It improves IPv4 longevity and can increase security by hiding hosts, but at the cost of breaking direct end-to-end connectivity and requiring workarounds for certain protocols (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium) (Network Address Translation. Network Address Translation (NAT) is a… | by Tushar Patel | Medium). NAT is a cornerstone of IPv4 networking today, bridging the gap until broader IPv6 adoption can restore end-to-end addressing.
4. Transport Layer
The Transport Layer is responsible for end-to-end communication between applications across the network. It builds on the network layer’s host-to-host delivery to provide additional functionality like reliability, ordering, multiplexing (ports), and flow/congestion control. The primary transport protocols in the internet are UDP (User Datagram Protocol) and TCP (Transmission Control Protocol), which offer different trade-offs. We’ll discuss how transport handles flow and congestion control (mainly in TCP), then differentiate UDP vs TCP and the notion of sockets which is the API for using transport services.
4.1 Flow Control and Congestion Control
Although both involve controlling the rate of data sending, flow control and congestion control address different problems:
- Flow Control is an end-to-end mechanism to prevent the sender from overwhelming the receiver. The classic approach used by TCP is a sliding window protocol with a receive window (rwnd) advertised by the receiver (How Flow Control is Achieved in TCP? – GeeksforGeeks). In every TCP segment acknowledgment (ACK), the receiver includes an advertised window size telling the sender how many more bytes it can send beyond the last acknowledged byte (How Flow Control is Achieved in TCP? – GeeksforGeeks). This ensures the sender does not send data that the receiver’s buffer cannot accept. For example, if a receiver has a 16 KB buffer and currently has 8 KB of unread data, it may advertise rwnd = 8 KB, meaning the sender can send at most 8 KB more. As the receiver application reads data and frees up space, it increases the window in subsequent ACKs (How Flow Control is Achieved in TCP? – GeeksforGeeks). If the receiver’s buffer fills (perhaps the application is slow to read), it might advertise rwnd = 0, telling the sender to pause sending (How Flow Control is Achieved in TCP? – GeeksforGeeks). TCP handles this gracefully (the sender will stop and periodically probe by sending a 1-byte packet to see if the window opens — this is the persist timer mechanism to avoid deadlock if a window update got lost) (How Flow Control is Achieved in TCP? – GeeksforGeeks). Flow control is essential for receiver-side resource management. It operates on a principle of don’t outrun the slowest consumer. It’s not about the network capacity, but the endpoint capacity.
- Congestion Control is about preventing too much data from being injected into the network such that it causes network congestion (i.e., router/switch buffers filling up, causing packet loss and high delays). This is a network-centric issue. TCP’s congestion control (as per algorithms by Van Jacobson et al.) treats packet loss (or substantial delay) as a sign of congestion and slows down, whereas successful receipt of ACKs suggests network capacity is available, so it speeds up gradually (Additive increase/multiplicative decrease – Wikipedia) (Additive increase/multiplicative decrease – Wikipedia). The core behavior of standard TCP congestion control is the AIMD principle – Additive Increase, Multiplicative Decrease (Additive increase/multiplicative decrease – Wikipedia). TCP maintains a congestion window (cwnd), which is an internal limit on how much data (in bytes) can be in flight (unACKed) at any time, governed by perceived network capacity. The rules, roughly:
- Start with cwnd = 1 MSS (maximum segment size) or a small multiple. Each ACK received increases cwnd by 1 MSS (this is the slow start phase, which actually increases cwnd exponentially fast – effectively doubling cwnd each RTT, because for each ACK cwnd += 1 MSS, and if cwnd was N, there are N ACKs per RTT, so cwnd becomes ~2N after one RTT) (Additive increase/multiplicative decrease – Wikipedia).
- Once cwnd surpasses a threshold (ssthresh), TCP enters congestion avoidance phase, where it increases cwnd more slowly: by roughly 1 MSS per RTT (additive increase) (Additive increase/multiplicative decrease – Wikipedia) (Additive increase/multiplicative decrease – Wikipedia). In practice, TCP does cwnd += MSS * (MSS/cwnd) for each ACK, which results in cwnd increasing by 1 MSS per RTT collectively.
- If a packet loss is detected (either by timeout or nowadays often by triple duplicate ACKs), it interprets that as congestion signal and multiplicatively decreases cwnd. Classic TCP Tahoe would set cwnd to 1 MSS (and restart slow start). TCP Reno and newer variants set cwnd to half of its previous value (multiplicative decrease by factor 2) and then enter congestion avoidance (linear increase) from that reduced level (AIMD congestion window halving – Stack Overflow) (Additive increase/multiplicative decrease – Wikipedia). Also, ssthresh is updated to half of cwnd at loss time.
- This leads to the characteristic saw-tooth behavior of TCP: cwnd grows linearly until loss, then drops and repeats (Additive increase/multiplicative decrease – Wikipedia) (Additive increase/multiplicative decrease – Wikipedia). Over time, the average cwnd is roughly half of the maximum reached, which statistically shares bandwidth reasonably fairly between flows.
In TCP, both flow control and congestion control together determine the effective sending window. Typically, the effective window = min(rwnd, cwnd)
. The sender cannot send more than the lower of what receiver allows (rwnd) and what congestion control allows (cwnd). So if the receiver is very fast but the network is limited, cwnd will limit throughput (network bound). If the network is fine but receiver is slow (small rwnd), that will limit throughput (receiver bound). TCP also has mechanisms like Nagle’s algorithm to coalesce small sends to avoid too many tiny packets (to improve efficiency and also indirectly avoid some congestion from tinygram flooding), and Silly Window Syndrome avoidance on the receiver side (don’t advertise too small an increase in window).
To illustrate: Suppose a connection starts, cwnd=1 MSS, rwnd=64 KB. TCP will do slow start: cwnd doubles each RTT (1,2,4,8,… MSS) until perhaps a loss occurs or cwnd reaches rwnd or some threshold. If it reaches rwnd=64KB first, then the sender is limited by the receiver (flow control) and will stop increasing even if network could handle more. If instead a loss happens when cwnd=32KB, then maybe ssthresh is set to 16KB, cwnd resets to 1 or 16KB depending on implementation, etc. The exact algorithms have many variants (TCP Tahoe, Reno, NewReno, Vegas, CUBIC, etc.), but AIMD is the common principle for loss-based congestion control (Additive increase/multiplicative decrease – Wikipedia). AIMD is proven to be stable and to converge to fairness between flows under certain assumptions (Multiplicative Decrease – an overview | ScienceDirect Topics) (Additive increase/multiplicative decrease – Wikipedia): each flow probes bandwidth by additive increase, but if they all cause a loss, they all drop roughly in proportion to their rate (multiplicative decrease), which tends to share the link fairly.
Explicit Congestion Notification (ECN): In addition to packet loss as a signal, some networks use ECN where routers mark packets instead of dropping them when they are experiencing congestion (queue growing). TCP receivers then notify the sender of the ECN marks (via ACK), and the sender reacts as if a loss occurred (multiplicative decrease) but without actual loss (Additive increase/multiplicative decrease – Wikipedia). This can improve performance by avoiding retransmissions.
In summary, flow control (like the TCP receive window) protects the receiver (How Flow Control is Achieved in TCP? – GeeksforGeeks), and congestion control (like TCP’s cwnd adjustment) protects the network (Additive increase/multiplicative decrease – Wikipedia). TCP’s implementation of these allows it to be a good citizen in sharing network bandwidth and adapting to varying conditions automatically, while ensuring the receiving host isn’t overrun with data it can’t process. Non-TCP protocols (UDP flows) must implement their own controls if they want to be friendly (e.g., many media streaming protocols implement rate adaptation which is effectively a form of congestion control by adjusting video quality/bitrate based on measured loss or delay).
4.2 UDP, TCP, and Sockets
The two main transport protocols in the Internet suite provide very different services:
UDP (User Datagram Protocol): UDP is a minimal transport protocol atop IP. It provides connectionless, unreliable delivery of individual messages (datagrams). It has no concept of a long-lived connection; each UDP packet (datagram) is sent independently (Differences between TCP and UDP – GeeksforGeeks) (Differences between TCP and UDP – GeeksforGeeks). UDP does not guarantee delivery, ordering, or duplicate protection – it simply adds a small header to identify source and destination ports and a length and checksum, and passes the packet to IP for best-effort delivery. If a UDP packet is lost or corrupted, UDP itself does nothing (there is an optional checksum in UDP header that allows the receiver to detect corruption; if the checksum is bad, the packet is dropped). Because of its simplicity, UDP has low overhead: the header is only 8 bytes (vs 20 bytes for TCP not counting options) (Differences between TCP and UDP – GeeksforGeeks) (Differences between TCP and UDP – GeeksforGeeks), and there’s no handshaking or connection setup delay. This makes UDP suitable for applications that need speed and can tolerate some loss or handle reliability on their own – e.g., live video or voice (where it’s better to drop a packet than spend time retransmitting it, as late data is useless), gaming, or simple query-response protocols like DNS queries (DNS often uses UDP for a single query/response, resending the query if no response in some time). UDP is also often used for broadcast or multicast transmission (TCP cannot do multicast).
However, since UDP does not implement congestion control, applications using UDP should implement their own rate control to avoid flooding the network (for example, many streaming apps adjust quality to network conditions, effectively a form of higher-level congestion control).
TCP (Transmission Control Protocol): TCP provides a connection-oriented, reliable, byte-stream service (Differences between TCP and UDP – GeeksforGeeks) (Differences between TCP and UDP – GeeksforGeeks). Connection-oriented means that two endpoints must first establish a connection (via a handshake, famously the 3-way handshake in TCP) before data flows, and then close it when done. Reliable means TCP ensures that data is delivered to the other end, and if any packets are lost, they are retransmitted; also data is delivered in order and without duplicates – exactly as sent. Byte-stream means TCP presents the data as a continuous stream of bytes with no inherent message boundaries (unlike UDP which preserves message boundaries per datagram). The application writes bytes into the stream, and the receiver reads bytes out in the same order. It’s up to the application to frame the bytes into messages if needed (e.g., using delimiters or length fields within the stream).
To achieve reliability, TCP uses sequence numbers for bytes and acknowledgments (ACKs) from the receiver to confirm receipt. It also has timers to detect packet loss (if an ACK isn’t received in time) and will retransmit lost data. It uses cumulative ACKs (ack number = next byte expected) and as mentioned can use selective ACKs for efficiency. For ordering, if out-of-order segments arrive (due to, say, one segment lost and a later one arrives), the receiver can buffer them but will not deliver to application until the missing bytes are received, ensuring in-order delivery. Flow control and congestion control in TCP were described above; those make TCP adjust its rate to network and receiver conditions.
TCP’s connection orientation implies overhead: there is a handshake at start (at least one round trip: client sends SYN, server replies SYN-ACK, client sends ACK; only after that data can flow) and a handshake at end (FIN/ACK exchange). For very short interactions, that setup cost can be significant, which is why protocols like DNS often prefer UDP (one packet each way, done). But for long sessions or bulk data, TCP’s benefits far outweigh the setup cost.
Ports and multiplexing: Both UDP and TCP use port numbers to allow multiple applications on the same host to use the network simultaneously (Differences between TCP and UDP – GeeksforGeeks). The combination of an IP address and a port is called a socket address or endpoint (also sometimes “IP:port”). A port is a 16-bit number identifying an application process. For example, by convention, TCP port 80 is HTTP, port 25 is SMTP email, port 443 is HTTPS, etc. When a packet arrives at a host, the transport layer looks at the destination port to decide which application socket to deliver the data to. Ports below 1024 are “well-known” ports typically reserved for system or standard services. Ephemeral ports (usually >1024) are used for client side of connections and dynamically allocated.
In TCP, a connection is identified by the tuple (Src IP, Src Port, Dst IP, Dst Port). This 4-tuple must be unique for each TCP connection on the network (Differences between TCP and UDP – GeeksforGeeks). That means a server can handle multiple connections to the same port (say port 80) from different clients because the source IP:port differs for each. In UDP, since it’s connectionless, each packet carries its own 4-tuple, but we often think of an ongoing conversation identified by the same 4-tuple as well.
UDP vs TCP use cases:
- Use TCP when you need reliable delivery, ordered data, and can tolerate the overhead and slight increase in latency due to retransmissions or congestion control. Examples: web pages (HTTP), file transfers (FTP), email (SMTP/IMAP/POP), remote login (SSH/Telnet). In these, correctness is paramount – you’d not want bytes missing or out of order in a file or email.
- Use UDP when you need low latency or have a real-time component and can tolerate some loss, or you want to implement your own specific reliability scheme. Examples: live audio/video streaming (losing a few packets results in slight degradation but waiting for retransmit would cause lag), online gaming (similar reasons), or simple query protocols like DNS where the overhead of a TCP handshake is not worth it for a single small query (DNS will fall back to TCP only if the response is too large for UDP or not received). Also UDP is used for broadcasts or multicasts (like service discovery protocols, routing protocol updates in some cases, etc., where TCP isn’t feasible).
Another difference: TCP is heavy-weight (connection, statefulness, larger header, flow/congestion control) vs UDP is lightweight (no connection, minimal header, send and forget) (Differences between TCP and UDP – GeeksforGeeks) (Differences between TCP and UDP – GeeksforGeeks). This also affects performance – TCP ensures fairness and reliable delivery, which can reduce throughput for one flow if network is shared, whereas UDP flows can hog more (which is why poorly behaved UDP apps can be problematic).
Sockets API: The programming interface for network communication in most operating systems is the sockets API, which was originally from 4.2BSD Unix and has become standard (with variants in many languages). A socket is an abstraction representing one end of a communication link. Applications create a socket (specifying TCP or UDP or another transport), then:
- For UDP: no formal connection; the app can sendto/recvfrom a socket to a target IP:port (the socket can be bound to a local port). Essentially writing and reading datagrams.
- For TCP: the client socket calls
connect()
specifying the server’s IP and port – this triggers the TCP handshake under the hood (Understanding Computer Networks Layered Architecture & Protocols – Course Sidekick). If connect succeeds, the socket is now considered “connected” and the app can usesend()/recv()
(or write/read) as if it’s a continuous stream. The server side creates a listening socket withsocket(), bind(port), listen()
and then callsaccept()
to accept incoming connections (Understanding Computer Networks Layered Architecture & Protocols – Course Sidekick). Accept returns a new socket for the established connection (while the listening socket continues to listen for new connections). After that, server and client use their respective sockets to send/receive.
In coding terms, a socket is identified by the tuple mentioned, but once connected (in TCP) you don’t usually worry about the address; you just get a file descriptor to read/write. In UDP, you might not call connect (though you can in a way to fix a default peer address), you usually use sendto and provide the destination each time.
The socket API also allows setting options (like timeouts, disabling Nagle’s algorithm, enabling broadcast on UDP, etc.), and for TCP you can use listen()/accept()
pattern for passive open, and for UDP you typically bind()
to a port to receive data on that port.
For example, a typical TCP server (like a web server) does:
socket() -> returns sockfd
bind(sockfd, port 80)
listen(sockfd)
for(;;) { newsock = accept(sockfd) ; /* returns a new socket for that client */
... handle client on newsock (read request, send response) ...
close(newsock);
}
close(sockfd);
A TCP client does:
socket() -> sockfd
connect(sockfd, server:80)
send(sockfd, "GET / ...")
recv(sockfd, response)
close(sockfd)
This results in behind the scenes: SYN, SYN-ACK, ACK handshake, then data exchange, then FIN/ACK termination.
A UDP interaction might be:
socket = socket(AF_INET, SOCK_DGRAM)
sendto(socket, data, len, dest_addr:port)
recvfrom(socket, buf, ..., &src_addr)
No handshake; each sendto is one packet out, each recvfrom gets one packet in (one per call if available).
Socket addressing: We mention IP+port forms a socket address. The OS uses this to route inbound data to the correct socket. For TCP, a socket is often characterized by local IP:port and remote IP:port (for connected sockets). For listening sockets, it’s only local port (and maybe local IP if bound to a specific one or INADDR_ANY for all local IPs). For UDP, if you do connect on it, the OS will fix the remote for convenience (allowing use of send instead of sendto). Otherwise, each datagram’s header is consulted.
Summary: TCP – connection-oriented, reliable, ordered, heavy overhead, suitable for the majority of applications needing accuracy (file transfer, web, etc.) (Differences between TCP and UDP – GeeksforGeeks) (Differences between TCP and UDP – GeeksforGeeks). UDP – connectionless, no guarantees, minimal overhead, suitable for apps needing speed or multicast/broadcast or implementing their own control (media streaming, simple queries, etc.) (Differences between TCP and UDP – GeeksforGeeks) (Differences between TCP and UDP – GeeksforGeeks). Sockets provide the programming abstraction to use these protocols, with the OS handling the protocol details. With sockets, network communication becomes akin to reading/writing files or messages, which greatly simplified network application development and became a standard model across platforms. Each application can use multiple sockets to communicate with different peers concurrently (each web browser tab might be a separate set of TCP sockets, etc.), and the transport layer ports and IPs ensure correct demultiplexing of data to each.
5. Application Layer Protocols
The Application Layer is where network applications and protocols reside – everything from web browsing (HTTP) and email (SMTP, IMAP, POP3) to domain name resolution (DNS) and file transfer (FTP). These protocols use the transport layer (TCP or UDP) to provide specific services to users or other programs. We will overview a few important application protocols: DNS (for name resolution), SMTP/Email (for sending email, plus mention of POP3/IMAP for retrieval), HTTP (for the World Wide Web), and FTP (an older but classic file transfer protocol). Understanding these gives insight into common Internet operations.
5.1 DNS
The Domain Name System (DNS) is often called the “phonebook of the Internet”. It translates human-friendly domain names (like www.example.com
) into IP addresses (like 93.184.216.34) that networking equipment uses to route traffic (DNS 101 – Cazarin Interactive). DNS is a distributed, hierarchical database. Its key features:
- Hierarchical Namespace: DNS names are arranged in a tree-like hierarchy. The top of the hierarchy is the root (represented by an empty label, often written as a trailing dot). Directly under the root are the top-level domains (TLDs) like
com
,org
,net
, country codes likeuk
,jp
, etc. Below TLDs are second-level domains (likeexample
inexample.com
), and so on (subdomains likewww
underexample.com
). The full domain namewww.example.com.
can be seen as a path in this tree from the root:.
(root) ->com
->example
->www
(Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]). Each node in the DNS tree can theoretically have many subdomains. This hierarchy allows delegation of authority: thecom
domain is managed by certain authorities, which delegateexample.com
to whoever registered it, and that owner could delegatesub.example.com
to someone else, etc. - DNS Servers and Zones: The database is distributed across millions of DNS servers worldwide. The namespace is divided into zones. A zone is a contiguous part of the DNS tree that is managed by a particular entity. For example, the
.com
zone is managed by Verisign (which runs the.com
DNS servers). Theexample.com
zone is managed by whoever owns example.com (or their DNS provider). Each zone has one or more authoritative name servers – servers that are the ultimate source of truth for that zone (they contain the DNS records for names in that zone) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]). The root zone is managed by ICANN via root servers. - Resolution Process: DNS resolution typically involves both recursive and iterative queries (DNS Resolution Process | Cycle.io) (Recursive vs. iterative DNS queries: What’s the difference?). A user’s computer is configured to use a local DNS resolver (either on the same machine or more often provided by the ISP or a public DNS like 8.8.8.8). When an application needs to resolve a name, say
www.example.com
, it asks its configured DNS resolver (a recursive resolver) to find the IP. The recursive resolver will then perform the resolution by querying the hierarchy:- It asks a root name server: “What is the DNS server for
example.com
?” Actually, it will ask forwww.example.com
– but root likely only knows about TLDs, so the root server replies with a referral: “I don’t knowwww.example.com
, but ask the.com
TLD servers; here are their addresses.” Root servers know the authoritative servers for each TLD (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]). - The resolver then asks one of the
.com
TLD servers: “What is the DNS server forexample.com
?” The.com
server responds with a referral: “Forexample.com
, the authoritative name servers arens1.somednsprovider.net
at IP x.x.x.x,ns2.somednsprovider.net
at IP y.y.y.y” (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]). - The resolver then asks the provided
ns1.somednsprovider.net
: “What is the IP forwww.example.com
?” The authoritative server, if it has a record forwww
, replies with the answer: e.g., “www.example.com A 93.184.216.34″ (A is the record type for IPv4 address). - The resolver gets the answer and returns it to the original client.
- It asks a root name server: “What is the DNS server for
This process involves iterative querying from the recursive resolver’s perspective (it queries root, then TLD, then SLD…). The client’s query to its local resolver was recursive, meaning “please do all the work and give me the final answer” (DNS Resolution Process | Cycle.io). Most stub clients (like your OS or browser) use recursive queries to a local DNS server. That local server then performs iterative queries up the chain.
Resolvers cache responses to improve performance. For instance, after resolving www.example.com
, the resolver will remember the IP for some time (the TTL – time-to-live – field provided in the DNS records, say 300 seconds). It will also likely cache that ns1.somednsprovider.net
is authoritative for example.com
(and the IP of ns1) for the TTL of those NS records. Next time someone asks for blog.example.com
, it can skip straight to ns1.somednsprovider.net without bothering root and .com again until cache entries expire (Iterative DNS query faster than recursive query due to more entries …). This caching significantly reduces DNS traffic and lookup times.
- Resource Records: DNS stores various types of records:
- A record: maps a hostname to an IPv4 address (DNS 101 – Cazarin Interactive).
- AAAA record: maps a hostname to an IPv6 address.
- CNAME record: an alias; maps one name to another canonical name. For example,
www.example.com CNAME example.com.
meanswww
is an alias of the root domain. - MX record: mail exchange – specifies the mail server for a domain (e.g.,
example.com MX 10 mail.example.com
meaning mail for example.com should go to mail.example.com). - NS record: specifies the authoritative name servers for a domain (used in delegation).
- TXT record: arbitrary text (used for things like SPF info, domain verification tokens, etc.).
- SRV records, PTR (reverse lookup for IP->name), etc.
A query includes a record type and a name. For example, an email program will do a DNS query for MX records of example.com
to find where to send mail.
DNS in practice: When you type a URL, say http://www.example.com/
, your browser needs to resolve www.example.com
:
- It calls the OS’s resolver library. If the OS doesn’t have it cached, it sends a recursive query to the configured DNS server (like your router or ISP).
- That server does the above process and returns an IP.
- The browser then opens a TCP connection to that IP on port 80 (for HTTP). This happens behind the scenes swiftly (and often cached – browsers also cache DNS for a short time, as does the OS).
DNS is designed to be robust. There are multiple root servers (actually 13 root server identities, each actually anycasted to many physical servers worldwide). TLD servers are redundant globally. Most queries can be resolved in a few tens of milliseconds (often faster if cached in a nearby resolver).
DNS primarily uses UDP on port 53 for queries and responses. UDP is used because queries are usually small and it avoids connection overhead. If a response is larger than 512 bytes (old limit, now with EDNS extension it can go larger but still if too large or if truncation needed), the server sets a “TC” (truncate) flag, indicating the client should retry over TCP (ELI5:Differenced between IMAP, POP3, and SMTP? – Reddit). TCP (also on port 53) is always used for zone transfers (when DNS servers synchronize zone data). But normal lookups usually fit in UDP (except for DNSSEC signed zones with large records, etc.)
Hierarchical, decentralized design (The Ultimate Guide to the Domain Name System (DNS) by IT pros) (DNS 101 – Cazarin Interactive) ensures no single point of failure and allows administration to be split: e.g., any company can manage their own DNS for their domain, adding and changing records as needed on their authoritative servers, and the global DNS will refer to them through the chain.
5.2 SMTP and Email
SMTP (Simple Mail Transfer Protocol) is the standard protocol to send email between servers (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]). It operates typically over TCP port 25 for server-to-server communication. SMTP is used by a mail client (Mail User Agent, MUA) to send outgoing mail to a mail server (Mail Submission Agent, MSA) – often on port 587 with authentication in modern setups – and by mail servers (Mail Transfer Agents, MTA) to relay mail towards the recipient’s mail server (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]). SMTP is a text-based protocol with commands like HELO
/EHLO
(to introduce oneself and start the session), MAIL FROM:
(to set the sender address), RCPT TO:
(to specify a recipient), and DATA
(to begin sending the message content) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]). The message (comprising headers like From, To, Subject, and the body) is sent after the DATA command and terminated by a line containing just a period. The server responds with numeric codes (e.g., “250 OK” for command accepted, “354 Start input” before data, “550 Mailbox unavailable” for an error, etc.). SMTP was originally designed for transfer between always-on servers, so it doesn’t require the sender to hold a connection if recipient server is offline; usually, the sending server will queue and retry later. But nowadays, typically, an MUA sends to its configured MTA (like SMTP submission to Gmail’s server), that server then does the necessary MX lookup and SMTP to the destination’s server.
Email Delivery Flow: A typical email path:
- A user Alice (alice@sender.com) composes an email to bob@recipient.com.
- Her mail client uses SMTP to send the email to her outgoing mail server (say smtp.sender.com). This may require authentication (SMTP AUTH) and uses port 587 or 465 (submission over SSL) typically.
- Alice’s server now needs to deliver to recipient.com. It does a DNS lookup for
MX records
of recipient.com. Suppose it findsmail.recipient.com
as the mail exchanger. - Alice’s server opens a TCP connection to mail.recipient.com on port 25 and does the SMTP dialogue: “HELO smtp.sender.com”, “MAIL FROM:alice@sender.com“, “RCPT TO:bob@recipient.com“, etc. (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]). If all recipients are accepted, it sends the message with DATA. The server (mail.recipient.com) stores the message in Bob’s mailbox.
- Alice’s server disconnects. It may try another MX if first fails, or queue if it can’t connect (with periodic retries).
- Now Bob wants to read the email. Here is where other protocols come in: Bob’s mail client could use POP3 (Post Office Protocol v3) or IMAP (Internet Message Access Protocol) to fetch the mail from his server (mail.recipient.com) (What are SMTP, POP3 & IMAP & How does it Work? – SmartReach.io).
- POP3 (TCP port 110, or 995 for POP3S) is a simple protocol where the client connects, authenticates (USER/PASS), and then can
LIST
messages,RETR
(retrieve/download) them, andDELE
(optionally delete from server) (ELI5:Differenced between IMAP, POP3, and SMTP? – Reddit). POP3 generally is used to download and often remove messages, typically suited for single-device access (older model of email). - IMAP (TCP 143, or 993 for IMAPS) is more complex and allows the client to manage mail on the server – multiple folders, keep messages on server, sync flags like read/unread, etc. IMAP supports partial fetches (like just headers) and is better when accessing the same mailbox from multiple devices as it leaves mail on server by default.
- Alternatively, Bob might use a webmail interface, which behind the scenes likely uses IMAP/POP or some proprietary API to fetch the mail, but that’s not an application-layer protocol we define separately (HTTP in that case to the webmail service).
- POP3 (TCP port 110, or 995 for POP3S) is a simple protocol where the client connects, authenticates (USER/PASS), and then can
So to summarize, SMTP handles sending of emails (between servers and from client to server) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]), while POP3/IMAP handle retrieval of emails by the end user. SMTP is a push protocol (sender initiates the transfer to receiver’s server) (What are SMTP, POP3 & IMAP & How does it Work? – SmartReach.io) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]), whereas POP3/IMAP are pull (the recipient’s client pulls from their server).
Ports:
- SMTP: 25 (server relay), 587 (submission), 465 (legacy SMTPS).
- POP3: 110 (143 for IMAP) plain, 995 (993 for IMAP) for TLS encrypted.
- In modern usage, an end-user typically doesn’t directly use SMTP to other domains; they use SMTP (with auth) to their mail provider’s server, which then uses SMTP to deliver to target domain’s server.
Email message format: SMTP transmits the message which has its own header lines (different from SMTP command lines). For example, the DATA after commands would include:
From: Alice <alice@sender.com>
To: Bob <bob@recipient.com>
Subject: Hello
Hello Bob, this is a test email.
.
The SMTP envelope (MAIL FROM, RCPT TO) could differ from the header “From:” and “To:” which are seen by users. The envelope sender is used for bounces (MAIL FROM is the return-path).
SMTP is reliable over TCP, but if a server can’t deliver to the next hop, it queues and will try again periodically (exponential backoff often up to a few days before giving up and sending a bounce message back to sender). So ultimate delivery might be delayed if network issues. Also, if Bob’s address doesn’t exist, mail.recipient.com would respond with “550 No such user” and Alice’s server would generate a bounce to Alice.
POP3 vs IMAP:
- POP3 is very simple (just download all new messages and optionally delete). It doesn’t keep state on server beyond “seen messages.” It has commands like USER/PASS, STAT (status of mailbox), LIST, RETR n, DELE n, QUIT.
- IMAP is stateful and complex. Clients can fetch message lists, fetch body or headers, search on server, manage folders, etc. It’s more suitable for online access where mail stays on server (which is the norm now).
Email ecosystem summary:
- MUA (Mail User Agent): e.g., Outlook, Thunderbird, iPhone Mail – used by end user. Uses SMTP to send, and POP3/IMAP to receive.
- MTA (Mail Transfer Agent): e.g., postfix, sendmail, Exchange – server that routes mail.
- MX (Mail Exchanger): the DNS entry specifying an MTA for a domain.
- MDA (Mail Delivery Agent): sometimes distinguished as the agent that delivers into mailbox (on the server), e.g., procmail, but often part of MTA.
- Bob’s MUA could also directly speak IMAP to read email that sits on the server from multiple devices (so that the emails remain on the server and sync status across devices).
Security: Originally SMTP was plaintext and open relay, which led to spam issues. Now, SMTP servers require auth for outgoing if from users, and definitely do not relay mail from arbitrary outside sources to other outside dest (anti-relay rules). TLS encryption is also typically used (STARTTLS on port 25/587 to encrypt connection).
In short, SMTP is for sending mail between servers and initial submission (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]) (Email Protocols Explained: IMAP vs POP3 vs SMTP [2025]), POP3/IMAP for retrieving mail from your mailbox server. All these are application-layer protocols using TCP. SMTP uses a line-oriented text protocol with commands/replies, POP3 also line-oriented (very simple), IMAP text but more complex with tagged commands and multiline responses. Together, they allow the global email system to function: you can send an email to anyone else and the protocols ensure it gets delivered and can be fetched.
5.3 HTTP
The HyperText Transfer Protocol (HTTP) is the foundation of the World Wide Web, used for transferring web pages and other resources. It is a request-response protocol that typically runs over TCP (port 80 for HTTP, port 443 for HTTPS which is HTTP over TLS). In HTTP, a client (usually a web browser) establishes a TCP connection to a server (web server) and sends an HTTP request message; the server then sends back an HTTP response message (HTTP – Wikipedia) (HTTP – Wikipedia).
HTTP Request: An HTTP request message includes:
- A request line: e.g.,
GET /index.html HTTP/1.1
which specifies the method (GET), the path (“/index.html”) on the server, and the HTTP version (HTTP request methods – HTTP | MDN). - Request headers: these are key-value pairs sent one per line after the request line, which convey additional information about the request. Examples:
Host: www.example.com
(specifies which host, since one server might host many domains – required in HTTP/1.1),User-Agent: Mozilla/5.0 ...
(identifies the client software),Accept: text/html
(what content types the client can accept), etc. - A blank line, then optionally a message body. For GET requests, typically there is no body (GET is just asking for data). For methods like POST (when submitting form data, uploading, etc.), the request will include a body (e.g., form fields or JSON payload) and a header like
Content-Length
to indicate size and possiblyContent-Type
to indicate format (HTTP request methods – HTTP | MDN).
HTTP Response: The server replies with a response message:
- A status line: e.g.,
HTTP/1.1 200 OK
orHTTP/1.1 404 Not Found
. It contains the HTTP version, a numeric status code, and a reason phrase (Differences between TCP and UDP – GeeksforGeeks) (Differences between TCP and UDP – GeeksforGeeks). - Response headers: e.g.,
Content-Type: text/html
,Content-Length: 3421
,Date: Tue, 20 Mar 2025 10:00:00 GMT
,Server: Apache/2.4.1
etc. - A blank line, then the response body: e.g., the HTML content of the page, or an image or whatever was requested, if the response has content (some responses like 204 No Content or 304 Not Modified have no body).
HTTP is stateless: the protocol itself doesn’t keep track of previous requests. Each request-response is independent (HTTP – Wikipedia). If state is needed (like logins, shopping carts), it’s done via mechanisms like cookies (headers that client and server exchange to identify sessions), or via embedding state in URLs or hidden form fields.
Methods: Common HTTP methods include:
- GET: retrieve a resource (should not have side effects). The query data, if any, is usually appended in the URL (after a ‘?’ in the URL).
- POST: submit data to the server (e.g., form submission). The data is sent in the request body (e.g., form fields encoded as application/x-www-form-urlencoded or as JSON, etc.) (HTTP Methods GET vs POST – W3Schools) (HTTP request methods – HTTP | MDN). POST often causes changes on server (e.g., adding a database entry).
- HEAD: same as GET but the server returns only headers (no body). Used to check things like content length, modification time, etc., without downloading.
- PUT: upload a representation of a resource (often used in REST APIs to replace or create a resource at a known URL).
- DELETE: delete the specified resource.
- Others like OPTIONS (to query server for supported methods), TRACE, CONNECT (used for proxies/TLS tunneling), PATCH (partial update).
HTTP/1.0 was simple: one request per TCP connection (client closes after each response). HTTP/1.1 introduced persistent connections by default: the TCP connection can be reused for multiple requests in sequence, saving the overhead of reconnecting for each resource (It’s #FrontendFriday – What is HTTP? – doubleSlash Blog). It also introduced chunked transfer encoding to send data in pieces without knowing total length in advance, as well as the Host header to support virtual hosting.
Modern HTTP (HTTP/1.1) still essentially works as above, but HTTP/2 (and now HTTP/3) have come in to improve performance by allowing multiple concurrent requests over one connection (multiplexing) and other features – but the basic request/response semantics remain.
Fundamental status codes groups:
- 1xx: Informational (rarely used in practice, aside from 101 Switching Protocols for WebSocket upgrade).
- 2xx: Success. 200 OK is common. 204 No Content for no body, 206 Partial Content for range requests.
- 3xx: Redirection. 301 Moved Permanently, 302 Found (temporary redirect), 303 See Other, 304 Not Modified (to make caching conditional requests, if client has cached copy and it’s still fresh, server responds 304).
- 4xx: Client errors. 400 Bad Request (malformed request), 401 Unauthorized (needs auth), 403 Forbidden, 404 Not Found (resource doesn’t exist), 405 Method Not Allowed, etc.
- 5xx: Server errors. 500 Internal Server Error (generic when an unexpected error happens), 502 Bad Gateway, 503 Service Unavailable, etc.
URL vs HTTP: A URL like http://www.example.com/path/page.html
contains the scheme (http), host (www.example.com), and path (/path/page.html). The browser uses DNS to resolve host to IP, then connects to that IP on port 80 (default for http). It sends an HTTP request “GET /path/page.html HTTP/1.1” with “Host: www.example.com” header. The server’s HTTP handling (like a web server software such as Apache, Nginx, IIS) sees the host and path, finds the file or generates it via some program, and sends back the content with appropriate Content-Type (text/html in this case) and so forth.
Statelessness and cookies: HTTP being stateless means if you do two requests, the server by default doesn’t know they came from the same client or are part of the same session. Cookies (set via Set-Cookie
header from server, then client stores and sends it back in Cookie
header on subsequent requests to that domain) are a way to maintain state. For instance, after login, server sets a session cookie, and that cookie is sent by client on each request, so the server knows the client is logged in as a certain user.
HTTP is extensible: both request and response have flexible headers. New headers can be added (e.g., for caching: Cache-Control
, for content negotiation: Accept-Language
, etc.). Also Content-Type header in responses (and sometimes requests with body) tells the MIME type of content (HTML, JSON, JPEG, etc.), so the client can handle accordingly.
Security (HTTPS): HTTP itself is plaintext. HTTPS is just HTTP inside a TLS (SSL) encrypted connection on typically port 443. The HTTP protocol messages are the same, but they are encrypted and authenticated via TLS, making it secure. From application perspective, it’s still “speak HTTP” but over a secure channel.
HTTP usage beyond browsers: Many API calls and web services use HTTP (often with JSON payloads, etc.) because it’s widely supported and can traverse firewalls easily (usually port 80/443 open). Tools like curl or programming libraries allow sending custom HTTP requests for various purposes.
In summary, HTTP is the application protocol for the web, following a simple request (with methods like GET/POST) and response (with status codes) format (Differences between TCP and UDP – GeeksforGeeks) (Differences between TCP and UDP – GeeksforGeeks). Its stateless, text-based design made it easy to implement and debug, and additional mechanisms (cookies, sessions) were added on top to manage stateful experiences. Modern developments (HTTP/2, HTTP/3) aim to optimize performance (reducing latency and improving parallelism), but the core concepts remain as defined in HTTP/1.x which is what we described.
5.4 FTP
The File Transfer Protocol (FTP) is one of the older Internet protocols (dating back to 1970s) designed for transferring files between systems. It predates HTTP and has more complex connection handling. FTP is distinctive in that it uses two separate TCP connections: a control connection and a data connection (Differences between TCP and UDP – GeeksforGeeks).
- Control Connection: This is established from the client (FTP user) to the server’s FTP control port (default port 21). The control connection stays open for the duration of the session and is used for sending commands from client to server and server replies (all in a text command format, somewhat similar to SMTP/HTTP style commands). For example, commands include:
USER username
(to login),PASS password
,CWD
to change directory,LIST
to list files,RETR filename
to download a file,STOR filename
to upload a file, etc. The control connection is persistent and is usually a simple ASCII protocol with three-digit response codes (like 220 on connection with welcome, 331 for user name okay need password, 230 login ok, 550 file not found, etc.). - Data Connection: Whenever data (like directory listings or file contents) needs to be transferred, a separate TCP connection is opened for the data. How this connection is opened defines active vs passive modes:
- Active Mode FTP: In active mode, the server actively connects back to the client for the data channel. The client, after sending a command that will require data transfer (like LIST or RETR), sends a
PORT
command over the control connection (Difference between Active and Passive FTP – GeeksforGeeks) (Difference between Active and Passive FTP – GeeksforGeeks). ThePORT
command includes the client’s IP and a port number (chosen by the client’s FTP program) that the client has opened to listen for the data connection. For example,PORT 192,168,1,5,12,34
instructs the server to connect to IP 192.168.1.5 at port (12*256 + 34) = 3106. The server then initiates a TCP connection from its data port (default 20 for FTP data) to the client’s specified address/port. Then the data (like directory listing or file) is transferred over that connection. After the transfer, that data connection is closed (one data connection per transfer typically). - Passive Mode FTP: Active mode has issues if the client is behind a NAT or firewall that blocks incoming connections. In passive mode, the roles for data connection setup are reversed: the client will initiate the data connection to the server. The client sends a
PASV
command instead of PORT (Difference between Active and Passive FTP – GeeksforGeeks) (Difference between Active and Passive FTP – GeeksforGeeks). The server then replies with something like227 Entering Passive Mode (ip1,ip2,ip3,ip4,p1,p2)
giving an IP address and port where it is listening for the data connection (Difference between Active and Passive FTP – GeeksforGeeks) (Difference between Active and Passive FTP – GeeksforGeeks). The client then opens a TCP connection from an ephemeral port to that server IP and port. Once connected, data transfer occurs. Passive mode is thus firewall/NAT-friendly (the client always initiates connections). Modern clients default to passive for this reason.
- Active Mode FTP: In active mode, the server actively connects back to the client for the data channel. The client, after sending a command that will require data transfer (like LIST or RETR), sends a
Summary of modes:
- Active: client opens port, server connects to client. (Requires client to not be blocked for incoming.)
- Passive: server opens port, client connects to server. (Works through NAT if server’s PASV info is properly reachable, though NAT on server side can complicate unless server knows its external IP to send in PASV reply.)
FTP Data Channel Use: The data channel is used for transferring file contents for RETR
(download) and STOR
(upload), and also for directory listings (LIST
or newer MLSD
). Each transfer usually opens a new data connection (though there’s an mode called “extended block mode” and other things rarely used). Typically, in classic FTP, after one file is transferred, the data connection closes and if another file needs to be transferred, a new PORT/PASV negotiation happens.
State and commands: FTP requires state (like current working directory, whether user is logged in, etc.) on the server per session (control connection) – it’s not stateless like HTTP. It also predates some conveniences: originally it sends passwords in plaintext (not good security; later FTPS extension for TLS or using SSH File Transfer/SFTP is common for secure file transfer).
Example FTP session (active mode):
Client connects to server:21
Server: "220 Welcome to FTP"
Client: "USER alice"
Server: "331 Password required for alice"
Client: "PASS secret"
Server: "230 User logged in"
Client: "PWD"
Server: "257 "/" is current directory"
Client: "PASV" (client chooses passive for easier NAT)
Server: "227 Entering Passive Mode (203,0,113,5,195,100)"
(meaning host 203.0.113.5, port 195*256+100 = 50020)
Client opens connection to 203.0.113.5:50020
Client: "LIST"
Server: "150 Opening ASCII mode data connection for file list"
(then over data connection, server sends directory listing)
Server: "... directory listing text..."
Server on control: "226 Transfer complete"
(data connection closes)
Client: "RETR file.txt"
Server: "150 Opening data connection for file.txt (xxx bytes)"
(for RETR, since data conn is closed from last, client would do another PASV and connect prior to RETR)
(data connection established, server sends file bytes)
Server: "226 Transfer complete"
Client: "QUIT"
Server: "221 Goodbye"
(control connection closed)
FTP was widely used historically but now because of NAT issues and not being secure, many use SFTP (completely different protocol over SSH) or FTPS (FTP over TLS) or just HTTP(S) for file transfer. However, understanding FTP is important as it shows a traditional approach with separate control/data channels and the complications that arise with firewalls.
Active vs Passive Summary: Passive mode is typically indicated in clients as a setting “Use Passive Mode (PASV)” and is usually on by default nowadays. If one sees in logs the server trying to connect to some port on client, that’s active mode. Many home routers handle active FTP by noticing the PORT command in control stream and opening a temporary pinhole (this is an example of an Application Layer Gateway, as mentioned earlier in NAT discussion, to make active FTP work behind NAT by rewriting the PORT command with external IP or by port forwarding logic).
Anonymous FTP: Many public FTP servers allowed “anonymous” login – user “anonymous” with any password (often you provide your email as password). This is how a lot of files (like Linux distributions, etc.) were distributed pre-http. Even now, ftp servers exist for large datasets but often reachable via http too.
FTP in the URL form: ftp://user:pass@host/path is a URL format for FTP accessible by browsers or ftp clients.
Summary: FTP is a stateful file transfer protocol using a separate control connection (on port 21) and data connections (port 20 from server or random ports for passive) (Difference between Active and Passive FTP – GeeksforGeeks). It predates secure practices and struggles with NAT, so while it’s historically important and still in use in some contexts, it’s often replaced by more firewall-friendly or secure methods. Understanding it, however, reveals how application protocols can orchestrate multiple connections and more complex interactions than a single request-response.
Be First to Comment