TCP protocol operations may be divided into three phases.
Connection establishment is a multi-step handshake process that establishes a connection before entering the
data transfer phase. After data transfer is completed, the
connection termination closes the connection and releases all allocated resources. A TCP connection is managed by an operating system through a resource that represents the local end-point for communications, the
Internet socket. During the lifetime of a TCP connection, the local end-point undergoes a series of
state changes:
Connection establishment Before a client attempts to connect with a server, the server must first bind to and listen at a port to open it up for connections: this is called a passive open. Once the passive open is established, a client may establish a connection by initiating an active open using the three-way (or 3-step) handshake: •
SYN: The active open is performed by the client sending a SYN to the server. The client sets the segment's sequence number to a random value x. •
SYN-ACK: In response, the server replies with a SYN-ACK. The acknowledgment number is set to one more than the received sequence number i.e. x+1, and the sequence number that the server chooses for the packet is another random number, y. •
ACK: Finally, the client sends an ACK back to the server. The sequence number is set to the received acknowledgment value i.e. x+1, and the acknowledgment number is set to one more than the received sequence number i.e. y+1. Steps 1 and 2 establish and acknowledge the sequence number for one direction (client to server). Steps 2 and 3 establish and acknowledge the sequence number for the other direction (server to client). Following the completion of these steps, both the client and server have received acknowledgments and a full-duplex communication is established.
Connection termination The connection termination phase uses a four-way handshake, with each side of the connection terminating independently. When an endpoint wishes to stop its half of the connection, it transmits a FIN packet, which the other end acknowledges with an ACK. Therefore, a typical tear-down requires a pair of FIN and ACK segments from each TCP endpoint. After the side that sent the first FIN has responded with the final ACK, it waits for a timeout before finally closing the connection, during which time the local port is unavailable for new connections; this state lets the TCP client resend the final acknowledgment to the server in case the ACK is lost in transit. The time duration is implementation-dependent, but some common values are 30 seconds, 1 minute, and 2 minutes. After the timeout, the client enters the CLOSED state and the local port becomes available for new connections. It is also possible to terminate the connection by a 3-way handshake, when host A sends a FIN and host B replies with a FIN & ACK (combining two steps into one) and host A replies with an ACK. Some operating systems, such as
Linux implement a half-duplex close sequence. If the host actively closes a connection, while still having unread incoming data available, the host sends the signal RST (losing any received data) instead of FIN. This assures that a TCP application is aware there was a
data loss. A connection can be in a
half-open state, in which case one side has terminated the connection, but the other has not. The side that has terminated can no longer send any data into the connection, but the other side can. The terminating side should continue reading the data until the other side terminates as well.
Resource usage Most implementations allocate an entry in a table that maps a session to a running operating system process. Because TCP packets do not include a session identifier, both endpoints identify the session using the client's address and port. Whenever a packet is received, the TCP implementation must perform a lookup on this table to find the destination process. Each entry in the table is known as a Transmission Control Block or TCB. It contains information about the endpoints (IP and port), status of the connection, running data about the packets that are being exchanged and buffers for sending and receiving data. The number of sessions in the server side is limited only by memory and can grow as new connections arrive, but the client must allocate an
ephemeral port before sending the first SYN to the server. This port remains allocated during the whole conversation and effectively limits the number of outgoing connections from each of the client's IP addresses. If an application fails to properly close unrequired connections, a client can run out of resources and become unable to establish new TCP connections, even from other applications. Both endpoints must also allocate space for unacknowledged packets and received (but unread) data.
Data transfer The Transmission Control Protocol differs in several key features compared to the
User Datagram Protocol: • Ordered data transfer: the destination host rearranges segments according to a sequence number Some TCP implementations use
selective acknowledgements (SACKs) to provide explicit feedback about the segments that have been received. This greatly improves TCP's ability to retransmit the right segments. Retransmission ambiguity can cause spurious fast retransmissions and congestion avoidance if there is reordering beyond the duplicate acknowledgment threshold. In the last two decades more packet reordering has been observed over the Internet which led TCP implementations, such as the one in the Linux Kernel to adopt heuristic methods to scale the duplicate acknowledgment threshold. Recently, there have been efforts to completely phase out duplicate-ACK-based fast-retransmissions and replace them with timer based ones. (Not to be confused with the classic RTO discussed below). The time based loss detection algorithm called Recent Acknowledgment (RACK) has been adopted as the default algorithm in Linux and Windows.
Timeout-based retransmission When a sender transmits a segment, it initializes a timer with a conservative estimate of the arrival time of the acknowledgment. The segment is retransmitted if the timer expires, with a new timeout threshold of twice the previous value, resulting in
exponential backoff behavior. Typically, the initial timer value is , where is the clock granularity. This guards against excessive transmission traffic due to faulty or malicious actors, such as
man-in-the-middle denial of service attackers. Accurate RTT estimates are important for loss recovery, as it allows a sender to assume an unacknowledged packet to be lost after sufficient time elapses (i.e., determining the RTO time). Retransmission ambiguity can lead a sender's estimate of RTT to be imprecise. In an environment with variable RTTs, spurious timeouts can occur: if the RTT is under-estimated, then the RTO fires and triggers a needless retransmit and slow-start. After a spurious retransmission, when the acknowledgments for the original transmissions arrive, the sender may believe them to be acknowledging the retransmission and conclude, incorrectly, that segments sent between the original transmission and retransmission have been lost, causing further needless retransmissions to the extent that the link truly becomes congested; selective acknowledgement can reduce this effect. specifies that implementations must not use retransmitted segments when estimating RTT.
Karn's algorithm ensures that a good RTT estimate will be produced—eventually—by waiting until there is an unambiguous acknowledgment before adjusting the RTO. After spurious retransmissions, however, it may take significant time before such an unambiguous acknowledgment arrives, degrading performance in the interim. TCP timestamps also resolve the retransmission ambiguity problem in setting the RTO, though they do not necessarily improve the RTT estimate.
Error detection Sequence numbers allow receivers to discard duplicate packets and properly sequence out-of-order packets. Acknowledgments allow senders to determine when to retransmit lost packets. To assure correctness a checksum field is included; see for details. The TCP checksum is a weak check by modern standards and is normally paired with a
CRC integrity check at
layer 2, below both TCP and IP, such as is used in
PPP or the
Ethernet frame. However, introduction of errors in packets between CRC-protected hops is common and the 16-bit TCP checksum catches most of these.
Flow control TCP uses an end-to-end
flow control protocol to avoid having the sender send data too fast for the TCP receiver to receive and process it reliably. Having a mechanism for flow control is essential in an environment where machines of diverse network speeds communicate. For example, if a PC sends data to a smartphone that is slowly processing received data, the smartphone must be able to regulate the data flow so as not to be overwhelmed.
TCP timestamps TCP timestamps, defined in in 1992, can help TCP determine in which order packets were sent. TCP timestamps are not normally aligned to the system clock and start at some random value. Many operating systems will increment the timestamp for every elapsed millisecond; however, the RFC only states that the ticks should be proportional. There are two timestamp fields: • a 4-byte sender timestamp value (my timestamp) • a 4-byte echo reply timestamp value (the most recent timestamp received from you). TCP timestamps are used in an algorithm known as
Protection Against Wrapped Sequence numbers, or
PAWS. PAWS is used when the receive window crosses the sequence number wraparound boundary. In the case where a packet was potentially retransmitted, it answers the question: "Is this sequence number in the first 4 GB or the second?" And the timestamp is used to break the tie. Also, the Eifel detection algorithm uses TCP timestamps to determine if retransmissions are occurring because packets are lost or simply out of order. TCP timestamps are enabled by default in Linux, and disabled by default in Windows Server 2008, 2012 and 2016. Recent Statistics show that the level of TCP timestamp adoption has stagnated, at ~40%, owing to Windows Server dropping support since Windows Server 2008.
Out-of-band data It is possible to interrupt or abort the queued stream instead of waiting for the stream to finish. This is done by specifying the data as
urgent. This marks the transmission as
out-of-band data (OOB) and tells the receiving program to process it immediately. When finished, TCP informs the application and resumes the stream queue. An example is when TCP is used for a remote login session where the user can send a keyboard sequence that interrupts or aborts the remotely running program without waiting for the program to finish its current transfer. Since the feature is not frequently used, it is not well tested on some platforms and has been associated with
vulnerabilities,
WinNuke for instance.
Forcing data delivery Normally, TCP waits for 200 ms for a full packet of data to send (
Nagle's Algorithm tries to group small messages into a single packet). This wait creates small, but potentially serious delays if repeated constantly during a file transfer. For example, a typical send block would be 4 KB, a typical MSS is 1460, so 2 packets go out on a 10 Mbit/s Ethernet taking ~1.2 ms each followed by a third carrying the remaining 1176 after a 197 ms pause because TCP is waiting for a full buffer. In the case of telnet, each user keystroke is echoed back by the server before the user can see it on the screen. This delay would become very annoying. Setting the
socket option TCP_NODELAY overrides the default 200 ms send delay. Application programs use this socket option to force output to be sent after writing a character or line of characters. The defines the PSH push bit as "a message to the receiving TCP stack to send this data immediately up to the receiving application". ==Vulnerabilities==