Content
- Service Model
Segment Structure
- Header
Connection Management
- Connection Establishment - Three-Way Handshake
- Connection Termination - Four-Way Handshake
- Reliable Data Transfer
- Retransmission Timeout Mechanism
- Adaptive Timeout Mechanism
- Retransmission Ambiguity
- Flow Control
- Congestion Control
- Congestion Window
- Bandwidth Probing
- TCP Congestion Control Algorithm
Service Model
- Endpoint-to-endpoint
- Full duplex
- Maintain independent sequencing info for each direction
- Connection-oriented
- Logical connection only in the end systems (not intermediate devices)
- Connection setup & release
- Handshaking to establish shared state info
- Keeping connection state before close
- Reliable in-order byte stream
- Reliability
- Checksum, sequence number, acknowledgment, retransmission, timeout
- Stream-oriented
- Pack application data (from 1 or more send() calls) into segments within Maximum Segment Size (MSS)
- Reliability
- Multiplexing/demultiplexing
- Each connection specified by
(my IP,my port,peer IP,peer port)
- Each connection specified by
Segment Structure
Header
- Sequence number
- Each byte has its own sequence #
- Indicated the first byte in this segment
- Initial sequence number (ISN) is generated by one end during connection setup and informed to the peer during 3-way handshaking
- Acknowledgement number field
- Valid when
ACK
bit is set in header - Next expected byte to be received
- Cumulative ACK (same as GBN)
- Valid when
- Fixed 20-byte header + up to 40-bytes optional part
- Each side determines MSS and inform its peer during connection setup
- Limited by lower-link layer MTU (max transmission unit)
- Header length
- In unit of 4-bytes
- Flag field
- URG: urgent data
- ACK: segment carries acknowledgement
- PSH: urge to pass to upper layer immediately
- RST: reset connection
- SYN: used in connection setup
- FIN: used in connection teardow
- Receive window
- For flow control
- Size of remaining buffer at the receiver's side in bytes
- Becomes the size of send window
- Internet checksum
- Same as UDP
- Options
- MSS: largest segment a node wants to receive, announced during connection establishment
Connection Management
Connection Establishment - Three-Way Handshake
- If no process listening on the server port, a reset segment (
RST
bit set) will be sent - Receiver of
SYN
must sent back anACK
, since aseq #
is consumed
Connection Termination - Four-Way Handshake
- Treat bidirectional channel as 2 unidirectional channels; release connection independently
- Timed wait state
- To receive retransmitted
FIN
segment ifACK
lost
- To receive retransmitted
- Each
FIN
consumes 1seq #
Reliable Data Transfer
- TCP window
- Not traditional sliding window
- Used by remote peer to limit the number of unACKed bytes can be sent
- Out-of-order segments handled by your own implementation
- Cumulative acknowledgement
- Piggybacked acknowledgement
- Data + acknowledgement sent together to save bandwidth
- Delayed ACK
Retransmission Timeout Mechanism
- TCP may or may not immediately transfer data upon receiving data from application, triggered when:
- Amount of data exceeds MSS
- URGENT data
- TCP thinks it's time to send
- Single transmission timer (same as GBN)
- Cannot depend on RTT - too unstable!
Adaptive Timeout Mechanism
SampleRTT = segment transmission ~ ACK receipt
EstimatedRTT_i+1 = (1-α) EstimatedRTT_i + α SampleRTT_i+1
(typical smoothing factor α = 0.125)Difference = SampleRTT_i+1 - EstimatedRTT_i
DevRTT_i+1 = (1-β) DevRTT_i + β |Difference|
(recommended β = 0.25)TimeoutInterval = EstimatedRTT + 4 x DevRTT
(safety margin to adjust to variations)
Retransmission Ambiguity
When ACK
is sent back, is it referring to the original transmission, or the retransmission?
- Don't update timeout for retransmitted segments
- TCP doesn't derive the timeout interval on retransmitted segment by
EstimatedRTT
andDevRTT
- Instead, double the timeout interval on each timeout (exponential backoff)
- Data loss mostly due to congestion delay
- For the next transmission, reuse the timeout value until the ACK for this segment is received
Flow Control
Regulates the rate at which the sender can send data while the receiver can consume without overflow.
RecvWin = RecvBuffer – (LastByteRcvd – LastByteRead)
- Deadlock may occur if
ACK = 4096 WIN = 2048
is lost!- Periodically send 1 byte to test the receiver when
RecvWin = 0
- Periodically send 1 byte to test the receiver when
Congestion Control
To avoid sending data overrunning the network resources.
- Congestion effects
- Longer delays
- Packet loss
- Principles of congestion control
- No explicit feedback from network
- Perceive from observed losses (timeout & duplicate ACKs)
Congestion Window
Reflects the estimated network capacity by the sender.
- Max. unACKed data at a time:
min{RcvWin,CongWin}
(LastByteSent – LastByteAcked) <= min{RcvWin,CongWin}
- Allowed to send no faster than the slowest component
Bandwidth Probing
TCP Congestion Control Algorithm
- Slow start
- Initially, set
CongWin = 1 MSS
,ssthresh = 64 KiB
- Double
CongWin
everyRTT
(in practice, increaseCongWin
by 1MSS
for everyACK
received) - When timeout
- Set
ssthresh = CongWin / 2
- Set
CongWin = 1 MSS
- Increase exponentially until
ssthresh
- Increase linearly from then on -> congestion avoidance
- Set
- Initially, set
- Congestion avoidance - additive increase, multiplicative decrease
- Increment
CongWin
by1 MSS
perRTT
(in practice, increase a little for eachACK
received)CongWin += MSS * MSS / CongWin
- When triple duplicate ACKs: likely to be data loss
- Drop transmission rate by half -> fast recovery
- When timeout
- Set
ssthresh = CongWin / 2
- Set
CongWin = 1 MSS
-> slow start
- Set
- Increment
- Fast recovery (+ fast retransmit)
- Retransmit what appears to be missing without waiting for timeout
- Drop transmission rate by half
- Set
CongWin = ssthresh + 3 * MSS
(3 duplicate packets) - When another duplicate
ACK
received- Increment
CongWin
by1 MSS
- Transmit new segment if allowed by the new
CongWin
- Increment
- When non-duplicate
ACK
received- Set
CongWin = ssthresh
-> congestion avoidance
- Set