197 lines
5.6 KiB
Markdown
197 lines
5.6 KiB
Markdown
# Jumbo Frames
|
|
|
|
## Contents
|
|
|
|
- [Overview](#overview)
|
|
- [Configuration](#configuration)
|
|
- [RHEL/CentOS](#rhelcentos)
|
|
- [Basic Testing](#basic-testing)
|
|
- [Node A to B](#node-a-to-b)
|
|
- [Bode B to A](#node-b-to-a)
|
|
- [Performance Testing](#performance-testing)
|
|
- [Node A to B](#node-a-to-b-1)
|
|
- [Node B to A](#node-b-to-a-1)
|
|
- [Protocol Overhead](#protocol-overhead)
|
|
|
|
|
|
## Overview
|
|
|
|
[Jumbo frames](http://en.wikipedia.org/wiki/Jumbo_frames) are the concept of opening up the Ethernet frames to a large MTU to be able to push large packets without fragmenting; this can be a common need on a private GigE switched network for Oracle RAC Interconnect between nodes. The accepted standard is 9000 (large enough for 8k payload plus packet overhead). Once your GigE switches have been configured for the new MTU, configure and test your servers.
|
|
|
|
|
|
## Configuration
|
|
|
|
**Example Setup**:
|
|
|
|
- Node **A**: 192.168.100.101 (bond1, eth1 + eth5)
|
|
- Node **B**: 192.168.100.102 (bond1, eth1 + eth5)
|
|
|
|
|
|
### RHEL/CentOS
|
|
|
|
On both nodes:
|
|
|
|
```
|
|
# ip link set dev bond1 mtu 9000
|
|
# vi /etc/sysconfig/networking-scripts/ifcfg-bond1
|
|
add: MTU=9000
|
|
```
|
|
|
|
|
|
## Basic Testing
|
|
|
|
### Node A to B
|
|
|
|
```
|
|
# ifenslave -c bond1 eth5
|
|
# ip route get 192.168.100.102
|
|
# tracepath -n 192.168.100.102
|
|
# ping -c 5 -s 8972 -M do 192.168.100.102
|
|
|
|
# ifenslave -c bond1 eth1
|
|
# ip route get 192.168.100.102
|
|
# tracepath -n 192.168.100.102
|
|
# ping -c 5 -s 8972 -M do 192.168.100.102
|
|
```
|
|
|
|
### Node B to A
|
|
|
|
```
|
|
# ifenslave -c bond1 eth5
|
|
# ip route get 192.168.100.101
|
|
# tracepath -n 192.168.100.101
|
|
# ping -c 5 -s 8972 -M do 192.168.100.101
|
|
|
|
# ifenslave -c bond1 eth1
|
|
# ip route get 192.168.100.101
|
|
# tracepath -n 192.168.100.101
|
|
# ping -c 5 -s 8972 -M do 192.168.100.101
|
|
```
|
|
|
|
|
|
## Performance Testing
|
|
|
|
Using `iperf` (available via EPEL) for throughput measurements.
|
|
|
|
### Node A to B
|
|
|
|
```
|
|
Node B: (receiver)
|
|
# ifenslave -c bond1 eth5
|
|
# iperf -B 192.168.100.102 -s -u -l 8972 -w 768k
|
|
|
|
Node A: (sender)
|
|
# ifenslave -c bond1 eth5
|
|
# iperf -B 192.168.100.101 -c 192.168.100.102 -u \
|
|
-b 10G -l 8972 -w 768k -i 2 -t 30
|
|
|
|
|
|
Node B: (receiver)
|
|
# ifenslave -c bond1 eth1
|
|
# iperf -B 192.168.100.102 -s -u -l 8972 -w 768k
|
|
|
|
Node A: (sender)
|
|
# ifenslave -c bond1 eth1
|
|
# iperf -B 192.168.100.101 -c 192.168.100.102 -u \
|
|
-b 10G -l 8972 -w 768k -i 2 -t 30
|
|
```
|
|
|
|
### Node B to A
|
|
|
|
```
|
|
Node A: (receiver)
|
|
# ifenslave -c bond1 eth5
|
|
# iperf -B 192.168.100.101 -s -u -l 8972 -w 768k
|
|
|
|
Node B: (sender)
|
|
# ifenslave -c bond1 eth5
|
|
# iperf -B 192.168.100.102 -c 192.168.100.101 -u \
|
|
-b 10G -l 8972 -w 768k -i 2 -t 30
|
|
|
|
|
|
Node A: (receiver)
|
|
# ifenslave -c bond1 eth1
|
|
# iperf -B 192.168.100.101 -s -u -l 8972 -w 768k
|
|
|
|
Node B: (sender)
|
|
# ifenslave -c bond1 eth1
|
|
# iperf -B 192.168.100.102 -c 192.168.100.101 -u \
|
|
-b 10G -l 8972 -w 768k -i 2 -t 30
|
|
```
|
|
|
|
|
|
## Protocol Overhead
|
|
|
|
Reference: [Theoretical Maximums and Protocol Overhead](http://sd.wareonearth.com/~phil/net/overhead/):
|
|
|
|
```
|
|
Theoretical maximum TCP throughput on GigE using jumbo frames:
|
|
|
|
(9000-20-20-12)/(9000+14+4+7+1+12)*1000000000/1000000 = 990.042 Mbps
|
|
| | | | | | | | | | | |
|
|
MTU | | | MTU | | | | | GigE Mbps
|
|
| | | | | | | |
|
|
IP | | Ethernet | | | | InterFrame Gap (IFG), aka
|
|
Header | | Header | | | | InterPacket Gap (IPG), is
|
|
| | | | | | a minimum of 96 bit times
|
|
TCP | FCS | | | from the last bit of the
|
|
Header | | | | FCS to the first bit of
|
|
| Preamble | | the preamble
|
|
TCP | |
|
|
Options Start |
|
|
(Timestamp) Frame |
|
|
Delimiter |
|
|
(SFD) |
|
|
|
|
|
Inter
|
|
Frame
|
|
Gap
|
|
(IFG)
|
|
|
|
Theoretical maximum UDP throughput on GigE using jumbo frames:
|
|
(9000-20-8)/(9000+14+4+7+1+12)*1000000000/1000000 = 992.697 Mbps
|
|
|
|
Theoretical maximum TCP throughput on GigE without using jumbo frames:
|
|
(1500-20-20-12)/(1500+14+4+7+1+12)*1000000000/1000000 = 941.482 Mbps
|
|
|
|
Theoretical maximum UDP throughput on GigE without using jumbo frames:
|
|
(1500-20-8)/(1500+14+4+7+1+12)*1000000000/1000000 = 957.087 Mbps
|
|
|
|
Ethernet frame format:
|
|
* 6 byte dest addr
|
|
* 6 byte src addr
|
|
* [4 byte optional 802.1q VLAN Tag]
|
|
* 2 byte length/type
|
|
* 46-1500 byte data (payload)
|
|
* 4 byte CRC
|
|
|
|
Ethernet overhead bytes:
|
|
12 gap + 8 preamble + 14 header + 4 trailer = 38 bytes/packet w/o 802.1q
|
|
12 gap + 8 preamble + 18 header + 4 trailer = 42 bytes/packet with 802.1q
|
|
|
|
Ethernet Payload data rates are thus:
|
|
1500/(38+1500) = 97.5293 % w/o 802.1q tags
|
|
1500/(42+1500) = 97.2763 % with 802.1q tags
|
|
|
|
TCP over Ethernet:
|
|
Assuming no header compression (e.g. not PPP)
|
|
Add 20 IPv4 header or 40 IPv6 header (no options)
|
|
Add 20 TCP header
|
|
Add 12 bytes optional TCP timestamps
|
|
Max TCP Payload data rates over ethernet are thus:
|
|
(1500-40)/(38+1500) = 94.9285 % IPv4, minimal headers
|
|
(1500-52)/(38+1500) = 94.1482 % IPv4, TCP timestamps
|
|
(1500-52)/(42+1500) = 93.9040 % 802.1q, IPv4, TCP timestamps
|
|
(1500-60)/(38+1500) = 93.6281 % IPv6, minimal headers
|
|
(1500-72)/(38+1500) = 92.8479 % IPv6, TCP timestamps
|
|
(1500-72)/(42+1500) = 92.6070 % 802.1q, IPv6, ICP timestamps
|
|
|
|
UDP over Ethernet:
|
|
Add 20 IPv4 header or 40 IPv6 header (no options)
|
|
Add 8 UDP header
|
|
Max UDP Payload data rates over ethernet are thus:
|
|
(1500-28)/(38+1500) = 95.7087 % IPv4
|
|
(1500-28)/(42+1500) = 95.4604 % 802.1q, IPv4
|
|
(1500-48)/(38+1500) = 94.4083 % IPv6
|
|
(1500-48)/(42+1500) = 94.1634 % 802.1q, IPv6
|
|
```
|