IPv6: Getting rid of the dreaded "Neighbour table overflow"
IPv6 is hard. It has many, many design flaws and the decade where we all ignored it and hoped for the better hasn't helped. So we're now all in on the protocol. Yeah.
One of the design principles is that it tries to be rather stateless in the configuration and "plug and play". But just like P&P in the good old ISA times, it just doesn't always work.
One of the common issues is that Linux bridges in IPv6 just don't work well with the router announcements that try to discover and configure the IPv6 neighbourhood.
The result is a sheer endless amount of "kernel: Neighbour table overflow." lines flooding dmesg and syslog (or journal for those on SystemD).
Oct 4 16:26:06 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: __ratelimit: 1832 callbacks suppressed
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: __ratelimit: 887 callbacks suppressed
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct 4 16:26:23 host-260 kernel: __ratelimit: 803 callbacks suppressed
Lovely. Welcome to a storage DOS waiting to happen.
So first tip:
cat /proc/sys/kernel/printk_ratelimit
shows you the amount of seconds the ratelimiter suppresses messages. The default is 5 seconds and you can adjust it to more reasonable values in case you get heavily flooded like in the example above. Notice that this will mean your dmesg becomes rather useless as the kernel is not very selective about which messages to suppress.
Now when you google "Neighbour table overflow", you'll find thousands of pages suggesting to increase the arp / lladdr caches and garbage collection (gc) times like so:
# Set ARP cache garbage collection interval
net.ipv4.neigh.default.gc_interval = 3600
net.ipv6.neigh.default.gc_interval = 3600
# Set ARP cache entry timeout
net.ipv4.neigh.default.gc_stale_time = 3600
net.ipv6.neigh.default.gc_stale_time = 3600
# Setup cache threshold for ARP
net.ipv4.neigh.default.gc_thresh1 = 1024
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096
# And the same for IPv6
net.ipv6.neigh.default.gc_thresh1 = 1024
net.ipv6.neigh.default.gc_thresh2 = 2048
net.ipv6.neigh.default.gc_thresh3 = 4096
That helps if and only if you really have 500+ IPv6 neighbours. Unless you have a badly segmented network or run in a university lab, you don't.
Now ... you may be seeing messages like "kernel: vmbr0: Multicast hash table maximum of 512 reached, disabling snooping: eth0" or "kernel: vmbr0: Multicast hash table chain limit reached: eth0" in your dmesg / syslog / journal.
That hints at what is really happening here: The bridge confused the link-local router negotiation and so you get endless ff02:: neighbour routing entries added to your caches until they flow over. So increasing the caches as in the sysctl entries above is basically pasting band-aid over the problem.
ip route show cache table all
will show you the tables. With all entries. See if you have too many ff02:: neighbours in there.
If so, you should try to add change your /etc/network/interfaces
on Debian / Ubuntu similar to this:
iface vmbr0 inet6 static
address 2a02:0100:1:1::500:1
netmask 64
gateway 2a02:0100:1:1::1
post-up echo 2048 > /sys/class/net/vmbr0/bridge/hash_max
post-up echo 1 > /sys/class/net/vmbr0/bridge/multicast_snooping
post-up echo 0 > /proc/sys/net/ipv6/conf/vmbr0/accept_ra
This obviously assumes your bridge is called vmbr0
.
Red Hat/CentOS users will need to adjust the config spread throughout multiple files in /etc/sysconfig/network-scripts
.
The ifup-ipv6
script is a good one to look at and amend.
The increase of the hash_max entry makes your bridge survive the initial storm of (useless) router solicitations.
multicast_snooping is usually off when routing but you may need it to make sure your VMs on the bridge can be reached.
Finally we make sure the bridge does not accept router announcements. Because that is what the host system should handle.
Sometimes you may need to throw in a static route or two to reach the VMs. P&P, you remember ... ip -6 neigh add nud permanent proxy <VM:IPv6:goes::here> dev vmbr0
is your friend. Unfortunately the antidote for the dreaded "Neighbour table overflow" depends on the specific cause. So you'll have to poke around a bit. tcpdump -i eth0 -v ip6
will show you what is on the wire and tcpdump -i vmbr0 -v ip6
what's visible on the bridge.
Comments
Display comments as Linear | Threaded