Skip to content

IPv6: Getting rid of the dreaded "Neighbour table overflow"

Internet

IPv6 is hard. It has many, many design flaws and the decade where we all ignored it and hoped for the better hasn't helped. So we're now all in on the protocol. Yeah.

One of the design principles is that it tries to be rather stateless in the configuration and "plug and play". But just like P&P in the good old ISA times, it just doesn't always work.

One of the common issues is that Linux bridges in IPv6 just don't work well with the router announcements that try to discover and configure the IPv6 neighbourhood.

The result is a sheer endless amount of "kernel: Neighbour table overflow." lines flooding dmesg and syslog (or journal for those on SystemD).

Oct  4 16:26:06 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: __ratelimit: 1832 callbacks suppressed
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:11 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: __ratelimit: 887 callbacks suppressed
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:16 host-260 kernel: Neighbour table overflow.
Oct  4 16:26:23 host-260 kernel: __ratelimit: 803 callbacks suppressed

Grep -c(ount) on syslog

Lovely. Welcome to a storage DOS waiting to happen.

So first tip: cat /proc/sys/kernel/printk_ratelimit shows you the amount of seconds the ratelimiter suppresses messages. The default is 5 seconds and you can adjust it to more reasonable values in case you get heavily flooded like in the example above. Notice that this will mean your dmesg becomes rather useless as the kernel is not very selective about which messages to suppress.

Now when you google "Neighbour table overflow", you'll find thousands of pages suggesting to increase the arp / lladdr caches and garbage collection (gc) times like so:

# Set ARP cache garbage collection interval
net.ipv4.neigh.default.gc_interval = 3600
net.ipv6.neigh.default.gc_interval = 3600

# Set ARP cache entry timeout
net.ipv4.neigh.default.gc_stale_time = 3600
net.ipv6.neigh.default.gc_stale_time = 3600

# Setup cache threshold for ARP
net.ipv4.neigh.default.gc_thresh1 = 1024
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096

# And the same for IPv6
net.ipv6.neigh.default.gc_thresh1 = 1024
net.ipv6.neigh.default.gc_thresh2 = 2048
net.ipv6.neigh.default.gc_thresh3 = 4096

That helps if and only if you really have 500+ IPv6 neighbours. Unless you have a badly segmented network or run in a university lab, you don't.

Now ... you may be seeing messages like "kernel: vmbr0: Multicast hash table maximum of 512 reached, disabling snooping: eth0" or "kernel: vmbr0: Multicast hash table chain limit reached: eth0" in your dmesg / syslog / journal.

That hints at what is really happening here: The bridge confused the link-local router negotiation and so you get endless ff02:: neighbour routing entries added to your caches until they flow over. So increasing the caches as in the sysctl entries above is basically pasting band-aid over the problem.

ip route show cache table all will show you the tables. With all entries. See if you have too many ff02:: neighbours in there. If so, you should try to add change your /etc/network/interfaces on Debian / Ubuntu similar to this:

iface vmbr0 inet6 static
   address 2a02:0100:1:1::500:1
   netmask 64
   gateway 2a02:0100:1:1::1
   post-up echo 2048 > /sys/class/net/vmbr0/bridge/hash_max
   post-up echo 1 > /sys/class/net/vmbr0/bridge/multicast_snooping
   post-up echo 0 > /proc/sys/net/ipv6/conf/vmbr0/accept_ra

This obviously assumes your bridge is called vmbr0.

Red Hat/CentOS users will need to adjust the config spread throughout multiple files in /etc/sysconfig/network-scripts. The ifup-ipv6 script is a good one to look at and amend.

The increase of the hash_max entry makes your bridge survive the initial storm of (useless) router solicitations.
multicast_snooping is usually off when routing but you may need it to make sure your VMs on the bridge can be reached.
Finally we make sure the bridge does not accept router announcements. Because that is what the host system should handle.

Sometimes you may need to throw in a static route or two to reach the VMs. P&P, you remember ... ip -6 neigh add nud permanent proxy <VM:IPv6:goes::here> dev vmbr0 is your friend. Unfortunately the antidote for the dreaded "Neighbour table overflow" depends on the specific cause. So you'll have to poke around a bit. tcpdump -i eth0 -v ip6 will show you what is on the wire and tcpdump -i vmbr0 -v ip6 what's visible on the bridge.

iTunes starts but does not show the main application window

IT

Every once in a while iTunes on Windows decides to start but does not show the main application window when trying to sync an iPhone (or any other iDevice).
May be it pops up after half an hour, may be not.

This behavior is caused by iTunes waiting for its Bonjour zeroconf network service. Unfortunately though iTunes is updated very frequently that bug is persistent for years now.
The work-around is easy though:

Open up CMD as Administrator and type

net stop "bonjour service"

and iTunes should pop up its window a few moments later. It will complain about Bonjour not running but that is not needed for anything but network self-discovery. Which you usually don't need anyways.

If you do, you can start Bonjour again with

net start "bonjour service"

anytime also while iTunes is running. Once it decides to show its main window, it's fine to go for the session.

Screenshot of iTunes and the CMD window

Keeping IRC nicks active

IRC

Typical IRC services usually allow you to register with nickserv and link a number of nicks to a personal account. It's quite common to have nick, nick_ and nick__ as many IRC clients auto-append underscores if the primary nickname is already in use when connecting. Obviously you can set these alternate nicknames to almost anything you like in a decent client.

Some folks also group a "vanity" nickname or two for whatever reason. To keep these active, people do the "nick shuffle" (/nick newnick, /nick oldnick) all the time:

nick shuffle on freenode

People who forget the occasional nick shuffle may end up losing a grouped nick because it became inactive. While freenode staff try to contact people before dropping linked nicks, there are occasional prunes of "old data" from the services database. And then nobody can really ask upfront.

So before the next big purge comes up, I wrote a small bash script that logs into a nickserv account and cycles through the linked nicks. A few friends and me have used it successfully for many months now.

Grab a copy of keepnick (2.4kB) and drop it into /usr/local/bin.

Keepnick expects to have an accountname, the corresponding password and then a sequence of linked nicks given on its command line.

Something like

/usr/local/bin/keepnick accountname passw0rd linked_nick linked_nick_ vanity_nick MyOtherNick

should work.

For regular use, you need to set up a cron job to call keepnick e.g. every week. So put something like the following script into /etc/cron.weekly/keepnicks_irc or create a corresponding crontab entry for keepnicks_irc if you do not have the convenient cron.* directories set up:

#!/bin/bash
#
# run keepnick for user(s) irc account(s)
# intended to be run from cron, e.g. through /etc/cron.weekly
#

KEEPNICK="/usr/local/bin/keepnick"
# better safe than sorry
PATH="/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin"
export PATH

$KEEPNICK accountname1 passw0rd1 linked_nick1 linked_nick1_ linked_nick1__
$KEEPNICK accountname2 passw0rd2 linked_nick1 linked_nick2_ linked_nick2__
 

You should see keepnick in action now every week like this:

keepnick in action

What happens here is that the IRC services package tells you, keepnick has just authenticated to your account and will now shuffle through all nicks you asked it to. The big advantage is that is does this outside of channels, so not annoying any users. The cron job should make sure you don't forget the nick shuffle anymore.

Making sure your bash supports network connections

Stock bash will support network connections but on Debian and old (=pre-karmic) Ubuntu that capability was disabled at compile time.

If you need to check whether your bash is compiled with network support, type cat < /dev/tcp/time.nist.gov/13 into a bash terminal.

In case that gives you a RFC-867 time string, you're all fine. If not, re-compile your bash with --enable-net-redirections.

Now for something more advanced (but entirely optional):

Continue reading "Keeping IRC nicks active"

Binding applications to a specific IP

Linux

These days many systems are multi-homed in the sense that they have more than one IP address bound at the same time.
I.e. for different network cards, virtual IPs for shared servers or just using WiFi and a wired network connection at the same time on a laptop.

Murphy of course makes sure that your system will choose to worst IP (i.e. that on slow WiFi or the one reserved for admin access) when an application does not specifically supports binding to a selected IP address. And Mozilla Firefox for example doesn't.

The kernel chooses an outgoing IP from those in the routing table with the same metric:

daniel@server:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.0.2.1         0.0.0.0         U     0      0        0 eth0
0.0.0.0         192.0.2.2         0.0.0.0         U     0      0        0 eth1
0.0.0.0         192.0.2.3         0.0.0.0         U     0      0        0 eth2
0.0.0.0         192.0.2.4         0.0.0.0         U     0      0        0 eth3

You can obviously play around with the metric and make the kernel router prefer the desired interface above others. This will affect all applications though. Some people use the firewall to nat all packages to port 80 onto the network interface desired for web browsing. Gee, beware the http://somewebsite.tld:8080 links...

Thankfully Daniel Ryde has solved the problem via a LD_PRELOAD shim. With his code you can run

daniel@laptop:~$ BIND_ADDR="192.0.2.100" LD_PRELOAD=/usr/lib/bind.so firefox (*)

and happily surf away.

To compile his code (3.3kB, local copy, see note 1) you need to run

gcc -nostartfiles -fpic -shared bind.c -o bind.so -ldl -D_GNU_SOURCE
strip bind.so
cp -i bind.so /usr/lib/

and you're set to go.

If you don't have gcc available (and trust me) you can download pre-compiled 32bit and 64bit (glibc-2) bind.so libraries here (4.5kB).

I guess because Daniel Ryde hid his code so well on his webpage, Robert J. McKay wrote another LD_PRELOAD shim, called Bindhack (4.5kB, local mirror). This will - as is - only compile on 32bit machines. But YMMV.

Run the above command (*) with your desired (and locally bound) IP address in bash and visit MyIP.dk or DNStools.ch or any of the other services that show your external IP to see whether you've succeeded.

Notes:

  1. Daniel Ryde did not specify the -D_GNU_SOURCE in the comments section of bind.c. Modern glibc/gcc need that as he used RTLD_NEXT which is Unix98 and not POSIX. I amended the local copy of bind.c and sent him an email so he can update his.
  2. Both are IPv4 only, no IPv6 support.

Updates:

19.03.15 madmakz wrote in to clarify that all of the bind LD_PRELOAD shims only work with TCP connections. So not with UDP.
I'm not aware of a shim that manipulates UDP sockets.

14.01.14 Christian Pellegrin wrote a superb article on how to achieve per-application routing with the help of Linux network namespaces.

16.06.13 showip.be seems to be gone, so I replaced it with dnstools.ch in the text above. There are plenty of others as well.

22.06.12 Lennart Poettering has a IPv4 only version of a shim and a rather good readme available at his site.

29.11.10 Catalin M. Boie wrote another LD_PRELOAD shim, force_bind. I have not tested this one. It's capable of handling IPv6 binds.

11.01.09 Daniel Ryde has replied to my email and updated his local copy now as well.

Ubuntu Karmic 9.10 Bluetooth UMTS Dial-up (DUN)

Linux

Using a mobile phone's Bluetooth Dial-up network (DUN) to connect to the Internet (UMTS/GPRS) while on the road is quite convenient for me. Sadly so this is not supported out-of-the-box in Ubuntu Karmic 9.10 (Netbook Remix) as it uses Network-Manager to handle - well - network connections. And that is not quite there on Bluetooth managed devices yet.

While the default solution (rfcomm and Gnome-PPP) still works, it's ugly to set up. Sadly so, zillions of Ubuntu-Forum threads and blog entries still detail this solution - or the issues encountered with it along the way.

The much better solutions is using Blueman, an improved Gnome-Bluetooth primarily developed by Valmantas Palikša. It brings the right UDEV magic along to teach Network-Manager about the Bluetooth devices it handles.

Blueman Screenshot on Ubuntu Karmic 9.10 Netbook Edition

Just follow the steps on their downloads page to set up the Blueman PPA (Personal Package Archive) to get things working.

Windows Vista dial-up networking slow to establish connection

IT

If you find that Microsoft Windows Vista is slow to establish a dial-up network connection (DUN) ("register with the network"), that may be caused by it trying to also get an IPv6 on a IPv4 only ISP. Remove the IPv6 protocol from the Properties -> Network tab of the DUN then. Worked for me on dialing into an ISP via Bluetooth / mobile phone. Ymmv.

Disabling a group policy'd screensaver on Windows

IT

I guess many people know the issue of having a screen saver forced active after a some time through a group policy in a corporate environment. This is usually done to make sure systems are locked during breaks if people forget to press Win+L (or Ctrl+Alt+Del and then Enter). While that may well help IT security, it turns problematic when giving presentations for extended periods of time. Having to move the mouse through the presentation pointer every few minutes or dash back to the PC once the screen saver has kicked in, again, is simply annoying. On your company's systems you may be able to get the system admins to allow configuration of the interval or allow for disabling the screen saver, but on foreign systems you're often lost. But...

Continue reading "Disabling a group policy'd screensaver on Windows"

Apache fails to start at boot, but works when started manually

ApacheGentoo

Since a baselayout update Apache fails to start on Gentoo at (re-)boot of a server if that server has unused ethernet interfaces.

The symptom is that Apache fails to start on boot although it has been added to the runlevel with
rc-update add apache default

This is caused by recent baselayouts not working properly with more than one eth and not all of them being up.

Thus changing depend() { need net ... } into
depend() { need net.eth0 ... } at the top of /etc/init.d/apache2 will help.

While you're at it you could also add an nice after urandom to the existing depend () construct
and make sure apr and apache are emerged with flag urandom set. Reading from /dev/random to initialize the digest authentication mechanism (or SSL for that matter) might cause apache to block otherwise if there is not enough entropy in the random pool.