[Users] Intermittent ARP failures using veth
Gregor Mosheh
gregor at hostgis.com
Fri Sep 12 11:42:17 EDT 2008
Hiya, all. I have 3 HNs running a total of some 30 VEs. There are three
specific VEs which have a problem which seems ARP-related. Any ideas on
it would be much appreciated.
Symptom:
Every now and then, sometimes every minute and sometimes not for a whole
hour, the VE becomes inaccessible via IP for some 10-30 seconds. During
this time, other hosts in the network also can't ping the host. The
"arp" command shows (incomplete) for the host's entry.
Temporary fix:
Using arpsend on the HN "/usr/sbin/arpsend -U -i $ip -c1 bond0" fixes
this issue until the client expires its ARP entry. I have a cronjob to
run this every minute, but even that isn't enough.
Other IP and routing info:
* There are 5 IP blocks, /27 and /28 in size. IPs from all blocks are
arbitrarily distributed around the machines.
* We have 2 GigE switches. Each HN has dual GigE NICs, and uses Linux
bonding. The 2 NICs go to the 2 switches, for fault tolerance.
* At the border we have a router which we don't control. Traffic between
IP blocks, even if destined for the local network, is double-transited.
The problem only affects these three and none of the others, and its
affecting only these specific three has been retained even as VEs are
moved between HNs. As such, I have ruled out bad cables or switch ports,
overloading of the hardware, system load, and differences in the HN's
OS and sysctl params. All VEs are created using the same script; the
only diffs in their VZ config would be auto-generated MACs and veths.
I've been over these pages, and couldn't find information to help:
http://wiki.openvz.org/Multiple_network_interfaces_and_ARP_flux
http://wiki.openvz.org/Virtual_Ethernet_device.
Any thoughts on troubleshooting this? Any further information I should
provide?
--
Gregor Mosheh / Greg Allensworth, BS, A+
System Administrator
HostGIS cartographic development & hosting services
http://www.HostGIS.com/
"Remember that no one cares if you can back up,
only if you can restore." - AMANDA
More information about the Users
mailing list