[Users] Intermittent ARP failures using veth

Gregor Mosheh gregor at hostgis.com
Fri Sep 12 11:42:17 EDT 2008


Hiya, all. I have 3 HNs running a total of some 30 VEs. There are three 
specific VEs which have a problem which seems ARP-related. Any ideas on 
it would be much appreciated.

Symptom:
Every now and then, sometimes every minute and sometimes not for a whole 
hour, the VE becomes inaccessible via IP for some 10-30 seconds. During 
this time, other hosts in the network also can't ping the host. The 
"arp" command shows (incomplete) for the host's entry.

Temporary fix:
Using arpsend on the HN "/usr/sbin/arpsend -U -i $ip -c1 bond0" fixes 
this issue until the client expires its ARP entry. I have a cronjob to 
run this every minute, but even that isn't enough.

Other IP and routing info:
* There are 5 IP blocks, /27 and /28 in size. IPs from all blocks are 
arbitrarily distributed around the machines.
* We have 2 GigE switches. Each HN has dual GigE NICs, and uses Linux 
bonding. The 2 NICs go to the 2 switches, for fault tolerance.
* At the border we have a router which we don't control. Traffic between 
IP blocks, even if destined for the local network, is double-transited.

The problem only affects these three and none of the others, and its 
affecting only these specific three has been retained even as VEs are 
moved between HNs. As such, I have ruled out bad cables or switch ports, 
  overloading of the hardware, system load, and differences in the HN's 
OS and sysctl params. All VEs are created using the same script; the 
only diffs in their VZ config would be auto-generated MACs and veths.

I've been over these pages, and couldn't find information to help:
http://wiki.openvz.org/Multiple_network_interfaces_and_ARP_flux
http://wiki.openvz.org/Virtual_Ethernet_device.

Any thoughts on troubleshooting this? Any further information I should 
provide?

-- 
Gregor Mosheh / Greg Allensworth, BS, A+
System Administrator
HostGIS cartographic development & hosting services
http://www.HostGIS.com/

"Remember that no one cares if you can back up,
  only if you can restore." - AMANDA


More information about the Users mailing list