[Devel] Re: [RFC][PATCH] IP address restricting cgroup subsystem
Guenter Roeck
groeck at redback.com
Fri Jan 9 09:43:35 PST 2009
I have tried something similar, only with CLONE_FILES|CLONE_FS|CLONE_VM|CLONE_NEWNET,
and actually creating a virtual interface and controlling socket or thread in each new
network namespace. This scales to a couple of thousand interfaces, though interface creation
takes a long time if more than 1,000 interfaces or so are created.
Problems I have seen are
- name hash in kernel is bad. A test program with similar names (eg eth0 to eth1000)
shows that only every 17th bucket or so is used at all.
- current sysfs implementation doesn't scale to thousands of interfaces.
Sequential search through file names, especially using strcmp, doesn't work well
if there are thousands of entries in a directory.
- Using sockets to control network namespaces starts to fail after a couple hundred
namespaces and attached interfaces are created. There is no error message, only
the socket<->interface/namespace relationship isn't always created. Some interfaces
stay in the initial network namespace.
- the idea of attaching/associating network namespaces with sockets and/or threads
doesn't really work well unless used strictly for virtualization. For other
applications (eg per-customer network namespaces in switches) one can not really
afford to "loose" a network namespace just because a controlling process dies.
I can send you the code if you like.
Guenter
On Fri, Jan 09, 2009 at 08:54:13AM -0800, Dan Smith wrote:
> SH> Does anyone else (Eric? Pavel?) have experience with hundreds or
> SH> thousands of network namespaces?
>
> I just gave it a shot on linux-next-20090108 with the following test
> case:
>
> int flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER \
> | CLONE_NEWIPC|SIGCHLD|CLONE_NEWNET;
>
> int clone_child(void *data)
> {
> printf("Child %i\n", (int)data);
> sleep(30);
> exit(0);
> }
>
> int main(int argc, char **argv)
> {
> int i;
>
> for (i = 0; i < 100; i++) {
> char *stack;
> unsigned int stacksize = getpagesize() * 4;
>
> stack = malloc(stacksize);
> if (stack == NULL) {
> printf("Failed to allocate %i\n", stacksize);
> return 1;
> }
>
> printf("Clone %i\n", i);
> clone(clone_child, stack + stacksize, flags, (void*)i);
> }
>
> sleep(40);
> }
>
> The loop runs to completion, but only 18 children ever print their
> message. After the test completes, doing something else (like
> bringing up a man page) consistently results in this panic:
>
> BUG: unable to handle kernel paging request at 00c85788
> IP: [<c0252af8>] rb_insert_color+0x28/0x100
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:1/0:0:1:0/block/sr0/size
> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc nfs lockd nfs_acl auth_rpcgss sunrpc af_packet ipv6 binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod uinput virtio_balloon virtio_net evbug evdev pcspkr virtio_pci virtio_ring virtio i2c_piix4 i2c_core sr_mod cdrom sg thermal button processor ata_generic pata_acpi piix ide_core sd_mod crc_t10dif ext3 jbd mbcache
>
> Pid: 2865, comm: man Not tainted (2.6.28-next-20090108 #5)
> EIP: 0060:[<c0252af8>] EFLAGS: 00010202 CPU: 0
> EIP is at rb_insert_color+0x28/0x100
> EAX: c8578088 EBX: c8578088 ECX: c8578090 EDX: 00c85780
> ESI: c8578088 EDI: 00c85780 EBP: cd93be28 ESP: cd93be14
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process man (pid: 2865, ti=cd93a000 task=cb75bd90 task.ti=cd93a000)
> Stack:
> cb56cc00 c8578087 c857807f c8578088 c857809f cd93be40 d0afe76d cb56cc00
> c8ccb00c 0001fe43 cf785f00 cd93be74 d0b04f79 c8ccb00c 0000000c 00000000
> cf7e0e28 c845d180 c8ccbff8 00000001 00000000 cb56cc00 cf7e0e28 cf7e0e28
> Call Trace:
> [<d0afe76d>] ? ext3_htree_store_dirent+0xbd/0x110 [ext3]
> [<d0b04f79>] ? htree_dirblock_to_tree+0x109/0x180 [ext3]
> [<d0b07a11>] ? ext3_htree_fill_tree+0x61/0x210 [ext3]
> [<c01b77e3>] ? nameidata_to_filp+0x53/0x70
> [<d0afe684>] ? ext3_readdir+0x6d4/0x700 [ext3]
> [<d0afe532>] ? ext3_readdir+0x582/0x700 [ext3]
> [<c01bc8b4>] ? cp_new_stat64+0xe4/0x100
> [<c01c6690>] ? filldir+0x0/0xd0
> [<c01bcd52>] ? sys_fstat64+0x22/0x30
> [<c01c68c8>] ? vfs_readdir+0x88/0xa0
> [<c01c6690>] ? filldir+0x0/0xd0
> [<c01c69f8>] ? sys_getdents+0x68/0xb0
> [<c0103762>] ? syscall_call+0x7/0xb
> Code: 8d 76 00 55 89 e5 57 56 53 83 ec 08 89 45 f0 89 55 ec 90 8b 55 f0 8b 02 89 c3 83 e3 fc 74 3c 8b 13 f6 c2 01 75 35 89 d7 83 e7 fc <8b> 77 08 39 de 74 59 85 f6 74 35 8b 06 a8 01 75 2f 83 c8 01 89
> EIP: [<c0252af8>] rb_insert_color+0x28/0x100 SS:ESP 0068:cd93be14
> ---[ end trace 5af0fea6439f26a1 ]---
>
> --
> Dan Smith
> IBM Linux Technology Center
> email: danms at us.ibm.com
>
> _______________________________________________
> Containers mailing list
> Containers at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list