[Devel] [RFC rh7 v2] ve/tty: vt -- Implement per VE support for console and terminals

Vladimir Davydov vdavydov at parallels.com
Tue Aug 4 09:12:27 PDT 2015


On Mon, Aug 03, 2015 at 10:08:19PM +0300, Cyrill Gorcunov wrote:
> Previously in commit 8674c044330fad1458bd59b02f9037fb97e8b7af stubs for
> virtual terminals have been added, they support writes from kernel side
> which simply drops into the void.
> 
> In the patch the code has been moved from kernel/ve/console.c
> to drivers/tty/pty.c to reuse a couple of pty helpers.
> 
> Now we support up to MAX_NR_VTTY_CONSOLES virtual consoles inside container.
> For /dev/console we reserve first virtual terminal. Terminals represent
> the following mapping
> 
> 	 | master					slave
> 	 | /dev/cosole|/dev/tty0|/dev/vttym0		/dev/vttys0
> 	 | ...
> 	 | /dev/tty11|/dev/vttym11			/dev/vttys11
> 
> major:minor numbers for peers are allocated dynamically by the kenel,
> thus if we need to connect to container's slave peer from the node
> we have to
> 
>  - scan node's /proc/tty/drivers and find the numbers for
>    "vtty_slave" driver, for example
> 
> 	 | [root at pcs7 ~]# cat /proc/tty/drivers
> 	 | ...
> 	 | vtty_slave            /dev/vttys     252 0-11 console
> 	 | vtty_master           /dev/vttym     253 0-11 console
> 	 | ...
> 
>  - add major:minor pairs into allowed devices, for example
> 
> 	 | echo 'c 253:* rwm' > /sys/fs/cgroup/devices/$ctid/devices.allow
> 
> Once the bullets above is done the userspace utility may
> open up slave peer and read/write data to/from. Note the
> slave may be opened if only if a master peer been previously
> opened.
> 
> Some details on the driver itself:
> 
>  - The drivers carries per-VE tty instances in @vtty_idr map, once
>    VE tries to open a terminal we allocate tty map internally and
>    keep it intact until VE destructed, this allow us to not bind
>    into device namespaces (ie not rely on tty_class);
> 
>  - Unlike buffered IO to unix98 driver once internal port buffer
>    get full we don't block write operations if there is no reader
>    assigned yet but zap them. This is done intentionally to behave
>    closely to native consoles;
> 
>  - The kernel choose which VE request terminal using get_exec_env
>    helper thus to connect to a slave peer from a node one have to
>    "enter" VE cgroup first.

Why do you add this functionality if it is not supposed to be used? IMO,
if you only allowed opening vtty-slave using ioctl, the implementation
would be much simpler.

BTW, a side question. In PCS6 /dev/console = vtty slave, while in your
code it is master. Why did you decide to reverse things? Why not just
port what we have in PCS6?

> 
>    Same time here is vtty_open_slave() helper provided which should
>    allow to connect to a slave peer from a node directly using ioctl
>    call (this is to implement yet).
> 
> https://jira.sw.ru/browse/PSBM-34533
> https://jira.sw.ru/browse/PSBM-34532
> https://jira.sw.ru/browse/PSBM-34107
> https://jira.sw.ru/browse/PSBM-32686
> https://jira.sw.ru/browse/PSBM-32685
> 
> N.B.: I've done some manual testing with python help
> 
>  1) Run container (ctid=200)
>  2) Run the script on the node
> 
> 	| #!/bin/sh
> 	| master=`cat /proc/tty/drivers | grep vtty_master | awk '{ print $3; }'`
> 	| slave=`cat /proc/tty/drivers | grep vtty_slave | awk '{ print $3; }'i`
> 	| echo "c $master:* rwm" > /sys/fs/cgroup/devices/$1/devices.allow
> 	| echo "c $slave:* rwm" > /sys/fs/cgroup/devices/$1/devices.allow
> 	| vzctl exec $1 mknod /dev/vttym5 c $master 5
> 	| vzctl exec $1 mknod /dev/vttys5 c $slave 5
>  3) In container run two instances of python where each does
>     a)
>         | >>> f=open("/dev/vttym5", "r+", 0)
>         | >>> f.write("aaa")
>     b)
>         | >>> f=open("/dev/vttys5", "r+", 0)
>         | >>> f.read(3)
>         | >>> 'aaa'
> 
> v2:
>  - Rename terminals from vtz to vtty
>  - Merge code into /drivers/tty/pty.c to reuse some of
>    pty functionality
>  - Get rid of two array of indices, use one for master
>    peers and fetch slaves via @link
>  - Drop TTY_VT_OPEN and wait() on it
>  - Add vtty_open_slave helper
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov at virtuozzo.com>
> CC: Vladimir Davydov <vdavydov at virtuozzo.com>
> CC: Konstantin Khorenko <khorenko at virtuozzo.com>
> CC: Pavel Emelyanov <xemul at virtuozzo.com>
> CC: Andrey Vagin <avagin at virtuozzo.com>
> CC: Igor Sukhih <igor at parallels.com>
> ---
>  drivers/tty/pty.c    |  543 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/tty/tty_io.c |   29 +-
>  include/linux/tty.h  |    3 
>  include/linux/ve.h   |   28 +-
>  kernel/ve/Makefile   |    2 
>  kernel/ve/console.c  |  382 -----------------------------------
>  6 files changed, 572 insertions(+), 415 deletions(-)
> 
> FWIW I didn't test yet the opening slave from the node  because
> it requires vzctl patching together with the kernel. Will do
> that on top.
> 
> Index: linux-pcs7.git/drivers/tty/pty.c
> ===================================================================
> --- linux-pcs7.git.orig/drivers/tty/pty.c
> +++ linux-pcs7.git/drivers/tty/pty.c
> @@ -26,12 +26,21 @@
>  
>  #include <bc/misc.h>
>  
> +#include <linux/ve.h>
> +#include <linux/file.h>
> +#include <linux/anon_inodes.h>
> +
>  #ifdef CONFIG_UNIX98_PTYS
>  static struct tty_driver *ptm_driver;
>  static struct tty_driver *pts_driver;
>  static DEFINE_MUTEX(devpts_mutex);
>  #endif
>  
> +#if defined(CONFIG_VE) && defined(CONFIG_UNIX98_PTYS)
> +static int __init vtty_init(void);
> +static struct file_operations vtty_fops;
> +#endif
> +
>  static void pty_close(struct tty_struct *tty, struct file *filp)
>  {
>  	BUG_ON(!tty);
> @@ -824,6 +833,540 @@ static int __init pty_init(void)
>  {
>  	legacy_pty_init();
>  	unix98_pty_init();
> +#ifdef CONFIG_VE
> +	vtty_init();
> +#endif
>  	return 0;
>  }
>  module_init(pty_init);
> +
> +#if defined(CONFIG_VE) && defined(CONFIG_UNIX98_PTYS)
> +
> +/*
> + * VTTY architecture overview.
> + *
> + * With VTTY we make /dev/console and /dev/tty[X] virtualized
> + * per container (note the real names may vary because the
> + * kernel itself uses major:minor numbers to distinguish
> + * devices and doesn't care how they are named inside /dev.
> + * /dev/console stands for TTYAUX_MAJOR:1 while /dev/tty[X]
> + * stands for TTY_MAJOR:[0:12]. That said from inside of
> + * VTTY /dev/console is the same as /dev/tty0.
> + *
> + * For every container here is a tty map represented by
> + * vtty_map_t. It carries id of VE and associated masters
> + * tty peers. Upon any master peer open we create
> + * associated slave peer and mark master inside tty map
> + * which allows to reopen it on user request, unlike traditional
> + * Unix98 terminals where only one master is allowed. In turn
> + * when user asks to open a slave peer we lookup for master
> + * first and then use @link member to fetch a slave.
> + *
> + * The major number for vtty is allocated dynamically upon
> + * kernel start and should be found via /proc/tty/drivers output
> + * using vtty[m|s]_driver->driver_name. While the primary purpose
> + * to containerize console and terminals the kernel creates real
> + * vttym[0-11] and vttys[0-11] devices for master and slave peers
> + * appropriately. Thus to read console output one have to open
> + * vttys0 device from inside of container's context.
> + *
> + * Same time to be able to open slave peer from a node we
> + * provide vtty_open_slave helper.
> + */
> +static struct tty_driver *vttym_driver;
> +static struct tty_driver *vttys_driver;
> +static DEFINE_IDR(vtty_idr);
> +
> +#undef VTTY_DEBUG
> +#ifdef VTTY_DEBUG
> +#define vtty_printk(fmt, ...)								\
> +	printk("%20s: %4d: " fmt,							\
> +	       __func__, __LINE__, ##__VA_ARGS__)
> +#define vtty_printk_one(__tty)								\
> +	vtty_printk("tty %p count %4d flags 0x%-8lx\n",					\
> +		    __tty, __tty->count, __tty->flags)
> +#define vtty_printk_pair(__tty)								\
> +	vtty_printk("tty %p count %4d flags 0x%-8lx link %p count %4d flags 0x%-8lx\n",	\
> +		    __tty, __tty->count, __tty->flags,					\
> +		    __tty->link, __tty->link->count, __tty->link->flags)
> +#else
> +#define vtty_printk(fmt, ...)
> +#define vtty_printk_one(__tty)
> +#define vtty_printk_pair(__tty)
> +#endif
> +
> +#define driver_is_master(d)	((d) == vttym_driver)

Call it vtty_driver_is_master to avoid conflicts.

> +#define tty_is_master(t)	driver_is_master((t)->driver)

vtty_is_master

> +
> +typedef struct {
> +	envid_t			veid;
> +	struct tty_struct	*vttys[MAX_NR_VTTY_CONSOLES];
> +	spinlock_t		lock;
> +} vtty_map_t;
> +
> +static vtty_map_t *vtty_map_lookup(envid_t veid)
> +{
> +	return idr_find(&vtty_idr, veid);
> +}
> +
> +static void vtty_map_remove(vtty_map_t *map)
> +{
> +	vtty_printk("map %p id %d\n", map, map ? map->veid : -1);
> +	if (map) {
> +		idr_remove(&vtty_idr, map->veid);
> +		kfree(map);
> +	}
> +}
> +
> +/*
> + * While vtty_install and vtty_shutdown are ordered by tty_mutex,
> + * vtty_cleanup can be called in any order so use spinlock to
> + * not throttle in a midle of zapping.
> + */

Can't you call vtty_map_zap from ->shutdown?

> +static void vtty_map_set(vtty_map_t *map, struct tty_struct *tty, int index)
> +{
> +	vtty_printk("map %p id %d tty %p index %d\n",
> +		    map, map->veid, tty, index);
> +	spin_lock(&map->lock);
> +	map->vttys[index] = tty;
> +	spin_unlock(&map->lock);
> +}
> +
> +static void vtty_map_zap(vtty_map_t *map, struct tty_struct *tty, int index)
> +{
> +	vtty_printk("map %p id %d tty %p index %d\n", map, map->veid, tty, index);
> +	spin_lock(&map->lock);
> +	if (map->vttys[index] == tty)

Why is this check?

> +		map->vttys[index] = NULL;
> +	spin_unlock(&map->lock);
> +}
> +
> +static vtty_map_t *vtty_map_alloc(envid_t veid)
> +{
> +	vtty_map_t *map = kzalloc(sizeof(*map), GFP_KERNEL);
> +
> +	if (map) {
> +		map->veid = veid;
> +		veid = idr_alloc(&vtty_idr, map, veid, veid + 1, GFP_KERNEL);
> +		if (veid < 0) {
> +			kfree(map);
> +			return ERR_PTR(veid);
> +		}
> +		spin_lock_init(&map->lock);
> +	} else
> +		map = ERR_PTR(-ENOMEM);
> +
> +	vtty_printk("map %p id %d\n", map, veid);
> +	return map;
> +}
> +
> +static inline int tty_is_exiting(struct tty_struct *tty)

vtty_is_exiting

BTW, why is testing TTY_CLOSING not enough?

> +{
> +	return test_bit(TTY_CLOSING, &tty->flags)		||
> +		test_bit(TTY_HUPPING, &tty->flags)		||
> +		test_bit(TTY_LDISC_CHANGING, &tty->flags)	||
> +		tty->count < 1;
> +}
> +
> +static struct tty_struct *vtty_lookup(struct tty_driver *driver,
> +				      struct inode *inode, int idx)
> +{
> +	vtty_map_t *map = vtty_map_lookup(get_exec_env()->veid);
> +	struct tty_struct *tty;
> +
> +	if (idx < 0 || idx >= MAX_NR_VTTY_CONSOLES)
> +		return ERR_PTR(-EIO);
> +
> +	vtty_printk("driver %s index %d\n", driver->driver_name, idx);
> +	/*
> +	 * Nothing ever been opened yet, allocate a new
> +	 * tty map together with both peers from the scratch
> +	 * in install procedure.
> +	 */
> +	if (!map)
> +		return NULL;
> +
> +	tty = map->vttys[idx];
> +	if (tty) {
> +		/*
> +		 * Fetching existing tty is a bit tricky:
> +		 * the pair may exist but one of the peer
> +		 * may start exiting so the caller should
> +		 * wait until exit complete and re-ask for
> +		 * a new peer pair. For this we check the
> +		 * master peer to be alive first and then
> +		 * the slave peer state if been requested.
> +		 */
> +		if (tty_is_exiting(tty))
> +			tty = ERR_PTR(-ENXIO);

tty = NULL ?

> +		else if (!driver_is_master(driver)) {
> +			tty = tty->link;
> +			if (tty_is_exiting(tty))
> +				tty = ERR_PTR(-ENXIO);

ditto

> +		}
> +	}
> +	vtty_printk("tty %p count %4d\n",
> +		    tty, IS_ERR_OR_NULL(tty) ? -1 : tty->count);
> +	return tty;
> +}
> +
> +static int vtty_standard_install(struct tty_driver *driver, struct tty_struct *tty)
> +{
> +	int ret = tty_init_termios(tty);
> +	if (ret)
> +		return ret;
> +
> +	tty_driver_kref_get(driver);
> +	tty->count++;

> +	if (driver_is_master(driver))
> +		vtty_map_set(tty->driver_data, tty, tty->index);

It's not that standard then. Move this to vtty_install?

> +	return 0;
> +}
> +
> +static struct tty_struct *vtty_install_slave(vtty_map_t *map, int index)
> +{
> +	struct tty_struct *tty;
> +
> +	tty = alloc_tty_struct();
> +	if (!tty)
> +		return ERR_PTR(-ENOMEM);
> +	initialize_tty_struct(tty, vttys_driver, index);
> +

> +	tty->port = kzalloc(sizeof(*tty->port), GFP_KERNEL);
> +	if (!tty->port) {
> +		deinitialize_tty_struct(tty);
> +		free_tty_struct(tty);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	tty->driver_data = map;

You do exactly the same thing is in vtty_install. Move this to
vtty_standard_install?

> +	WARN_ON(vtty_standard_install(vttys_driver, tty));

Make vtty_standard_install return void and wrap this WARN_ON around
tty_init_termios please.

> +	tty_port_init(tty->port);
> +	tty->port->itty = tty;

Again, this could live in vtty_standard_install.

> +
> +	set_bit(TTY_EXTRA_REF, &tty->flags);
> +	return tty;
> +}
> +
> +static int vtty_install(struct tty_driver *driver, struct tty_struct *tty)
> +{
> +	envid_t veid = get_exec_env()->veid;
> +	struct tty_struct *peer;
> +	vtty_map_t *map;
> +
> +	map = vtty_map_lookup(veid);
> +	if (!map) {
> +		map = vtty_map_alloc(veid);
> +		if (IS_ERR(map))
> +			return PTR_ERR(map);
> +	}
> +	tty->driver_data = map;
> +
> +	tty->port = kzalloc(sizeof(*tty->port), GFP_KERNEL);
> +	if (!tty->port)
> +		return -ENOMEM;
> +
> +	peer = vtty_install_slave(map, tty->index);
> +	if (IS_ERR(peer)) {
> +		vtty_map_zap(map, tty, tty->index);
> +		kfree(tty->port);
> +		return PTR_ERR(peer);
> +	}
> +	WARN_ON(vtty_standard_install(driver, tty));
> +	tty_port_init(tty->port);
> +	tty->port->itty = tty;
> +
> +	tty->link = peer;
> +	peer->link = tty;
> +
> +	vtty_printk_pair(tty);
> +	return 0;
> +}
> +
> +static int vtty_open(struct tty_struct *tty, struct file *filp)
> +{
> +	if (test_bit(TTY_OTHER_CLOSED, &tty->flags))
> +		return -EIO;
> +
> +	clear_bit(TTY_IO_ERROR, &tty->flags);

> +	clear_bit(TTY_OTHER_CLOSED, &tty->flags);

You check that this bit is unset a couple of lines above.

> +	clear_bit(TTY_OTHER_CLOSED, &tty->link->flags);
> +	set_bit(TTY_THROTTLED, &tty->flags);
> +
> +	vtty_printk_pair(tty);
> +	return 0;
> +}
> +
> +static void vtty_close(struct tty_struct *tty, struct file *filp)
> +{
> +	struct tty_struct *peer = tty->link;
> +
> +	if (tty->count > 1)
> +		return;
> +
> +	if (test_bit(TTY_IO_ERROR, &tty->flags))
> +		return;
> +
> +	wake_up_interruptible(&tty->read_wait);
> +	wake_up_interruptible(&tty->write_wait);
> +
> +	wake_up_interruptible(&peer->read_wait);
> +	wake_up_interruptible(&peer->write_wait);
> +
> +	set_bit(TTY_IO_ERROR, &tty->flags);
> +	set_bit(TTY_OTHER_CLOSED, &peer->flags);
> +
> +	/*
> +	 * TTY_EXTRA_REF set for the peers which
> +	 * carries additional @count (ie peers
> +	 * which were created as a pair to a master
> +	 * one, thus when master is closed we drop
> +	 * the reference so slave will be closed
> +	 * as well).
> +	 */
> +	if (test_bit(TTY_EXTRA_REF, &peer->flags)) {
> +		clear_bit(TTY_EXTRA_REF, &peer->flags);
> +		peer->count--;
> +	}
> +
> +	if (tty_is_master(tty)) {
> +		tty_unlock(tty);
> +		tty_vhangup(peer);
> +		tty_lock(tty);
> +	}
> +
> +	vtty_printk_pair(tty);
> +}
> +
> +static void vtty_cleanup(struct tty_struct *tty)
> +{
> +	cancel_work_sync(&tty->port->buf.work);

Why? Isn't this supposed to be done in tty_port_put?

> +
> +	/*
> +	 * Make sure line discipline is off already.
> +	 * It is just for debug reasons.
> +	 */
> +	WARN_ON_ONCE(!test_bit(TTY_LDISC_HALTED, &tty->flags));
> +	WARN_ON_ONCE(test_bit(TTY_LDISC, &tty->flags));
> +
> +	if (tty_is_master(tty))
> +		vtty_map_zap(tty->driver_data, tty, tty->index);
> +
> +	tty_port_put(tty->port);
> +	vtty_printk_one(tty);
> +}
> +
> +static int vtty_write(struct tty_struct *tty, const unsigned char *buf, int count)
> +{
> +	struct tty_struct *peer = tty->link;
> +
> +	if (tty->stopped)
> +		return 0;
> +
> +	if (count > 0) {
> +		count = tty_insert_flip_string(peer->port, buf, count);
> +		if (count) {
> +			tty_flip_buffer_push(peer->port);
> +			tty_wakeup(tty);
> +		} else {
> +			int _count = test_bit(TTY_EXTRA_REF, &tty->flags) ? 2 : 1;
> +			/*
> +			 * Flush the slave reader if noone
> +			 * is actually hooked on. Otherwise
> +			 * wait until reader fetch all data.
> +			 */
> +			if (peer->count < _count)

How can peer->count be less than 1 here?

> +				tty_perform_flush(peer, TCIFLUSH);
> +		}
> +	}
> +
> +	return count;
> +}
> +
> +static const struct tty_operations vtty_ops = {
> +	.lookup		= vtty_lookup,
> +	.install	= vtty_install,
> +	.open		= vtty_open,
> +	.close		= vtty_close,
> +	.cleanup	= vtty_cleanup,
> +	.write		= vtty_write,

> +	.write_room	= pty_space,

That's wrong. There must always be enough room if slave is not open.

> +	.set_termios	= pty_set_termios,
> +	.unthrottle	= pty_unthrottle,
> +	.remove		= pty_unix98_remove,
> +};
> +
> +struct tty_driver *vtty_console_driver(int *index)
> +{
> +	*index = 0;
> +	return vttym_driver;
> +}
> +
> +int vtty_match(dev_t device)
> +{
> +	return MAJOR(device) == TTY_MAJOR && MINOR(device) < MAX_NR_VTTY_CONSOLES;
> +}
> +
> +struct tty_driver *vtty_driver(dev_t dev, int *index)
> +{
> +	BUG_ON(MINOR(dev) >= MAX_NR_VTTY_CONSOLES);
> +
> +	*index = MINOR(dev);
> +	return vttym_driver;
> +}
> +
> +static void ve_vtty_fini(void *data)
> +{
> +	struct ve_struct *ve = data;
> +	vtty_map_remove(vtty_map_lookup(ve->veid));

Making vtty_map_remove take veid would make the code more accurate IMO.

> +}
> +
> +static struct ve_hook vtty_hook = {
> +	.fini           = ve_vtty_fini,
> +	.priority       = HOOK_PRIO_DEFAULT,
> +	.owner          = THIS_MODULE,
> +};
> +
> +static int __init vtty_init(void)
> +{
> +#define TTY_DRIVER_ALLOC_FLAGS			\

Please zap this macro and pass the flags as is.

> +	(TTY_DRIVER_REAL_RAW		|	\
> +	 TTY_DRIVER_RESET_TERMIOS	|	\
> +	 TTY_DRIVER_DEVPTS_MEM)
> +
> +	vttym_driver = tty_alloc_driver(MAX_NR_VTTY_CONSOLES, TTY_DRIVER_ALLOC_FLAGS);
> +	if (IS_ERR(vttym_driver))
> +		panic(pr_fmt("Can't allocate master vtty driver\n"));
> +
> +	vttys_driver = tty_alloc_driver(MAX_NR_VTTY_CONSOLES, TTY_DRIVER_ALLOC_FLAGS);
> +	if (IS_ERR(vttys_driver))
> +		panic(pr_fmt("Can't allocate slave vtty driver\n"));
[...]



More information about the Devel mailing list