[Devel] [RFC rh7 v2] ve/tty: vt -- Implement per VE support for console and terminals

Tue Aug 4 13:20:20 PDT 2015

On 08/04/2015 07:12 PM, Vladimir Davydov wrote:
...
>>
>> Some details on the driver itself:
>>
>>  - The drivers carries per-VE tty instances in @vtty_idr map, once
>>    VE tries to open a terminal we allocate tty map internally and
>>    keep it intact until VE destructed, this allow us to not bind
>>    into device namespaces (ie not rely on tty_class);
>>
>>  - Unlike buffered IO to unix98 driver once internal port buffer
>>    get full we don't block write operations if there is no reader
>>    assigned yet but zap them. This is done intentionally to behave
>>    closely to native consoles;
>>
>>  - The kernel choose which VE request terminal using get_exec_env
>>    helper thus to connect to a slave peer from a node one have to
>>    "enter" VE cgroup first.
> 
> Why do you add this functionality if it is not supposed to be used? IMO,
> if you only allowed opening vtty-slave using ioctl, the implementation
> would be much simpler.

The ability of accessing slave peers from inside of VE using cgroups
mechanism has been an initial idea of all this code. It happened later
when we deside to provide ioctls instead. And no, implementation won't
be anyhow simplier because you have to find out _which_ VE you're installing
terminal from is currently active.

New terminal allocation is happening from inside of lookup/install helpers
which have no other data under the hands than driver+index. Since we are
going to rip off driver::ve, we can't longer rely on this.

Moreover, currently I'm testing vzctl console CTID index functionality
and as you might imagine this routine is called from inside of ve0 context
so I added additional helpers which allow to lookup into specified VEs
map, instead of get_exec_env.

> BTW, a side question. In PCS6 /dev/console = vtty slave, while in your
> code it is master. Why did you decide to reverse things? Why not just
> port what we have in PCS6?

Actually I don't understand why it's a slave peers assigned to /dev/console
and /dev/tty's. Maybe it was done for some historical reason? I don't know
the background. The vtty code is close to pty code in pcs6 and I presume
some ideas have been taken: in particular several "readers/writers" might
be opening console and ttys, but in terms of pty code only one "master"
is allowed (iow to make a pair one have to open ptmx device, next open
of the same device cause new pair to generate). So I assume to be somehow
close to the ideas of unix98 terminals we provided "slave" peers to emulate
console and ttys, because slave peers may be opened several times.

But I decided to make it straight -- there is many to many mapping:
if someone has opened a master peer (/dev/console|/dev/tty),
then it's allowed to open it again while it's alive. And because
the primary users of these devices are the container itself
(agetty service running inside container) they are the master peers.

As to why not simply port pcs6 code -- the code is close to one
we use in pcs6, except

 - console and ttys are the masters as they should be I think
 - we don't patch tty_struct itself to carry pointers to owner_env,
   all VEs are strictly isolated (in turn in pcs6 we carry a global
   list of master peers which makes me wonder: every master is
   unique in its name, why on earth we keep them in a global list,
   this is a mystery to me. having a global list make lookup to be a way
   slower as it could be if there are high number of containers
   currently running). In the patch each VE has own vtty map and lookup
   is going in O(1) time in compare to O(n) as in pcs6
 - the rest is due to changes in tty layer itself

>> +
>> +#define driver_is_master(d)	((d) == vttym_driver)
> 
> Call it vtty_driver_is_master to avoid conflicts.
> 
>> +#define tty_is_master(t)	driver_is_master((t)->driver)
> 
> vtty_is_master
> 

OK

>> +/*
>> + * While vtty_install and vtty_shutdown are ordered by tty_mutex,
>> + * vtty_cleanup can be called in any order so use spinlock to
>> + * not throttle in a midle of zapping.
>> + */
> 
> Can't you call vtty_map_zap from ->shutdown?

No, imagine a master and slave peer get opened, but then master peer
get closed -- shutdown will be called only when slave peer is closed
and only for slave tty, for master peer only cleanup will be called.
So I have to cleanup it lately. Actually this brings some delay into
reuse of tty operation so I'm thinking of improve it and move into
shutdown (since we have per map spinlock this gonna be possible).

In turn in pcs6 we indeed clean up the things in shutdown routine
simply because there are _two_ trackers: one tracking is done
via vtty_masters, while for slaves peer ve::vtty array is used,
while in pcs7 I think we may use only one map and access slaves
via @link member. As I said I think I'll update this routine
to cleanup tty map earlier.

> 
>> +static void vtty_map_set(vtty_map_t *map, struct tty_struct *tty, int index)
>> +{
>> +	vtty_printk("map %p id %d tty %p index %d\n",
>> +		    map, map->veid, tty, index);
>> +	spin_lock(&map->lock);
>> +	map->vttys[index] = tty;
>> +	spin_unlock(&map->lock);
>> +}
>> +
>> +static void vtty_map_zap(vtty_map_t *map, struct tty_struct *tty, int index)
>> +{
>> +	vtty_printk("map %p id %d tty %p index %d\n", map, map->veid, tty, index);
>> +	spin_lock(&map->lock);
>> +	if (map->vttys[index] == tty)
> 
> Why is this check?

Oh, sorry, this is left from my attempt to rework cleanup/shouldown/install
to work locklessly. It doesn't hurt but confusing a reader, will drop.

...
>> +
>> +static inline int tty_is_exiting(struct tty_struct *tty)
> 
> vtty_is_exiting

OK

> 
> BTW, why is testing TTY_CLOSING not enough?

Strictly speaking, no, it's not enough: tty can be reused (or
treated as alive) if it's not closing/hupping or changing
line discipline (see tty_reopen helper). I think I miss
additional tty_lock here ;)

>> +{
>> +	return test_bit(TTY_CLOSING, &tty->flags)		||
>> +		test_bit(TTY_HUPPING, &tty->flags)		||
>> +		test_bit(TTY_LDISC_CHANGING, &tty->flags)	||
>> +		tty->count < 1;
>> +}
>> +
...
>> +	tty = map->vttys[idx];
>> +	if (tty) {
>> +		/*
>> +		 * Fetching existing tty is a bit tricky:
>> +		 * the pair may exist but one of the peer
>> +		 * may start exiting so the caller should
>> +		 * wait until exit complete and re-ask for
>> +		 * a new peer pair. For this we check the
>> +		 * master peer to be alive first and then
>> +		 * the slave peer state if been requested.
>> +		 */
>> +		if (tty_is_exiting(tty))
>> +			tty = ERR_PTR(-ENXIO);
> 
> tty = NULL ?
> 
>> +		else if (!driver_is_master(driver)) {
>> +			tty = tty->link;
>> +			if (tty_is_exiting(tty))
>> +				tty = ERR_PTR(-ENXIO);
> 
> ditto

No, if it's exiting, we must wait until exit procedure
complete, and tty map get a free position to assign
new terminal into.

>> +static int vtty_standard_install(struct tty_driver *driver, struct tty_struct *tty)
>> +{
>> +	int ret = tty_init_termios(tty);
>> +	if (ret)
>> +		return ret;
>> +
>> +	tty_driver_kref_get(driver);
>> +	tty->count++;
> 
>> +	if (driver_is_master(driver))
>> +		vtty_map_set(tty->driver_data, tty, tty->index);
> 
> It's not that standard then. Move this to vtty_install?

OK

>> +	tty->port = kzalloc(sizeof(*tty->port), GFP_KERNEL);
>> +	if (!tty->port) {
>> +		deinitialize_tty_struct(tty);
>> +		free_tty_struct(tty);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +	tty->driver_data = map;
> 
> You do exactly the same thing is in vtty_install. Move this to
> vtty_standard_install?

OK

>> +	WARN_ON(vtty_standard_install(vttys_driver, tty));
> 
> Make vtty_standard_install return void and wrap this WARN_ON around
> tty_init_termios please.
> 
>> +	tty_port_init(tty->port);
>> +	tty->port->itty = tty;
> 
> Again, this could live in vtty_standard_install.

OK

>> +
>> +static int vtty_open(struct tty_struct *tty, struct file *filp)
>> +{
>> +	if (test_bit(TTY_OTHER_CLOSED, &tty->flags))
>> +		return -EIO;
>> +
>> +	clear_bit(TTY_IO_ERROR, &tty->flags);
> 
>> +	clear_bit(TTY_OTHER_CLOSED, &tty->flags);
> 
> You check that this bit is unset a couple of lines above.

yeah, thanks

> 
>> +	clear_bit(TTY_OTHER_CLOSED, &tty->link->flags);
>> +	set_bit(TTY_THROTTLED, &tty->flags);
>> +
>> +	vtty_printk_pair(tty);
>> +	return 0;
>> +}
>> +
>> +
...
>> +static void vtty_cleanup(struct tty_struct *tty)
>> +{
>> +	cancel_work_sync(&tty->port->buf.work);
> 
> Why? Isn't this supposed to be done in tty_port_put?

Not sure I follow this moment. Canceling pending work
is done in tty_release (ie when file is closed) so
we move it here for case when peer is closed on kref-put
without tty_release call (otherwise for one peer cancel_work_sync
will be called while for other -- not).

>> +static int vtty_write(struct tty_struct *tty, const unsigned char *buf, int count)
>> +{
>> +	struct tty_struct *peer = tty->link;
>> +
>> +	if (tty->stopped)
>> +		return 0;
>> +
>> +	if (count > 0) {
>> +		count = tty_insert_flip_string(peer->port, buf, count);
>> +		if (count) {
>> +			tty_flip_buffer_push(peer->port);
>> +			tty_wakeup(tty);
>> +		} else {
>> +			int _count = test_bit(TTY_EXTRA_REF, &tty->flags) ? 2 : 1;
>> +			/*
>> +			 * Flush the slave reader if noone
>> +			 * is actually hooked on. Otherwise
>> +			 * wait until reader fetch all data.
>> +			 */
>> +			if (peer->count < _count)
> 
> How can peer->count be less than 1 here?

Should be < 2, thanks.

>> +
>> +static const struct tty_operations vtty_ops = {
>> +	.lookup		= vtty_lookup,
>> +	.install	= vtty_install,
>> +	.open		= vtty_open,
>> +	.close		= vtty_close,
>> +	.cleanup	= vtty_cleanup,
>> +	.write		= vtty_write,
> 
>> +	.write_room	= pty_space,
> 
> That's wrong. There must always be enough room if slave is not open.

yup, thanks

>> +static void ve_vtty_fini(void *data)
>> +{
>> +	struct ve_struct *ve = data;
>> +	vtty_map_remove(vtty_map_lookup(ve->veid));
> 
> Making vtty_map_remove take veid would make the code more accurate IMO.

Don't think so, but no problem, will do.

> 
>> +}
>> +
>> +static struct ve_hook vtty_hook = {
>> +	.fini           = ve_vtty_fini,
>> +	.priority       = HOOK_PRIO_DEFAULT,
>> +	.owner          = THIS_MODULE,
>> +};
>> +
>> +static int __init vtty_init(void)
>> +{
>> +#define TTY_DRIVER_ALLOC_FLAGS			\
> 
> Please zap this macro and pass the flags as is.

I pass them into two place, so no, not going to do that:
if I need a new flag I'll update one macro instead of
walking the code and update two places.

> 
>> +	(TTY_DRIVER_REAL_RAW		|	\
>> +	 TTY_DRIVER_RESET_TERMIOS	|	\
>> +	 TTY_DRIVER_DEVPTS_MEM)
>> +
>> +	vttym_driver = tty_alloc_driver(MAX_NR_VTTY_CONSOLES, TTY_DRIVER_ALLOC_FLAGS);
>> +	if (IS_ERR(vttym_driver))
>> +		panic(pr_fmt("Can't allocate master vtty driver\n"));
>> +
>> +	vttys_driver = tty_alloc_driver(MAX_NR_VTTY_CONSOLES, TTY_DRIVER_ALLOC_FLAGS);
>> +	if (IS_ERR(vttys_driver))
>> +		panic(pr_fmt("Can't allocate slave vtty driver\n"));
> [...]
> 

Vladimir, thanks a *huge* for review!

-- 
    Cyrill