[Devel] [PATCH RFC] vhost/vsock: Refuse the connection immediately when guest isn't ready
Konstantin Khorenko
khorenko at virtuozzo.com
Tue May 12 14:14:38 MSK 2026
On 5/7/26 22:13, Polina Vishneva wrote:
> From: "Denis V. Lunev" <den at openvz.org>
>
> When the host initiates an AF_VSOCK connect() to a guest that has not
> yet loaded the virtio-vsock transport (i.e. still booting), the caller
> blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
> vhost_transport_do_send_pkt() silently exits when
> vhost_vq_get_backend(vq) returns NULL.
>
> If the guest doesn't start listening within this timeout, connect()
> returns ETIMEDOUT.
>
> This delay is usually pointless and it doesn't well align with our
> behavior at other initialization stages: for example, if a connection is
> attempted when the guest driver is already loaded, but when nothing is
> listening yet, it returns ECONNRESET immediately without any wait.
>
> Fix this by checking the RX virtqueue backend in
> vhost_transport_send_pkt() before queuing. If the backend is NULL,
> return -ECONNREFUSED immediately.
>
> Signed-off-by: Denis V. Lunev <den at openvz.org>
> Co-authored-by: Polina Vishneva <polina.vishneva at virtuozzo.com>
> Signed-off-by: Polina Vishneva <polina.vishneva at virtuozzo.com>
> ---
> drivers/vhost/vsock.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 1d8ec6bed53e..e6de1e23121b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -302,6 +302,20 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> return -ENODEV;
> }
>
> + /* If the guest has not yet initialized the RX virtqueue, fail
> + * immediately rather than queueing the packet and letting the
> + * caller wait for VSOCK_DEFAULT_CONNECT_TIMEOUT.
> + *
> + * Reading private_data without vq->mutex is a deliberate racy
> + * check: if the backend is NULL the guest driver is definitely
> + * not ready; if it becomes NULL right after, the worker
> + * (do_send_pkt) rechecks under the mutex. */
> + if (!READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)) {
> + rcu_read_unlock();
> + kfree_skb(skb);
> + return -ECONNREFUSED;
i'm a bit hesitating about the proper error code returned here.
Who receives this error code eventually and how does it process it?
i mean - we are in a process on a VM start, but it has not been fully initialized yet.
But we believe it will be initialized soon, so i'd expect the attempt should be repeated in a while.
On the other hand i'm not sure the process when gets -ECONNREFUSED, will definitely retries the attempt.
May be to use -EAGAIN here - this error code definitely is expected when a new attempt is expected.
AI also suggests -EHOSTUNREACH (and by the way - AI does not recommend EAGAIN he-he :))) ).
EHOSTUNREACH as the error code for "guest transport not ready"
Semantics: EHOSTUNREACH means "the destination host cannot be reached" - the peer exists
conceptually but the
communication path to it is currently unavailable. This maps precisely to the situation: the guest
VM exists, QEMU has
opened the vhost-vsock device and assigned a CID, but the guest has not yet loaded its virtio-vsock
driver, so the
transport path is not established.
Existing usage in vsock subsystem:
• vmci_transport.c:95 - VMCI_ERROR_INVALID_RESOURCE is mapped to EHOSTUNREACH. This is the case
where the VMCI
endpoint for the peer cannot be located - the peer's transport resource does not exist yet or has
been destroyed.
• vmci_transport_notify.c:436,525 - returned when send_waiting_read() / send_waiting_write() fails,
meaning the
notification could not reach the peer. The peer is considered unreachable.
Both cases share the same pattern: the peer is known to exist (has a CID, was previously connected,
etc.) but the
transport layer cannot deliver data to it right now.
Why it fits better than ECONNREFUSED:
• ECONNREFUSED implies the peer received the request and actively rejected it (e.g., nothing
listening on that port).
Here the guest never sees the request at all - the virtqueue backend is NULL, so the packet
cannot even enter the
guest.
• EHOSTUNREACH implies the packet could not be routed/delivered to the destination. This is exactly
what happens - the
RX virtqueue has no backend, so delivery is impossible.
Userspace behavior:
• Programs and retry frameworks commonly treat EHOSTUNREACH as a transient condition worth retrying
(the host may come
up), whereas ECONNREFUSED is typically treated as "service does not exist at this address" and
not retried.
• For the specific use case (host connecting to a guest that is still booting), retry is the
correct behavior - the
guest will eventually load its driver and become reachable.
It is a standard connect() error code - unlike EAGAIN, which is not expected from connect() and
would confuse most
userspace socket code.
> + }
> +
> if (virtio_vsock_skb_reply(skb))
> atomic_inc(&vsock->queued_replies);
>
> @@ -624,9 +638,6 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> mutex_unlock(&vq->mutex);
> }
>
> - /* Some packets may have been queued before the device was started,
> - * let's kick the send worker to send them.
> - */
> vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);
i think the vhost_vq_work_queue() call should be removed as well here, not only the comment.
Before the patch: packets accumulate while backend is NULL
Timeline from the QEMU/host perspective:
1. QEMU opens /dev/vhost-vsock - struct vhost_vsock is created, but virtqueue backend
(private_data) is still NULL.
2. QEMU issues ioctl(VHOST_VSOCK_SET_GUEST_CID) - sets vsock->guest_cid, inserts vsock into
vhost_vsock_hash. From this point vhost_vsock_get(cid) can find it.
3. Guest is still booting, virtio-vsock driver not loaded yet. But the vsock is already
discoverable by CID lookup.
4. Host calls connect() - the packet gets queued but cannot be delivered:
connect(fd, {AF_VSOCK, guest_cid, port})
vsock_connect() [af_vsock.c:1650]
transport->connect(vsk) [af_vsock.c:1730]
virtio_transport_connect() [virtio_transport_common.c:1076]
virtio_transport_send_pkt_info() [virtio_transport_common.c:328]
t_ops->send_pkt(skb, net)
vhost_transport_send_pkt() [vsock.c:289]
vhost_vsock_get(dst_cid) -> found (CID already in hash)
virtio_vsock_skb_queue_tail() ← PACKET QUEUED
vhost_vq_work_queue() ← WORKER KICKED
return len ← SUCCESS (positive)
Worker wakes up but cannot deliver:
vhost_transport_send_pkt_work()
vhost_transport_do_send_pkt(vsock, vq) [vsock.c:107]
mutex_lock(&vq->mutex)
vhost_vq_get_backend(vq) == NULL ← guest not ready
goto out ← PACKET STAYS IN QUEUE
mutex_unlock(&vq->mutex)
Back in vsock_connect() - transport->connect() returned success (len > 0), so the code enters the
wait loop:
sk->sk_state = TCP_SYN_SENT;
err = transport->connect(vsk); → returns len (success)
if (err < 0) goto out; → NOT taken
...
while (sk->sk_state != TCP_ESTABLISHED && ...) {
timeout = schedule_timeout(timeout); ← SLEEPS 2 SECONDS
if (timeout == 0) {
err = -ETIMEDOUT; ← GIVES UP
}
}
The guest never receives the CONNECT request (it is stuck in the queue), so no response arrives,
and connect() returns ETIMEDOUT after 2 seconds.
5. Later the guest finishes booting, loads the virtio-vsock driver, negotiates virtqueues. QEMU
issues ioctl(VHOST_VSOCK_SET_RUNNING, 1) which calls vhost_vsock_start():
vhost_vsock_start() [vsock.c:609]
for each vq:
mutex_lock(&vq->mutex)
vhost_vq_set_backend(vq, vsock) ← backend becomes NON-NULL
mutex_unlock(&vq->mutex)
vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], ← KICKS WORKER AGAIN
&vsock->send_pkt_work)
Worker wakes up, now vhost_vq_get_backend(vq) != NULL, delivers the queued packet to the guest. But
it is too late - connect() on the host side already timed out.
Why the kick in vhost_vsock_start() is essential here: between steps 4 and 5 nobody else will wake
the worker. The kick from step 4 already fired and did nothing (backend was NULL). No new packets are
coming - the only connect() caller is sleeping. Without this kick the packet would remain in the queue
forever.
────────────────────────────────────────
After the patch: packets no longer accumulate
Same initial conditions - QEMU has set the CID, guest is still booting.
Host calls connect():
connect(fd, {AF_VSOCK, guest_cid, port})
vsock_connect() [af_vsock.c:1650]
transport->connect(vsk) [af_vsock.c:1730]
virtio_transport_connect() [virtio_transport_common.c:1076]
virtio_transport_send_pkt_info() [virtio_transport_common.c:328]
t_ops->send_pkt(skb, net)
vhost_transport_send_pkt() [vsock.c:289]
vhost_vsock_get(dst_cid) -> found
READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data) == NULL
kfree_skb(skb) ← PACKET FREED
return -ECONNREFUSED ← ERROR RETURNED
The error propagates back immediately:
virtio_transport_send_pkt_info():
ret = t_ops->send_pkt(skb, net) → -ECONNREFUSED
if (ret < 0) break → breaks out
virtio_transport_connect() returns -ECONNREFUSED
vsock_connect():
err = transport->connect(vsk) → -ECONNREFUSED
if (err < 0) goto out → TAKEN, skips wait loop
connect() returns ECONNREFUSED to userspace immediately
The packet never enters send_pkt_queue. When vhost_vsock_start() runs later, the queue is
guaranteed to be empty - there is nothing for the worker kick to flush.
────────────────────────────────────────
Summary: SET_GUEST_CID makes the vsock discoverable, SET_RUNNING actually enables the virtqueues.
Between these two ioctls there is a window where packets are accepted into the queue but cannot be
delivered. The kick in vhost_vsock_start() existed to drain this backlog. The patch closes the window
at the entry point instead - refusing packets outright - so the backlog can never form.
>
> mutex_unlock(&vsock->dev.mutex);
More information about the Devel
mailing list