[Devel] [PATCH RFC] vhost/vsock: Refuse the connection immediately when guest isn't ready

Tue May 12 14:14:38 MSK 2026

On 5/7/26 22:13, Polina Vishneva wrote:
> From: "Denis V. Lunev" <den at openvz.org>
> 
> When the host initiates an AF_VSOCK connect() to a guest that has not
> yet loaded the virtio-vsock transport (i.e. still booting), the caller
> blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
> vhost_transport_do_send_pkt() silently exits when
> vhost_vq_get_backend(vq) returns NULL.
> 
> If the guest doesn't start listening within this timeout, connect()
> returns ETIMEDOUT.
> 
> This delay is usually pointless and it doesn't well align with our
> behavior at other initialization stages: for example, if a connection is
> attempted when the guest driver is already loaded, but when nothing is
> listening yet, it returns ECONNRESET immediately without any wait.
> 
> Fix this by checking the RX virtqueue backend in
> vhost_transport_send_pkt() before queuing. If the backend is NULL,
> return -ECONNREFUSED immediately.
> 
> Signed-off-by: Denis V. Lunev <den at openvz.org>
> Co-authored-by: Polina Vishneva <polina.vishneva at virtuozzo.com>
> Signed-off-by: Polina Vishneva <polina.vishneva at virtuozzo.com>
> ---
>   drivers/vhost/vsock.c | 17 ++++++++++++++---
>   1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 1d8ec6bed53e..e6de1e23121b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -302,6 +302,20 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
>   		return -ENODEV;
>   	}
>   
> +	/* If the guest has not yet initialized the RX virtqueue, fail
> +	 * immediately rather than queueing the packet and letting the
> +	 * caller wait for VSOCK_DEFAULT_CONNECT_TIMEOUT.
> +	 *
> +	 * Reading private_data without vq->mutex is a deliberate racy
> +	 * check: if the backend is NULL the guest driver is definitely
> +	 * not ready; if it becomes NULL right after, the worker
> +	 * (do_send_pkt) rechecks under the mutex. */
> +	if (!READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)) {
> +		rcu_read_unlock();
> +		kfree_skb(skb);
> +		return -ECONNREFUSED;

i'm a bit hesitating about the proper error code returned here.
Who receives this error code eventually and how does it process it?

i mean - we are in a process on a VM start, but it has not been fully initialized yet.
But we believe it will be initialized soon, so i'd expect the attempt should be repeated in a while.

On the other hand i'm not sure the process when gets -ECONNREFUSED, will definitely retries the attempt.

May be to use -EAGAIN here - this error code definitely is expected when a new attempt is expected.

AI also suggests -EHOSTUNREACH (and by the way - AI does not recommend EAGAIN he-he :)))  ).

   EHOSTUNREACH as the error code for "guest transport not ready"

   Semantics: EHOSTUNREACH means "the destination host cannot be reached" - the peer exists 
conceptually but the
   communication path to it is currently unavailable. This maps precisely to the situation: the guest 
VM exists, QEMU has
   opened the vhost-vsock device and assigned a CID, but the guest has not yet loaded its virtio-vsock 
driver, so the
   transport path is not established.

   Existing usage in vsock subsystem:

   • vmci_transport.c:95 - VMCI_ERROR_INVALID_RESOURCE is mapped to EHOSTUNREACH. This is the case 
where the VMCI
     endpoint for the peer cannot be located - the peer's transport resource does not exist yet or has 
been destroyed.

   • vmci_transport_notify.c:436,525 - returned when send_waiting_read() / send_waiting_write() fails, 
meaning the
     notification could not reach the peer. The peer is considered unreachable.

   Both cases share the same pattern: the peer is known to exist (has a CID, was previously connected, 
etc.) but the
   transport layer cannot deliver data to it right now.

   Why it fits better than ECONNREFUSED:

   • ECONNREFUSED implies the peer received the request and actively rejected it (e.g., nothing 
listening on that port).
     Here the guest never sees the request at all - the virtqueue backend is NULL, so the packet 
cannot even enter the
     guest.

   • EHOSTUNREACH implies the packet could not be routed/delivered to the destination. This is exactly 
what happens - the
      RX virtqueue has no backend, so delivery is impossible.

   Userspace behavior:

   • Programs and retry frameworks commonly treat EHOSTUNREACH as a transient condition worth retrying 
(the host may come
      up), whereas ECONNREFUSED is typically treated as "service does not exist at this address" and 
not retried.

   • For the specific use case (host connecting to a guest that is still booting), retry is the 
correct behavior - the
     guest will eventually load its driver and become reachable.

   It is a standard connect() error code - unlike EAGAIN, which is not expected from connect() and 
would confuse most
   userspace socket code.

> +	}
> +
>   	if (virtio_vsock_skb_reply(skb))
>   		atomic_inc(&vsock->queued_replies);
>   
> @@ -624,9 +638,6 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>   		mutex_unlock(&vq->mutex);
>   	}
>   
> -	/* Some packets may have been queued before the device was started,
> -	 * let's kick the send worker to send them.
> -	 */
>   	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);

i think the vhost_vq_work_queue() call should be removed as well here, not only the comment.

   Before the patch: packets accumulate while backend is NULL

   Timeline from the QEMU/host perspective:

   1. QEMU opens /dev/vhost-vsock - struct vhost_vsock is created, but virtqueue backend 
(private_data) is still NULL.

   2. QEMU issues ioctl(VHOST_VSOCK_SET_GUEST_CID) - sets vsock->guest_cid, inserts vsock into 
vhost_vsock_hash. From this point vhost_vsock_get(cid) can find it.

   3. Guest is still booting, virtio-vsock driver not loaded yet. But the vsock is already 
discoverable by CID lookup.

   4. Host calls connect() - the packet gets queued but cannot be delivered:

   connect(fd, {AF_VSOCK, guest_cid, port})
     vsock_connect()                                [af_vsock.c:1650]
       transport->connect(vsk)                      [af_vsock.c:1730]
         virtio_transport_connect()                 [virtio_transport_common.c:1076]
           virtio_transport_send_pkt_info()         [virtio_transport_common.c:328]
             t_ops->send_pkt(skb, net)
               vhost_transport_send_pkt()           [vsock.c:289]
                 vhost_vsock_get(dst_cid) -> found  (CID already in hash)
                 virtio_vsock_skb_queue_tail()      ← PACKET QUEUED
                 vhost_vq_work_queue()              ← WORKER KICKED
                 return len                         ← SUCCESS (positive)

   Worker wakes up but cannot deliver:

   vhost_transport_send_pkt_work()
     vhost_transport_do_send_pkt(vsock, vq)         [vsock.c:107]
       mutex_lock(&vq->mutex)
       vhost_vq_get_backend(vq) == NULL             ← guest not ready
       goto out                                     ← PACKET STAYS IN QUEUE
       mutex_unlock(&vq->mutex)

   Back in vsock_connect() - transport->connect() returned success (len > 0), so the code enters the 
wait loop:

       sk->sk_state = TCP_SYN_SENT;
       err = transport->connect(vsk);     → returns len (success)
       if (err < 0) goto out;             → NOT taken
       ...
       while (sk->sk_state != TCP_ESTABLISHED && ...) {
           timeout = schedule_timeout(timeout);     ← SLEEPS 2 SECONDS
           if (timeout == 0) {
               err = -ETIMEDOUT;                    ← GIVES UP
           }
       }

   The guest never receives the CONNECT request (it is stuck in the queue), so no response arrives, 
and connect() returns ETIMEDOUT after 2 seconds.

   5. Later the guest finishes booting, loads the virtio-vsock driver, negotiates virtqueues. QEMU 
issues ioctl(VHOST_VSOCK_SET_RUNNING, 1) which calls vhost_vsock_start():

   vhost_vsock_start()                              [vsock.c:609]
     for each vq:
       mutex_lock(&vq->mutex)
       vhost_vq_set_backend(vq, vsock)              ← backend becomes NON-NULL
       mutex_unlock(&vq->mutex)
     vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX],  ← KICKS WORKER AGAIN
                         &vsock->send_pkt_work)

   Worker wakes up, now vhost_vq_get_backend(vq) != NULL, delivers the queued packet to the guest. But 
it is too late - connect() on the host side already timed out.

   Why the kick in vhost_vsock_start() is essential here: between steps 4 and 5 nobody else will wake 
the worker. The kick from step 4 already fired and did nothing (backend was NULL). No new packets are 
coming - the only connect() caller is sleeping. Without this kick the packet would remain in the queue 
forever.

   ────────────────────────────────────────

   After the patch: packets no longer accumulate

   Same initial conditions - QEMU has set the CID, guest is still booting.

   Host calls connect():

   connect(fd, {AF_VSOCK, guest_cid, port})
     vsock_connect()                                [af_vsock.c:1650]
       transport->connect(vsk)                      [af_vsock.c:1730]
         virtio_transport_connect()                 [virtio_transport_common.c:1076]
           virtio_transport_send_pkt_info()         [virtio_transport_common.c:328]
             t_ops->send_pkt(skb, net)
               vhost_transport_send_pkt()           [vsock.c:289]
                 vhost_vsock_get(dst_cid) -> found
                 READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data) == NULL
                 kfree_skb(skb)                     ← PACKET FREED
                 return -ECONNREFUSED               ← ERROR RETURNED

   The error propagates back immediately:

           virtio_transport_send_pkt_info():
             ret = t_ops->send_pkt(skb, net)  → -ECONNREFUSED
             if (ret < 0) break               → breaks out
         virtio_transport_connect() returns -ECONNREFUSED
       vsock_connect():
         err = transport->connect(vsk)        → -ECONNREFUSED
         if (err < 0) goto out                → TAKEN, skips wait loop
     connect() returns ECONNREFUSED to userspace immediately

   The packet never enters send_pkt_queue. When vhost_vsock_start() runs later, the queue is 
guaranteed to be empty - there is nothing for the worker kick to flush.

   ────────────────────────────────────────

   Summary: SET_GUEST_CID makes the vsock discoverable, SET_RUNNING actually enables the virtqueues. 
Between these two ioctls there is a window where packets are accepted into the queue but cannot be 
delivered. The kick in vhost_vsock_start() existed to drain this backlog. The patch closes the window 
at the entry point instead - refusing packets outright - so the backlog can never form.

>   
>   	mutex_unlock(&vsock->dev.mutex);