s3:smbd: fix multichannel connection passing race
If a client opens multiple connection with the same
client guid in parallel, our connection passing is likely
to hit a race.
Assume we have 3 processes:
smbdA: This process already handles all connections for
a given client guid
smbdB: This just received a new connection with an
SMB2 neprot for the same client guid
smbdC: This also received a new connection with an
SMB2 neprot for the same client guid
Now both smbdB and smbdC send a MSG_SMBXSRV_CONNECTION_PASS
message to smbdA. These messages contain the socket fd
for each connection.
While waiting for a MSG_SMBXSRV_CONNECTION_PASSED message
from smbdA, both smbdB and smbdC watch the smbXcli_client.tdb
record for changes (that also verifies smbdA stays alive).
Once one of them say smbdB received the MSG_SMBXSRV_CONNECTION_PASSED
message, the dbwrap_watch logic will wakeup smbdC in order to
let it recheck the smbXcli_client.tdb record in order to
handle the case where smbdA died or deleted its record.
Now smbdC rechecks the smbXcli_client.tdb record, but it
was not woken because of a problem with smbdA. It meant
that smbdC sends a MSG_SMBXSRV_CONNECTION_PASS message
including the socket fd again.
As a result smbdA got the socket fd from smbdC twice (or even more),
and creates two (or more) smbXsrv_connection structures for the
same low level tcp connection. And it also sends more than one
SMB2 negprot response. Depending on the tevent logic, it will
use different smbXsrv_connection structures to process incoming
requests. And this will almost immediately result in errors.
The typicall error is:
smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
But other errors would also be possible.
The detail that leads to the long delays on the client side is
that our smbd_server_connection_terminate_ex() code will close
only the fd of a single smbXsrv_connection, but the refcount
on the socket fd in the kernel is still not 0, so the tcp
connection is still alive...
Now we remember the server_id of the process that we send
the MSG_SMBXSRV_CONNECTION_PASS message to. And just keep
watching the smbXcli_client.tdb record if the server_id
don't change. As we just need more patience to wait for
the MSG_SMBXSRV_CONNECTION_PASSED message.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15346
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Tue Aug 8 13:59:58 UTC 2023 on atb-devel-224