Make corresponding device active before each TestDevice is created
When populating the list of GPU devices for tests, the DeviceContext and
DeviceStream objects are created for each TestDevice. In CUDA, the
active device has to be explicitly set by the API since the DeviceStream
is created for the active device. Otherwise, all the streams are created
for the first device. When these streams are later used for the devices
other than first, CUDA API returns an invalid argument error. The bug only
affects unit tests in CUDA builds for systems with more than one GPU.
Does not affect OpenCL/SYCL builds, because they use DeviceContext to
attach the device to the stream (queue).
Fixes #3781, #3782 and #3805.