Skip to content

Possible race condition in Vulkan swapchain recreation

There is a long-standing bug in mpv player, where the player hangs when resizing window, emitting VK_DEVICE_LOST_ERROR

Recently I decided to look into that problem and noted that there are some Vulkan Validation failures in the gpu debug log.

Attached is the log produced with mpv --no-config --log-file=gpu-debug.txt --gpu-debug --vo=gpu --gpu-api=vulkan ~/Japan\ in\ 8K-m1jY2VLCRmY.mkv gpu-debug.txt

The backtrace suggests pl_swapchain_resize is the fault and upon looking into that, I found punching a vkDeviceWaitIdle call during swapchain recreation can make the problem go away. (A deadlock condition seemed to be introduced when mpv config vulkan-queue-count is anything larger than 1, but it is expected with vkDeviceWaitIdle anyway)

Is this a pontial bug in libplacebo vulkan code or in NVIDIA's vulkan driver?

My operating system: Gentoo Linux

My mpv player: upstream master branch, 0.34.0-301-gdefb02daa4

My libplacebo version: master branch, commit b4867541

My GPU driver: NVIDIA proprietary 515.43.04

The video I used to reproduce the problem can be downloaded from https://www.youtube.com/watch?v=m1jY2VLCRmY with yt-dlp tool. On my machine it can be reproduced quite consistently by resizing mpv window around with mouse.

Said `vkDeviceWaitIdle` workaround
diff --git a/src/vulkan/common.h b/src/vulkan/common.h
index 339da6a..be99a45 100644
--- a/src/vulkan/common.h
+++ b/src/vulkan/common.h
@@ -226,4 +226,5 @@ struct vk_ctx {
     PL_VK_FUN(GetMemoryWin32HandleKHR);
     PL_VK_FUN(GetSemaphoreWin32HandleKHR);
 #endif
+    PL_VK_FUN(DeviceWaitIdle);
 };
diff --git a/src/vulkan/context.c b/src/vulkan/context.c
index 5be9bd7..72d3057 100644
--- a/src/vulkan/context.c
+++ b/src/vulkan/context.c
@@ -336,6 +336,7 @@ static const struct vk_fun vk_dev_funs[] = {
     PL_VK_DEV_FUN(SetDebugUtilsObjectNameEXT),
     PL_VK_DEV_FUN(UpdateDescriptorSets),
     PL_VK_DEV_FUN(WaitForFences),
+    PL_VK_DEV_FUN(DeviceWaitIdle),
 };
 
 static void load_vk_fun(struct vk_ctx *vk, const struct vk_fun *fun)
diff --git a/src/vulkan/swapchain.c b/src/vulkan/swapchain.c
index 3b66ee6..8ff443f 100644
--- a/src/vulkan/swapchain.c
+++ b/src/vulkan/swapchain.c
@@ -570,6 +570,8 @@ static bool vk_sw_recreate(pl_swapchain sw, int w, int h)
     while (p->old_swapchain)
         vk_poll_commands(vk, UINT64_MAX);
 
+    vk->DeviceWaitIdle(vk->dev);
+
     VkSwapchainCreateInfoKHR sinfo = p->protoInfo;
     sinfo.oldSwapchain = p->swapchain;
 
Edited by Ocean Shen