英伟达
直播中

云解相

7年用户 179经验值
私信 关注
[问答]

ESXi 6.0上的K2 vGPU和HV 6.2驱动程序354.97 Win 10 x64的随机系统锁定

我们在戴尔R730上部署了Horizo​​n View设置,其GRID为k2。
我们的Win 10 Ent x64池遇到了一些问题,使用户体验不可靠。
环境概要
ESXi 6.0版
Horizo​​n View 6.2
戴尔PE R730
NVIDIA K2;
在Host和vGPU上运行354.97驱动程序
运行Win 10 Ent 1511的客户端池
WYSE P45 Zero Client运行PCoIP  -  Terra 2芯片 - 来自teradici的5.2固件
双显示器设置,1920x1200
问题1。
网页浏览器(任何网络浏览器)中的全屏视频在全屏显示的几秒钟内冻结,恢复它的唯一方法是逃离全屏。
解决方法是在浏览器中禁用硬件加速,但这会破坏投资GRID基础架构的目的。
问题2。
至少每天一次,win 10 VDI客户端将遇到图形驱动程序的重大故障。
这首先表现为屏幕锁定,然后音频失败,然后会话结束,无法通过Zero Client菜单重新初始化。
解决方法是从Zero Client菜单启动VM的重新启动,该菜单重新启动整个机器,丢失所有未保存的工作。
操作系统本身似乎没有崩溃,因为重启时启动的是干净重启操作系统而不是虚拟硬件的硬重置。
任何人都可以帮助我们解决这个问题或指向我们已知的GRID驱动程序的稳定版本?

以上来自于谷歌翻译


以下为原文

We have deployed a Horizon View setup on a dell R730 with a GRID k2.  Our Win 10 Ent x64 pool is experiencing a few issues which make the user experience unreliable.

Summary of environment

ESXi version 6.0
Horizon View 6.2
Dell PE R730
NVIDIA K2; running 354.97 driver both on Host and vGPU
Client pool running Win 10 Ent 1511
WYSE P45 Zero Client running PCoIP - Terra 2 chip - 5.2 firmware from teradici
Dual monitor setup, 1920x1200


Issue 1.

Full screen video within a web browser (any web browser) freezes up within seconds of going full screen, the only way to restore it is to escape out of full screen.  The work around is to disable hardware acceleration within the browser, but this defeats the purpose in investing in the GRID infrastructure.

Issue 2.

At least once a day at random the win 10 VDI client will experience a major failure of the graphics driver.  This manifest itself at first as a lock up of the screen, the audio then fails followed by the session ending and unable to be reinitialized through the Zero Client menu.  The workaround is to initiate a restart of the VM from the Zero Client menu which reboots the entire machine losing all unsaved work.  It seems that the OS itself has not crashed as the reboot when initiated is a clean reboot of the OS not a hard reset of the Virtual hardware.

Can anyone please help us address this issue or point us to a known stable build of the GRID drivers?

回帖(8)

李桂芝

2018-9-12 16:27:08
嗨,Mohb60,
我不知道这样的任何已知问题,驱动程序应该是稳定的。
您可以做的最好的事情是提出支持票,因为这是GRID 1.0产品(K2 / K1板),您需要通过提供电路板的OEM(在这种情况下为Dell)执行此操作,然后他们可以将其升级为
NVIDIA工程如果是驱动程序问题。
在GRID 2.0 SUMS支持是可用的,所以有一个过程直接用NVIDIA提高票价,但在较旧的硬件销售模式 - 我担心你需要通过卖给你卡的OEM。
最好的祝愿,
雷切尔

以上来自于谷歌翻译


以下为原文

Hi Mohb60,

I don't know of any known issue like this and the drivers should be stable. The best thing you can do is to raise a support ticket because this is GRID 1.0 product (K2/K1 boards) you need to do this via the OEM who supplied the board (Dell in this case) and they in turn can escalate it into NVIDIA engineering if it's a driver issue.

In the GRID 2.0 SUMS support is available so there's a process to raise tickets directly with NVIDIA but in the older hardware sales model - I'm afraid you do need to go via the OEM who sold you the card.

Best wishes,
Rachel
举报

许佳

2018-9-12 16:34:27
嗨雷切尔,
感谢您的反馈意见。
无论如何,我有一些好消息,我们设法在我们的环境中找到解决问题#1的方法。
据我所知,在浏览器中全屏播放期间视频冻结是由于图像质量容差设置。
与直觉相反,似乎在PCoIP客户端上将图像质量容差设置得太低也会产生问题。
将条形设置从80%设置为感知无损似乎是我们设置中的最佳设置,并且在全屏播放期间不再导致屏幕锁定。
我希望这有助于其他人采用类似的设置并面临同样的问题。
我们仍在研究问题#2。

以上来自于谷歌翻译


以下为原文

Hi Rachel,

Thanks for your feedback.  Anyway, I have some good news, we managed to find a work around for issue #1 in our environment.  From what I can tell, the video freezing up during full screen playback in a browser was due to the image quality tolerance settings.  Counter-intuitively, it seems that setting the lower threshold too low on the image quality tolerance on the PCoIP client produces the issue.  Setting the bar from 80% to perceptively lossless seems to be the best setting on our setup and no longer results in the screen locking up during full screen playback.  I hope that this helps others with a similar setup and are facing the same issue.

We are still looking into issue #2.
举报

王若鸿

2018-9-12 16:53:00
这个问题在我们的环境中仍然存在。
我们采取了以下步骤并得出结论,新的Nvidia驱动程序不稳定。
我们的环境与上面列出的相同。
为了进行调查,我们的第一步是查看日志以确定VM随机崩溃和重新启动的原因。
日志显示很少,Windows日志显示在遇到VM操作系统故障后,窗口已从不干净的关闭中恢复。
VM日志显示错误,其中大致相同的时间戳将故障与Windows日志中显示的内容相关联。
将vib和vGPU驱动程序从348.27升级到较新的354.97之后,这种不稳定性被引入我们的环境。
这是通过在我们的一台主机上回滚振动并创建一个安装了348.27驱动程序的新的相同vGPU池来确认的。
新游泳池一直很稳定。
来自Nvidia的任何人都可以告诉我们新驱动程序不稳定的原因吗?

以上来自于谷歌翻译


以下为原文

This issue still persists in our environment.  We have taken the following steps and concluded that the newer Nvidia drivers are unstable.

Our environment is the same as what is listed above.

To investigate, our initial step was to look into the logs to determine why VMs were randomly crashing and rebooting.  The logs show very little, windows logs show that windows had recovered from an unclean shutdown after experiencing a failure of the VM's OS.  The VM logs show an error with approximately the same time stamp correlating the failure with what's seen in the windows logs.  

This instability was introduced into our environment after upgrading both the vib and vGPU drivers from 348.27 to the newer 354.97.  This was confirmed by rolling back the vib on one of our hosts and creating a new identical vGPU pool with 348.27 drivers installed.  The new pool has been stable.

Can anyone from Nvidia give us a reason for the instability in the newer drivers?
举报

林立银

2018-9-12 17:12:09
感谢您的反馈非常有帮助 - 我会看到内部已知的内容。
雷切尔

以上来自于谷歌翻译


以下为原文

Thanks for the feedback very helpful - I'll see what is known internally.
Rachel
举报

更多回帖

发帖
×
20
完善资料,
赚取积分