完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
我们在Nvidia K1s上运行了几年的vSGA。
升级到Horizon View 6.2并测试vGPU配置文件。 最初的测试进行得很顺利但是一旦我将我的池缩小了几个虚拟机失败的定制。 但是,Horizon没有发生错误,它们只是处于自定义状态。 如果我强制重置电源然后响应Windows恢复以正常启动,它通常会继续定制并完成。 最好的部分是因为它永远不会启动,因为控制台是通过连接到VM的vGPU K100禁用的,所以我无法进入VNC。 这是非常奇怪的行为,我有一个与VMware的开放票。 当我挖掘日志时,我在VM的vmware.log中找到了这个有趣的项目: 2016-04-27T01:44:09.329Z | MKS | W110:GLWindow:无法保留主机GPU资源 2016-04-27T01:44:09.339Z | VMX | I120:[msg.mks.noGPUResourceFallback]硬件GPU资源不可用。 虚拟机将使用软件渲染。 如果你看看周围的工作,我重新启动VM,它最终会工作。 似乎在开机期间某些VM在K1上分配了GPU核心时出现故障。 我没有找到任何在线提到这个问题。 我会保持这篇文章更新。 系统环境 超微 双K1 ESXi 6.0 U2 Nvidia VIB 361.40 Windows Nvidia 362.13 以上来自于谷歌翻译 以下为原文 We have been running vSGA for a couple years off of Nvidia K1s. Upgraded to Horizon View 6.2 and testing vGPU profiles. The initial testing went very well but once I scaled my pool out several VMs failed customization. However, no error occurred in Horizon, they just remained in a customization status. If I forced a power reset and then responded to windows recovery to boot normally, it would usually continue customization and finish. The best part is because it never boots, I can't VNC into it as the console is disabled with the vGPU K100 attached to the VM. It is very odd behavior and I have an open ticket with VMware. As I dug through the logs, I found this interesting item in the vmware.log for the VM: 2016-04-27T01:44:09.329Z| mks| W110: GLWindow: Unable to reserve host GPU resources 2016-04-27T01:44:09.339Z| vmx| I120: [msg.mks.noGPUResourceFallback] Hardware GPU resources are not available. The virtual machine will use software rendering. If you look at the work around, where I power reset the VM and it eventually works. It seems like there is a failure during power on for some VMs to get assigned a GPU core on the K1s. I haven't found ANYTHING online referring to this issue. I'll keep this post updated. System Environment
|
|
相关推荐
7个回答
|
|
你能检查一下安装的K1卡的vBIOS,并确保它们是最新版本。
您可能需要从SuperMicro请求此更新。 另外,为什么要使用K100? K120Q是更好的选择,更多的图形内存和完全相同的密度,每个GPU仅支持最多8个vGPU会话(因此在K1上为32)。 以上来自于谷歌翻译 以下为原文 Can you check the vBIOS of the K1 cards installed and ensure they're at the latest version. You may need to request this update from SuperMicro. Also, why use K100? K120Q is a better choice, more Graphics Memory and exactly the same density as each GPU only supports a maximum of 8 vGPU sessions ( so that's 32 on a K1). |
|
|
|
感谢Jason,我联系了Super Micro,但他们不知道Nvidia GRID卡的任何“授权”BIOS更新。
我也尝试在线查看,但没有找到GRID卡的BIOS版本历史记录。 运行nvidia-smi命令,它报告它们正在运行: VBIOS版本:80.07.BE.00.04 MultiGPU董事会:是的 电路板ID:0x8300 GPU部件号:900-52401-0020-000 Inforom版本 图像版本:2401.0502.00.02 你知道我在哪里可以找到这些信息吗? 另外,关于K100的选择。 我同意,我们只是想分别测试K100和K120Q,以了解正在使用的应用程序的性能提升。 我计划将K120Q用于生产,因为我们获得了相同的用户密度。 感谢您及时的回复! 以上来自于谷歌翻译 以下为原文 Thanks Jason, I contacted Super Micro but they are not aware of any "authorized" BIOS updates for the Nvidia GRID cards. I also tried looking online but didn't find a BIOS version history for the GRID cards. Running the nvidia-smi command, it reports that they are running: VBIOS Version : 80.07.BE.00.04 MultiGPU Board : Yes Board ID : 0x8300 GPU Part Number : 900-52401-0020-000 Inforom Version Image Version : 2401.0502.00.02 Do you know where I could find that info? Also, regarding the K100 choice. I agree, we just wanted to test the K100 and K120Q separately to understand the performance gains on applications being used. I plan to go K120Q for production since we get the same user density. Thanks for the quick response! |
|
|
|
您使用的是最新的VBIOS,因此无需更新。
我会避免使用K100配置文件,它仅用于遗留支持,我建议所有新项目/部署不使用它。 出于兴趣,您在创建的池中有多少VM,并且在这些主机中可以使用许多K1? 以上来自于谷歌翻译 以下为原文 You're on the latest VBIOS so no update required. I would avoid the K100 profile, it's only there for legacy support and I would recommend all new projects / deployments to not use it. Out of interest, how many VM's do you have in the pool you're creating, and hown many K1's are available in those hosts? |
|
|
|
感谢您查看vBIOS。
我将测试K120Q然后报告。 关于池***为55个VM,目标主机有两个K1。 如果群集中发生主机故障,则具有两个K1的第二个主机将成为备用主机(我知道不支持vmotion)。 以上来自于谷歌翻译 以下为原文 Thanks for checking on the vBIOS. I will test out the K120Q then and report back. Regarding the pool size, it was planned to be 55 VMs with the target host having two K1s. A second host with two K1s would be a standby in case of host failure in the cluster (I know vmotion isn't supported). |
|
|
|
|
|
|
|
None
以上来自于谷歌翻译 以下为原文 We are good to go, the K100 was the issue. Once I switched over to the K120q and tested re-provisioning, all VMs came up normally. We had users on the new vGPU profile today without issues. I think Nvidia should drop the K100 from their deployment documentation as it definitely impacted us. I realize the K100 and K120q have the same user density but a good POC means you test up in complexity. I would have avoided the K100 if it was marked as legacy. Thanks Jason for being very responsive and informative! That was the exact info I needed to help root cause the issue. |
|
|
|
很高兴知道它已经解决了!
我会提出关于放弃K100的观点。 它有一些原因可以坚持下去,但我们总能问...... 以上来自于谷歌翻译 以下为原文 Good to know it's resolved! I'll raise the point about dropping the K100. There are some reasons for it to persist, but we can always ask... |
|
|
|
只有小组成员才能发言,加入小组>>
使用Vsphere 6.5在Compute模式下使用2个M60卡遇到VM问题
3125 浏览 5 评论
是否有可能获得XenServer 7.1的GRID K2驱动程序?
3531 浏览 4 评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-12-22 17:40 , Processed in 0.935570 second(s), Total 88, Slave 72 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号