完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
嗨,
也许有人经历过类似的问题。 我们用: HP Proliant DL380 Gen9服务器最新固件(881936_001_spp-2017.07.2-SPP2017072.2017_0922.6) Mesla M60最新驱动程序(NVIDIA-vGPU-xenserver-7.1-384.73.x86_64.rpm) Windows 7 Enterprise 64Bit,最新驱动程序(385.41_grid_win8_win7_server2012R2_server2008R2_64bit_international.exe) Xendesktop(Win7)和XenApp(Windows Server 2012 R2),7.13 XenServer 7.1,应用了最新更新{/。] [。] GRID M60-0B配置文件512MB 自从我们更新到最新的驱动程序NVIDIA-GRID-XenServer-7.1-384.73-385.41后,我们看到各种虚拟机在人们正在进行操作时才冻结。 Win7操作系统崩溃了。 当Delivery Controller尝试引导新VM时,我们在Citrix XenCenter中也会看到以下问题:运行此VM所需的模拟器无法启动。 同样适用于XenApp服务器,冻结和Vis挂起,最终崩溃。 在XenServer的控制台中,nvidia-smi显示一张卡使用100%vgpu。 2017年10月2日星期一11:54:15 + ------------------------------------------------- ---------------------------- + | NVIDIA-SMI 384.73驱动程序版本:384.73 | | ------------------------------- + ----------------- ----- + ---------------------- + | GPU名称持久性-M | Bus-Id Disp.A | 挥发性的Uncorr。 ECC | | Fan Temp Perf Pwr:用法/上限| 内存使用| GPU-Util Compute M. | | =============================== + ================= ===== + ====================== | | 0特斯拉M60开| 00000000:86:00.0关闭| 关| | N / A 45C P8 25W / 150W | 3066MiB / 8191MiB | 0%默认值| + ------------------------------- + ----------------- ----- + ---------------------- + | 1特斯拉M60开| 00000000:87:00.0关闭| 关| | N / A 48C P0 58W / 150W | 18MiB / 8191MiB | 100%默认值| + ------------------------------- + ----------------- ----- + ---------------------- + 人们可能会认为这是某种内存消耗,但我们发现当内存和GPU没有完全负载时,这只是突然发生的。 就在这之前的状态: 时间戳名称pci.bus_id driver_version pstate pcie.link.gen.max pcie.link.gen.current temperature.gpu utilization.gpu [%] utilization.memory [%] memory.total [MiB] memory.free [MiB] memory。 二手[MiB] 02.10.2017 09:03特斯拉M60 00000000:87:00.0 384.73 P0 3 3 40 1%0%8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:03特斯拉M60 00000000:87:00.0 384.73 P0 3 3 40 3%0%8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:03特斯拉M60 00000000:87:00.0 384.73 P0 3 3 41 16%1%8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:03特斯拉M60 00000000:87:00.0 384.73 P0 3 3 41 100%0%8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:04特斯拉M60 00000000:87:00.0 384.73 P0 3 3 43 100%0%8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:04特斯拉M60 00000000:87:00.0 384.73 P0 3 3 43 100%0%8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:04特斯拉M60 00000000:87:00.0 384.73 P0 3 3 44 100%0%8191 MiB 3093 MiB 5098 MiB 正如你所看到的那样,负载并没有那么多。 令人遗憾的是,用户在VM崩溃时失去了工作。 除了VM无法再次启动之外,临时解决问题的唯一方法是重新启动XenServer。 可悲的是,这无济于事,因为它会很快再次发生。 我们不得不从VM中移除所有GPU,... Citrix声称这个问题不是他们的问题。 现在一切都指向Nvidia。 我们在2017年9月27日首次看到了这个问题。 以上来自于谷歌翻译 以下为原文 Hi, Maybe someone experienced a similar issue. We use:
Since we updated to the latest driver NVIDIA-GRID-XenServer-7.1-384.73-385.41 we see various VM's just freezing while people are working on it. The Win7 OS crashes. We also see in Citrix XenCenter the following issue when Delivery Controller tries to boot new VM's: An emulator required to run this VM failed to start. Same applies to for XenApp Servers, freeze and Vis hanging, finally crashes. In the console of the XenServer, nvidia-smi shows that one card is at 100% vgpu use. Mon Oct 2 11:54:15 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.73 Driver Version: 384.73 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M60 On | 00000000:86:00.0 Off | Off | | N/A 45C P8 25W / 150W | 3066MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla M60 On | 00000000:87:00.0 Off | Off | | N/A 48C P0 58W / 150W | 18MiB / 8191MiB | 100% Default | +-------------------------------+----------------------+----------------------+ One could think that this is some kind of memory exhaust, but we see that this just happens out of the blue when memory and gpu is not fully under load. Here the state just before this happens: timestamp name pci.bus_id driver_version pstate pcie.link.gen.max pcie.link.gen.current temperature.gpu utilization.gpu [%] utilization.memory [%] memory.total [MiB] memory.free [MiB] memory.used [MiB] 02.10.2017 09:03 Tesla M60 00000000:87:00.0384.73 P033401%0% 8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:03 Tesla M60 00000000:87:00.0384.73 P033403%0% 8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:03 Tesla M60 00000000:87:00.0384.73 P0334116%1% 8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:03 Tesla M60 00000000:87:00.0384.73 P03341100%0% 8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:04 Tesla M60 00000000:87:00.0384.73 P03343100%0% 8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:04 Tesla M60 00000000:87:00.0384.73 P03343100%0% 8191 MiB 3093 MiB 5098 MiB 02.10.2017 09:04 Tesla M60 00000000:87:00.0384.73 P03344100%0% 8191 MiB 3093 MiB 5098 MiB As you can see the load was not as much before this happened. The sad thing about this, users lose their work as VM's crash. On top of that VM's cannot start again, the only thing that resolves the issue on a temporary basis is to reboot the XenServer. Sadly enough this will not help, since it will happen again quickly. We had to remove all our GPU's from VM's,... Citrix claims this issue not their problem. Everything points to Nvidia at the moment. We saw this issue first 27.09.2017. |
|
相关推荐
11个回答
|
|
也许作为插件,我们不使用HDX PRO 3D,我们在XenDesktop环境中使用标准的VDA部署。
以上来自于谷歌翻译 以下为原文 Maybe as an addon, we don't use HDX PRO 3D, we use standard VDA deployments forour XenDesktop environement. |
|
|
|
你好
您是否尝试过不同的vGPU配置文件大小? 也许1B简介? 我认为你有SUM,你有没有用NVIDIA提出它? 如果上述两种情况都失败了,您是否可以回到之前的驱动程序,以便在Dev平台上进行故障排除时为您提供稳定性? 问候 以上来自于谷歌翻译 以下为原文 Hi Have you tried a different vGPU profile size? Maybe the 1B profile? I take it you have SUMs, have you raised it with NVIDIA? Failing both of the above, can you not role back to the previous driver that was working to give you stability whilst you troubleshoot on a Dev platform? Regards |
|
|
|
感谢您的回复。
是的,我们有SUMS,是的,我们已经提出了NVIDIA的问题(到目前为止还没有解决方案)。 回滚可能是一个选项,我们现在只需删除GPU,因为我们必须有快速解决方案。 也可以尝试1GB的配置文件,但是我只能运行64个用户,所以我需要更多的M60ties。 由于我们在标准VDA模式下使用Win7,我们认为512 Profile将是正确的。 我们的测试环境暂时没有任何M60,这些卡非常昂贵:-)。 以上来自于谷歌翻译 以下为原文 Thanks for your reply. Yes we have SUMS and yes, we've raised the issue with NVIDIA (no solution so far). Roll back could be an option, we simply removed the GPU for now, since we had to have quick solution. The 1GB profile could be tried as well, but then I can run only 64 users, so I would need more M60ties for that. Since we use Win7 in standard VDA mode we thought the 512 Profile will just be right. Our test environment does not have any M60 in it for the moment, those cards are quite expensive :-). |
|
|
|
|
|
|
|
在我的PM之后需要考虑的其他事情...你可能更好地调查使用M10而不是M60s。
它们比M60便宜,但拥有两倍的Framebuffer和两倍的GPU,因此您可以在保持当前VM /服务器密度的同时为用户提供1GB的容量。 M10的性能低于M60,但如果你只分配512MB,那么这些显然不是高性能用户。 此外,如果您只分配512MB,那么您甚至不使用NVEnc,因为这仅适用于1GB配置文件和更高版本。 如果您希望每个物理服务器具有更好的密度,那么您可能需要查看XenApp模型(再次使用M10)。 显然取决于使用的应用程序,安全要求等。 看看您的开发平台上的M10,看看您的想法......使用我的PM作为指导,找到一个用于测试... 问候 以上来自于谷歌翻译 以下为原文 Something else to consider after my PM ... You may be better investigating using M10s rather than M60s. These are cheaper than M60s but have twice the Framebuffer and twice the amount of GPUs, so you would be able to give your users 1GB whilst maintaining current VM / server density. The M10 offers less performance than the M60, but if you're only allocating 512MB, then these are clearly not high performance users. Also, if you're only allocating 512MB, then you're not even using NVEnc, as this is only available on 1GB profiles and higher. If you want better density per physical server, then you might want to look at a XenApp model (again using the M10). Obviously depending on applications being used, security requirements etc etc. Have a look at an M10 on your dev platform and see what you think ... Use my PM as guidance for locating one for testing ... Regards |
|
|
|
是的,明白你的意思。
对于测试环境,我们会考虑使用M10。 由于它不是同一张卡,我们可能没有相同的问题。 我们现在将1GB配置文件分配给一些测试用户。 只是为了看看我们是否可以重现这个问题。 我们还使用XenApp将应用程序推送到XenDesktop,但只有XenApp不能为我们的用户工作 - 我担心 - 我们需要同时保留XenDesktop和XenApp。 你是对的,我们的用户在这个意义上不是高端用户。 我们可以通过Nvidia卡来平整CPU使用率,用户肯定会有更好的GUI体验。 我们也使用Bloomberg,Thomsone Reuters等,这些卡也受益于...... 有点令人失望的是,这个严重的问题首先发生,Nvidia的支持到目前为止有点受限,...... 最好的祝福 以上来自于谷歌翻译 以下为原文 Yes, got your point. For the test environment we will go for the M10 I think. Since it is not the same card we might not have the same issue. We will assign the 1GB profile to some test users now. Just to see if we can reproduce the issue. We also use XenApp to push apps to the XenDesktop, but only XenApp will not work for our users I'm afraid - we need to keep both XenDesktop and XenApp. You are right, our users are not high end users in that sense. We can flatten CPU Usage in general with the Nvidia cards, users have a better GUI experience for sure. We also use Bloomberg, Thomsone Reuters etc. which benefit from the cards as well,... What is a bit disappointing is that such a severe issue is happening in the first place and Nvidia support is a bit limited so far,... Best regards |
|
|
|
您何时通过NVIDIA支持(日期/时间)提出通话?
你有回复吗? 我认为你有个案号码吗? 如果你愿意,请随意PM我 问候 以上来自于谷歌翻译 以下为原文 When did you raise the call with NVIDIA Support (Date / Time)? Have you had any response back yet? I take it you have a case number? Feel free to PM me that if you like ... Regards |
|
|
|
09/27/2017,03:37 AM
门票ID:170927-000048 猜猜是什么,而不是解决方案,对Nvidia的支持非常非常失望,我必须说。 问候 以上来自于谷歌翻译 以下为原文 09/27/2017, 03:37 AM ticket ID: 170927-000048 Guess what, not solution yet, very very disappointed by support of Nvidia I must say. Regards |
|
|
|
我会请别人看一看,看看机票上发生了什么......
问候 以上来自于谷歌翻译 以下为原文 I'll ask someone to take a look and see what's happening with the ticket ... Regards |
|
|
|
这是一个简短的更新。
Nvidia没有解决问题,支持非常非常有限,意味着不存在。 好吧,我们知道这个问题也发生在1 GB配置文件中。 到目前为止的好消息是测试显示,如果我们通过禁用此服务来禁用Windows 7 Areo:桌面窗口管理器会话管理器(服务名称:UxSms),问题根本没有发生。 最好的祝福 以上来自于谷歌翻译 以下为原文 Here a short update. Nvidia did not solve the issue, support is very very very limited, meaning not existing. Well, we know by know that this issue happens also with the 1 GB profile as well. The good news so far is that testing showed that if we disable Windows 7 Areo by disabling this service: Desktop Window Manager Session Manager (Service Name: UxSms) the issue did not occure any at all. Best regards |
|
|
|
|
|
|
|
只有小组成员才能发言,加入小组>>
使用Vsphere 6.5在Compute模式下使用2个M60卡遇到VM问题
3082 浏览 5 评论
是否有可能获得XenServer 7.1的GRID K2驱动程序?
3498 浏览 4 评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-11-28 11:09 , Processed in 0.693928 second(s), Total 65, Slave 58 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号