英伟达
直播中

李件杰

8年用户 174经验值
私信 关注
[问答]

GRID K2不适用于XenServer 6.5SP1

我们在K2池中有两个XenServer主机,但其中一个不起作用。
只有K2P机器可以启动它。
K200,K220,K240,K260和K280机器无法启动。
来自XenServer的错误消息是内部错误:xenopsd内部错误:Device.Ioemu_failed(“vgpu意外退出”)。
我尝试了两个版本的friver,分别是367.43和367.64,它们都没有用。
查看日志文件并获取以下信息:
1月17日11:24:18 xxxxxxxxxxen032内核:[417949.279318]块tda:扇区大小:512/512容量:629145600
1月17日11:24:18 xxxxxxxxxxen032内核:[417949.478882]设备vif428.0进入混杂模式
1月17日11:24:18 xxxxxxxxxxen032内核:[417949.538774] IPv6:ADDRCONF(NETDEV_UP):vif428.0:链接未准备好
1月17日11:24:49 xxxxxxxxxxen032内核:[417980.526518] xapi7:端口2(vif428.0)进入禁用状态
1月17日11:24:49 xxxxxxxxxxen032内核:[417980.526628]设备vif428.0离开混杂模式
1月17日11:24:49 xxxxxxxxxxen032内核:[417980.526630] xapi7:端口2(vif428.0)进入禁用状态
1月17日11:24:56 xxxxxxxxxxen032内核:[417987.488401] NVRM:RmInitAdapter失败了!
(0X24:0x40的:1035)
我想问一下之前遇到过这种问题的人,好吗?
谢谢。

以上来自于谷歌翻译


以下为原文

We have two XenServer hosts in a K2 pool, but one of them does not work. Only K2P machines can start on it. K200, K220, K240, K260 and K280 machines could not start on it. Error message from XenServer is Internal error: xenopsd internal error: Device.Ioemu_failed("vgpu exited unexpectedly"). I tried two version of friver as 367.43 and 367.64, neither of them worked. View log files and obtain below information:

Jan 17 11:24:18 xxxxxxxxxxen032 kernel: [417949.279318] block tda: sector-size: 512/512 capacity: 629145600
Jan 17 11:24:18 xxxxxxxxxxen032 kernel: [417949.478882] device vif428.0 entered promiscuous mode
Jan 17 11:24:18 xxxxxxxxxxen032 kernel: [417949.538774] IPv6: ADDRCONF(NETDEV_UP): vif428.0: link is not ready
Jan 17 11:24:49 xxxxxxxxxxen032 kernel: [417980.526518] xapi7: port 2(vif428.0) entered disabled state
Jan 17 11:24:49 xxxxxxxxxxen032 kernel: [417980.526628] device vif428.0 left promiscuous mode
Jan 17 11:24:49 xxxxxxxxxxen032 kernel: [417980.526630] xapi7: port 2(vif428.0) entered disabled state
Jan 17 11:24:56 xxxxxxxxxxen032 kernel: [417987.488401] NVRM: RmInitAdapter failed! (0x24:0x40:1035)

I want to ask who encountered this kind of issue before, please?
Thank you.

回帖(5)

田晴

2018-9-30 10:51:21
我想主机/ Dom0驱动程序无法正常启动。
vGPU虚拟化需要host / Dom0驱动程序(例如Kxxx)。
-  NVidia应解码“(0x24:0x40:1035)”错误。
- 您可以尝试“nvidia-smi”或“nvidia-smi --debug = logfile”。
- 您可以尝试使用“NVRM”或“nvidia”标签“grep”相关的系统日志。
- 类似的问题(相同的错误)被谷歌打到这里https://devtalk.nvidia.com/default/topic/957827/geforce-maxwell-titan-x-and-pascal-titan-x-in-same-
machine- /并在一些硬件不兼容问题中得出结论。
- 您可以尝试在主机之间比较“pci”资源分配“lspci -nv | more”(或“lspci -nvv | more”)(搜索“10de:11bf”K2 gpu芯片并比较“Memory”/“Region”
“......)。
(例如,您忘记在BIOS中启用“64位”pci资源分配,但您的服务器hw + config + BIOSversion未知提示。)。

以上来自于谷歌翻译


以下为原文


I suppose that host/Dom0 driver does not start correctly. The host/Dom0 driver is needed for vGPU virtualization (eg. Kxxx).
- NVidia should decode "(0x24:0x40:1035)" error.
- You can try "nvidia-smi" or "nvidia-smi --debug=logfile".
- You can try to "grep" relevant system logs with "NVRM" or "nvidia" tags.
- The similar problem (the same error) is hit by google here https://devtalk.nvidia.com/default/topic/957827/geforce-maxwell-titan-x-and-pascal-titan-x-in-same-machine-/ and it concludes in some HW incompatibility problem.
- You can try to compare "pci" resource assignment "lspci -nv | more" (or "lspci -nvv | more") between host machines (search for "10de:11bf" K2 gpu chips and compare "Memory"/"Region" ...). (for example you forgot to enable "64bit" pci resource assignments in BIOS but your server hw+config+BIOSversion are unknown to hint.).
举报

刘嘉佳

2018-9-30 10:56:40
None
以上来自于谷歌翻译


以下为原文

I think it might be best to post on the Citrix forums or raise a support case with them as the errors are all xapi...


http://discussions.citrix.com/forum/523-gpu-technologies/ might be a good place as the XS team monitor it.

Check the host license and host driver and update the host BIOS is the best I can think of.
举报

李昕羿

2018-9-30 11:11:10
值得检查32位(6.2 XS及更低)> 4GB MMIO需要禁用的管理程序版本:http://discussions.citrix.com/topic/351335-every-4th-guest-fails-with-error-vgpu-exited
-不料/

以上来自于谷歌翻译


以下为原文

worth checking your hypervisor version on 32-bit (6.2 XS and lower) >4GB MMIO needs disabling: http://discussions.citrix.com/topic/351335-every-4th-guest-fails-with-error-vgpu-exited-unexpectedly/
举报

陈鹏

2018-9-30 11:24:56
如果电源线出现问题,可能会出现错误:https://support.citrix.com/article/CTX210153

以上来自于谷歌翻译


以下为原文

Error can arise if power cable issue: https://support.citrix.com/article/CTX210153
举报

更多回帖

发帖
×
20
完善资料,
赚取积分