英伟达
直播中

刘京

7年用户 207经验值
私信 关注
[问答]

无法初始化NVML:未知错误

你好,
我是新手,所以请耐心:)
我们有一个运行VMware 6.0U1a的3节点VMware集群。
我们刚刚在我们的一个主机中安装了Nvidia Grid K1。
主机是IBM 3850 X6,我们已将卡安装在插槽4中,插槽4是属于CPU3的PCIe x16插槽
我已按照部署指南进行了相应的更改BIOS设置:
*内存映射配置基本内存窗口 - 从自动更改为2 GB  - (我认为它应该低于4 GB)
* 64位PCI资源 - 从启用更改为禁用
我安装了虚拟GPU管理器:
esxcli软件振动列表|
grep -i nvidia
NVIDIA-vgx-VMware_ESXi_6.0_Host_Driver 346.42-1OEM.600.0.0.2159203 NVIDIA VMwareAccepted 2015-12-14
模块已加载:
esxcfg-module -l |
grep nvidia
nvidia 0 8420
当我运行nvidia-smi命令时:
NVIDIA-SMI
无法初始化NVML:未知错误
vmkernel.log中没有输出:
cat /var/log/vmkernel.log |
grep NVRM
[根@ ESX-F-1:在/ var /日志]
VMware似乎并未意识到Nvidia卡。
它只能找到板载显卡:
lspci |
grep -i显示
0000:1b:00.0显示控制器:Matrox Electronics Systems Ltd. G200eR2
我已经很久没遇到这个问题,所以我真的希望你能提供帮助。
/迈克尔

以上来自于谷歌翻译


以下为原文

Hello,
I'm pretty new at this so please have patience :)

We have a 3 node VMware cluster running VMware 6.0U1a. We have just installed a Nvidia Grid K1 in one of our hosts.

The host is an IBM 3850 X6 and we have installed the card in slot 4 which is a PCIe x16 slot belonging to CPU3

I have followed the deployment guide and have changed the BIOS settings accordingly:
* Memory Mapped Config Base memory window - changed from auto to 2 GB - (I think it supposed to be below 4 GB)
* 64-bit PCI Resource - changed from Enabled to Disabled


I have installed the Virtual GPU manager:
esxcli software vib list | grep -i nvidia
NVIDIA-vgx-VMware_ESXi_6.0_Host_Driver  346.42-1OEM.600.0.0.2159203           NVIDIA  VMwareAccepted    2015-12-14


The module is loaded:
esxcfg-module -l | grep nvidia
nvidia                   0    8420


When I run the nvidia-smi command:
nvidia-smi
Failed to initialize NVML: Unknown Error


Theres no output in the vmkernel.log:
cat /var/log/vmkernel.log | grep NVRM
[root@ESX-F-1:/var/log]


VMware doesn't seem to be aware of the Nvidia card. It only finds the onboard graphics card:
lspci | grep -i display
0000:1b:00.0 Display controller: Matrox Electronics Systems Ltd. G200eR2


I have struggled with this issue quite some time now so I really hope you can help.

/Michael

回帖(6)

李桂香

2018-9-7 16:57:04
您是否在vSphere中为PCI Passthrough配置了K1?
如果你这样做,你需要撤消它。

以上来自于谷歌翻译


以下为原文

Do you have the K1's configured for PCI Passthrough in vSphere?

If you do, you need to undo that.
举报

王淑英

2018-9-7 17:04:36
嗨,杰森,
否它没有在VMware中配置为passthrough

以上来自于谷歌翻译


以下为原文

Hi Jason,
No It's not configured for passthrough in VMware
举报

李兆水

2018-9-7 17:13:40
“我们已将卡安装在插槽4中,这是一个属于CPU3的PCIe x16插槽”
是否填充了CPU插槽?

以上来自于谷歌翻译


以下为原文

"we have installed the card in slot 4 which is a PCIe x16 slot belonging to CPU3"

Is that CPU socket populated?
举报

陈桂平

2018-9-7 17:26:55
嗨,杰森,
是的,CPU插槽已填充 - 但谢谢

以上来自于谷歌翻译


以下为原文

Hi Jason,

Yes the CPU socket is populated - but thanks
举报

更多回帖

发帖
×
20
完善资料,
赚取积分