完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
我们有一个干净的SuperMicro服务器并安装了VMWare ESXI 6.0.2 build 360759.进入“维护模式”,然后按照步骤安装最新指南文档中的NVDIA主机驱动程序,重新启动并关闭“维护模式”。
一旦它恢复,我们通过各种命令进行验证,它们都验证了,除非使用nvidia-smi命令返回:无法初始化NVML:未知错误 注意:相同的硬件在GRID 2.0中正常工作 硬件/软件清单: Supermicro Chassis 1028GQ-TRT 双至强E5-2600v3 2.60 256 gib内存 (4)安装了NVIDIA M-60卡并处于图形模式 (2)480 gig SSD硬盘 ESXI 6.0.2 build 360759 NVIDIA-的vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib 安装步骤: •在干净的系统上安装ESXI 6 •启用SSH •尚未设置虚拟机或数据存储区 •通过vSphere在服务器上设置时钟:ntp.org.pool •在Configuration / Software / Advanced Settings / VMkernel / Boot下检查:“VMkernel.Boot.disableACSCheck “然后点击”确定“ •进入维护模式 •从此链接下的最新产品版本下的NVIDIA许可中心下载网格软件: •https://nvidia.flexnetoperations.com/control/nvda/viewRecentProductReleases •抓住4月4日发布的适用于vSphere 6.0的Grid 3.0 •复制:NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib到tmp文件夹 •SSH进入服务器 •Ran:esxcli软件振动安装-v /tmp/NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib 结果: 安装结果 消息:操作成功完成。 需要重新启动:false 已安装VIB:NVIDIA_bootbank_NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585 VIB已删除: VIB跳过: ? 重新启动服务器并关闭维护模式 ? SSH进入服务器并验证安装 验证结果: [root @ localhost:〜] esxcli软件振动列表| grep -i nvidia NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver 361.40-1OEM.600.0.0.2494585 NVIDIA VMwareAccepted 2016-05-04 [root @ localhost:〜] vmkload_mod -l | grep nvidia nvidia 0 10012 [root @ localhost:〜] esxcfg-module -l | grep nvidia nvidia 0 10012 [root @ localhost:〜] nvidia-smi 无法初始化NVML:未知错误 在ESXI上没有passthru的设置它只是一个没有虚拟机或数据存储的干净服务器 如果我们继续并继续尝试创建VM并通过vCenter设置添加M60卡,则不会列出任何配置文件。 我们已经在另外两台具有不同CPU的服务器上进行了此操作,但是相同的软件和M60卡已经多次完成相同的结果。 请帮忙。 谢谢! 亚历克斯 以上来自于谷歌翻译 以下为原文 We have a clean SuperMicro server and installed VMWare ESXI 6.0.2 build 360759. Entered "Maintenance Mode", then followed the steps and installed the NVDIA Host Driver from the latest guide docs, rebooted and turned off "Maintenance Mode". Once it came back up and we ssh'd in to verify it via various commands they all verified except when using the nvidia-smi command which returns: Failed to initialize NVML: Unknown Error NOTE: The same hardware worked properly in GRID 2.0 Hardware/Software list: Supermicro Chassis 1028GQ-TRT Dual Xeon E5-2600v3 2.60 256 gib memory (4) NVIDIA M-60 cards installed and in graphics mode (2) 480 gig SSD drives ESXI 6.0.2 build 360759 NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib Steps taken to instal: •Installed ESXI 6 on a clean system •Enabled SSH •No vm’s or Datastores setup yet •Setup clock on server: ntp.org.pool via vSphere •Checked off under Configuration/Software/Advanced Settings/VMkernel/Boot: ”VMkernel.Boot.disableACSCheck " and clicked "OK" •Entered Maintenance Mode •Downloaded Grid software from NVIDIA License center under Recent Product Releases from this link: •https://nvidia.flexnetoperations.com/control/nvda/viewRecentProductReleases •Grabbed the April 4th release of Grid 3.0 for vSphere 6.0 •Copied: NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib to tmp folder •SSH’d into server •Ran: esxcli software vib install -v /tmp/NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib Result: Installation Result Message: Operation finished successfully. Reboot Required: false VIBs Installed: NVIDIA_bootbank_NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585 VIBs Removed: VIBs Skipped: ?Rebooted Server and turned off Maintenance mode ?SSH'd into server and verified install Verify Results: [root@localhost:~] esxcli software vib list | grep -i nvidia NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver 361.40-1OEM.600.0.0.2494585 NVIDIA VMwareAccepted 2016-05-04 [root@localhost:~] vmkload_mod -l | grep nvidia nvidia 0 10012 [root@localhost:~]esxcfg-module -l | grep nvidia nvidia 0 10012 [root@localhost:~] nvidia-smi Failed to initialize NVML: Unknown Error No passthru's setup on ESXI it's just a clean server with no vm's or datastores If we continue and go forward try to create VM's and setup via vCenter to add the M60 card, none of the profiles are listed. We've gone through this on 2 other servers with different CPU's but the same software and M60 cards several times with the exact same results. Please help. Thanks! Alex |
|
相关推荐
20个回答
|
|
嗨,亚历克斯,
我担心自己不是VMware专家。 但我正在检查支持和产品团队的已知问题。 您有权获得M60和GRID 3.0的全面支持 - 您是否已提出支持案例? 最好的祝愿, 雷切尔 以上来自于谷歌翻译 以下为原文 Hi Alex, I'm afraid I'm not a VMware expert myself. But I'm checking for known issues with the support and product teams. You are entitled to full support with M60 and GRID 3.0 - have you raised a support case yet? Best wishes, Rachel |
|
|
|
不,我没有提出支持案例。
我应该继续创建吗? 以上来自于谷歌翻译 以下为原文 No I haven't raised a support case. Should I go ahead and create one? |
|
|
|
是的,这将是好的,提出一个支持案例和pm(个人信息)我的数字,我会密切关注它。
我们的一位工程师已经在研究这个问题。 请将我的名字添加到故障单中,这样前线就不必追逐信息,我会把它们填入。 雷切尔 以上来自于谷歌翻译 以下为原文 Yes that would be good, raise a support case and pm (personal message) me the number and I'll keep an eye on it. One of our engineers is already looking into this. Please add my name to the ticket so frontline don't have to chase info and I'll fil them in. Rachel |
|
|
|
在ESXi主机CLI上,请运行
lspci -n | grep 10de 然后在这里发布结果。 以上来自于谷歌翻译 以下为原文 At the ESXi host CLI please run lspci –n | grep 10de then post the result here. |
|
|
|
嗨,杰森,这是结果
[root @ localhost:〜] lspci -n | grep 10de 0000:04:00.0 0300:10de:13f2 [vmgfx6] 0000:05:00.0 0300:10de:13f2 [vmgfx7] 0000:08:00.0 0300:10de:13f2 [vmgfx4] 0000:09:00.0 0300:10de:13f2 [vmgfx5] 0000:83:00.0 0300:10de:13f2 [vmgfx2] 0000:84:00.0 0300:10de:13f2 [vmgfx3] 0000:87:00.0 0300:10de:13f2 [vmgfx0] 0000:88:00.0 0300:10de:13f2 [vmgfx1] 以上来自于谷歌翻译 以下为原文 Hi Jason, here are the results [root@localhost:~] lspci -n | grep 10de 0000:04:00.0 Class 0300: 10de:13f2 [vmgfx6] 0000:05:00.0 Class 0300: 10de:13f2 [vmgfx7] 0000:08:00.0 Class 0300: 10de:13f2 [vmgfx4] 0000:09:00.0 Class 0300: 10de:13f2 [vmgfx5] 0000:83:00.0 Class 0300: 10de:13f2 [vmgfx2] 0000:84:00.0 Class 0300: 10de:13f2 [vmgfx3] 0000:87:00.0 Class 0300: 10de:13f2 [vmgfx0] 0000:88:00.0 Class 0300: 10de:13f2 [vmgfx1] |
|
|
|
|
|
|
|
这些GPU是由NVIDIA直接提供的,因为我们是NVIDIA的合作伙伴,我们自己安装它们。
相同的GPU刚刚在此服务器上成功完成了NVQUAL。 以上来自于谷歌翻译 以下为原文 These GPU's were provided from NVIDIA directly as we are an NVIDIA partner and we installed them ourselves. The same GPU's just successfully completed the NVQUAL on this server. |
|
|
|
这表明M60 / M6 GPU已在图形模式下正确设置 - 任何遇到M60 / M6问题的人都可以轻松地仔细检查这一点,请遵循以下建议 http://nvidia.custhelp.com/app/answers/detail
/ a_id / 4106 / [/ url]。 我担心我对这个案子没有任何建议。 以上来自于谷歌翻译 以下为原文 This shows the M60 / M6 GPU is correctly set in graphics mode - anyone else experiencing M60 / M6 issues with can double-check this easily, following this advice [url]http://nvidia.custhelp.com/app/answers/detail/a_id/4106/ . I'm afraid I haven't got any suggestions for this case though. |
|
|
|
我尝试过的更多东西
我取出了4个GPU中的3个并禁用了SuperMicro BIOS中的“高于4G解码”。 (1)GPU和“4g以上解码”禁用并运行nvidia-smi以正确返回: [root @ localhost:〜] nvidia-smi 2016年5月5日星期五19:18:01 + ------------------------------------------------- ----- + | NVIDIA-SMI 361.40驱动程序版本:361.40 | | ------------------------------- + ----------------- ----- + ---------------------- + | GPU名称持久性-M | Bus-Id Disp.A | 挥发性的Uncorr。 ECC | | Fan Temp Perf Pwr:用法/上限| 内存使用| GPU-Util Compute M. | | =============================== + ================= ===== + ====================== | | 0特斯拉M60开| 0000:83:00.0关| 关| | N / A 38C P8 24W / 150W | 19MiB / 8191MiB | 0%默认值| + ------------------------------- + ----------------- ----- + ---------------------- + | 1特斯拉M60开| 0000:84:00.0关| 关| | N / A 33C P8 24W / 150W | 19MiB / 8191MiB | 0%默认值| + ------------------------------- + ----------------- ----- + ---------------------- + + ------------------------------------------------- ---------------------------- + | 进程:GPU内存| | GPU PID类型进程名称用法| | ================================================= ============================ | | 找不到正在运行的进程| + --------------------------------------- 我然后启用“4g以上解码”并离开(1)卡并运行nvidia-smi时得到:“无法初始化NVML:未知错误” 我安装了第二个GPU并禁用“4g以上解码”并启动并获得BIOS错误:“检测到的PCI资源不足”因此除非安装了超过(1)GPU的“4g以上解码”,否则它将无法启动。 所以我重新启动了“4g以上解码”并尝试了nvidia-smi并得到了相同的“无法初始化NVML:未知错误” 基本上它只用于安装(1)卡和禁用“4g以上解码”。 以上来自于谷歌翻译 以下为原文 More things I’ve tried I took out 3 of the 4 GPUs and disabled "Above 4G Decoding" in the SuperMicro BIOS. (1) GPU and “Above 4g Decoding” disabled and ran nvidia-smi to properly return: [root@localhost:~] nvidia-smi Thu May 5 19:18:01 2016 +------------------------------------------------------+ | NVIDIA-SMI 361.40 Driver Version: 361.40 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M60 On | 0000:83:00.0 Off | Off | | N/A 38C P8 24W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla M60 On | 0000:84:00.0 Off | Off | | N/A 33C P8 24W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +--------------------------------------- I then enabled “Above 4g Decoding” and left the (1) card and got the: “Failed to initialize NVML: Unknown Error” when running nvidia-smi I installed a 2nd GPU and disabled “Above 4g Decoding” and booted and get the BIOS ERROR: “Insufficient PCI Resources Detected” So it won’t boot unless the “Above 4g Decoding” is enabled with more than (1) GPU installed. So I rebooted with “Above 4g Decoding” enabled and tried the nvidia-smi and got the same ”Failed to initialize NVML: Unknown Error” Basically it’s only working with only (1) card installed and “Above 4g Decoding” disabled. |
|
|
|
https://gridforums.nvidia.com/default/topic/526/necessary-to-disable-quot-above-4g-decoding-quot-for-view-with-vgpu-/?
https://gridforums.nvidia.com/default/topic/546/nvidia-grid-vgpu/mmio-above-4-gb-esxi-6-0u1-vgpu/? (http://www.supermicro.com/support/faqs/faq.cfm?faq=20016?) NVQUAL:对于NVIDIA vGPU应用程序,GPU应映射到4GB地址空间(BAR1 JS:在ESXi中,您必须将MMIO设置为低于4G。 VMware文章是正确的。 尽管ESXi是64位虚拟机管理程序,但它仍然具有此限制。 这些陈述今天仍然有效吗? 你可以使用GPU作为passthrough(vDGA)并启用“4g以上解码”(https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2139299)。 以上来自于谷歌翻译 以下为原文 https://gridforums.nvidia.com/default/topic/526/necessary-to-disable-quot-above-4g-decoding-quot-for-view-with-vgpu-/ ? https://gridforums.nvidia.com/default/topic/546/nvidia-grid-vgpu/mmio-above-4-gb-esxi-6-0u1-vgpu/ ? ( http://www.supermicro.com/support/faqs/faq.cfm?faq=20016 ? ) NVQUAL: For NVIDIA vGPU application, the GPUs should be mapped below the 4GB address space (BAR1<32b). JS: In ESXi you have to have MMIO set to below 4G. The VMware article is correct. Although ESXi is a 64bit hypervisor it still has this restriction. Are these statements still valid today ? You can probably use GPU as passthrough (vDGA) with enabled “Above 4g Decoding” ( https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2139299 ). |
|
|
|
|
|
|
|
现在我们到了某个地方。
是否安装了其他PCI设备? 您是否在机箱的最新BIOS修订版中? 您是否已检查VMware HCL以获取此配置? 您是否有办法尝试不限于以下4G解码限制的虚拟机管理程序? (XenServer是最简单的)。 以上来自于谷歌翻译 以下为原文 Now we're getting somewhere. Are there any other PCI devices installed? Are you at the latest BIOS revision for the chassis? Have you checked the VMware HCL for this configuration? Do you have the means to try a hypervisor that is not limited to the below 4G decoding limit? (XenServer is the easiest). |
|
|
|
是的ESXi仍需要低于4G。
这个要求导致只能启用几张卡(例如,一张卡需要> 2x 128MB + 16MB + 32MB)。 512MB-1.5GB的mmio孔是标准配置,但我没有看到任何选项来配置BIOS中的mmio孔尺寸。 我的超微X9DR3-F有2GB mmio孔,我不能添加超过1-2张卡(在Xen dom0和禁用“4g以上解码”)。 我使用XenServer(> = 6.5)并且启用“4g以上解码”工作正常。 如果在ESXi中可用,请尝试分析此程序输出以获取内存PCI BAR映射: #lspci -vv | egrep“^ [a-f0-9] |记忆在” #cat / proc / iomem #dmesg | grep“可用于PCI设备” #dmesg | grep“pci_bus” #dmesg | grep“ [mem” 以上来自于谷歌翻译 以下为原文 Yes ESXi still requires below 4G. This requirements leads to only few cards can be enabled (eg. one card needs >2x 128MB+16MB+32MB). The 512MB-1.5GB of mmio hole is standard but I do not see any options to configure mmio hole size in bios. My supermicro X9DR3-F has 2GB mmio hole and I cannot add more than 1-2 cards (under Xen dom0 and disabled "Above 4g Decoding"). I use XenServer (>=6.5) and it works fine with enabled "Above 4g Decoding". Try to analyze this programs output for memory PCI BAR mappings if available in ESXi: # lspci -vv | egrep "^[a-f0-9]|Memory at" # cat /proc/iomem # dmesg | grep "available for PCI devices" # dmesg | grep "pci_bus" # dmesg | grep "[mem " |
|
|
|
在听完SuperMicro的回复并在这里进行一些调整后,我们现在有2个GPU在工作。
他们建议我们将MMIOHBase设置更改为“2T”。 该设置位于Advanced PCIe / PCI / PnP Configuration下。 我们还在BIOS中更改了以下内容: https://dl.dropboxusercontent.com/u/4009063/PCIe_PCI_PnP_Part_2%20Changes.jpg 以上来自于谷歌翻译 以下为原文 After hearing back from SuperMicro and doing some tweaking here we now have 2 GPUs working. They suggested we change the MMIOHBase setting to “2T”. The setting is located under Advanced PCIe/PCI/PnP Configuration. We also changed the following in the BIOS: https://dl.dropboxusercontent.com/u/4009063/PCIe_PCI_PnP_Part_2%20Changes.jpg |
|
|
|
None
以上来自于谷歌翻译 以下为原文 The above BIOS settings work with 3 cards but when you add the 4th card then only 3 are recognized. I tested the one card itself and it works by itself with all the others removed. I just can't get all 4 to be seen via nvidia-smi and lspci -n | grep 10de [root@localhost:~] lspci -n | grep 10de 0000:04:00.0 Class 0300: 10de:13f2 [vmgfx4] 0000:05:00.0 Class 0300: 10de:13f2 [vmgfx5] 0000:83:00.0 Class 0300: 10de:13f2 [vmgfx2] 0000:84:00.0 Class 0300: 10de:13f2 [vmgfx3] 0000:87:00.0 Class 0300: 10de:13f2 [vmgfx0] 0000:88:00.0 Class 0300: 10de:13f2 [vmgfx1] More fun stuff to figure out. |
|
|
|
我认为这些设置会导致从2TB开始激活“64位”PCI mmio BAR寻址(例如,低于4TB 32位地址限制)。
前几张卡很幸运,mmio BAR低于4TB 32bit限制,内核/驱动程序/ nvidia-smi可见,但第四张卡超出此限制且无法看见。 您可以尝试使用以下命令研究卡片的mmio BAR(10de:13f2)的内存分配吗? #lspci -nvv | egrep“^ [a-f0-9] |记忆在” 以上来自于谷歌翻译 以下为原文 I think that the settings leads to activate "64bit" PCI mmio BAR addressing starting from 2TB (eg. bellow 4TB 32bit address limit). The first few cards get lucky and mmio BAR is under 4TB 32bit limit and visible to kernel/driver/nvidia-smi but 4th card is beyond this limit and invisible. Can you try and study memory assignments of mmio BAR of your cards (10de:13f2) with following command ? # lspci -nvv | egrep "^[a-f0-9]|Memory at" |
|
|
|
None
以上来自于谷歌翻译 以下为原文 Here's what I get with the 4 cards installed: [root@localhost:~] lspci -nvv | egrep "^[a-f0-9]|Memory at" 0000:00:00.0 Class 0600: 8086:2f00 [PCIe RP[0000:00:00.0]] 0000:00:01.0 Class 0604: 8086:2f02 [PCIe RP[0000:00:01.0]] 0000:00:02.0 Class 0604: 8086:2f04 [PCIe RP[0000:00:02.0]] 0000:00:03.0 Class 0604: 8086:2f08 [PCIe RP[0000:00:03.0]] 0000:00:04.0 Class 0880: 8086:2f20 0000:00:04.1 Class 0880: 8086:2f21 0000:00:04.2 Class 0880: 8086:2f22 0000:00:04.3 Class 0880: 8086:2f23 0000:00:04.4 Class 0880: 8086:2f24 0000:00:04.5 Class 0880: 8086:2f25 0000:00:04.6 Class 0880: 8086:2f26 0000:00:04.7 Class 0880: 8086:2f27 0000:00:05.0 Class 0880: 8086:2f28 0000:00:05.1 Class 0880: 8086:2f29 0000:00:05.2 Class 0880: 8086:2f2a 0000:00:05.4 Class 0800: 8086:2f2c 0000:00:11.0 Class ff00: 8086:8d7c 0000:00:11.4 Class 0106: 8086:8d62 [vmhba0] 0000:00:14.0 Class 0c03: 8086:8d31 0000:00:16.0 Class 0780: 8086:8d3a 0000:00:16.1 Class 0780: 8086:8d3b 0000:00:1a.0 Class 0c03: 8086:8d2d 0000:00:1c.0 Class 0604: 8086:8d10 [PCIe RP[0000:00:1c.0]] 0000:00:1c.4 Class 0604: 8086:8d18 [PCIe RP[0000:00:1c.4]] 0000:00:1d.0 Class 0c03: 8086:8d26 0000:00:1f.0 Class 0601: 8086:8d44 0000:00:1f.2 Class 0106: 8086:8d02 [vmhba1] 0000:00:1f.3 Class 0c05: 8086:8d22 0000:02:00.0 Class 0604: 10b5:8747 0000:03:08.0 Class 0604: 10b5:8747 0000:03:10.0 Class 0604: 10b5:8747 0000:04:00.0 Class 0300: 10de:13f2 [vmgfx4] 0000:05:00.0 Class 0300: 10de:13f2 [vmgfx5] 0000:07:00.0 Class 0200: 8086:1528 [vmnic0] 0000:07:00.1 Class 0200: 8086:1528 [vmnic1] 0000:08:00.0 Class 0604: 1a03:1150 0000:09:00.0 Class 0300: 1a03:2000 0000:7f:08.0 Class 0880: 8086:2f80 0000:7f:08.2 Class 1101: 8086:2f32 0000:7f:08.3 Class 0880: 8086:2f83 0000:7f:09.0 Class 0880: 8086:2f90 0000:7f:09.2 Class 1101: 8086:2f33 0000:7f:09.3 Class 0880: 8086:2f93 0000:7f:0b.0 Class 0880: 8086:2f81 0000:7f:0b.1 Class 1101: 8086:2f36 0000:7f:0b.2 Class 1101: 8086:2f37 0000:7f:0c.0 Class 0880: 8086:2fe0 0000:7f:0c.1 Class 0880: 8086:2fe1 0000:7f:0c.2 Class 0880: 8086:2fe2 0000:7f:0c.3 Class 0880: 8086:2fe3 0000:7f:0c.4 Class 0880: 8086:2fe4 0000:7f:0c.5 Class 0880: 8086:2fe5 0000:7f:0c.6 Class 0880: 8086:2fe6 0000:7f:0c.7 Class 0880: 8086:2fe7 0000:7f:0d.0 Class 0880: 8086:2fe8 0000:7f:0d.1 Class 0880: 8086:2fe9 0000:7f:0d.2 Class 0880: 8086:2fea 0000:7f:0d.3 Class 0880: 8086:2feb 0000:7f:0d.4 Class 0880: 8086:2fec 0000:7f:0d.5 Class 0880: 8086:2fed 0000:7f:0f.0 Class 0880: 8086:2ff8 0000:7f:0f.1 Class 0880: 8086:2ff9 0000:7f:0f.2 Class 0880: 8086:2ffa 0000:7f:0f.3 Class 0880: 8086:2ffb 0000:7f:0f.4 Class 0880: 8086:2ffc 0000:7f:0f.5 Class 0880: 8086:2ffd 0000:7f:0f.6 Class 0880: 8086:2ffe 0000:7f:10.0 Class 0880: 8086:2f1d 0000:7f:10.1 Class 1101: 8086:2f34 0000:7f:10.5 Class 0880: 8086:2f1e 0000:7f:10.6 Class 1101: 8086:2f7d 0000:7f:10.7 Class 0880: 8086:2f1f 0000:7f:12.0 Class 0880: 8086:2fa0 0000:7f:12.1 Class 1101: 8086:2f30 0000:7f:12.4 Class 0880: 8086:2f60 0000:7f:12.5 Class 1101: 8086:2f38 0000:7f:13.0 Class 0880: 8086:2fa8 0000:7f:13.1 Class 0880: 8086:2f71 0000:7f:13.2 Class 0880: 8086:2faa 0000:7f:13.3 Class 0880: 8086:2fab 0000:7f:13.6 Class 0880: 8086:2fae 0000:7f:13.7 Class 0880: 8086:2faf 0000:7f:14.0 Class 0880: 8086:2fb0 0000:7f:14.1 Class 0880: 8086:2fb1 0000:7f:14.2 Class 0880: 8086:2fb2 0000:7f:14.3 Class 0880: 8086:2fb3 0000:7f:14.4 Class 0880: 8086:2fbc 0000:7f:14.5 Class 0880: 8086:2fbd 0000:7f:14.6 Class 0880: 8086:2fbe 0000:7f:14.7 Class 0880: 8086:2fbf 0000:7f:16.0 Class 0880: 8086:2f68 0000:7f:16.1 Class 0880: 8086:2f79 0000:7f:16.2 Class 0880: 8086:2f6a 0000:7f:16.3 Class 0880: 8086:2f6b 0000:7f:16.6 Class 0880: 8086:2f6e 0000:7f:16.7 Class 0880: 8086:2f6f 0000:7f:17.0 Class 0880: 8086:2fd0 0000:7f:17.1 Class 0880: 8086:2fd1 0000:7f:17.2 Class 0880: 8086:2fd2 0000:7f:17.3 Class 0880: 8086:2fd3 0000:7f:17.4 Class 0880: 8086:2fb8 0000:7f:17.5 Class 0880: 8086:2fb9 0000:7f:17.6 Class 0880: 8086:2fba 0000:7f:17.7 Class 0880: 8086:2fbb 0000:7f:1e.0 Class 0880: 8086:2f98 0000:7f:1e.1 Class 0880: 8086:2f99 0000:7f:1e.2 Class 0880: 8086:2f9a 0000:7f:1e.3 Class 0880: 8086:2fc0 0000:7f:1e.4 Class 0880: 8086:2f9c 0000:7f:1f.0 Class 0880: 8086:2f88 0000:7f:1f.2 Class 0880: 8086:2f8a 0000:80:02.0 Class 0604: 8086:2f04 [PCIe RP[0000:80:02.0]] 0000:80:03.0 Class 0604: 8086:2f08 [PCIe RP[0000:80:03.0]] 0000:80:04.0 Class 0880: 8086:2f20 0000:80:04.1 Class 0880: 8086:2f21 0000:80:04.2 Class 0880: 8086:2f22 0000:80:04.3 Class 0880: 8086:2f23 0000:80:04.4 Class 0880: 8086:2f24 0000:80:04.5 Class 0880: 8086:2f25 0000:80:04.6 Class 0880: 8086:2f26 0000:80:04.7 Class 0880: 8086:2f27 0000:80:05.0 Class 0880: 8086:2f28 0000:80:05.1 Class 0880: 8086:2f29 0000:80:05.2 Class 0880: 8086:2f2a 0000:80:05.4 Class 0800: 8086:2f2c 0000:81:00.0 Class 0604: 10b5:8747 0000:82:08.0 Class 0604: 10b5:8747 0000:82:10.0 Class 0604: 10b5:8747 0000:83:00.0 Class 0300: 10de:13f2 [vmgfx2] 0000:84:00.0 Class 0300: 10de:13f2 [vmgfx3] 0000:85:00.0 Class 0604: 10b5:8747 0000:86:08.0 Class 0604: 10b5:8747 0000:86:10.0 Class 0604: 10b5:8747 0000:87:00.0 Class 0300: 10de:13f2 [vmgfx0] 0000:88:00.0 Class 0300: 10de:13f2 [vmgfx1] 0000:ff:08.0 Class 0880: 8086:2f80 0000:ff:08.2 Class 1101: 8086:2f32 0000:ff:08.3 Class 0880: 8086:2f83 0000:ff:09.0 Class 0880: 8086:2f90 0000:ff:09.2 Class 1101: 8086:2f33 0000:ff:09.3 Class 0880: 8086:2f93 0000:ff:0b.0 Class 0880: 8086:2f81 0000:ff:0b.1 Class 1101: 8086:2f36 0000:ff:0b.2 Class 1101: 8086:2f37 0000:ff:0c.0 Class 0880: 8086:2fe0 0000:ff:0c.1 Class 0880: 8086:2fe1 0000:ff:0c.2 Class 0880: 8086:2fe2 0000:ff:0c.3 Class 0880: 8086:2fe3 0000:ff:0c.4 Class 0880: 8086:2fe4 0000:ff:0c.5 Class 0880: 8086:2fe5 0000:ff:0c.6 Class 0880: 8086:2fe6 0000:ff:0c.7 Class 0880: 8086:2fe7 0000:ff:0d.0 Class 0880: 8086:2fe8 0000:ff:0d.1 Class 0880: 8086:2fe9 0000:ff:0d.2 Class 0880: 8086:2fea 0000:ff:0d.3 Class 0880: 8086:2feb 0000:ff:0d.4 Class 0880: 8086:2fec 0000:ff:0d.5 Class 0880: 8086:2fed 0000:ff:0f.0 Class 0880: 8086:2ff8 0000:ff:0f.1 Class 0880: 8086:2ff9 0000:ff:0f.2 Class 0880: 8086:2ffa 0000:ff:0f.3 Class 0880: 8086:2ffb 0000:ff:0f.4 Class 0880: 8086:2ffc 0000:ff:0f.5 Class 0880: 8086:2ffd 0000:ff:0f.6 Class 0880: 8086:2ffe 0000:ff:10.0 Class 0880: 8086:2f1d 0000:ff:10.1 Class 1101: 8086:2f34 0000:ff:10.5 Class 0880: 8086:2f1e 0000:ff:10.6 Class 1101: 8086:2f7d 0000:ff:10.7 Class 0880: 8086:2f1f 0000:ff:12.0 Class 0880: 8086:2fa0 0000:ff:12.1 Class 1101: 8086:2f30 0000:ff:12.4 Class 0880: 8086:2f60 0000:ff:12.5 Class 1101: 8086:2f38 0000:ff:13.0 Class 0880: 8086:2fa8 0000:ff:13.1 Class 0880: 8086:2f71 0000:ff:13.2 Class 0880: 8086:2faa 0000:ff:13.3 Class 0880: 8086:2fab 0000:ff:13.6 Class 0880: 8086:2fae 0000:ff:13.7 Class 0880: 8086:2faf 0000:ff:14.0 Class 0880: 8086:2fb0 0000:ff:14.1 Class 0880: 8086:2fb1 0000:ff:14.2 Class 0880: 8086:2fb2 0000:ff:14.3 Class 0880: 8086:2fb3 0000:ff:14.4 Class 0880: 8086:2fbc 0000:ff:14.5 Class 0880: 8086:2fbd 0000:ff:14.6 Class 0880: 8086:2fbe 0000:ff:14.7 Class 0880: 8086:2fbf 0000:ff:16.0 Class 0880: 8086:2f68 0000:ff:16.1 Class 0880: 8086:2f79 0000:ff:16.2 Class 0880: 8086:2f6a 0000:ff:16.3 Class 0880: 8086:2f6b 0000:ff:16.6 Class 0880: 8086:2f6e 0000:ff:16.7 Class 0880: 8086:2f6f 0000:ff:17.0 Class 0880: 8086:2fd0 0000:ff:17.1 Class 0880: 8086:2fd1 0000:ff:17.2 Class 0880: 8086:2fd2 0000:ff:17.3 Class 0880: 8086:2fd3 0000:ff:17.4 Class 0880: 8086:2fb8 0000:ff:17.5 Class 0880: 8086:2fb9 0000:ff:17.6 Class 0880: 8086:2fba 0000:ff:17.7 Class 0880: 8086:2fbb 0000:ff:1e.0 Class 0880: 8086:2f98 0000:ff:1e.1 Class 0880: 8086:2f99 0000:ff:1e.2 Class 0880: 8086:2f9a 0000:ff:1e.3 Class 0880: 8086:2fc0 0000:ff:1e.4 Class 0880: 8086:2f9c 0000:ff:1f.0 Class 0880: 8086:2f88 0000:ff:1f.2 Class 0880: 8086:2f8a [root@localhost:~] nvidia-smi Fri May 6 20:03:28 2016 +------------------------------------------------------+ | NVIDIA-SMI 361.40 Driver Version: 361.40 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M60 On | 0000:04:00.0 Off | Off | | N/A 38C P8 24W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla M60 On | 0000:05:00.0 Off | Off | | N/A 34C P8 23W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla M60 On | 0000:83:00.0 Off | Off | | N/A 33C P8 24W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla M60 On | 0000:84:00.0 Off | Off | | N/A 30C P8 23W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla M60 On | 0000:87:00.0 Off | Off | | N/A 33C P8 25W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla M60 On | 0000:88:00.0 Off | Off | | N/A 30C P8 23W / 150W | 19MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | |
|
|
|
lspci @ ESXi可能有不同的输出格式化尝试修改regexp。
我的输出(XenServer 6.5,不是完整输出,但包括K1,K2,K2200,启用“4g以上解码”,64位mmio BAR,不强制启动@ 2GB(MMIOHBase)): 06:00.0 0300:10de:0ff2(rev a1)(prog-if 00 [VGA控制器]) 区域0:内存为dd000000(32位,不可预取)[size = 16M] 区域1:内存为380ff0000000(64位,可预取)[size = 128M] 区域3:内存为380ff8000000(64位,可预取)[size = 32M] 07:00.0 0300:10de:0ff2(rev a1)(prog-if 00 [VGA控制器]) 区域0:db000000处的内存(32位,不可预取)[size = 16M] 区域1:内存为380fe0000000(64位,可预取)[size = 128M] 区域3:内存为380fe8000000(64位,可预取)[size = 32M] 08:00.0 0300:10de:0ff2(rev a1)(prog-if 00 [VGA控制器]) 区域0:内存为d9000000(32位,不可预取)[size = 16M] 区域1:内存为380fd0000000(64位,可预取)[size = 128M] 区域3:内存为380fd8000000(64位,可预取)[size = 32M] 09:00.0 0300:10de:0ff2(rev a1)(prog-if 00 [VGA控制器]) 区域0:内存为d7000000(32位,非预取)[size = 16M] 区域1:内存为380fc0000000(64位,可预取)[size = 128M] 区域3:内存为380fc8000000(64位,可预取)[size = 32M] 82:00.0 0300:10de:13ba(rev a2)(prog-if 00 [VGA控制器]) 区域0:内存为fa000000(32位,非预取)[size = 16M] 区域1:内存为381fc0000000(64位,可预取)[size = 256M] 区域3:内存为381fd0000000(64位,可预取)[size = 32M] 85:00.0 0302:10de:11bf(rev a1) 区域0:内存为f8000000(32位,不可预取)[size = 16M] 区域1:内存为381fe8000000(64位,可预取)[size = 128M] 区域3:内存为381ff0000000(64位,可预取)[size = 32M] 86:00.0 0302:10de:11bf(rev a1) 区域0:内存为f6000000(32位,不可预取)[size = 16M] 区域1:内存为381fd8000000(64位,可预取)[size = 128M] 区域3:内存为381fe0000000(64位,可预取)[size = 32M] 预计你的3张卡的mmio BAR映射在4GB(32位边界)下,最后一张超过4GB。 以上来自于谷歌翻译 以下为原文 lspci @ ESXi has probably different output formatting try to modify regexp. My output (XenServer 6.5, not full output, but include K1,K2,K2200, with enabled "Above 4g Decoding", with 64bit mmio BAR, not forced to start @ 2GB (MMIOHBase)): 06:00.0 0300: 10de:0ff2 (rev a1) (prog-if 00 [VGA controller]) Region 0: Memory at dd000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at 380ff0000000 (64-bit, prefetchable) [size=128M] Region 3: Memory at 380ff8000000 (64-bit, prefetchable) [size=32M] 07:00.0 0300: 10de:0ff2 (rev a1) (prog-if 00 [VGA controller]) Region 0: Memory at db000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at 380fe0000000 (64-bit, prefetchable) [size=128M] Region 3: Memory at 380fe8000000 (64-bit, prefetchable) [size=32M] 08:00.0 0300: 10de:0ff2 (rev a1) (prog-if 00 [VGA controller]) Region 0: Memory at d9000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at 380fd0000000 (64-bit, prefetchable) [size=128M] Region 3: Memory at 380fd8000000 (64-bit, prefetchable) [size=32M] 09:00.0 0300: 10de:0ff2 (rev a1) (prog-if 00 [VGA controller]) Region 0: Memory at d7000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at 380fc0000000 (64-bit, prefetchable) [size=128M] Region 3: Memory at 380fc8000000 (64-bit, prefetchable) [size=32M] 82:00.0 0300: 10de:13ba (rev a2) (prog-if 00 [VGA controller]) Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at 381fc0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at 381fd0000000 (64-bit, prefetchable) [size=32M] 85:00.0 0302: 10de:11bf (rev a1) Region 0: Memory at f8000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at 381fe8000000 (64-bit, prefetchable) [size=128M] Region 3: Memory at 381ff0000000 (64-bit, prefetchable) [size=32M] 86:00.0 0302: 10de:11bf (rev a1) Region 0: Memory at f6000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at 381fd8000000 (64-bit, prefetchable) [size=128M] Region 3: Memory at 381fe0000000 (64-bit, prefetchable) [size=32M] It is expected that your 3 cards has mmio BAR mapped under 4GB (32bit boundary) and the last one over 4GB. |
|
|
|
感谢您抽出宝贵时间向每个人介绍supermicro的建议。
我们的支持组织将针对虚拟机管理程序改进有关MMIO的BIOS需求的文档,以便您的体验和时间用于改善其他人的体验。 当支持写这个时我会更新线程。 谢谢, 雷切尔 以上来自于谷歌翻译 以下为原文 Thank you for taking the time to update everyone on supermicro's recommendations. Our support org will look to improve the documentation around BIOS needs on MMIO for hypervisors so that your experience and time is used to improve the experience for others. I'll update the thread when support write this up. Thank you, Rachel |
|
|
|
只有小组成员才能发言,加入小组>>
使用Vsphere 6.5在Compute模式下使用2个M60卡遇到VM问题
3124 浏览 5 评论
是否有可能获得XenServer 7.1的GRID K2驱动程序?
3531 浏览 4 评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-12-22 16:00 , Processed in 1.182146 second(s), Total 113, Slave 96 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号