英伟达
直播中

陈晨

7年用户 207经验值
私信 关注
[问答]

支持网格vGPU的桌面无法在esxi 6.5主机上启动

我会尽量不让这个长篇大论。
昨天我在我们的一台支持GRID(m10 / m60)的服务器上重新安装了esxi 6.0(dell r730自定义版本),以了解它在新的vsphere 6.5服务器设备上的表现。
我发现在重新安装软件,修补服务器并尝试在其上启动m10或m60启用的vm之后,vm将移动到群集中的其他网格服务器之一以启动。
我的第一个想法是检查图形,发现“全新安装”服务器上的图形的活动类型设置为基本,并且已共享配置的类型。
在群集中的2个正常运行的服务器上,活动类型已共享,并且已配置类型为空。
我还发现xorg服务在停止之前不会保持启动超过几秒钟。
打开桌面电源时没有错误(我猜这是因为vm将自己移动到集群中的另一台服务器)。
我需要与图形交互以尝试更改活动类型并希望启动xorg服务,因此我将主机升级到esxi 6.5。
这允许我与图形交互并将它们更改为共享/共享直接,但不是活动类型仍然是基本的。
此外,xorg服务将不会保持打开并在我刷新屏幕后立即停止。
当电源打开时,vms仍然会从服务器上跳下来,就像瘟疫一样。
我已经比较了所有其他设置,它匹配起来。
最大的问题是xorg不会继续运行。
如果我从正常运行的服务器运行nvidia-smi,我得到nvidia-smi:找不到。
但我从其他服务器获取信息。
我不确定接下来要看什么,因为没有出现与图形相关的错误,但我觉得某处有一个复选框或者我缺少的设置。
任何帮助将非常感谢。

以上来自于谷歌翻译


以下为原文

I'll try not to make this long winded. On yesterday I reinstalled esxi 6.0 (dell r730 customized ver) on one of our GRID (m10/m60) enabled servers to see how it behaved with our new vsphere 6.5 server appliance. I found that after reinstalling the software, patching the server, and attempting to start a m10 or m60 enabled vm on it, the vm would move to one of the other grid servers in the cluster to power on. My first thought was to check the graphics and found that the active type for the graphics on the "fresh install" server was set to basic and the configured type was shared. On the 2 functioning servers in the cluster, the active type was shared and blank for the configured type. I also found that the xorg service would not remain started for more than a couple of seconds before stopping. There is no error when powering on a desktop (my guess was that this was because the vm moved itself to another server in the cluster).

I needed to interact with the graphics in an attempt to change the active type and hopefully get the xorg service to start, so I upgraded the host to esxi 6.5. That allowed me to interact with the graphics and change them to shared/ shared direct, but not for the active type is still basic. Also, the xorg service will not stay on and stops itself as soon as I refresh the screen.

The vms still jump off the server when powered on like it's the plague. I have compared every other setting and it matches up. The biggest propblem is that the xorg won't stay running. If I run nvidia-smi from the functioning servers, I get nvidia-smi: not found. But I get information from the other servers. I'm not really sure what to look at next as there are no graphic related errors appearing, but I feel like there is a checkbox somewhere or a setting that I am missing. Any help would greatly appreciated.

回帖(4)

郝汉

2018-9-18 16:50:18
再说一遍。
当我在工作主机上运行esxcli hardware pci list -c 0x0300 -m 0xf时,该模块列出了nvidia,但是在有问题的主机上,模块为none。
我从这个kb https://kb.vmware.com/s/article/2064775获得了命令。
它说这是5.0,但我认为这可能意味着什么。

以上来自于谷歌翻译


以下为原文

Just another note. When I run esxcli hardware pci list –c 0x0300 –m 0xf on a working host, the module has nvidia listed, but on the host with the issue, the module is none.

I got the command from this kb https://kb.vmware.com/s/article/2064775. It says it's for 5.0, but I figured that it might mean something.
举报

李海

2018-9-18 17:06:52
好的。
现在我几乎可以肯定这是振动。
当我从工作服务器运行esxcli软件振动列表时,我可以看到nvidia vgpu esxi主机驱动程序。
我打算在新服务器上安装它,看看会发生什么。
编辑:好的。
这删除了基本的活动类型并启动了xorg服务。
我重新启动了,但我在服务器上启动的桌面仍然跳到另一个桌面。

以上来自于谷歌翻译


以下为原文

Okay. Now I'm almost certain that it's the vib. When I run esxcli software vib list from the working server, I can see the nvidia vgpu esxi host driver. I'm going to install it on the fresh server and see what happens.

EDIT: Okay. That removed the basic active type and got the xorg service started. I rebooted, but the desktops I start on the server still jump to another one.
举报

王雷

2018-9-18 17:16:27
那么,使用论坛搜索可能会有所帮助。
它不是VIB,它是ESX6.0 U3的一个错误,因为我怀疑你的描述与xorg ...
https://gridforums.nvidia.com/default/topic/1207/nvidia-virtual-gpu-drivers/vmware-esxi-6-0-update-3-support/
问候
西蒙

以上来自于谷歌翻译


以下为原文

Well, using the forum search would probably help. It's not the VIB, it's a bug with ESX6.0 U3 as I suspect from your description with xorg...
https://gridforums.nvidia.com/default/topic/1207/nvidia-virtual-gpu-drivers/vmware-esxi-6-0-update-3-support/

Regards

Simon
举报

张文婷

2018-9-18 17:34:04
谢谢回复。
去年我看到了这个帖子,并在那个问题上遇到了问题,但我现在已经6.5了,xorg服务现在已经启动并持有。
我可以再次尝试这个过程,但如果服务不再有问题,我似乎不需要它,对吗?
编辑:我想我99%的方式。
我在群集上关闭了DRS,而vms没有从全新安装主机上移开。
所以,我猜测有些东西需要在DRS中进行调整。

以上来自于谷歌翻译


以下为原文

Thanks for the reply. I saw that post and ran the gauntlet on that issue last year, but I am on 6.5 now and xorg service is now started and holds. I could try that process again, but it would seem that I wouldn't need it if the service is not having an issue anymore, correct?

Edit: I think I'm 99% of the way there. I turned off DRS on the cluster and the vms didn't move off of the fresh install host. So, I'm guessing that there is something that needs to be tweeked in DRS.
举报

更多回帖

发帖
×
20
完善资料,
赚取积分