完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
嗨,
如果有一个使用Horovod和Keras或PyTorch等流行框架的指南会很有用。 我尝试导入“import horovod.keras as hvd”,我收到以下错误: OSError:/lib64/libstdc++.so.6:找不到版本`CXXABI_1.3.8'(/home/u13882/.local/lib/python3.6/site-packages/horovod/common/mpi_lib.cpython-36m-需要 x86_64-linux-gnu.so) 我不确定这是否是一个OpenMPI问题。 提前致谢! 在切线上,我得到一个点差错“AttributeError:'_ NamespacePath'对象没有属性'sort'”。 当我使用conda或virtualenv时,问题就解决了。 只是想知道环境外的pip是什么问题。 以上来自于谷歌翻译 以下为原文 Hi, It would be useful if there is a guide for using Horovod with some popular frameworks like Keras or PyTorch. I tried importing "import horovod.keras as hvd" and I got the following error: OSError: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/u13882/.local/lib/python3.6/site-packages/horovod/common/mpi_lib.cpython-36m-x86_64-linux-gnu.so) I'm not sure if this is an OpenMPI issue. Thanks in advance! On a tangential note, I'm getting a pip error "AttributeError: '_NamespacePath' object has no attribute 'sort'". When I use conda or virtualenv, the problem is solved. Just wanted to know what the issue is with pip outside the environments. |
|
相关推荐
15个回答
|
|
嗨Amlaan,感谢您与我们联系。
我们非常乐意为您提供帮助。请按照以下步骤进行操作。 请设置所有环境变量export PATH = $ PATH:$ HOME / .local / bin:$ HOME / bin source /glob/development-tools/parallel-studio/bin/compilervars.sh intel64 export INTEL_LICENSE_FILE = / usr / local / licenseserver / psxe.lic export PATH = / glob / intel-python / python3 / bin /:/ glob / intel-python / python2 / bin /:$ {PATH} export LD_LIBRARY_PATH = / glob / development-tools / mklml / lib / :$ {LD_LIBRARY_PATH} export CC = / glob / development-tools / versions / gcc-6.4.0 / bin / gcc export LD_LIBRARY_PATH = / glob / development-tools / versions / gcc-6.4.0 / lib64 /:$ LD_LIBRARY_PATH source /glob/development-tools/parallel-studio/compilers_and_libraries/linux/mpi/bin64/mpivars.sh source /glob/development-tools/parallel-studio/impi/2018.3.222/bin64/mpivars.sh2。 创建您的虚拟环境conda create -n -c intel3。 激活environemnt源激活4.安装Tensorflow conda install tensorflow5。 安装Keras conda install keras6。 安装horovod pip install horovod --user7。 导入horovod导入horovod.keras作为hvdPlease告诉我们你是否面临任何进一步的问题导入horovod.When你得到错误“AttributeError:'_ NamespacePath'对象没有属性'排序'”?你能详细说明你究竟在尝试什么 使用pip来执行或执行,以便我们可以更准确地查看它。谢谢& RegardsRatheesh A. horovod_solution.JPG 26.2 K. 以上来自于谷歌翻译 以下为原文 Hi Amlaan, Thanks for reaching out to us. We are very happy to help you. Please follow the below mentioned steps. 1. Please set all the environment variables export PATH=$PATH:$HOME/.local/bin:$HOME/bin source /glob/development-tools/parallel-studio/bin/compilervars.sh intel64 export INTEL_LICENSE_FILE=/usr/local/licenseserver/psxe.lic export PATH=/glob/intel-python/python3/bin/:/glob/intel-python/python2/bin/:${PATH} export LD_LIBRARY_PATH=/glob/development-tools/mklml/lib/:${LD_LIBRARY_PATH} export CC=/glob/development-tools/versions/gcc-6.4.0/bin/gcc export LD_LIBRARY_PATH=/glob/development-tools/versions/gcc-6.4.0/lib64/:$LD_LIBRARY_PATH source /glob/development-tools/parallel-studio/compilers_and_libraries/linux/mpi/bin64/mpivars.sh source /glob/development-tools/parallel-studio/impi/2018.3.222/bin64/mpivars.sh 2. Create your virtual environment conda create -n 3. Activate the environemnt source activate 4. Install Tensorflow conda install tensorflow 5. Install Keras conda install keras 6. Install horovod pip install horovod --user 7. import horovod import horovod.keras as hvd Please let us know if you face any further issues on importing horovod. When are you getting the error "AttributeError: '_NamespacePath' object has no attribute 'sort'" ? Can you elaborate what exactly you are trying to do or execute using pip, so that we can look in to that more precisely . Thanks & Regards Ratheesh A
|
|
|
|
jerry1978 发表于 2018-11-21 14:27 嗨Ratheesh, 谢谢回复! 我按照Horovod的说明进行操作。 我完成了所有步骤。 但是,当我尝试导入它时,我收到以下错误: >>>将horovod.keras导入为hvd 使用TensorFlow后端。 回溯(最近的呼叫最后): 文件“”,第1行,in 文件“/home/u13882/.local/lib/python3.6/site-packages/horovod/keras/__init__.py”,第19行,in 将horovod.tensorflow导入为hvd 文件“/home/u13882/.local/lib/python3.6/site-packages/horovod/tensorflow/__init__.py”,第42行,in 来自horovod.tensorflow.mpi_ops导入allgather 文件“/home/u13882/.local/lib/python3.6/site-packages/horovod/tensorflow/mpi_ops.py”,第56行,in ['HorovodAllgather','HorovodAllreduce']) 在_load_library中输入文件“/home/u13882/.local/lib/python3.6/site-packages/horovod/tensorflow/mpi_ops.py”,第43行 library = load_library.load_op_library(filename) 在load_op_library中输入文件“/home/u13882/.local/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py”,第56行 lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError:/home/u13882/.local/lib/python3.6/site-packages/horovod/tensorflow/mpi_lib.cpython-36m-x86_64-linux-gnu.so:undefined symbol:_ZTIN10tensorflow13AsyncOpKernelE 对此有何帮助? 对于AttributeError:'_ NamespacePath'对象没有属性'sort'错误,以下是一个片段: [u13882 @ c009-n001~] $ pip list 回溯(最近的呼叫最后): 文件“/ glob / intel-python / python3 / bin / pip”,第4行,in 进口点 文件“/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/__init__.py”,第26行,in 来自pip.utils import get_installed_distributions,get_prog 文件“/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/utils/__init__.py”,第27行,在 来自pip._vendor import pkg_resources 文件“/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py”,第3018行, @_call_aside 在_call_aside中输入文件“/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py”,第3004行 f(* args,** kwargs) _initialize_master_working_set中的文件“/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py”,第3046行 dist.activate(替换=假) 文件“/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py”,第2578行,激活 declare_namespace(PKG) 在declare_namespace中输入文件“/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py”,第2152行 _handle_ns(packageName,path_item) 在_handle_ns中输入文件“/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py”,第2092行 _rebuild_mod_path(path,packageName,module) 在_rebuild_mod_path中输入文件“/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py”,第2121行 orig_path.sort(键= position_in_sys_path) AttributeError:'_ NamespacePath'对象没有属性'sort' 在conda或virtualenv中不会发生此pip错误(在虚拟环境中它可以正常工作)。 它只发生在他们之外。 最好, Amlaan 以上来自于谷歌翻译 以下为原文 Hi Ratheesh, Thanks for the reply! I followed the instructions for Horovod. I completed all the steps. However, when I try to import it, I get the following error: >>> import horovod.keras as hvd Using TensorFlow backend. Traceback (most recent call last): File " File "/home/u13882/.local/lib/python3.6/site-packages/horovod/keras/__init__.py", line 19, in import horovod.tensorflow as hvd File "/home/u13882/.local/lib/python3.6/site-packages/horovod/tensorflow/__init__.py", line 42, in from horovod.tensorflow.mpi_ops import allgather File "/home/u13882/.local/lib/python3.6/site-packages/horovod/tensorflow/mpi_ops.py", line 56, in ['HorovodAllgather', 'HorovodAllreduce']) File "/home/u13882/.local/lib/python3.6/site-packages/horovod/tensorflow/mpi_ops.py", line 43, in _load_library library = load_library.load_op_library(filename) File "/home/u13882/.local/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: /home/u13882/.local/lib/python3.6/site-packages/horovod/tensorflow/mpi_lib.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZTIN10tensorflow13AsyncOpKernelE Any help with that? For the AttributeError: '_NamespacePath' object has no attribute 'sort' error, following is a snippet: [u13882@c009-n001 ~]$ pip list Traceback (most recent call last): File "/glob/intel-python/python3/bin/pip", line 4, in import pip File "/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/__init__.py", line 26, in from pip.utils import get_installed_distributions, get_prog File "/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/utils/__init__.py", line 27, in from pip._vendor import pkg_resources File "/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3018, in @_call_aside File "/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3004, in _call_aside f(*args, **kwargs) File "/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3046, in _initialize_master_working_set dist.activate(replace=False) File "/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2578, in activate declare_namespace(pkg) File "/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2152, in declare_namespace _handle_ns(packageName, path_item) File "/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2092, in _handle_ns _rebuild_mod_path(path, packageName, module) File "/glob/intel-python/versions/2018u2/intelpython3/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2121, in _rebuild_mod_path orig_path.sort(key=position_in_sys_path) AttributeError: '_NamespacePath' object has no attribute 'sort' This pip error does not occur inside conda or virtualenv (it works perfectly when in virtual environments). It only occurs outside them. Best, Amlaan |
|
|
|
嗨,你能否确认你已经在.bash_profile中添加了所有推荐的路径并获取它。如果不是,请按照以下步骤操作。 复制bash_profile中的所有环境路径vi~ / .bash_profile export PATH = $ PATH:$ HOME / .local / bin:$ HOME / bin source /glob/development-tools/parallel-studio/bin/compilervars.sh intel64 export INTEL_LICENSE_FILE = / usr / local / licenseserver / psxe.lic export PATH = / glob / intel-python / python3 / bin /:/ glob / intel-python / python2 / bin /:$ {PATH} export LD_LIBRARY_PATH = / glob / development- tools / mklml / lib /:$ {LD_LIBRARY_PATH} export CC = / glob / development-tools / versions / gcc-6.4.0 / bin / gcc export LD_LIBRARY_PATH = / glob / development-tools / versions / gcc-6.4.0 / lib64 /:$ LD_LIBRARY_PATH export PATH = / glob / development-tools / versions / gcc-6.4.0 / bin /:$ PATH #if你想使用open-mpi,取消注释下面的行#source / glob / development-tools / parallel-studio / compilers_and_libraries / linux / mpi / bin64 / mpivars.sh#如果你想使用Intel-mpi,取消注释以下行#source /glob/development-tools/parallel-studio/impi/2018.3.222/bin64/mpivars .sh2。 找到bash_profile源〜/ .bash_profile3。 按照计算节点中之前的响应[步骤2到步骤7]中的所有剩余步骤进行操作#这将重定向到计算节点qsub - 如果您在完成后也可以共享屏幕截图,那就更好了。“对于AttributeError:'_ NamespacePath' 对象没有属性'sort'错误,以下是一个片段:我们建议您尝试更新pip,如下所示pip install --upgrade pip setuptools - 如果你仍然遇到问题,请回复。感谢&RegardsRatheesh A 以上来自于谷歌翻译 以下为原文 Hi, Could you please confirm that you have added all the recommended path in .bash_profile and source it.? if not please follow the steps below. 1. Copy all the environment path in bash_profile vi ~/.bash_profile export PATH=$PATH:$HOME/.local/bin:$HOME/bin source /glob/development-tools/parallel-studio/bin/compilervars.sh intel64 export INTEL_LICENSE_FILE=/usr/local/licenseserver/psxe.lic export PATH=/glob/intel-python/python3/bin/:/glob/intel-python/python2/bin/:${PATH} export LD_LIBRARY_PATH=/glob/development-tools/mklml/lib/:${LD_LIBRARY_PATH} export CC=/glob/development-tools/versions/gcc-6.4.0/bin/gcc export LD_LIBRARY_PATH=/glob/development-tools/versions/gcc-6.4.0/lib64/:$LD_LIBRARY_PATH export PATH=/glob/development-tools/versions/gcc-6.4.0/bin/:$PATH #if you want to use open-mpi, uncomment the below line #source /glob/development-tools/parallel-studio/compilers_and_libraries/linux/mpi/bin64/mpivars.sh #if you want to use Intel-mpi, uncomment the below line #source /glob/development-tools/parallel-studio/impi/2018.3.222/bin64/mpivars.sh 2. source the bash_profile source ~/.bash_profile 3. Follow all remaining steps from previous response [step 2 to step 7] in the compute node #This will redirect to compute node qsub -I Would be better if you can share the screenshot also once you are done. "For the AttributeError: '_NamespacePath' object has no attribute 'sort' error, following is a snippet: we recommend you to try update pip as follows pip install --upgrade pip setuptools --user Please revert if you still face an issue. Thanks & Regards Ratheesh A |
|
|
|
jerry1978 发表于 2018-11-21 14:55 嗨, 是的,我在〜/ .bash_profile中添加了所有路径并获取了它。 我附上了我的bash_profile的截图。 我看到的另一件事是它说“找到一个名字的交换文件......”。 我也附上了截图。 我还附上了导入horovod.keras作为hvd的结果的屏幕截图。 在该屏幕截图中,它通过各种包提供需求警告。 不确定这是不是一个问题。 希望能找到解决方案。 对于pip错误,我得到了同样的错误。 我也附上了截图。 谢谢。 最好,Amlaan pip_error.png 130.7 K. horovod.png 211.1 K. bash_profile.png 87.9 K. bash_error.png 72.4 K. 以上来自于谷歌翻译 以下为原文 Hi, Yes, I've added all the paths in ~/.bash_profile and sourced it. I've attached a screenshot of my bash_profile. One extra thing I see is it says "Found a swap file by the name...". I've attached a screenshot of that as well. I've also attached a screenshot of the result of import horovod.keras as hvd. In that screenshot, it gives warnings of requirements by various packages. Not sure if that's an issue. Hope a solution comes out. For the pip error, I get the same error. I've attached a screenshot of that as well. Thank you. Best,Amlaan
|
|
|
|
嗨,感谢您分享截图。以下几点建议。 请删除.swp文件:rm~ / .bash_profile.swp2。 再次将bash_profile作为源:/〜.bash_profile [我们建议使用Intel-mpi] 3。 在环境pip uninstall horovod4中删除使用pip安装的所有软件包。 在计算节点qsub -I conda create -n -c intel python = 35中创建环境。 如前所述再次安装horovod“对于AttributeError:'_ NamespacePath'对象没有属性'sort':这与pip错误有关。很快就会回复你。同时,作为替代方案,我们建议你尝试进行实验 conda环境本身。如果你能在完成后分享观察结果,那将是很棒的.Regards,Ratheesh A. 以上来自于谷歌翻译 以下为原文 Hi, Thanks for sharing the screenshots. Few suggestions below. 1. Please remove the .swp file as : rm ~/.bash_profile.swp 2. Source the bash_profile again as : source ~/.bash_profile [We recommend to use Intel-mpi ] 3. Remove all the packages installed using pip in the environment pip uninstall horovod 4. Create the environment in compute node qsub -I conda create -n 5. Install horovod again as mentioned earlier "For the AttributeError: '_NamespacePath' object has no attribute 'sort' : This is related to pip error. Will get back to you soon. Meanwhile, as an alternative,we recommend you to try your experiment in conda environment itself. Would be great if you can share the observation once you are done. Regards, Ratheesh A |
|
|
|
jerry1978 发表于 2018-11-21 15:19 嗨,希望有关horovod的解决方案解决了这个问题.Pip问题:请找到bash_profile的附件截图。 请使用附带的文件交叉检查您的bash_profile。由于某些问题,您的帐户可能会被搞砸。 如果需要,我们将通过提供解决您所有问题的新帐户来帮助您。请检查并还原我们.Regards,Ratheesh A bash_profile.JPG 68.0 K. 以上来自于谷歌翻译 以下为原文 Hi, Hope the solution provided regarding horovod solved the issue. Pip Issue: Please find the attached screenshot of bash_profile. Kindly cross check your bash_profile with the file attached. There is a chance that your account might have been messed up due to some issues. If required, we will help you by providing new account which should resolve all your issues. Kindly check and revert us. Regards, Ratheesh A
|
|
|
|
cd340823 发表于 2018-11-21 15:34 嗨Ratheesh, 谢谢回复! 我按照Horovod导入问题的所有步骤进行了操作。 所以现在,我可以成功地“导入horovod.keras as hvd”。 但是,当我执行hvd.init()(初始化horovod)时,我收到以下错误: 2018-08-30 08:14:00.860585:F ./tensorflow/core/common_runtime/mkl_cpu_allocator.h:96]非OK状态:s状态:未实现:挂钩MKL功能的未实现案例。 中止 (我在〜/ .bash_profile中使用Intel MPI)。 有帮助吗? 有一件事,按照Keras的安装程序(conda install keras),它会自动将Tensorflow降级到1.3.1。 这是一个问题吗? 我附上了错误附带的屏幕截图。 对于pip错误,我仔细检查了我的〜/ .bash_profile。 我已将其与截图中的〜/ .bash_profile内容相同。 我仍然得到同样的错误:/。 也许这个帐户毕竟有问题。 期待你的回复。 谢谢。 最好, Amlaan hvd.png 105.6 K. 以上来自于谷歌翻译 以下为原文 Hi Ratheesh, Thanks for the reply! I followed all the steps for the Horovod import problem. So now, I can successfully do "import horovod.keras as hvd". However, when I do hvd.init() (to initialize horovod), I get the following error: 2018-08-30 08:14:00.860585: F ./tensorflow/core/common_runtime/mkl_cpu_allocator.h:96] Non-OK-status: s status: Unimplemented: Unimplemented case for hooking MKL function. Aborted (I'm using Intel MPI in ~/.bash_profile). Any help? One thing, following the install procedure (conda install keras) for Keras, it downgrades Tensorflow to 1.3.1 automatically. Is this an issue? I've attached a screenshot accompanying the error. For the pip error, I double checked my ~/.bash_profile. I've made it identical to the ~/.bash_profile contents in your screenshot. I still get the same error :/. Maybe there is an issue with the account afterall. Look forward to your reply. Thanks. Best, Amlaan
|
|
|
|
您好,感谢您的reply.hvd.init():我们正在研究这个问题。 我们会尽快回复您。对于点差错误:请尝试执行下面指定的命令。 如果解决方案不适合您,我们会通过创建新帐户来帮助您。 python -m pip install --upgrade pip == 9.0.3RegardsRatheesh A. 以上来自于谷歌翻译 以下为原文 Hi, Thanks for your reply. hvd.init() : We are looking in to the issue. We will get back to you soon. For the pip error : Please try to execute the command specified below. We would help you by creating a new account in case the solution doesn't work for you. python -m pip install --upgrade pip==9.0.3 Regards Ratheesh A |
|
|
|
JIWENJIE 发表于 2018-11-21 16:11 嗨Ratheesh, 期待Horovod的解决方案。 对于pip, 我试过python -m pip install --upgrade pip == 9.0.3,我收到了一个错误。 我附上了截图。 最好, Amlaan pip.png 137.6 K. 以上来自于谷歌翻译 以下为原文 Hi Ratheesh, Looking forward to solution for Horovod. For pip, I tried python -m pip install --upgrade pip==9.0.3 and I got an error. I've attached a screenshot of that. Best, Amlaan
|
|
|
|
嗨Amlaan, 感谢分享截图和确认。 hvd.init()问题:我们强烈建议您创建一个新的conda环境来解决问题 请按照下面提到的步骤操作。 这将帮助您安装英特尔优化的tensorflow1.9.0 keras 2.2.2 horovod 0.14.0 1.创建如下所述的符号链接mkdir~ / lib cd~ / lib ln -s /glob/supplementary-software/versions/glibc/glibc_2_28/lib/libm.so.6 2.复制bash_profile中的所有环境路径vi~ / .bash_profile export PATH = $ PATH:$ HOME / .local / bin:$ HOME / bin source /glob/development-tools/parallel-studio/bin/compilervars.sh intel64 export INTEL_LICENSE_FILE = / usr / local / licenseserver / psxe.lic export PATH = / glob / intel-python / python3 / bin /:/ glob / intel-python / python2 / bin /:$ {PATH} export LD_LIBRARY_PATH = / glob / development-tools / mklml / lib /:$ {LD_LIBRARY_PATH} export CC = / glob / development-tools / versions / gcc-6.4.0 / bin / gcc export LD_LIBRARY_PATH = / glob / development-tools / versions / gcc-6.4。 0 / lib64 /:$ LD_LIBRARY_PATH export PATH = / glob / development-tools / versions / gcc-6.4.0 / bin /:$ PATH source /glob/development-tools/parallel-studio/impi/2018.3.222/bin64/ mpivars.sh export LD_LIBRARY_PATH =〜/ lib:$ LD_LIBRARY_PATH 3.获取bash_profile源〜/ .bash_profile [关闭终端并在源不起作用时重新打开] 4. Rediredt计算节点qsub -I 5.创建虚拟环境conda create -n -c intel python = 3 6.激活environemnt源激活 7.使用whl文件安装Tensorflow 1.9 pip install https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.9.0-cp36-cp36m-linux_x86_64.whl --no-cache 8.安装Keras conda install keras 9.安装horovod pip install horovod --no-cache 10.检查代码python import tensorflow作为tf import keras import horovod import horovod.keras as hvd hvd.init() 我们附上了截图。 检查一下。 希望这会对你有所帮助如果你仍然面临同样的问题,请回复。 点差:我们期待创建新的用户帐户。 请分享您的user_ID。 谢谢& RegardsRatheesh A. hvd.JPG 36.4 K. 以上来自于谷歌翻译 以下为原文 Hi Amlaan, Thanks for sharing the screenshot and confirmation. hvd.init() Issue : We strongly recommend you to create a new conda environment to solve the issue Please follow the steps mentioned below. This will help you install intel optimized tensorflow1.9.0 keras 2.2.2 horovod 0.14.0 1. Create the symbolic link as mentioned below mkdir ~/lib cd ~/lib ln -s /glob/supplementary-software/versions/glibc/glibc_2_28/lib/libm.so.6 2. Copy all the environment path in bash_profile vi ~/.bash_profile export PATH=$PATH:$HOME/.local/bin:$HOME/bin source /glob/development-tools/parallel-studio/bin/compilervars.sh intel64 export INTEL_LICENSE_FILE=/usr/local/licenseserver/psxe.lic export PATH=/glob/intel-python/python3/bin/:/glob/intel-python/python2/bin/:${PATH} export LD_LIBRARY_PATH=/glob/development-tools/mklml/lib/:${LD_LIBRARY_PATH} export CC=/glob/development-tools/versions/gcc-6.4.0/bin/gcc export LD_LIBRARY_PATH=/glob/development-tools/versions/gcc-6.4.0/lib64/:$LD_LIBRARY_PATH export PATH=/glob/development-tools/versions/gcc-6.4.0/bin/:$PATH source /glob/development-tools/parallel-studio/impi/2018.3.222/bin64/mpivars.sh export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH 3. source the bash_profile source ~/.bash_profile [ Close the terminal and re-open if source doesn't work ] 4. Rediredt to compute node qsub -I 5. Create your virtual environment conda create -n 6. Activate the environemnt source activate 7. Install Tensorflow 1.9 using whl file pip install https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.9.0-cp36-cp36m-linux_x86_64.whl --no-cache 8. Install Keras conda install keras 9. Install horovod pip install horovod --no-cache 10. Check the code python import tensorflow as tf import keras import horovod import horovod.keras as hvd hvd.init() We have attached the screenshot. Have a check on that. Hope this will help you Please revert if you still face an issue regarding the same. Pip error : We look forward to create new user account. Kindly share your user_ID. Thanks & Regards Ratheesh A
|
|
|
|
嗨Ratheesh, 谢谢你的步骤。 我一步一步地按照你的所有指示。 以下所有内容均采用全新的Conda环境。 我看到hvd.init()时遇到错误截图(见hvd1.png)。 我意识到〜/ .bash_profile中的一个导出语句导致TensorFlow版本被从1.9.0(使用pip安装)覆盖到1.3.1(参见bash1.png或bash2.png,其中我评论了#problematic import )。 我删除了那行,来源〜/ .bash_profile,然后再次将import horovod.keras作为hvd语句运行。 这次,tf .__ version__维持了1.9.0,但是,我在导入horovod.keras作为hvd语句时遇到错误(参见hvd2.png)。 相应的〜/ .bash_profile内容附加为bash1.png和bash2.png。 有帮助吗? 对于pip, user_id是:u13882 @ c009 谢谢你的支持。 期待你的回复。 最好, Amlaan hvd2.png 170.3 K. hvd1.png 99.1 K. bash2.png 94.7 K. bash1.png 94.4 K. 以上来自于谷歌翻译 以下为原文 Hi Ratheesh, Thanks for the steps. I followed all your instructions step by step. All of the following is in a brand new Conda environment. I got an error while hvd.init() as seen screenshot (see hvd1.png). I realized one of the export statements in ~/.bash_profile caused the TensorFlow version to be overwritten from 1.9.0 (as installed using pip) to 1.3.1 (see bash1.png or bash2.png where I've commented # problematic import). I removed that line, sourced ~/.bash_profile, then ran import horovod.keras as hvd statement again. This time, tf.__version__ maintained 1.9.0, but, I run into an error while import horovod.keras as hvd statement (see hvd2.png). The respective ~/.bash_profile contents are attached as bash1.png and bash2.png. Any help? For pip, The user_id is: u13882@c009 Thank you for the support. Look forward to your reply. Best, Amlaan |
|
|
|
嗨Amlaan,非常感谢您的详细回复。 我们完全理解您的关注..hvd_init()问题:我们已经给您发了一封邮件。 请回答.Pip错误:根据讨论,我们已经发起了创建新用户帐户的请求。 很快就会回复你。谢谢& RegardsRatheesh A. 以上来自于谷歌翻译 以下为原文 Hi Amlaan, Thank you very much for the detailed response. we completely understand your concern. hvd_init() issue : We have sent you a mail. Kindly respond to that. Pip error: As per the discussion , we have already initiated request for creating new user account. will get back to you soon. Thanks & Regards Ratheesh A |
|
|
|
jerry1978 发表于 2018-11-21 17:04 嗨,谢谢你确认时间。 我们已回复您的邮件。 请检查您的邮箱。错误:我们已为您创建了新的用户帐户。 希望你收到确认邮件。 请您确认。谢谢& RegardsRatheesh A. 以上来自于谷歌翻译 以下为原文 Hi, Thanks for confirming the time. We have replied to your mail. Kindly check your mailbox. Pip error : We have created new user account for you. Hope you got the confirmation mail. Kindly requesting you to confirm. Thanks & Regards Ratheesh A |
|
|
|
根据通过邮件发生的对话提供此线程的更新。 以下是Amlaan的最后回复: 好消息! Horovod终于工作了! 以下是今天发生的事情:1。我登录了我的新帐户。 我按照你的所有指示运行Horovod(我想知道这是否是一个帐户问题).3。 这一次,TensorFlow保持在版本1.9.0.4。 但是,hvd.init()仍然给出错误“F ./tensorflow/core/common_runtime/mkl_cpu_allocator.h:96”非OK状态:s状态:未实现:未实现的情况用于挂钩MKL功能。分类“5。 然后我认为TensorFlow版本存在问题。我安装了Intel Optimized TensorFlow 1.10.0(与你提到的相同的链接,但更改了版本号).6。这次,它给出了另一个错误。我决定降级回TensorFlow 1.9.0.7。我再次运行相同的代码,这次它有效!我可以成功导入Horovod和hvd.init()工作。我测试了Horovod的Keras MNIST教程(https://github.com/uber/horovod/blob /master/examples/keras_mnist.py https://github.com/uber/horovod/blob/master/examples/keras_mnist.py>)它的确有效!我不知道为什么会这样或者如何工作。我查了一下 在TensorFlow的存储库中输出mkl_cpu_allocator.h文件(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/mkl_cpu_allocator.h https: //github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/mkl_cpu_allocator.h>)但它没有让我深入了解主要问题。 再次感谢您的继续支持。 在过去的几天里,我感谢您的时间和帮助。 以上来自于谷歌翻译 以下为原文 Providing an update on this thread based on the conversation happened through mail. Given below is the last response from Amlaan: Good news! Horovod is finally working! Here is what happened today: 1. I logged in to my new account. 2. I followed all your instructions to run Horovod (I wanted to see if it was an account issue). 3. This time, TensorFlow stayed on at version 1.9.0. 4. However, hvd.init() was still giving the error "F ./tensorflow/core/common_runtime/mkl_cpu_allocator.h:96] Non-OK-status: s status: Unimplemented: Unimplemented case for hooking MKL function. Aborted” 5. I then thought there is an issue with TensorFlow versioning. I installed Intel Optimized TensorFlow 1.10.0 (same link as you had mentioned but changing the version number). 6. This time, it gave another error. I decided to downgrade back to TensorFlow 1.9.0. 7. I ran the same code again, this time it worked! I can successfully import Horovod and hvd.init() works. I tested with Horovod’s Keras MNIST tutorial (https://github.com/uber/horovod/blob/master/examples/keras_mnist.py <https://github.com/uber/horovod/blob/master/examples/keras_mnist.py>) and it works! I don’t have much idea why this worked or how. I checked out the mkl_cpu_allocator.h file on TensorFlow’s repository (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/mkl_cpu_allocator.h <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/mkl_cpu_allocator.h>) but it didn’t give me insight into the main problem. Once again, thank you for your continued support. I appreciate your time and help with everything for the past few days. |
|
|
|
None 以上来自于谷歌翻译 以下为原文 Hi Amlaan, Since the solution provided worked fine and confirmed , we are closing the discussion thread. Feel free to open a new thread for any further issues. Thanks and Regards Ratheesh A |
|
|
|
只有小组成员才能发言,加入小组>>
482浏览 0评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-12-1 12:54 , Processed in 0.725272 second(s), Total 73, Slave 67 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号