encount error while running real_nmm.exe

Submitted by lwj on Tue, 07/20/2021 - 08:48
Forum: Users | Forecast

Dear all,

 

I hit  "/opt/mpi/intel-18.0/mvapich2-2.3/bin/mpiexec -f /home/lwj/jobs/MODL/HWRF/HWRF_v3.9a/hwrfrun_ckcd/wrappers/hosts -np 4 ./real_nmm.exe"

in my command after geogrid. ungrib. metgrid.exe were successfully implemeted.

I got below error message while I am running real_nmm.exe

--------------------------------------------------

taskid: 0 hostname: node10
*** The MPI_Comm_f2c() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[node10:413169] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
---------------------------------------------------

 

Do you have any idea what I am wrong ?

I am looking forward to your reply.

Thanks,

Woojeong

Hi Woojeong,

Are you trying to submit this command on the login prompt? I think you should run the executable through a batch script. You can ask your system admin how to do it. Also, you can submit them in an interactive queue if that is available in your machine. Please let me know if you run into issues again.

Thanks

Biswas

Hi, Biswas,

 

I re-compiled with intel18/icc18, openmpi-3.1.1. hdf5-1.10.4, netcdf-4.4.4, pnetcdf-1.10.0 and submit init_gfs_wrapper

There was some warning messages in my log file such like

-----------------------------------------

07/21 16:30:28.720 hwrf.gfsinit/wrfghost (fcsttask.py:889) INFO: Final result is exe('/opt/mpi/intel-18.0/openmpi-3.1.1/bin/mpiexec')['-machinefile','/home/lwj/jobs/MODL/HWRF/HWRF_v3.9a/hwrfrun_ckcd/wrappers/hosts','-np','12','/home/lwj/jobs/MODL/HWRF/HWRF_v3.9a/hwrfrun_ck/sorc/WRFV3/main/wrf.exe']
07/21 16:30:28.720 hwrf.gfsinit/wrfghost (run.py:296) INFO: Starting: exe('/opt/mpi/intel-18.0/openmpi-3.1.1/bin/mpiexec')['-machinefile','/home/lwj/jobs/MODL/HWRF/HWRF_v3.9a/hwrfrun_ckcd/wrappers/hosts','-np','12','/home/lwj/jobs/MODL/HWRF/HWRF_v3.9a/hwrfrun_ck/sorc/WRFV3/main/wrf.exe']
ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0x1010012389ab valid_mask = 0x1)
[node10][[45496,1],9][btl_openib_component.c:1670:init_one_device] error obtaining device attributes for mlx4_0 errno says Invalid argument
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   node10
  Local device: mlx4_0
--------------------------------------------------------------------------
ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0x1010012389ab valid_mask = 0x1)
[node10][[45496,1],3][btl_openib_component.c:1670:init_one_device] error obtaining device attributes for mlx4_0 errno says Invalid argument
ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0x1010012389ab valid_mask = 0x1)
[node10][[45496,1],2][btl_openib_component.c:1670:init_one_device] error obtaining device attributes for mlx4_0 errno says Invalid argument
ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0x1010012389ab valid_mask = 0x1)
[node10][[45496,1],6][btl_openib_component.c:1670:init_one_device] error obtaining device attributes for mlx4_0 errno says Invalid argument
ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0x1010012389ab valid_mask = 0x1)
[node10][[45496,1],1][btl_openib_component.c:1670:init_one_device] error obtaining device attributes for mlx4_0 errno says Invalid argument

 

-------------------------------------------------

But it works !!

I can find right output files following init_gfs_wrapper.

May I run HWRF despite of those warning messages ? or Is there something wrong ?

 

Thanks

Woojeong