the hybrid run

Submitted by huihuojia on Wed, 06/09/2021 - 00:20

Hi,

I have tested the GSIv3.7 on a new machine (intel 2021, I know it doesn't belong to the supported intel version, but it is what I have right now). If GSI runs in 3D-Var mode, it is fine, but if it runs in the hybrid mode, it stops with segmentation fault. 

When I traced the source of the error, it starts in the "ensctl2state.f90" and in the following lines:

   call gsi_bundlegetvar     (wbundle_c, clouds(ic),sv_rank3,istatus)

   call gsi_bundlegetvar ( wbundle_c, 'sst', sv_sst, istatus )

which means, it has trouble to get the "cw" and "sst" from the bundle. But I could not understand why, because similar commands are used in "control2state.f90". Why it causes problems in "ensctl2state.f90", but not "control2state.f90".

Since the "cw" and "sst" causes the problems during the hybrid run, I have commented out all lines related to "cw" and "sst" in the "fix/anavinfo_arw_netcdf" (it is attached at the end). Then GSI could run in the hybrid mode without problems.

1. Could you explain why this problem occurs?

2. Are there any undesirable consequences for what I did in "fix/anavinfo_arw_netcdf"?

Thanks,

Jia

 

 

 

met_guess::
!var     level    crtm_use    desc              orig_name
  ps        1      -1         surface_pressure     ps
  z         1      -1         geopotential_height  phis
  u        30       2         zonal_wind           u
  v        30       2         meridional_wind      v
  div      30      -1         zonal_wind           div
  vor      30      -1         meridional_wind      vor
  tv       30       2         virtual_temperature  tv
  q        30       2         specific_humidity    sphu
# oz       30       2         ozone                ozone
# cw       30      10         cloud_condensate     cw
# ql       30      10         cloud_liquid         ql
# qi       30      10         cloud_ice            qi
# qr       30      10         rain                 qr
# qs       30      10         snow                 qs
# qg       30      10         graupel              qg
::

state_derivatives::
!var  level  src
 ps   1      met_guess
 u    30     met_guess
 v    30     met_guess
 tv   30     met_guess
 q    30     met_guess
#oz   30     met_guess
#cw   30     met_guess
 prse 31     met_guess
::

state_tendencies::
!var  levels  source
 u    30      met_guess
 v    30      met_guess
 tv   30      met_guess
 q    30      met_guess
#cw   30      met_guess
#oz   30      met_guess
 prse 31      met_guess
::

state_vector::  
!var     level  itracer source     funcof
 u        30      0     met_guess    u
 v        30      0     met_guess    v
 tv       30      0     met_guess    tv
 tsen     30      0     met_guess    tv,q
 q        30      1     met_guess    q
#oz       30      1     met_guess    oz
#cw       30      1     met_guess    cw
 prse     31      0     met_guess    prse
 ps        1      0     met_guess    prse
#sst       1      0     met_guess    sst
::

control_vector_enkf::
!var     level  itracer as/tsfc_sdv  an_amp0   source  funcof
 u        30      0       1.00        -1.0     state    u,v
 v        30      0       1.00        -1.0     state    u,v
 ps        1      0       0.50        -1.0     state    prse
 tv       30      0       0.70        -1.0     state    tv
 q        30      1       0.70        -1.0     state    q
::

control_vector::
!var     level  itracer as/tsfc_sdv  an_amp0   source  funcof
 sf       30      0       1.00        -1.0     state    u,v
 vp       30      0       1.00        -1.0     state    u,v
 ps        1      0       0.50        -1.0     state    prse
 t        30      0       0.70        -1.0     state    tv
 q        30      1       0.70        -1.0     state    q
#oz       30      1       0.50        -1.0     state    oz
#sst       1      0       1.00        -1.0     state    sst
#cw       30      1       1.00        -1.0     state    cw
 stl       1      0       1.00        -1.0     motley   sst
 sti       1      0       1.00        -1.0     motley   sst
::
 

Hi Jia,

The problem may be due to the higher memory needs in hybrid. Can you try with more memory per core? An alternative could also be to change the analysis grid ratio to reduce the size of these arrays. This being said, I think your approach should be fine for these variables in a regional application. 

Will

Permalink

In reply to by wmayfield

Hi, Will,

What do you mean by more memory per core? I thought the memory per core is a fixed value.

I will try to use more CPUs to see whether this problem still occurs or not.

For the state variables I commented out, "cw", "sst", "oz", where does "cw" come into use? I know "qi,qc" are used in cloud analysis and initialized using the background, but I didn't see how "cw" is initialized, and its application.

Jia

Hi, Will,

For the segmentation fault that I encountered when I ran GSIv3.7 in hybrid mode using Intel 2021, I mentioned in my 1st post that, it could be fixed if I deleted entries related to "cw"/"oz"/"sst" in "fix/anavinfo_arw_netcdf".

Today I found out that if I deleted all lines related to openMP (the lines start with "!$omp") in "ensctl2state.f90" and "ensctl2state_ad.f90" (no modifications are made in fix/anavinfo_arw_netcdf), then GSI could run in hybrid mode without problems.

I have tried to run GSI with more CPUs. But if I didn't make the changes as the above, the segmentation fault still shows up.

Jia

Permalink

In reply to by huihuojia

Hi Jia,

To increase memory resources for each processor, using fewer processors per node may help that. You can also try adding options to your run script such as "ulimit -s unlimited". However, from what you have found in your modifications to ensctl2state.f90, it seems that openmp likely is the problem. Can you try adding "OMP_NUM_THREADS=1" to your run script?

Will