FAMOUS

View
Edit
History
Print

This page is dedicated to running FAMOUS using reconfigurations although it might help in more general cases. Please refer to general and introductory gudes and notes if you haven’t done so.

A large part of information written in this page is based on the answers to my questions by Robin Smith and Annette Osprey. Lots of thanks to both of them.

This page is meant to be as foolproof as possible and so inevitably lengthy. Don’t tell me to tidy it up.

0. set environmental constants

0. set environmental constants

You do not have to do these at all, but you will need to know that I did these to understand the rest of this page.

on puma and quest

jobid=xxxxx (this page is based on tcmid and tcmie)

your login name is already set as $USER (I’m ggxmy)

on quest

mkdir $DUMP2HOLD/um/dumps    #create a job-indep dir to store dump files
ln -s $DUMP2HOLD/um/dumps MY_DUMPS

UMUI: [Submodel indep]-[File&directory namings] “Define other environmental variables” add;
```
MY_DUMPS    $DUMP2HOLD/um/dumps
```

1. Reconfigure Atmosphere

1.1. UMUI: [submodel indep]-[compil+modif]-[Compile Options]

Either “compile and build then stop” as in compiling OR “Run from existing exec” as in running should be selected

1.2. UMUI: [submodel indep]-[compil+modif]-[mod for reconfig]

select “Compile and build the executable named below”. this is related to trouble shooting 1.t (2) and (3) below.

1.3. UMUI: [submodel indep]-[Job submission, resources…]

Either “at for other unix” as in compiling OR “qsub for supporting platforms” as in running should be selected

1.4. UMUI: [submodel indep]-[Gen config control]

CHECK “perform reconfig step only” (this is important)

1.5. UMUI: [Atmos]-[Ancil+input]-[start dump]

use a regular start dump eg: $MY_DUMPS/xbyvqa#da000002541c1+
CHECK “using the reconfiguration”
CHECK “Resetting data time to verification time” (and maybe “override…” also) if starting date given in UMUI (above in the same page) is different from the date in atmospheric start dump
leave other selections unchanged if you are not sure (originally selected: using spiral coastal…, billinear)
specify output as $MY_DUMPS/$RUNID.astart

1.6. UMUI: [Ocean]-[input files]-[start dump]

use a proper start dump ($MY_DUMPS/ xbyvqo#da000002541c1+ , qrparm.ocean.restart etc.) (this only matters in the next step though)
“using the reconfiguration” should be UNCHECKED (this is probably important)

1.7. UMUI: press <save> and then <process>

1.8. manually post process on puma if it is necessary for you (see section [2] of NoteFamousQuest)

  jobid=xxxxx	#dont forget setting jobid first!!!!
  echo $jobid	#chk jobid!

  /home/famous/bin/he_namelist_new_phase5 $jobid
  /home/famous/bin/vfdrift_pp.sh $jobid
  /home/famous/bin/quest_queue.sh $jobid

1.9. Submit

  UMUI: press [SUBMIT] ,
  puma; ~jeff/bin/umsubmit -h quest-hpc.bris.ac.uk -u $USER -r scp $jobid , OR
  puma; clustersubmit -c n -s y -r quest $jobid

this takes short but non-negligible time (a few to a few tens of seconds)
output file is created as ~/umui_out/*.leave. ~300KB if successful
$jobid.astart file is created as you specified in [Atmos]-[Ancil+input]-[start dump]
(in my case it is $MY_DUMPS/$jobid.astart = ~/um/dumps/$jobid.astart)
the size should be 2080768B when successful. but in my first attempts for new jobs I never get a file of this size but a one of 2064384B which is not quite the right size. If this happens to you look carefully the .leave file and look if it is the case described in the trouble shooting (2) below. I have not come across a way to preemptively avoid this problem. So what you do is submit, get a wrong .astart file, go through the trouble shooting (2) and fix the problem, and try again. I get a file of the right size in my second attempt after going through the trouble shooting.
~/DUMP2HOLD/um/$jobid/dataw/$RUNID.recon is newly created in the first run that crashes (LET IT CRASH AND DO THE TROUBLE SHOOTING (2) UNTIL WE FIND A BETTER WAY TO RUN!!!). This will not be created after going through the trouble shooting (2), or in other words, the one created manually in the trouble shooting should not be overwritten.

1.10. Save the original astart file

quest:$MY_DUMPS/
```
cp $jobid.astart $jobid.astart.ini
```

1.T. trouble shooting

(1) in my initial attempts, a $jobid.astart file was not created. .leave file looked as normal in terms of the size. However, looking in the .leave file revealed some problems. “Completion code” was 134 and I suppose anything other than 0 suggests an error. just above that there are lines like;

Abort
  /exports/gpfsbig/um/PUM 64?/um/vn4.5/scripts/qsprelim: Error in dump reconfiguration - see OUTPUT

so clearly there was an error. near the end of the file, I found what was wrong;

ERROR : Reconfiguration CONTROL
  No of land points in output Land-sea mask     =    770
  No of land points specified in namelist RECON =    836
  Please reprocess the job with the correct number of land points in UMUI panel

This can be fixed in UMUI: [Atmos]-[Model resol+domain]-[horizontal].

(2) (1) is fixed and .astart file of larger size (2064384B and later found this is not quite the right size; the right size is 2080768B) is created, which looks ‘mostly’ alright, but .leave file still doesn’t seem quite happy;

Abort
  /exports/gpfsbig/um/PUM 64?/um/vn4.5/scripts/qsprelim: Error in dump reconfiguration - see OUTPUT
     Completion code :   134
  …………<skipped>…………
    No Sea Ice Temperature in input dump
    Sea Ice Temperature being initialised.
  Processing Field   119 Stash Code=  408 : SEA-ICE SURFACE TEMP AFTER TIMESTEP
   Warning - non-constant polar row for field                       119
    Problem with reading T* field.
   Error detected in subroutine CONTRO Lcontrol 1?.f^@^@^@^@^@^@^@^@^@^@^@^@^@TRANSPLANTING DATA               ^T^@
  ^@^@�^G^@^@�^G^@^@�^G^@^@�^G
   while doing I/O on unit 21

this problem is at least circumvented (may or may not be essentially fixed) by going through the following procedure;

modify quest:~/DUMP2HOLD/um/$jobid/code/exec_build/qxrecon_dump_dir/control1.f (lines near 5760) as blow;

original:

              IF (ICODE.GT.0) THEN
                WRITE (6,*) ' Problem with reading T* field.'
                CALL ABORT_IO ('CONTROL',CMESSAGE,ICODE,NFTOUT)
              ENDIF

changed:

              IF (ICODE.GT.0) THEN
                if (ICODE.eq.1501) then !!!
                  write(6,*) ' Polar rows not constant in T*.' !!!
     &                     //' This is probably not a problem.' !!!
                else !!!
                  WRITE (6,*) ' Problem with reading T* field.'
                  CALL ABORT_IO ('CONTROL',CMESSAGE,ICODE,NFTOUT)
                end if !!!
              ENDIF

type make in the directory and an executable qxrecon_dump is created. I think it is ok to let it overwrite the original one, but you may want to change the name of the original to keep it.
copy qxrecon_dump to /exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/dataw/ and rename it $jobid.recon

quest1:/exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/code/exec_build/qxrecon_dump_dir% cp qxrecon_dump qxrecon_dump.org
quest1:/exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/code/exec_build/qxrecon_dump_dir% make
quest1:/exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/code/exec_build/qxrecon_dump_dir% ls -ltr
…………………………………………
-rwxr-xr-x  1 $USER users 1380231 Feb  8 17:08 qxrecon_dump.org
-rwxr-xr-x  1 $USER users 1380423 Feb  8 17:15 qxrecon_dump
quest1:/exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/code/exec_build/qxrecon_dump_dir% cp qxrecon_dump /exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/dataw/
quest1:/exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/code/exec_build/qxrecon_dump_dir% cd /exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/dataw
quest1:/exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/dataw% mv $jobid.recon $jobid.recon.old
quest1:/exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/dataw% mv qxrecon_dump $jobid.recon
quest1:/exports/gpfsbig/home/$USER/DUMP2HOLD/um/$jobid/dataw% ls -l
…………………………………………
-rwxr-xr-x  1 $USER users  1380423 Feb  8 17:15 $jobid.recon
-rwxr-xr-x  1 $USER users  1380231 Feb  7 17:36 $jobid.recon.old

Then go to UMUI: [submodel indep]-[comp+mod]-[mod for reconfig] and select “Run from the existing reconfiguration executable”, or the created $jobid.recon will be over-written by the standard set of mods.
Now go back and do 1.7–1.10

Doing these seems to fix the problem and a trouble free .astart file is created, and it has been proved that the job using this .astart file runs successfully.

(3) I was successful in the previous job but in the new job .astart file is not created. This seems a different problem from (1) or (2). .leave file says;

/exports/gpfsbig/um/PUM 64?/um/vn4.5/scripts/qsprelim[779]: /exports/gpfsbig/work/bristol/$USER/um/$jobid/dataw/ $jobid.recon: not found
/exports/gpfsbig/um/PUM 64?/um/vn4.5/scripts/qsprelim: Error in dump reconfiguration - see OUTPUT
*****************************************************************
   Ending script   :   qsprelim
   Completion code :   127

This happened because $jobid.recon was not created, and that might be because you copied the previous job to make the current job and did not undo the setting made in (2). Now I added this in the procedure (1.2) you are less likely to experience this problem. Anyway, here is how to tackle this problem;

UMUI: [submodel indep]-[comp+mod]-[mod for reconfig] and select “Compile and build the executable named below” just as 1.2
go back and do 1.7–1.9, let it crush just as the first time, do (2), and repeat following the instructions

[If you had a different kind of trouble in atmospheric reconfiguration and resolved it, please add the information about it here (or anywhere else) and share it with other users.]

2. Reconfigure Ocean

If started from atmospheric reconfiguration 1–3 should already be set correctly. In that case start from 4.

2.1. UMUI: [submodel indep]-[compil+modif]-[Compile Options]

Either “compile and build then stop” as in compiling OR “Run from existing exec” as in running should be selected

2.2. UMUI: [submodel indep]-[Job submission, resources…]

Either “at for other unix” as in compiling OR “qsub for supporting platforms” as in running should be selected

2.3. UMUI: [submodel indep]-[Gen config control]

“perform reconfig step only” should be checked (this is important!)

2.4. UMUI: [Atmos]-[Ancil+input]-[start dump]

specify the created .astart file: $MY_DUMPS/$RUNID.astart.ini as the “initial dump”
UNCHECK “using the reconfiguration”

2.5. UMUI: [Ocean]-[input files]-[start dump]

use a proper start dump ($MY_DUMPS/ xbyvqo#da000002541c1+ , qrparm.ocean.restart etc.)
CHECK “using the reconfiguration”
CHECK “Resetting data time to verification time” (and maybe “override…” also) if starting date given in UMUI (above in the same page) is different from the date in the oceanic start dump
specify output eg: $MY_DUMPS/$RUNID.ostart

2.6. UMUI: press <save> and then <process>

2.7. manually post process on puma if necessary (see 1.8)

2.8. Submit

  UMUI: press [SUBMIT] ,
  puma; ~jeff/bin/umsubmit -h quest-hpc.bris.ac.uk -u $USER -r scp $jobid , OR
  puma; clustersubmit -c n -s y -r quest $jobid

this takes short but non-negligible time (probably a few tens of seconds)
output file is created as ~/umui_out/*.leave. ~240KB if successful
$jobid.ostart file is created as you set in [Ocean]-[input files]-[start dump] (in my case it is $MY_DUMPS/$jobid.ostart = ~/um/dumps/$jobid.ostart) If successful the size should be 15253504B when the ocean chemistry is turned on, or 7274496B when it is turned off and extra tracers are disabled (see trouble shooting 2.T.(1)).
check trouble shooting if you have a problem here

2.9. Save the original ostart file

  echo $jobid	#chk jobid!
  cp $jobid.ostart $jobid.ostart.ini

2.T. trouble shooting in ocean reconfiguration

(1) $jobid.ostart has only about 2.8MB and completion code in the .leave file is 134.
The completion code suggest it was clearly unsuccessful. Near the bottom found a line indicating what was wrong.

  *ERROR* Stash code  103 not found on input file

If you look at stash (UMUI:[Ocean]-[STASH]-[STASH. Specification…]) you will see what is 103 for (it is ‘OCN EXTRASER 1: CONVEN TCO2′).

This error occurred because the ocean chemistry was turned on in the model but was not in the ocean start dump file. If it is Ok to turn off ocean chemistry, do so in UMUI:[Ocean]-[Scientific Parameters]-[Carbon Cycle]. Also disable extra tracers in UMUI:[Ocean]-[Tracers]-[User Defined Tracers]. Then go back to 2.6 and try again. I got a .ostart of 7274496B, which is much smaller than the case with ocean chemistry turned on (~15MB), but this turned out to be correct.

If you do need the chemistry, you’ll need some fields compatible with the rest of the restart.

[If you had a different kind of trouble in ocean reconfiguration and resolved it, please add the information about it here (or anywhere else) and share it with other users.]

3. Compile

3.1. UMUI: [submodel indep]-[compil+modif]-[Compile Options]

Select “compile and build then stop”

3.2. UMUI: [submodel indep]-[Job submission, resources…]

Select “at for other unix”

3.3. UMUI: [submodel indep]-[Gen config control]

UNCHECK “perform reconfig step only”

3.4. UMUI: [Atmos]-[Ancil+input]-[start dump]

The created .astart file: $MY_DUMPS/$RUNID.astart.ini should be specified as the “initial dump”
“using the reconfiguration” should be checked or unchecked depending on your purpose.
so far I have never turned on atmospheric reconfiguration.

3.5. UMUI: [Ocean]-[input files]-[start dump]

specify the created .ostart file: $MY_DUMPS/$RUNID.ostart.ini as the “initial dump”
UNCHECK “Resetting data time to verification time” ????
“using the reconfiguration” should be checked or unchecked depending on your purpose
now I turn ON ocean reconfiguration to make effective the freshwater ancillary file and vfdrift modification.

3.6. UMUI: press <save> and then <process>

3.7. manually post process on puma if necessary (see 1.8)

3.8. Submit

  UMUI: press [SUBMIT] ,
  puma: ~jeff/bin/umsubmit -h quest-hpc.bris.ac.uk -u $USER -r scp $jobid , OR
  puma; clustersubmit -c n -s y -r quest $jobid

this will take several minutes
output file is created as ~/umui_out/*.leave. this is usually 150–300KB when successful
an executable ~/DUMP2HOLD/um/$jobid/dataw/$RUNID.exec should be created as specified in [submodel indep]-[compil+modif]-[Compile Options]

4. Run

4.1. UMUI: [submodel indep]-[compil+modif]-[Compile Options]

select “Run from existing exec”

4.2. UMUI: [submodel indep]-[Job submission, resources…]

select “qsub for supporting platforms”

4.3. (same as 3.3) UMUI: [submodel indep]-[Gen config control]

“perform reconfig step only” should be unchecked

4.4. (same as 3.4) UMUI: [Atmos]-[Ancil+input]-[start dump]

the created .astart file: $MY_DUMPS/$RUNID.astart.ini should be specified as the “initial dump”
“using the reconfiguration” should be checked or unchecked depending on your purpose. so far I have never turned on atmospheric reconfiguration.

4.5. (same as 3.5) UMUI: [Ocean]-[input files]-[start dump]

the created .ostart file: $MY_DUMPS/$RUNID.ostart.ini should be specified as the “initial dump”
“using the reconfiguration” should be checked or unchecked depending on your purpose. now I turn ON ocean reconfiguration to make effective the freshwater ancillary file and vfdrift modification.

4.6. UMUI: press <save> and then <process>

4.7. manually post process on puma if necessary (see 1.8)

4.8. Submit

  UMUI: press [SUBMIT] ,
  puma: ~jeff/bin/umsubmit -h quest-hpc.bris.ac.uk -u $USER -r scp $jobid , OR
  puma; clustersubmit -c n -s y -r quest $jobid

In Bristol clustersubmit is said to be preferable. It generally has more control and the advantage in this particular case is that you can skip step 4.9.
messages like following may show up but it turned out it’s actually ok.

  qsub: invalid option ? x
  qsub: invalid option ? s
  usage: qsub [-a date_time] [-A account_string] [-b secs]
  [-c  c[=‹INTERVAL?] ] [-C directive_prefix] [-d path] [-D path]
  [-e path] [-h] [-I] [-j oe] [-k {oe}] [-l resource_list] [-m {abe}]
  [-M user_list] [-N jobname] [-o path] [-p priority] [-q queue] [-r y|n]
  [-S path] [-u user_list] [-X] [-W otherattributes=value?] [-v variable_list]
  [-V ] [-z] [script]

If UMUI [SUBMIT] button is clicked or umsubmit is used

output file (quest:~/umui_out/***.leave) is not created.
THIS WILL NOT START THE RUN. SO GO ON TO 4.9!!!

If clustersubmit is used

it should start the run so you can SKIP 4.9!!!
the output file (quest:~/umui_out/***.leave) will be created after the run is stopped or finished.

4.9. Run

on quest cd to the most recent directory in $HOME/umui_runs

cd $HOME/umui_runs
ls -lrt
cd $jobid-012345678   (go to the latest directory)
ls -l

submit the job with

~um/bin/qsub-um qsubmit.quest#   (# = 1 or 2; look in the directory)

output is not created until the run completes in some way
(either finished successfully or stopped due to error)
it will be ~/umui_runs/[jobid]−012345678/[jobid]000.o1234

4.10. check the status of the run on quest

  qstat		#check all jobs runing on quest
  qstat -u $USER	#check your jobs only@]

4.T. trouble shooting in running the model

(1) Run stopped after couple tens of seconds. Near (not quite at) the end of the output (~/umui_runs/[jobid]−012345678/[jobid]000.o1234) said “LSEGF NOT LARGE ENOUGH.”

Robin says: This comes from the subroutine that sorts out some basic stuff for the fourier filtering of the high latitude lines. It looks like the relevant loop goes through the map line by line, looking for separate areas. The maximum number of areas allowed for this procedure is set in the UMUI - it was 6, but the map has one line that needs 7.

In UMUI:[Ocean]-[Scientific Parameters]-[Fourier Filtering], change “Maximum number of start and end indices” to 7 or more.
Just repeat this section and the model started to run.

(2) Run stopped after tens of years of simulation time
As always look in the output file ([jobid]000.o1234 or [jobid]***.leave). I got the following message;

  Model aborted with error code -    1 Routine and message:-
                          P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED.

This is a common error and I have had it many times. It seems the simulated climate went unstable and the simulation died.

Robin says: Often this means that something in the boundary conditions has pushed the climate past the model’s capability - either too hot, or too cold somewhere. Carefully check the model output fields and look for things that are out of place. Sometimes, however, this seems to happen in FAMOUS with perfectly normal climates - we’ve never worked out why.

In the previous versions of FAMOUS, this problem can be overcome simply by resubmitting the job. However, this version is bit-reproducing, so simply resubmitting the job will result in exactly the same error.

Robin continues: (If there is nothing wrong in the climate,) make a small perturbation to the climate and restart, and it’ll run fine. The easiest way to do this is to reconfigure the atmosphere dump (this adjusts some of the coastal tiling fields), although sometimes I change the date on the previous year’s ocean dump (again, by reconfiguring) and use that to restart.

So basically what you can do is use the latest atmospheric and ocean dumps as initial dumps, reconfigure, and run again;

Copy the latest atmosphere and ocean dumps (quest1:~dump2hold/$jobid/${jobid}[a|o]#da00000yyyyc1+) to the directory where you save your dump files (it is MY_DUMPS in my case)
Go back and do steps 1, 2 and 4
- Skip 1.2 and you don’t have to worry about the trouble shooting 1.T.(2). I get a proper .astart file in my first attempt.
- Specify these dumps as initial dumps in 1.5 and 2.5.
- Change the start year to the year of the dump files (and the run length accordingly).
- Don’t worry about steps 3 and 5. The same executable can be used. Simply submit the job as a new run with the same jobid.

[If you had a different kind of trouble in running the model and resolved it, please add the information about it here (or anywhere else) and share it with other users.]

5. Submit a continuation run

I have heard of a few ways to submit a continuation run.

Common for all methods

Do 5.1 and 5.2 if necessary.

5.1. UMUI: [submodel indep]-[start date + run length options]

set a longer length as you like

5.2. repeat 4.1~4.5.

Then do one of (1)~(4).

(1) Use `clustersubmit`

  puma; clustersubmit -c y -s y -r quest $jobid

Here the flag -c is to specify whether this is a continuation run (y) or not (n). (-s is to specify whether to submit (y) or copy files over (n), and -r is to specify the target machine (e.g. quest, ormen, etc.))

(2) Modify `qsubmit.quest#`

5.3. modify qsubmit.quest#

just like 4.6, on quest cd to the most recent directory in $HOME/umui_runs

cd $HOME/umui_runs
ls -lrt
cd $jobid-012345678   (go to the latest directory)
ls -l

open qsubmit.quest# (# = 1 or 2; look in the directory) with a text editor and replace “TYPE=NRUN” with “TYPE=CRUN”.

5.4. submit to run

just like 4.6, submit the job with
```
~um/bin/qsub-um qsubmit.quest#
```

(3) Do the same thing but on puma

cd to ~/umui_jobs/$jobid, edit SUBMIT as shown in 5.3 above and then run umsubmit to copy the new scripts across. Then submit using qsub-um on quest as 5.4.

(4) Add a post processing script

Include the following in the post processing.
```
/home/annette/famous/bin/change_crun_vn4.5
```

By doing this UMUI will automatically do the same thing as (3).

There is also some information about resubmission in the NCAS-CMS website.

Page last modified on May 01, 2008, at 04:48 PM by Masaru Yoshioka

Contents

0. set environmental constants

1. Reconfigure Atmosphere

2. Reconfigure Ocean

3. Compile

4. Run

5. Submit a continuation run

0. set environmental constants

1. Reconfigure Atmosphere

1.T. trouble shooting

2. Reconfigure Ocean

2.T. trouble shooting in ocean reconfiguration

3. Compile

4. Run

4.T. trouble shooting in running the model

5. Submit a continuation run

Common for all methods

(1) Use clustersubmit

(2) Modify qsubmit.quest#

(3) Do the same thing but on puma

(4) Add a post processing script

(1) Use `clustersubmit`

(2) Modify `qsubmit.quest#`