D2D Operations Info

v AWIPS OB8/OB9

Quick access internal links

Audience

This document is directed at GSD and WFO staff who may be called upon to diagnose problems with the WFO-Advanced data ingest, internal communications, and display software at WFO Denver/Boulder. The operational staff at the WFO has higher-level monitoring and restart tools available that are not described here.

Support

Primary support is from the NCF (301-713-9344). GSD staff are also "on call" (informally) to help with problems. WFO staff are authorized to call Joe Wakefield, Darien Davis, or Carl Bullock at home for help, if NCF can't. They also have Gregg Phillips' cell phone number.

Note for GSD folks: to get hold of WFO staff, use 303-494-4454 (this is the national coordination number, the one other WFOs use to call in). The Admin number there is 303-494-3210, and the external coordination number (media, etc.) is 303-494-4479. GSD operators are on duty 24x7, and can be reached at 303-497-6887/303-230-3454 (pager)/opers.its.gsd@noaa.gov.

Environment

The Boulder computers carry the suffix -bou, which is the official ID of the Denver/Boulder WFO. The complement of servers includes ds1-bou (database, radar ingest), dx1-bou (primary ingest), dx2-bou, px1-bou (SCAN/FFMP, notification, purging), and px2-bou. (Name aliases allow these to be referred to locally as ds1, dx1, dx2, px1, and px2.)

When you log into one of the -bou subnet machines as fxa or an individual user, several scripts are automatically run to set a number of environment variables, etc. The .cshrc script sets this process in motion. Settings of interest include FXA_DATA (/data/fxa), LOG_DIR (/data/logs/fxa), FXA_HOME (/awips/fxa), and TZ (GMT).

All D2D processes are found in ~fxa/bin, and data files (tables, menus, WarnGen templates, etc.) are in ~fxa/data. Most ingest logs are in $LOG_DIR/yyyymmdd (type "logs" to get there), with a few in $LOG_DIR; these are on local disks.

The log file for a user interface process is $LOG_DIR/display/<displayName>/<date>/fxaWish<pid> where <displayName> is :0.0 for the center display, :0.1 for the left display, and :0.2 for the right displayn; <date> is the UTC date when the user interface process was started in YYYYMMDD format; and <pid> is the process ID.

The log files for the IGC processes, the application manager, the applications, the extensions, and all descendents of the user interface process are in the directory $LOG_DIR/display/<displayName>/<date>/fxaWish<pid>.children and have the format <programName><pid> where <programName> is the name of the executable.

Since FXA_HOME/bin is in fxa's PATH, there's no need to include that when entering process commands, and that's reflected in the commands included in these instructions. All commands that you'll need to enter are shown in bold type. Except as noted, all will be run from the fxa account.

You can get to today's ingest log directory simply by typing logs, and up will get you to its parent, where some logs live. The naming convention for ingest logs is <processName><pid><hostname><hhmmss>.

Overview

The diagram below outlines the flow of messages through the WFO-A ingest system. (It's way out of date, obviously - heck, by now, it's a museum piece - but still instructive.) Following sections discuss each interface in detail.

General data ingest

All ingest processes are started automatically at boot time. dx1/dx2, dx3/dx4, and px1/px2 are heartbeat pairs that are monitored and started with the hb_ software. As root, use hb_stat to see if (and where) the packages are running, and hb_swap to start one up (e.g., hb_swap px1apps px1-bou. Michael Vrencur recommends running hb_swap on the node from which the package is being swapped.

A bigger hammer is service heartbeat restart. If, for example, you run this on dx3, expect that dx3apps will swap over to dx4. Then you can jump on the latter and swap the package back to dx3.

As long as the package is up, you can run the start/stop scripts by hand. Should it be necessary to restart, use stopIngest and startIngest.[dx1|dx2|dx3|dx4|px1|px2].

Processes included in stop/startIngest for OB8.1, in the order they are started: (Note that the scripts use $FXA_HOME, which resolves to /awips/fxa. What's shown here is the text that appears in a ps listing.)

For ds1:
[if $mhs_host=ds]
/awips/fxa/bin/MhsServer
/awips/fxa/bin/MhsRequestServer

For dx1:
/awips/fxa/bin/DataController COMMS_ROUTER PDCservcontrol.co
  /awips/fxa/bin/PDCserver
[next two if $mhs_host=dx1f]
/awips/fxa/bin/MhsServer
/awips/fxa/bin/MhsRequestServer
/awips/fxa/bin/textNotificationServer
/awips/fxa/bin/NWWSProduct
/usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c DX1

For dx2:
/awips/fxa/bin/RadarServer
/awips/fxa/bin/DialServer
/awips/fxa/bin/RMR_Server
/awips/fxa/bin/DataController COMMS_ROUTER TextDB2_Controller.config
  /awips/fxa/bin/RadarTextDecoder
/awips/fxa/bin/RadarMsgHandler
/awips/fxa/bin/DataController COMMS_ROUTER RadarController.config
  /awips/fxa/bin/RadarStorage
  /awips/fxa/bin/HandleGenericMsg
/awips/fxa/bin/ORPGCommsMgr KFTG
/usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c DX2

For dx3:
/awips/fxa/bin/acqserver 21600
  /awips/fxa/bin/acqserver 8 21600
  <13 more of these>
/awips/fxa/bin/CommsRouter COMMS_ROUTER
/awips/fxa/bin/CommsRouter GRID_ROUTER
/awips/fxa/bin/DataController GRID_ROUTER GribController.con
  /awips/fxa/bin/GribDecoder
  /awips/fxa/bin/Grib2Decoder
/awips/fxa/bin/DataController GRID_ROUTER GribImgController.
  /awips/fxa/bin/GribImgDecoder
/awips/fxa/bin/DataController COMMS_ROUTER SatelliteControll
  /awips/fxa/bin/Satdecoder
/awips/fxa/bin/DataController COMMS_ROUTER TextCont2.config
  /awips/fxa/bin/RaobBufrDecoder
  /awips/fxa/bin/AircraftDecoder
  /awips/fxa/bin/MaritimeDecoder
  /awips/fxa/bin/profilerDecoder
  /awips/fxa/bin/RedbookStorage
/awips/fxa/bin/DataController COMMS_ROUTER TextDB_Controller
  /awips/fxa/bin/CollDB_Decoder
  /awips/fxa/bin/StdDB_Decoder
/awips/fxa/bin/DataController COMMS_ROUTER WarnDB_Controller
  /awips/fxa/bin/WarnDBDecoder
/awips/fxa/bin/DataController COMMS_ROUTER TextCont.config
  /awips/fxa/bin/MetarDecoder
/awips/fxa/bin/DataController COMMS_ROUTER BufrDriverContr.c
  /awips/fxa/bin/BufrDriver model
  /awips/fxa/bin/BufrDriver goes
  /awips/fxa/bin/BufrDriver acars
  /awips/fxa/bin/BufrDriver poes,quikscat
  /awips/fxa/bin/BufrDriver hdw
/awips/fxa/bin/DataController COMMS_ROUTER GFSdriverContr.c
  /awips/fxa/bin/gfsDriver
/awips/fxa/bin/DataController COMMS_ROUTER SSMIdriverContr.config
  /awips/fxa/bin/SSMIdriver
/awips/fxa/bin/DataController COMMS_ROUTER BufrMOScontr.config
  /awips/fxa/bin/BufrMosDecoder
/awips/fxa/bin/DataController COMMS_ROUTER TextCont3.config
  /awips/fxa/bin/binLightningDecoder
/awips/fxa/bin/DataController COMMS_ROUTER TextCont4.config
  /awips/fxa/bin/SynopticDecoder
/usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c DX3

For dx4:
/awips/fxa/bin/notifyTextProd COMMS_ROUTER /awips/GFESuite/primary...
/awips/fxa/bin/notifyTextProd COMMS_ROUTER /awips/GFESuite/svcbu...
/usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c DX4

For px1:
/awips/fxa/bin/DataController COMMS_ROUTER SCANcontroller.config
  /awips/fxa/bin/SCANprocessor
/awips/fxa/bin/DataController COMMS_ROUTER FFMPcontroller.config
  /awips/fxa/bin/FFMPprocessor
/awips/fxa/bin/DataController COMMS_ROUTER SRUcontroller.config
  /awips/fxa/bin/SRUprocessor
/awips/fxa/bin/DataController COMMS_ROUTER FMcontroller.config
  /awips/fxa/bin/FMprocessor
/awips/fxa/bin/DataController COMMS_ROUTER SNOWcontroller.config
  /awips/fxa/bin/SNOWprocessor
[at marine WFOs]
/awips/fxa/bin/DataController COMMS_ROUTER SScontroller.config
  /awips/fxa/bin/SSprocessor
/awips/fxa/bin/asyncScheduler
/awips/fxa/bin/hmMonitorServer
/awips/fxa/bin/NWWSSchedule
/usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c PX1

For px2:
/awips/fxa/bin/ldadServer
/usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c PX2

And processes in start/stopTextDB:
For dx1:
/awips/fxa/bin/TextDB_Server -Read
/awips/fxa/bin/TextDB_Server -Write

For px1:
/awips/fxa/bin/textNotificationServer

The stop/start scripts handle the non-indented items in the list. Indented items are children spawned by the process listed immediately above.

Other persistent items started by cron:

on ds1:
/awips/fxa/bin/ingProcMon.pl -c DS
/awips/fxa/bin/ctrlCpu
on dx1:
/awips/fxa/bin/purgeProcess
/awips/fxa/bin/ingProcMon.pl -c DX1
/awips/fxa/bin/ctrlCpu
on dx2:
/awips/fxa/bin/ingProcMon.pl -c DX2
/awips/fxa/bin/ctrlCpu
on dx3:
/awips/fxa/bin/ingProcMon.pl -c DX3
/awips/fxa/bin/ctrlCpu
on dx4:
/awips/fxa/bin/ingProcMon.pl -c DX4
/awips/fxa/bin/ctrlCpu
on px1:
/awips/fxa/bin/ingProcMon.pl -c PX1
/awips/fxa/bin/processSummary.pl
/awips/fxa/htdocs/ldadMon/bin/MakeSUMMpage
/awips/fxa/htdocs/ldadMon/bin/MakePROCpage
/awips/fxa/bin/ctrlCpu
on px2:
/awips/fxa/bin/ctrlCpu

Also started separately:

on dx1:
/awips/fxa/bin/notificationServer

Rarely, the GRIB decoder will hang on bad grids (can't remember the last time it happened). You'll see this by the GribDecoder process using lots of CPU time for extended periods, and a check on the log will show nothing happening. Issue kill -10 <pid> to force a crash. The signal handler will remove the bad grid and the controller will start a new decoder.

If radar is not auto-updating, you'll probably need to restart the notificationServer. When you use stopNotificationServer to kill the server, it may take some time to update its client list, which is found in $FXA_DATA/workFiles/notificationServerClientListState.txt. It will do a kill -9 after 20 seconds, if necessary. Use startNotificationServer to get it running again. The textNotificationServer has a similar feature; its client list is in $FXA_DATA/workFiles/textNotificationServerClientList.txt.

SBN ingest

The bulk of our datasets are received over the Satellite Broadcast Network (SBN) via the SBN communications processors. Please note that cpsbn1 and cpsnb2 are monitored by the AWIPS Network Control Facility (NCF), which is also responsible for their maintenance. There is a switch box near cpsbn1 that must usually be in `Modem' position, so NCF operators can check on its operation.

If SBN data (satellite, METARs, text, grids) are not arriving, check the CP operation, to see if it's hung. ssh cpsbn1 for TG data, or cpsbn2 for NESDIS, as user root. (Note: if you need to log in at the console, you'll need to move the CP switch to the Monitor position.) Type inmon to run the ingest monitor and outmon to run the dissemination monitor. In the former, all lines in the lower section will show, e.g., cpsbn1-bou, since the data are coming from the CP and being stored locally on disk. In the latter, you'll see connections to dx3f-bou on each CP. (If both CPs have stopped at the same time, it's likely that there's an uplink problem at the NCF, or there could be a downlink problem. Check with NCF (301-713-9344) before restarting.) Other problem indicators are lots of buffers or distribution headers in use. If times are not up to date in the xfr column, you can restart using stop_cpsbn_all and start_cpsbn_all. A lot of text will scroll by as the software starts up. Monitor the system again with inmon and outmon; you should see the TG line connect within a few seconds, though the NESDIS line may take several minutes. Log out (exit) (and switch back to Modem if at the console). Child acqservers will go down when you stop the CP, then will come back as data are sent.

If this doesn't work, have the forecasters check the Sync and Signal green lights on the demod. If these are out, have them contact the NCF for information. (This is unlikely, as the NCF monitors that portion of the system.) If the signal looks good, but you can't connect, you may need to reboot the CP. Log in and enter /etc/reboot. Ingest processes start automatically. (If you can't log in, you can press the reset button that's just above the CP's power switch at lower right. The system will reboot itself. Using the reboot command is preferred.)

If necessary, either CP can be configured to send both data streams to the server. Call the NCF and tell them which of the CPs has failed. They will perform the failover. [The split of data is set using config_dvb. Issue this without argument to see which channels are enabled on each CP. To enable a channel, use, e.g., config_dvb -a -c GOES; to disable one use config_dvb -r -c NMC3. Then restart the CP ingest.]

Data: As data are received, they are first written to circular buffers on the CP disks, in /data/co/<DDIR>/d<FILE>/[nn]/<FILE>.nnnn, where DDIR and FILE are from STORE lines of /awips/data/acq_send_parms.sbn, nn is 00..mm, and nnnn is 0000..9999 (or 000..999 or less, in accordance with the MAX_PER_DIR and MAX_FILE settings in the STORE lines). The number of directories used, mm, is determined by dividing MAX_FILE by MAX_PER_DIR. If all of the files will fit in one directory, the nn part of the directory path is not needed.

Next, the files are copied to distribution staging areas in /data/co/<DDIR>/l<FILE>_g<GROUP>_h<HOST>/nn, where DDIR, FILE, and nn are as above, and GROUP and HOST are from SEND lines of acq_send_parms.sbn. Example:

SEND[13]="STORE_ID=13 LINK_ID=2 LABEL=nmc2_misc      GROUP=2 HOST=1"

STORE[13]="ID=13 WMO=*                  PROD_TYP=NWSTG    SBN_CHAN=NMC2 \
          DDIR=NMC2   FILE=nmc2_misc      MAX_FILE=100000  MAX_PER_DIR=10000"
entries result in datasets being staged in /data/co/NMC2/lnmc2_misc_g2_h1/nn, where nn runs 00..09. From here, an acq_send process sends the data to a corresponding acqServer on the receiving host, and removes the files from the l... directories. The CP will queue files if the receiver is not up (up to the limits specified in the STORE line) and refresh them after a connection is established.

Logs: The CPs write their logs locally to /awips/logs/, /data/co/logs/Products/cpsbni-bou/sbn_procm/mcProduct.log, .../acq_clntm_hn/mcProduct.log, and .../acq_sendm.ho.gp_m5/mcProduct.log, where m is the link number that you see in an acq_stats display (defined in acq_send_parms.sbn), n is a subchannel number that's seen in the inmon acq_stats display, o is the host number (defined in acq_send_parms.sbn), and p is the group ID, again from acq_send_parms.sbn. The system breaks these logs when they hit 1MB size, putting older logs in ARCHIVE/mmmdd/mcProduct* files; these can be useful in diagnosing missing data. If a cp gets overloaded, it's logged in the mcProduct logs, tagged STATUS LOST n products. We saw this periodically on cpsbn1-fsld when it was serving four hosts.

An ingest note: Twice on cpsbn1-fsld, we've seen IUAX02 (MDCRS) files fail to get to our dxs. Investigation showed numerous short files in /data/co/NMC/lbufr_g5_h0/00/tmp0h0g5m5 (e.g.) with content Key file not accessible! It seems that this occurred after reboot, and the fix (thanks to Maureen Tankersley) is to make a link /.key -> /root/.key. Apparently, upon reboot, $HOME is not defined as /root, so the system looks in / for this file. (I suppose with the link in place, the problem won't happen any more, but, just in case...) And oh, by the way, /usr/local/bin/cruft is the actual decrpytion program. Make sure that's in place - can copy from another CP if necessary.

Use of WMO headers: In acq_wmo_parms.sbn, we exclude certain datasets (chiefly, AK/HI/PR grids and satellite images). The "codes" used with those are not at all obvious. Here is a list provided 3/00 by Leroy Klet.

/* Table of T1 letters to codes */
 
T1=A    PC=13   ASCII Analysis
T1=B    PC=19   ASCII Admin Msg
T1=C    PC=14   ASCII Climatic
T1=D    PC=44   GRID
T1=E    PC=51   Satellite Imagery
T1=F    PC=15   ASCII Forecast
T1=G    PC=45   GRID
T1=H    PC=46   GRID
T1=I    PC=31   BUFR Obs
T1=J    PC=32   BUFR Forecast
T1=K    PC=71   Unused
T1=L    PC=72   Unused
T1=M    PC=73   Unused
T1=N    PC=16   ASCII Notices
T1=O    PC=43   Grid
T1=P    PC=10   Graphic
T1=Q    PC=11   Graphic
T1=R    PC=74   Unused
T1=S    PC=17   ASCII Surface
T1=T    PC=52   Satellite Imagery (same as GOES) 
T1=U    PC=61   Upper Air
T1=V    PC=62   National Data
T1=W    PC=18   ASCII Warnings
T1=X    PC=47   GRID
T1=Y    PC=41   GRID
T1=Z    PC=42   GRID
      
This definition is in the AWIPS baseline (NCF Comms workset) under 
.../src/co/include/cp_product_code.h
Satellite data file IDs
raw satellite name channel
4-sat composite (grid201)
TICF01 vis
TICF03 iwv
TICF04 i11
Alaska
TIGA06 i12 (13μ from GOES E during eclipse)
TIDB17 SSM/I TPW
TIDB29 SSM/I rain rate
TITB17 AMSU TPW
TITB29 AMSU rain rate
TITB61 POES vis
TITB63 POES 3.74μ
TITB64 POES 11μ
eastCONUS
TIGE01 vis
TIGE02 i11
TIGE03 i12 (12μ from GOES W during eclipse)
TIGE04 i39
TIGE05 iwv
TIGE06 i12 (13μ)
TITE61 POES vis
TITE63 POES 3.74μ
TITE64 POES 11μ
northern hemi
TIGF01 vis
TIGF02 i11
TIGF03 i12
TIGF04 i39
TIGF05 iwv
Pacific
TIGI43 i14
TIGI48 i11
TIGI50 wv74
TIGI51 wv70
TIGI52 wv65
TIGI55 i45
TIGI57 i40
TIGI59 vis
Hawaii
TIDI17 SSM/I TPW
TIDI29 SSM/I rain rate
TITI17 AMSU TPW
TITI29 AMSU rain rate
TITI61 POES vis
TITI63 POES 3.74μ
TITI64 POES 11μ
raw satellite name channel
superNational
TIGN01 vis
TIGN02 i11
TIGN03 i12
TIGN04 i39
TIGN05 iwv
TIGN16 sli (LI)
TIGN17 spw (precip H2O)
TIGN18 sst (skin temp)
TIGN27 scp (cloud top pressure)
TIDN17 SSM/I TPW
TIDN29 SSM/I rain rate
TITN17 AMSU TPW
TITN29 AMSU rain rate
Puerto Rico
TIGP01 vis (Mercator - regional)
TIGP02 i11
TIGP04 i39
TIGP05 iwv
TIGP06 i12 (13μ)
TIGQ01 vis (polar stereo - national("prBig"))
TIGQ02 i11
TIGQ05 iwv
TIDQ17 SSM/I TPW
TIDQ29 SSM/I rain rate
TITQ17 AMSU TPW
TITQ29 AMSU rain rate
TITQ61 POES vis
TITQ63 POES 3.74μ
TITQ64 POES 11μ
Atlantic
TIGQ43 i14
TIGQ48 i11
TIGQ50 wv74
TIGQ51 wv70
TIGQ52 wv65
TIGQ55 i45
TIGQ57 i40
TIGQ59 vis
westCONUS
TIGW01 vis
TIGW02 i11
TIGW03 i12
TIGW04 i39
TIGW05 iwv
TIGW06 i12 (13μ from GOES E during eclipse)
TITW61 POES vis
TITW63 POES 3.74μ
TITW64 POES 11μ
westCONUS sounder images
TIGW43 i14
TIGW48 i11
TIGW50 wv74
TIGW51 wv70
TIGW52 wv65
TIGW55 i45
TIGW57 i40
TIGW59 vis

More information on satellite sectors, including mapping and geographic coverage, is available on the NOAAPORT User's Page and in the AWIPS-NESDIS ICD (PDF).

Radar ingest

Radar products come from the ORPG box, via the ORPGCommsMgr process. (For fsld and alps, the data flow is wideband raw data via LDM from GSD's Central Facility to cx-alps, where products are generated then sent to dx2. fslc connects to BOU's ORPG over the WAN. Note: After ORPG 9 was installed at BOU, our connection caused them problems. Evidently, ORPG 9 cares that there are two connections from the same PUP. (Since we run fslc with a BOU localization, it was using the BOU setting.) We're now using the FSL number in a customFiles version of pupId.txt, and all is copasetic.)

Files are stored temporarily in $FXA_DATA/radar/raw and /text. Files in /raw are moved by RadarStorage to the appropriate product directories, e.g., /kftg/Z/ or V/. The text/ files are processed by the RadarTextDecoder process; output goes to the text database (e.g., WSRVWPFTG). We have had problems occasionally on fsld/alps where each local product caused a 3s timeout while trying to connect to MHS. We set MHS_SERVER to localhost in ipc.config to avoid that.

A comms status file is maintained in /data/fxa/workFiles/wfoApi.StateInfo (so named for historical reasons). Every time a connection is received from an ORPGCommsMgr process, information about the radar and the process is recorded. The entries include radar ID and name, max number of products, ORPGCommsMgr 'target string,' current VCP, operations mode, scan interval, connection state (1=connected), and 'firstGsm' (1 when started, then changes to 0 after the first GSM product is received). (This information can be gleaned from ipc/radar/RadarServerClient.C.)

The ORPGCommsMgr not only receives data from the ORPG, but is also responsible for sending data out on the WAN. Whenever ORPGCommsMgr is stopped (or if the line goes down), the connection state tag is changed from 1 to 0 in wfoApi.StateInfo. As long as this is set to 1, then RadarStorage will not store SBN-received products for the radar in question. (Though I've not seen it, I'm told that wfoApi.StateInfo can have multiple lines.)

On dx2, a cron job runs $FXA_HOME/bin/restartRadar about every four minutes, checking whether the ingest (ORPGCommsMgr) is up and starting it if necessary. Data come from host rpg-kftg. (restartRadar gets the port number out of ~fxa/data/orgpDedicated.txt, then executes ORPGCommsMgr.)

Radar ingest processes also include the RadarServer and the DataController/RadarStorage pair. The former communicates via the ORPGCommsMgr process with the ORPG, while the latter are responsible for storing radar products as they are received.

General Status Messages (GSMs) can be used to check on the status of the 88D. This can be checked from the workstation `radar status' window or the Unit Status Message graphic, the last entry in the top section of kftg>Graphics> menu. You can also tail -30 $FXA_DATA/workFiles/RADAR_Announcer to see what's what.

Please note that the RadarServer process must be running in order to send the RPS list and get data. The radar ingest (ORPGCommsMgr) will start but will not stay up if the RadarServer is down. RadarServer is started as part of startIngest.

If no data to fsld/alps, check the ingest on cx-alps. Log on and become user fxa.

First, check to see if LDM is running: proc ldm. You should see a couple of rpc.ldmd processes, a pqact, and a read_ldm. Check to see if data are coming by using /usr/local/ldm/bin/ldmadmin watch. You should see CRAFT files coming in every few seconds. Check /scratch/data/ldm/nexradII/KFTG; read_ldm should be updating a yyyymmddhhmmss.raw file, again every few seconds.

If LDM is not running, start it with /usr/local/ldm/bin/ldmadmin start. To get radar data, ldmd.conf includes a line like

request CRAFT
        ^L2.*KFTG
        137.75.129.113
NEXRAD2 works in place of CRAFT. For multiple radars, use (KFTG|KPUX) form.

As for ORPG, proc fxa should show a host of processes with names like rpgdbm -v and swp -v.

If you see Connection refused messages in the ORPGCommsMgr log, you'll need to restart the mrpg software on cx-alps. As user fxa, cd ~/orpg_build9 and source .cshrc. Type site KFTG to restart the mrpg suite. You can run HCI (see below) to watch what's happening.

Among other things, 'site' stops and starts the mrpg suite. If you just want to stop it, use:

  1. mrpg shutdown
  2. mrpg cleanup

ORPG configuration notes

From the ORPG home directory on cx-alps, cd cfg. comms_link.conf includes settings for 1..6 TCP links, and tcp.conf relates these to ports 4489..4494. We arbitrarily have decided to use 4490 (2) on fslc, 4492 (4) on fsld, and 4493 (5) on alps. The line numbers and connections can be checked in the HCI Comms display. (See below.)

Access to data from a receiving host (dx2-xxx) is configured in ~fxa/data/orpgDedicated.txt. Using the TCP port and link numbers noted above, enter the IP address of the host (cx-alps or whatever), and the appropriate radar name and ID, which can be found in nationalData/radarInfoMaster.txt.

To check on the connections from cx-alps, use netstat -a|grep 44 to see the status of these ports. If ORPGCommsMgr is not running, you should see a LISTEN entry for the port. If it is running, you'll see two ESTABLISHED entries. A WAIT notation is not a good sign. Experience suggests you need to stop the ORPGCommsMgr and wait for the port to disappear (back to just LISTEN), then restart.

cfg/product_generation_tables has the list of products and parameters. You'll need to restart mrpg after this table has been edited. cfg/site_info.dea also needs to be set up for the radar to be ingested.

Logs are in $ORPGDIR/logs, but they must be read with lelb_mon <name> (leave off the .log).

ORPG/LDM configuration notes

As noted above, data are delivered via LDM, which runs under user fxa. This includes the standard set of LDM processes. pqact directs the data files to /scratch/data/ldm/nexradII/<radar>/, where they are picked up by the read_ldm process started by the site script. To start LDM, cd /usr/local/ldm/bin, then
ldmadmin start

We used to use a n2bz/ldml2server/ldml2client process set to handle the LDM data. These processes were written by Warren Blanchard, who has retired. Although the suite still works, we have moved to the ROC method, which uses read_ldm (incorporated in the 'site' scripts). To support this, /usr/local/ldm/etc/pqact.conf is simple, including

NEXRAD2 ^L2-([^/]*)/(....)/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9][0-5][0-9][0-9][0-9])     FILE    /scratch/orpg_build8/ldm/nexradII/\2/\3.raw

(Don't make the mistake Joe made in Feb/Mar 06, where during debugging of the montana ingest he enabled both the NEXRAD2 and CRAFT lines in pqact.conf. Both get exercised, resulting in double storage of the data, which does not play well downstream.)

read_ldm puts the data in resp.0, whence they're read by mrpg. read_ldm writes a standard ORPG log file, which you can read with lelb_mon read_ldm. It typically logs 300 more "messages written to LB" every few seconds. At the end of each scan, it does final processing on and removes the file from /scratch... (An advantage of read_ldm is that it will read the current volume's file when started. Thus, we don't have to wait for the beginning of a volume to sync up and get data, unlike ldml2server/client.)

Note that 'site' starts up mrpg with -p. Among other things, this creates a new resp.0, thus requiring a restart of read_ldm. Conveniently, this is built into the script.

ORPG Human-Computer Interface (HCI)

Refer to the EPSS support manual (current version, ORPG 7) for information on HCI. We can run HCI by following these steps:
on cx-alps, become user fxa and set the ORPG environment as above
newer X configurations probably won't allow you to export the hci output to your desktop. I've been using xt1-avs for this purpose, with xhost + cx-alps and then setenv DISPLAY xt1-avs:0
hci
Try Status, Products -> Products in DB, Base Data Display.

Snowfall accumulation is reset using HCI. See the EPSS how-to for information. (Note: If you get a "password data not available" message, run hci_init_config as user v1.14.)

Here's one of those 'may not happen again' items. Perhaps due to work done at kftg, we lost connection from montana to dx2-fsld/alps. Although the hci display showed the line connected, there was in fact none. A restart using bin/site.montana was sucessful. After the fact, it was apparent that the Comms display indicated a problem - Delay showed a high percentage and Rate was on the order of 700k, vice the normal 2200k or so.

We have installed ORPG 10 on cx-alps and ORPG 11 on dx5-alps. We currently are using cx-alps for fsld, alps, a2dp, and RSA radar, but plan to move the first three to dx5-alps. The code and installation info are accessible from the NWS WSR-88D CODE Web site.

To get the mrpg suite going on cx-alps,

  1. sudo su - fxa
  2. cd orpg_build10
  3. source .cshrc
  4. site KFTG

On dx5-alps:

  1. sudo su - fxa
  2. cd orpg_build11
  3. source .cshrc
  4. site KFTG

To bring in data:

On cx-alps or dx5-alps:

  1. as user fxa, cd /usr/local/ldm
  2. bin/ldmadmin start

Again, you can run bin/lnux_x86/hci to see what's happening.

To configure for another radar, only a couple of changes are necessary.

  1. Make a cfg/site_info.dea.kxxx. Edit the lat/lon/elev/ID using information from nationalData/radarInfoMaster.txt. (Make sure to use only to .001 precision on the lat/lon - mrpg won't start if you go to .0001, as in radarInfoMaster.)
  2. Link that file to cfg/site_info.dea.
  3. Change /misc/linux/apps-OB6/ldm-6.0.14/etc/ldmd.conf to request the radar data of interest, then restart LDM.
  4. Restart the mrpg suite (bin/site.dylan.kxxx).

For RSA...

We run the ORPG (Build 10) software on cx-alps, under user fxa. The data feed is via LDM from a remote radar (currently KPTR) that we disguise as KVBX. [No longer true as of 5/09. We were using dx5-alps for kftg data and cx-alps for RSA, as described, but dx5-alps died and we moved kftg onto cx-alps, so RSA doesn't have an ingest at present - except that the kftg level II data are fed into the kvbx wideband stream on dx1-avs.]

To get things set up and going...

As user ldm,

As user fxa,

  1. cd /data/orpg
  2. source .cshrc
  3. site.rsa.ax-avs KMLB

Look at bin/site.rsa.ax-avs. There are two lines near the bottom, to wit:

 read_ldm -a -d $ORPG_HOME/ldm/nexradII/KMLB $ORPGDIR/ingest/resp.0 &
 #orpg_client -v $ORPGDIR/ingest/resp.0 &

In the configuration shown, we're using LDM. For direct ingest (such as at the Ranges), one would swap the commented-out line and issue command "site.rsa.ax-avs KMLB".

To enable the LDM ingest, as user fxa, make sure this line in /usr/local/ldm/etc/pqact.conf is active:

NEXRAD2   ^L2-([^/]*)/(....)/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9][0-5][0-9][0-9][0-9])     FILE    /data/orpg/ldm/nexradII/\2/\3.raw

Also, make sure that /data/orpg/ldm/nexradII/KMLB.lbz has 777 permission (and its parent 775). (fxa is the owner.)

The CRAFT data are compressed Level II files, which ITS brings in from CONDUIT at U. Maryland. The pqact line above writes them to disk, and read_ldm reads and decompresses the data, writing output to resp.0, which is where the mrpg suite looks for its data. (Note that orpg_client puts its data in the same place, as you would expect.)

A few words on RPS lists

When the radar changes VCPs, the RPS list sent is based on /data/fxa/radar/lists/KFTG.[storm|clear-air].VCPxx. This information is merged with the appropriate national list, and the resulting list is stored in/data/fxa/radar/lists/KFTG.current. A user can edit this list, or create a list from scratch, and save it in /data/fxa/rps-lists. KXXX.current is recalled and sent out whenever a "Connection Up" message is received from the radar, or whenever a GSM comes in. If the mode, as specified in the GSM, has changed, the appropriate RPS list gets sent out and is saved in KXXX.current.

The national RPS lists, those containing required products and whose contents are merged with user requests (in RadarServer, module WanRpsManager.C), are found in $FXA_NATL_CONFIG_DATA/nationalData/. There are six of these, for clear air and storm modes, and for X.25 (wfoApi) and TCP (ORPGCommsMgr - LAN) connections, and associated radars. Names are rps-RPGOP.clear-air, rps-RPGOP.storm, rps-RPGOP-tcp.clear-air, rps-RPGOP-tcp.storm, rps-assoc.clear-air, and rps-assoc.storm. The choice of which national list to use is a bit arcane. The essential source of information is portInfo.txt, which includes a max number of products value. If no LLL-portInfo.txt is supplied (most sites have one), this defaults to 65 via localization. If the max prods is greater than 50 (a value set in Radar.H, applied in RadarStatus.C, and used in WanRpsManager::getList), then the -tcp version of the national list is used. Otherwise, the standard version is used.

SBN Radar

Data for all radars are available on the SBN. Products are stored for the RPGs listed in dx2's localizationDataSets/xxx/radarsInUse.txt. There are 21 products sent on the SBN: CZ, STP, SRM 0.5, VIL, V 0.5, Z 0.5 res 1, DHR, DSP (SDUS5); DPA (SDUS8); GSM (NXUS6); SRM 1.5, SRM 2.4, Z 1.5, Z 2.4 (SDUS2); OHP, SRM 3.4, VWP, Z 3.4 (SDUS3); ET, V1.5, Z 0.5 res2 (SDUS7) [see the NWS 88D list and 88D/TDWR list [PDF] for more info]. DHR and DSP were added site by site with the installation of AWIPS OB8.2.

LDM ingest

While most datasets used by D2D are delivered by the SBN or LDAD or generated locally, NOWrad data are relayed from GSD to ls1-bou via Unidata's Local Data Manager (LDM). LDM processes include
rpc.ldmd -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ld
  (1 identically-named child)
  pqact
  pqexpire

LDM files are in /usr/local/ldm. The admin process is bin/ldmadmin.in. LDM ingest is managed with ldmadmin.in start and ldmadmin.in stop. The control file is etc/ldmd.conf. By default, no log is written; in order to do so, a line like this must be in /etc/syslog.conf:

local0.debug            /usr/local/ldm/logs/ldmd.log
Also for logging and 'ldmadmin newlog' to work, /var/run/syslogd.pid must be world readable and /usr/local/ldm/bin/hupsyslog must have the suid bit set (chmod u+s)
---s--x--x   1 root   6100    20534 Jan 16  1997 /usr/local/ldm/bin/hupsyslog*

If you can't get LDM working, check to see if syslogd is running. If it's not, as root, run /usr/sbin/syslogd -D, then try starting LDM again. Another likely possibility is a bad pattern in a request line. Unfortunately, I don't know the rules. From experience, I can tell you that "*Graphic.*" is not a winner. I've made other mistakes, but don't recall what. A less likely possibility is an open socket. With LDM shut down, enter rpcinfo -p. If you see one or more lines beginning 300029 at the bottom, type sudo rpcinfo -d 300029 5 to remove this open socket, then startLdm again.

Another possibility is that portmap isn't running. You'll know this if rpcinfo returns an error message. sudo /usr/sbin/portmap, and you'll probably be in business.

etc/pqact.conf includes this line to move the files to where they need to go:

FSL3    ^FSL\.CompressedNetCDF\.(.*)    FILE    -close  /data/Incoming/\1

Note that it is very important that the fields are tab-, not space-, delimited! This puts the files in /data/Incoming, whence LDAD will pull them over to dx1. Once there, an entry in LDADinfo.txt calls a simple script that moves the files to /data/fxa/nowrad/nowradZ.

In order for the NOWrad files to be processed from there, a line in the dx1 fxa crontab must be uncommented. This is the first one under "Denver/Boulder-specific items" in ingest.crontab.dx1, below. The second line there should be uncommented, too.

GSD LDM info

The LDMs on ls1-fslc and dx1-fsld receive NOWrad data from prism.fsl.noaa.gov. The same dx1 cron items need to be enabled as mentioned above. Also on dx1-fsld, we pull in MADIS data from eldm. On both dx1-fsld and dx1-alps, we use LDM to get MODIS imagery from NWS CRH. (These are processed on px1 via a script controlled by SITEpx1cron.)

To support this, LDM needs to run on ls1-fslc, dx1-alps, and dx1-fsld. (Note that the new LDAD boxes on fslc are called ls2 and ls3, but are accessed by logging in to ls1.)

  ls1-fslc dx1-alps dx1-fsld
Request FSL3 (NOWrad) from prism.fsl.noaa.gov and MADIS files of various sorts from eldm.fsl.noaa.gov EXP (MODIS) from ldm.crh.noaa.gov FSL3 (NOWrad) from prism.fsl.noaa.gov, MADIS files of various sorts from eldm.fsl.noaa.gov, and EXP (MODIS) from ldm.crh.noaa.gov
Send     ECMWF files to borg.fsl.noaa.gov

As outlined in the Radar section, we also run LDM on cx-alps, dx5-alps, and ax-avs to pull in local radar.

Now for a few words about Nowrad. We bring Nowrad files in from the Central Facility using LDM, as noted here, and they're processed on dx1. We had an occasion where our LDM was down for a while, but were able to recover the files. Copy files from /public/data/radar/fsl-conus/nowrad/netcdf/ to /data/fxa/nowrad/nowradZ, compress them (compress *), do a batch rename (

foreach filen (07*)
foreach? mv $filen ConusNowrad.${filen}
foreach? end
), then run the conversion script that is in SITEdx1cron:
(cd ${FXA_HOME}/xfer/nowrad; ./xferNowrad_v3.com ${FXA_HOME}/xfer/nowrad)

And by the way, all xferNowrad does is reformat the netCDF files received from /public into D2D-compatible form - it's a CDL change.

Text ingest and database

The text database system is also managed separately from the general startIngest and stopIngest. Text products are stored in PostgreSQL databases.

The database runs under user postgres. A number of processes are normally running on dx1, which you can see in a proc post listing. The main process consumes the most CPU time, with the postgres: writer using a fair amount, as well.

The main thing to do for postgres is to check the log (/var/log/postgres). Database problems should be referred to NCF for resolution.

The workstation uses three processes to communicate with the text database, to wit:

  $FXA_HOME/bin/TextDB_Server -Write
  $FXA_HOME/bin/TextDB_Server -Read
  $FXA_HOME/bin/textdb
The first two of these are started and stopped by the startTextDB.dx1 and stopTextDB.dx1 scripts. Another script, stopTextNotification, will stop the textNotificationServer (it's started, if necessary, by startTextDB.px1). It's generally left running. textdb runs as needed to read/write the database.

Since it's not safe simply to kill the write server, as it may be in the middle of a transaction and the text database could get corrupted, stopTextDB issues a KILLSERV command to the text database to let it down gracefully.

If stopTextDB/startTextDB does not clear up text storage/retrieval problems, try restarting PostgreSQL.

  1. Shut down the Read and Write servers with stopTextDB.dx1.
  2. As root, issue /etc/init.d/postgresql stop, then
  3. /etc/init.d/postgresql start.
  4. Use /etc/init.d postgresql status to check that the server is up.
  5. As fxa again, bring the Read and Write servers back up with startTextDB.

Note: When the database is down while the data ingest is running, text messages will queue up inside the TextDB DataController process. Once the database is back up and accepting messages, this queue will be processed. It may take a long time to catch up, however. (To see what's being processed, tail the CollDecoder or StdDecoder logs.) If it's necessary to empty the queue (due to excessive length), use the "CollDB, StdDB" section of startIngest.dx1 to restart the DataController - most easily done by using X to copy the lines out of ~fxa/bin/startIngest.dx1).

Text database maintenance

Periodically, it's a good idea to do some database cleanup. There are three vacuumdb runs daily, but those don't take care of all of the space cleanup. Wayne Martin recommends a monthly full cleanup, to wit:

  1. stopTextDB.dx1
  2. as user postgres, /usr/bin/vacuumdb -v --analyze --full -d fxatext
  3. as fxa, startTextDB.dx1

There are also some scripts in /home/awipsadm/scripts to check on and maintain the text database. One handy one is purge-by-time.sh, which you can use if you have some old stuff hanging around. Another is fxatext-delete.ksh to completely remove an obsolete NNN.

Hydro decoder & database

A SHEF decoder runs on ds1 as part of the hydrology package. /awips/hydroapps/shefdecode/bin/shefdecode runs under oper, and is started at boot time. If it is down, you must sudo su - oper, then /awips/hydroapps/shefdecode/bin/start_shefdecode &. Data are stored in an Informix database, separate from the text database. Other hydro cron jobs are run (under user oper) to manage the database, to wit:
  01 0,4,8,12,16,20 * * * /awips/hydroapps/whfs/standard/bin/CleanWFO
  27 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_cleanup 
  37 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_tuneup
  15 * * * * /awips/hydroapps/whfs/standard/bin/run_precip_accum
And as fxa
  3,8,13,18,23,28,33,38,43,48,53,58 * * * * csh -c
    '${FXA_HOME}/bin/moveProds.ksh /data/fxa/ispan/text/hydro/
    /awips/hydroapps/whfs/local/data/shef_input/'
Decoder logs are found in /awips/hydroapps/whfs/local/data/log/shef/decoder.

Interprocess communication

Messages are passed between processes using TCP sockets. The software runs essentially flawlessly and requires no maintenance.

Cron

Many fxa activities are managed by cron. For workstations, the currently-running cron lists are found in /var/spool/cron/crontabs/<username>. Servers are set up as heartbeat pairs - dx1/dx2, dx3/dx4, and px1/px2. Crons are managed by the heartbeat system. The crontabs are all owned by root, with all component items merged into one host-specific file, in /etc/ha.d/cron.d. When the package is activated, the appropriate crontab is placed in /etc/cron.d. Information on these files is shown below.

host cron.d file our tree file
dx1 dx1cron ingest.crontab.dx1
dx2 dx2cron ingest.crontab.dx2
dx3 dx3cron ingest.crontab.dx3
dx4 dx4cron ingest.crontab.dx4
px1 px1cron ingest.crontab.px1
px2 px2cron ingest.crontab.px2

Local crontab additions may be put in files SITE<host>cron, which are also kept in /etc/ha.d/cron.d and /etc/cron.d. For manual update, modify the file(s) in /etc/cron.d. Note that duplicate copies of both <host>cron and SITE<host>cron need to be kept on both hosts in a pair, in /etc/ha.d/cron.d so they'll be available during failover.

The fxa lists are shown here:

(Can you say "outdated"?)

ingest.crontab.ds1

# Crontab file for starting transient data ingest processes.

# This file, ingest.crontab.ds1, contains the items that run on the primary
# data server. It is to be installed as
#  ds1:/etc/cmcluster/crons/fxa/ds1.dsswap
#  ds2:/etc/cmcluster/crons/fxa/ds2.dsswap
# under root ownership.

# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.

#-------------------------------------------------------------------------------

# Break ingest log and announcer files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
0 0 * * * csh -c '${FXA_HOME}/bin/breakAnnouncementFiles >&! ${LOG_DIR}/breakAnnouncementFiles.log'

# Run scour daily to clean up log files and a few items not hit by master.purge.
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'

# Radar ingest
2,6,11,14,18,22,26,31,35,39,43,47,51,56 * * * * csh -c '${FXA_HOME}/bin/restartRadar' > /dev/null 2>&1

# Process monitor/CPU monitor start-up
36 * * * * csh -c '${FXA_HOME}/bin/DS_startProcMon.sh'
37 * * * * /awips/fxa/bin/startCtrlCpu.sh

# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'

# Data Monitor Scripts
0,10,20,30,40,50 * * * * /awips/fxa/bin/diskUsage.pl -c /awips/fxa/data/disk.cfg -o diskUsage_data.html

# Data archiving and archive purging
55 * * * * csh -c '${FXA_HOME}/bin/legalArchiver.sh'

# Get the RUC model data for the tstorm decoder (MDL)
20,40 0,3,6,9,12,15,18,21 * * * csh -c '${FXA_HOME}/bin/getModelData >& ${LOG_DIR}/getModelData.log'

# Scheduled radar distribution
25,55 * * * * csh -c '${FXA_HOME}/bin/startRadarDist.pl RCM >& /dev/null'
17,34 * * * * csh -c '${FXA_HOME}/bin/startRadarDist.pl THP >& /dev/null'

# Scheduled radar requests to the RadarServer
#23,53 * * * * csh -c '${FXA_HOME}/bin/sendOTR.sh 74 >& /dev/null'
#15 * * * * csh -c '${FXA_HOME}/bin/sendOTR.sh 79 >& /dev/null'
#35 * * * * csh -c '${FXA_HOME}/bin/waitUpTo.pl 600 >& /dev/null' ; csh -c '${FXA_HOME}/bin/sendOTR.sh 136 >& /dev/null'
#5 0,8,16 * * * csh -c '${FXA_HOME}/bin/waitUpTo.pl 600 >& /dev/null' ; csh -c '${FXA_HOME}/bin/sendOTR.sh 152 >& /dev/null'
#1 0,6,12,18 * * * csh -c '${FXA_HOME}/bin/sendOTR.cfc.sh >& /dev/null'

# send radar precipitation bias table data to ORPG via the RadarServer
#26,46 * * * * csh -c '${FXA_HOME}/bin/sendEnvData.pl'

# Watch to make sure nwrTrans.pl has not died, and restart if has
* * * * * /awips/fxa/bin/nwrWatchDog.sh > /dev/null 2>&1
ingest.crontab.dx1
# Crontab file for starting dx1apps data ingest processes for fxa.
#
# MODIFICATION HISTORY:
# ---------------------------------------------------------------------------
#    NAME           DATE       CHANGES
#    M. Huang       05/26/05   - Moved NWWSKeepAliveMsg to DX (DR_16193)
#    M. Huang       05/27/05   - Moved mhs-data.purge into DX (DR_16194)
#-----------------------------------------------------------------------------

# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.

# INGEST SCRIPTS
# ACARS profiles
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startAcarsProfiles.sh >&! ${LOG_DIR}/acarsProfiles.log'

# Scheduled radar requests to the RadarServer 
# RCM
23,53 * * * * csh -c '${FXA_HOME}/bin/sendOTR.sh 74 >& /dev/null'
# THP
15 * * * * csh -c '${FXA_HOME}/bin/sendOTR.sh 79 >& /dev/null' 
# SO
35 * * * * csh -c '${FXA_HOME}/bin/waitUpTo.pl 600 >& /dev/null' ; csh -c '${FXA_HOME}/bin/sendOTR.sh 136 >& /dev/null' 
# RSS
5 0,8,16 * * * csh -c '${FXA_HOME}/bin/waitUpTo.pl 600 >& /dev/null' ; csh -c '${FXA_HOME}/bin/sendOTR.sh 152 >& /dev/null'
# CFC
1 0,6,12,18 * * * csh -c '${FXA_HOME}/bin/sendOTR.cfc.sh >& /dev/null'
# NWWSKeepAliveMsg - test uplink status
13,28,43,58 * * * * csh -c '${FXA_HOME}/bin/NWWSKeepAliveMsg >& ${LOG_DIR}/nwwsKeepAlive.log'
 
# send radar precipitation bias table data to ORPG via the RadarServer
26,46 * * * * csh -c '${FXA_HOME}/bin/sendEnvData.pl'

# MONITOR SCRIPTS
# Process Monitor start-up
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/DX_startProcMon.sh'

# Disk usage monitor
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'

# CtrlCpu (CPU monitor) start-up
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startCtrlCpu.sh'

# Purge MHS data  once per day.
20 1 * * * csh -c '${FXA_HOME}/bin/mhs-data.purge'

# CLEAN-UP ITEMS
# Run scour daily to clean up log files and a few items not hit by other purgers
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'

# Break ingest log files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'

# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'

# Restart Grib2Decoder every week for memory leak workaround
41 0 * * 0 csh -c '${FXA_HOME}/bin/RestartGribSatDecoders.sh >& /data/logs/fxa/RestartGribSatDecoders.log'
ingest.crontab.dx2
# Crontab file for starting dx2apps data ingest processes for fxa.

# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.

# Break ingest log files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'

# PURGER/SCOUR...
# Run scour daily to clean up log files and a few items not hit by master.purge.
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'

# MONITOR SCRIPTS
# Process Monitor start-up script
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/DX_startProcMon.sh'

# CtrlCpu Monitor start-up script (CPU monitor)
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startCtrlCpu.sh'

# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'
ingest.crontab.px1
# Crontab file for starting px1apps data ingest processes for fxa.

# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.

# Break ingest log and announcer files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'

# PURGER/SCOUR...
# Keep purgeProcess running
*/10 * * * * csh -c '${FXA_HOME}/bin/startPurgeProcess >& /dev/null'

# Run the master purger twice hourly, to pare data back to necessary levels.
15,45 * * * * csh -c '${FXA_HOME}/bin/master.purge >&! ${LOG_DIR}/master.purge.log'
# Run the radar purger every hour
#30 * * * * csh -c '${FXA_HOME}/bin/fxa-radar.purge >&! ${LOG_DIR}/fxa-radar.purge'

# Run scour daily to clean up log files and a few items not hit by master.purge.
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'

# MONITOR SCRIPTS
# Process monitor/summary monitor/LDAD monitor/CPU monitor start-up
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/PX_startProcMon.sh >&! ${LOG_DIR}/procmon.log'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startLdadMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startCtrlCpu.sh'

# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'

# Data Monitor scripts
4,14,24,34,44,54 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/grid.cfg  -o ${FXA_HOME}/data/grid_data.html -h "Grid Data"'
5,15,25,35,45,55 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/graphic.cfg  -o ${FXA_HOME}/data/graphic_data.html -h "Redbook Graphics Products"'
6,16,26,36,46,56 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/radar.cfg -o ${FXA_HOME}/data/radar_data.html -h "Radar Data"'
7,17,27,37,47,57 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/point.cfg -o ${FXA_HOME}/data/point_data.html -h "Point Data"'
8,18,28,38,48,58 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/sat.cfg -o ${FXA_HOME}/data/sat_data.html -h "Satellite Data"'
9,19,29,39,49,59 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/local.cfg -o ${FXA_HOME}/data/local_data.html -h "Local Data"'

# Disk Usage Monitor
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'

# Data Monitor summary page
3,13,23,33,43,53 * * * * csh -c '${FXA_HOME}/bin/monitorSummary.pl'

# Climate (MDL)
# morning climate
25 12 * * *  csh -c '/awips/adapt/climate/bin/Linux/climate.sh auto am>& /dev/null'
# evening climate
25 22 * * *  csh -c '/awips/adapt/climate/bin/Linux/climate.sh auto pm>& /dev/null'

# Run Mtr_scd_dvr at hh:07. Controls the MTR decoder feeding Climate (MDL).
7 * * * * csh -c '/awips/adapt/verification/bin/Linux/launch_AEV.csh Mtr_scd_dvr >&! ${LOG_DIR}/Mtr_scd_drv.log'

# Purge the MTR decoder tables about once a week (MDL).
40 3 1,8,15,22 * * csh -c '/awips/adapt/verification/bin/Linux/clean_FSS_tables.sh >&! /dev/null'

# HWR crons (MDL)
10 * * * * csh -c '/awips/adapt/hwr/bin/hwrnwr -t >&! ${LOG_DIR}/hwrnwr.log'
10 * * * * csh -c '/awips/adapt/hwr/bin/hwrnwws -t >&! ${LOG_DIR}/hwrnwws.log'

# MSAS - The MAPS/RUC Surface Assimilation System #
# ----------------------------------------------- #
#  In PVCS at ldad/src/MSAS/WFOA_scripts/WFOA_MSAS_cron_file

# Ingest the NCEP surface grids every 6 hours 
# Programs    =  sfcnmc 
# Valid Times =  00Z 06Z 12Z 18Z
# Runtime Z   =  05:37, 11:37, 17:37, 23:37
37 5,11,17,23 * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_Sfcnmc.run  >&! ${FXA_HOME}/ldad/MSAS/logs/sfcnmclog &' > /dev/null 2>&1

# Run the surface cycle every hour at 18 minutes after the hour.
# Programs    =  sfcing  sfchqc  sfcanl  sfcncdf  sfcver  srcplot
18 * * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_Surface.run  >&! ${FXA_HOME}/ldad/MSAS/logs/sfclog &' > /dev/null 2>&1

# Compile the surface QC stats at the end of the day
# Programs    =  asos
# Valid Times =  00Z
# Runtime Z   =  23:53
53 23 * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_Asos.run  >&! ${FXA_HOME}/ldad/MSAS/logs/asoslog &' > /dev/null 2>&1

# QCMS processing
#################
# Run the stage 1 & 2 QC on current hour's data 
3,8,13,18,23,28,33,38,43,48,53,58 * * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_QCstage1_2.run >&! ${FXA_HOME}/ldad/MSAS/logs/qcstg1_2log &' > /dev/null 2>&1

# Run the stage 1 & 2 QC on previous hour's data
3,8,13,18,23,28,33,38,43,48,53,58 * * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_QCstage1_2_late.run >&! ${FXA_HOME}/ldad/MSAS/logs/qclatelog &' > /dev/null 2>&1

# Get yesterday's QC stage 1, 2 & 3 daily summaries
35 0 * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_QCday.run  >&! ${FXA_HOME}/ldad/MSAS/logs/qcdaylog &' > /dev/null 2>&1

# LAPS #
# ---- #
20 * * * * /usr/local/perl/bin/perl /awips/laps/etc/sched.pl /awips/laps /data/fxa/laps
03,19,34,49 * * * * /usr/local/perl/bin/perl /awips/laps/etc/LapsRadar.pl /awips/laps /data/fxa/laps
08,14,23,29,38,45,53,59 * * * * /usr/local/perl/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /data/fxa/laps
ingest.crontab.px2
# Crontab file for starting px2apps data ingest processes for fxa.

# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.

# Run scour daily to clean up log files and other leftovers
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'

# MONITOR SCRIPTS
# CPU monitor start-up
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startCtrlCpu.sh'

# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'

# Disk Usage Monitor Script
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'

Data purging

The main purger is purgeProcess, running on px1. See the localization write-up (OB6 version).

Legacy purgers are still run by cron, as noted in the previous section. The first, master.purge, runs twice an hour on px1. It in turn runs ~fxa/bin/fxa-data.purge, plus an optional, site-supplied, ~fxa/bin/fxa-data-addons.purge. The second, startScour, runs daily at 0030Z on each server. It starts ~fxa/bin/scour, which reads ~fxa/data/scour.conf.[ds|dx|px] for the list of directories to clear out. Finally, mhs-data.purge runs daily to clean out files in $MHS_DATA and /data/x400 areas. Logs for these processes are $LOG_DIR/master.purge.log and $LOG_DIR/startScour.log (no log for the MHS purger). Each is overwritten each run.

Data and process monitoring

The data monitor comprises a series of perl scripts that run via cron on px1. These scripts build HTML pages that are then copied to $SERVER_DIRECTORY/dataMon/html/, where SERVER_DIRECTORY is defined in ~fxa/data/dataMon.cfg. (The files are also retrieved by an http process on www-sdd (quasar), for use in the summary monitor.) Cron entries are as shown above.

The ingest process monitor is started via cron on each server, also as shown above. The XX_startProcMon.sh script starts ~fxa/bin/ingProcMon.pl, which checks processes in /data/fxa/data/fxa_monitor/monitorProcesses.txt, and builds an HTML file (XXX_ingestProcMon.html, in the same directory) showing what's up and down. These are copied to $FXA_WWW_SERVER_HOST:$SERVER_DIRECTORY/dataMon/, where SERVER_DIRECTORY is defined in ~fxa/data/dataMon.cfg. (On quasar, the document root is defined in /opt/apache/httpd.conf as /opt/apache/lib/htdocs; this is SERVER_DIRECTORY.)

The restart mechanism

Included at the bottom of the process monitor Web page is a link to bring up a restart menu, pointing to /awips/fxa/htdocs/cgi-bin/restart-setup.sh. (Note that the link just says /cgi-bin..., which one would think points to /awips/fxa/htdocs/dataMon/cgi-bin, since .../dataMon is the document root. However, an alias for /cgi-bin is set in the Web server configuration to point to ~fxa/htdocs/cgi-bin.) This runs ~fxa/bin/restart-ingest.sh on as1, which in turn runs ~fxa/bin/restart-ingest-display.tcl. That finally runs ~fxa/bin/restart-ingest.tcl, which puts up a menu and takes action based on the user's selection. Except for radar, this tcl script runs restartIngest.pl, using information from $FXA_DATA/data/fxa-monitor/monitorProcesses.txt to decide what to do. Specific radar actions depend on the contents of ~fxa/data/localizationDataSets/<siteID>/portInfo.txt on as1, but /awips/fxa/bin/stopRadarProc.pl and icpReset[01] are used. A write-up of the process is in the header block of ~fxa/bin/restart-ingest.tcl.

Text workstation

Procedures are stored in $FXA_DATA/scripts/<username>. Each procedure is in a file, and consists of a list of commands. The usernames are found in ~fxa/data/fxa-users.

Each text Xterm is hosted by its associated workstation. Text `stuff' is stored in $FXA_DATA/textWSwork/xtn-bou:0. Subdirectories include saved (copies of all products that have been created on this station), and journals (in-progress editing, saved for crash recovery), and archived (permanent copies of products sent out over the WAN. Also here is textAlarmAlertProducts.txt, the list of alarm/alert products specific to this workstation. (Site-wide products are in ~fxa/data/textAlarmAlertProducts.txt.)

Log files are in $LOG_DIR/display/xtn-bou:0/yymmdd/textWish<pid>. Logs exist for the text windows, but not the parent textWS.tcl process.

If an Xterm gets mis-configured, the title window will come up, but the individual text windows will not. (You'll get a tcl error when you try to start one.) Press F12 on the keyboard for a second or two, then select Server. Press the Access Control button (middle button in second panel) `on' and click OK (upper right). Answer OK in the dialog box, wait for the reset, log in, and you should be ready to roll.

Local LAPS processing

LAPS (analysis) runs on as2, hourly by cron. In Build 4.3, LAPS is moved onto the new fxa_local partition (will be a separate disk in 5.0). For now, this link is critical to successful LAPS runs:

lrwxr-xr-x   1 fxa  fxalpha  /data/fxa/laps@ -> /data/fxa_local/laps

As noted earlier, four LAPS processes run via cron:

  20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps/ /awips/laps/data
  03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data
  08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data
  22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data
  1. The sched.pl script runs the analysis starting at 20 past the hour. This script runs processes that ingest various datasets, run the analysis, purge analysis and intermediate files, and write the results into /data/fxa/Grid/FSL/netCDF/LAPS_Grid/LAPS. Part of this process is a set of grid notifications, logged in $LOG_DIR//GridNotify*, at about 22 past the hour.
    Logfiles for the individual processes (named *.log.<hhmm>) are written to $LOG_DIR//laps. Analyses and intermediate ingest files are written to /awips/laps/data/lapsprd/*, in which the `*' refers to the appropriate product subdirectory. (This is directed via soft link to /data/fxa/lapsprd.)
  2. The second entry runs the radar (NOWrad) ingest process for LAPS.
  3. The third crontab entry activates the satellite ingest process (called lvd) 8 times an hour (to accommodate rapid scan operations ­ in many cases, it does nothing, requiring appropriate satellite files). This puts GOES data on the LAPS grid, creating files in .../lapsprd/lvd/. Similar to the other processes, logs are written to log/lvd.log.hhmm and log/lvd.err.hhmm.
  4. The final entry ingests satellite sounder data. (None is available on AWIPS, so this is essentially a no-op.)

The entire LAPS ingest/analysis generally completes in approximately 5 minutes. Run times longer than 15 minutes or shorter than 2 minutes may indicate a problem. Run completion times are logged in runtime.log.

LAPS localization is effected by

cd /awips/laps/etc
perl laps_localization > local.out

More information about LAPS run-time details is available in the LAPS README file, http://laps.noaa.gov/software/README.html.

LDAD processes

LDAD runs partly on ds1 ("internal") and partly on ls1 ("external"). The internal part includes these processes:

The listener process gets data through the firewall, storing files in $FXA_DATA/LDAD/Raw. There is a listener log in /data/logs/ldad, but it's not at all easy to read. (I did on one occasion find a permissions problem writing the raw data by looking at the listener log.) You'll also see there a LDADdecoder.log file, which is the log of the current decoder. The watchDogInternal script checks every 30 seconds to see if the listener and decoder are running.  Decoder logs are also written to the usual spot along with other ingest logs. Those files include PID in the name, so there are lots of 'em. (The LDADdecoder.log file includes time stamps on the messages, but those in $LOG_DIR/<date> do not.)

Sometimes, both decoder and listener are up, but no data are coming through. This suggests a problem on the external side. You can restart the whole LDAD system:

This procedure starts both internal and external processes, and may shake things loose.

The LDAD monitors run on as1 (summary and internal) and ls1 (external, acquisition, and dissemination). If one of the pages is more than 5 minutes old (time is in one of the config files), it won't show it. Also, we've seen problems where the obj.conf file on ls1 or as1 had the wrong data root. It's in /etc/opt/ns-fasttrack/httpd-default.

For the record, here are the steps needed to set passwords for LDAD admin access to the fsli Web server. I imagine that a quite similar procedure is used for others.

  1. go to URL as1-fsli.fsl.noaa.gov:17482 (this is the server administrator)
  2. login as root (passwd is the same as on firewalls)
  3. click on Default
  4. click on Access Control tab
  5. click on List Users and make edits as necessary.

Some other stuff

Data sources and storage

Data are stored on a NAS device, on a volume known to the data ingest software as $FXA_DATA.

Use df (bdf on ds1) to check on disk space.

Click here for data storage information.

Mike Graf wrote a tutorial on model grid WMO headers back in early 1999. Although the file is no longer available on the Web, much of the information can be found at http://www.nws.noaa.gov/tg/awips.html#ccc. There's also a nice summary of NCEP grid information available.


This page is maintained by Joe Wakefield.
Last modified: 11 Sep 09