This document is directed at GSD and WFO staff who may be called upon to diagnose problems with the WFO-Advanced data ingest, internal communications, and display software at WFO Denver/Boulder. The operational staff at the WFO has higher-level monitoring and restart tools available that are not described here.
Primary support is from the NCF (301-713-9344). GSD staff are also "on call" (informally) to help with problems. WFO staff are authorized to call Joe Wakefield, Darien Davis, or Carl Bullock at home for help, if NCF can't. They also have Gregg Phillips' cell phone number.
Note for GSD folks: to get hold of WFO staff, use 303-494-4454 (this is the national coordination number, the one other WFOs use to call in). The Admin number there is 303-494-3210, and the external coordination number (media, etc.) is 303-494-4479. GSD operators are on duty 24x7, and can be reached at 303-497-6887/303-230-3454 (pager)/opers.its.gsd@noaa.gov.
The Boulder computers carry the suffix -bou, which is the official ID of the Denver/Boulder WFO. The complement of servers includes ds1-bou (database, radar ingest), dx1-bou (primary ingest), dx2-bou, px1-bou (SCAN/FFMP, notification, purging), and px2-bou. (Name aliases allow these to be referred to locally as ds1, dx1, dx2, px1, and px2.)
When you log into one of the -bou subnet machines as fxa or an individual user, several scripts are automatically run to set a number of environment variables, etc. The .cshrc script sets this process in motion. Settings of interest include FXA_DATA (/data/fxa), LOG_DIR (/data/logs/fxa), FXA_HOME (/awips/fxa), and TZ (GMT).
All D2D processes are found in ~fxa/bin, and data files (tables, menus, WarnGen templates, etc.) are in ~fxa/data. Most ingest logs are in $LOG_DIR/yyyymmdd (type "logs" to get there), with a few in $LOG_DIR; these are on local disks.
The log file for a user interface process is $LOG_DIR/display/<displayName>/<date>/fxaWish<pid> where <displayName> is :0.0 for the center display, :0.1 for the left display, and :0.2 for the right displayn; <date> is the UTC date when the user interface process was started in YYYYMMDD format; and <pid> is the process ID.
The log files for the IGC processes, the application manager, the applications, the extensions, and all descendents of the user interface process are in the directory $LOG_DIR/display/<displayName>/<date>/fxaWish<pid>.children and have the format <programName><pid> where <programName> is the name of the executable.
Since FXA_HOME/bin is in fxa's PATH, there's no need to include that when entering process commands, and that's reflected in the commands included in these instructions. All commands that you'll need to enter are shown in bold type. Except as noted, all will be run from the fxa account.
You can get to today's ingest log directory simply by typing logs, and up will get you to its parent, where some logs live. The naming convention for ingest logs is <processName><pid><hostname><hhmmss>.
The diagram below outlines the flow of messages through the WFO-A ingest system. (It's way out of date, obviously - heck, by now, it's a museum piece - but still instructive.) Following sections discuss each interface in detail.

All ingest processes are started automatically at boot time. dx1/dx2, dx3/dx4, and px1/px2 are heartbeat pairs that are monitored and started with the hb_ software. As root, use hb_stat to see if (and where) the packages are running, and hb_swap to start one up (e.g., hb_swap px1apps px1-bou. Michael Vrencur recommends running hb_swap on the node from which the package is being swapped.
A bigger hammer is service heartbeat restart. If, for
example, you run this on dx3, expect that dx3apps will swap over to dx4.
Then you can jump on the latter and swap the package back to dx3.
As long as the package is up, you can run the start/stop scripts by hand. Should it be necessary to restart, use stopIngest and startIngest.[dx1|dx2|dx3|dx4|px1|px2].
Processes included in stop/startIngest for OB8.1, in the order they are started: (Note that the scripts use $FXA_HOME, which resolves to /awips/fxa. What's shown here is the text that appears in a ps listing.)
For ds1: [if $mhs_host=ds] /awips/fxa/bin/MhsServer /awips/fxa/bin/MhsRequestServer For dx1: /awips/fxa/bin/DataController COMMS_ROUTER PDCservcontrol.co /awips/fxa/bin/PDCserver [next two if $mhs_host=dx1f] /awips/fxa/bin/MhsServer /awips/fxa/bin/MhsRequestServer /awips/fxa/bin/textNotificationServer /awips/fxa/bin/NWWSProduct /usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c DX1 For dx2: /awips/fxa/bin/RadarServer /awips/fxa/bin/DialServer /awips/fxa/bin/RMR_Server /awips/fxa/bin/DataController COMMS_ROUTER TextDB2_Controller.config /awips/fxa/bin/RadarTextDecoder /awips/fxa/bin/RadarMsgHandler /awips/fxa/bin/DataController COMMS_ROUTER RadarController.config /awips/fxa/bin/RadarStorage /awips/fxa/bin/HandleGenericMsg /awips/fxa/bin/ORPGCommsMgr KFTG /usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c DX2 For dx3: /awips/fxa/bin/acqserver 21600 /awips/fxa/bin/acqserver 8 21600 <13 more of these> /awips/fxa/bin/CommsRouter COMMS_ROUTER /awips/fxa/bin/CommsRouter GRID_ROUTER /awips/fxa/bin/DataController GRID_ROUTER GribController.con /awips/fxa/bin/GribDecoder /awips/fxa/bin/Grib2Decoder /awips/fxa/bin/DataController GRID_ROUTER GribImgController. /awips/fxa/bin/GribImgDecoder /awips/fxa/bin/DataController COMMS_ROUTER SatelliteControll /awips/fxa/bin/Satdecoder /awips/fxa/bin/DataController COMMS_ROUTER TextCont2.config /awips/fxa/bin/RaobBufrDecoder /awips/fxa/bin/AircraftDecoder /awips/fxa/bin/MaritimeDecoder /awips/fxa/bin/profilerDecoder /awips/fxa/bin/RedbookStorage /awips/fxa/bin/DataController COMMS_ROUTER TextDB_Controller /awips/fxa/bin/CollDB_Decoder /awips/fxa/bin/StdDB_Decoder /awips/fxa/bin/DataController COMMS_ROUTER WarnDB_Controller /awips/fxa/bin/WarnDBDecoder /awips/fxa/bin/DataController COMMS_ROUTER TextCont.config /awips/fxa/bin/MetarDecoder /awips/fxa/bin/DataController COMMS_ROUTER BufrDriverContr.c /awips/fxa/bin/BufrDriver model /awips/fxa/bin/BufrDriver goes /awips/fxa/bin/BufrDriver acars /awips/fxa/bin/BufrDriver poes,quikscat /awips/fxa/bin/BufrDriver hdw /awips/fxa/bin/DataController COMMS_ROUTER GFSdriverContr.c /awips/fxa/bin/gfsDriver /awips/fxa/bin/DataController COMMS_ROUTER SSMIdriverContr.config /awips/fxa/bin/SSMIdriver /awips/fxa/bin/DataController COMMS_ROUTER BufrMOScontr.config /awips/fxa/bin/BufrMosDecoder /awips/fxa/bin/DataController COMMS_ROUTER TextCont3.config /awips/fxa/bin/binLightningDecoder /awips/fxa/bin/DataController COMMS_ROUTER TextCont4.config /awips/fxa/bin/SynopticDecoder /usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c DX3 For dx4: /awips/fxa/bin/notifyTextProd COMMS_ROUTER /awips/GFESuite/primary... /awips/fxa/bin/notifyTextProd COMMS_ROUTER /awips/GFESuite/svcbu... /usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c DX4 For px1: /awips/fxa/bin/DataController COMMS_ROUTER SCANcontroller.config /awips/fxa/bin/SCANprocessor /awips/fxa/bin/DataController COMMS_ROUTER FFMPcontroller.config /awips/fxa/bin/FFMPprocessor /awips/fxa/bin/DataController COMMS_ROUTER SRUcontroller.config /awips/fxa/bin/SRUprocessor /awips/fxa/bin/DataController COMMS_ROUTER FMcontroller.config /awips/fxa/bin/FMprocessor /awips/fxa/bin/DataController COMMS_ROUTER SNOWcontroller.config /awips/fxa/bin/SNOWprocessor [at marine WFOs] /awips/fxa/bin/DataController COMMS_ROUTER SScontroller.config /awips/fxa/bin/SSprocessor /awips/fxa/bin/asyncScheduler /awips/fxa/bin/hmMonitorServer /awips/fxa/bin/NWWSSchedule /usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c PX1 For px2: /awips/fxa/bin/ldadServer /usr/bin/perl /awips/fxa/bin/ingProcMon.pl -c PX2And processes in start/stopTextDB:
For dx1: /awips/fxa/bin/TextDB_Server -Read /awips/fxa/bin/TextDB_Server -Write For px1: /awips/fxa/bin/textNotificationServer
The stop/start scripts handle the non-indented items in the list. Indented items are children spawned by the process listed immediately above.
Other persistent items started by cron:
on ds1: /awips/fxa/bin/ingProcMon.pl -c DS /awips/fxa/bin/ctrlCpu on dx1: /awips/fxa/bin/purgeProcess /awips/fxa/bin/ingProcMon.pl -c DX1 /awips/fxa/bin/ctrlCpu on dx2: /awips/fxa/bin/ingProcMon.pl -c DX2 /awips/fxa/bin/ctrlCpu on dx3: /awips/fxa/bin/ingProcMon.pl -c DX3 /awips/fxa/bin/ctrlCpu on dx4: /awips/fxa/bin/ingProcMon.pl -c DX4 /awips/fxa/bin/ctrlCpu on px1: /awips/fxa/bin/ingProcMon.pl -c PX1 /awips/fxa/bin/processSummary.pl /awips/fxa/htdocs/ldadMon/bin/MakeSUMMpage /awips/fxa/htdocs/ldadMon/bin/MakePROCpage /awips/fxa/bin/ctrlCpu on px2: /awips/fxa/bin/ctrlCpu
Also started separately:
on dx1: /awips/fxa/bin/notificationServer
Rarely, the GRIB decoder will hang on bad grids (can't remember the last time it happened). You'll see this by the GribDecoder process using lots of CPU time for extended periods, and a check on the log will show nothing happening. Issue kill -10 <pid> to force a crash. The signal handler will remove the bad grid and the controller will start a new decoder.
If radar is not auto-updating, you'll probably need to restart the notificationServer. When you use stopNotificationServer to kill the server, it may take some time to update its client list, which is found in $FXA_DATA/workFiles/notificationServerClientListState.txt. It will do a kill -9 after 20 seconds, if necessary. Use startNotificationServer to get it running again. The textNotificationServer has a similar feature; its client list is in $FXA_DATA/workFiles/textNotificationServerClientList.txt.
The bulk of our datasets are received over the Satellite Broadcast Network (SBN) via the SBN communications processors. Please note that cpsbn1 and cpsnb2 are monitored by the AWIPS Network Control Facility (NCF), which is also responsible for their maintenance. There is a switch box near cpsbn1 that must usually be in `Modem' position, so NCF operators can check on its operation.
If SBN data (satellite, METARs, text, grids) are not arriving,
check the CP operation, to see if it's hung. ssh cpsbn1 for
TG data, or cpsbn2 for NESDIS, as user root. (Note: if you need to log
in at the console, you'll need to move the CP switch to the Monitor
position.) Type inmon to run the ingest monitor and outmon
to run the dissemination monitor. In the former, all lines in the lower
section will show, e.g., cpsbn1-bou, since the data are coming from the
CP and being stored locally on disk. In the latter, you'll see
connections to dx3f-bou on each CP.
(If both CPs have stopped at the same time, it's likely that
there's an uplink problem at the NCF, or there could be a downlink
problem. Check with NCF (301-713-9344) before restarting.) Other
problem indicators are lots of buffers or distribution headers in use.
If times are not up to date in the xfr column, you can restart using
stop_cpsbn_all and start_cpsbn_all. A lot of
text will scroll by as the software starts up.
Monitor the system again with inmon and outmon; you
should see the TG line connect within a few seconds, though the NESDIS
line may take several minutes. Log out
(exit) (and switch back to Modem if at the console). Child
acqservers will go down when you stop the CP, then will come back as
data are sent.
If this doesn't work, have the forecasters check the Sync and Signal green lights on the demod. If these are out, have them contact the NCF for information. (This is unlikely, as the NCF monitors that portion of the system.) If the signal looks good, but you can't connect, you may need to reboot the CP. Log in and enter /etc/reboot. Ingest processes start automatically. (If you can't log in, you can press the reset button that's just above the CP's power switch at lower right. The system will reboot itself. Using the reboot command is preferred.)
If necessary, either CP can be configured to send both data streams
to the server. Call the NCF and tell them which of the CPs has failed.
They will perform the failover. [The split of data is set using
config_dvb. Issue this without argument to see which channels are enabled
on each CP. To enable a channel, use, e.g., config_dvb -a -c
GOES; to disable one use config_dvb -r -c NMC3. Then
restart the CP ingest.]
Data: As data are received, they are first written to circular buffers on the CP disks, in /data/co/<DDIR>/d<FILE>/[nn]/<FILE>.nnnn, where DDIR and FILE are from STORE lines of /awips/data/acq_send_parms.sbn, nn is 00..mm, and nnnn is 0000..9999 (or 000..999 or less, in accordance with the MAX_PER_DIR and MAX_FILE settings in the STORE lines). The number of directories used, mm, is determined by dividing MAX_FILE by MAX_PER_DIR. If all of the files will fit in one directory, the nn part of the directory path is not needed.
Next, the files are copied to distribution staging areas in /data/co/<DDIR>/l<FILE>_g<GROUP>_h<HOST>/nn, where DDIR, FILE, and nn are as above, and GROUP and HOST are from SEND lines of acq_send_parms.sbn. Example:
SEND[13]="STORE_ID=13 LINK_ID=2 LABEL=nmc2_misc GROUP=2 HOST=1"
STORE[13]="ID=13 WMO=* PROD_TYP=NWSTG SBN_CHAN=NMC2 \
DDIR=NMC2 FILE=nmc2_misc MAX_FILE=100000 MAX_PER_DIR=10000"
entries result in datasets being staged in
/data/co/NMC2/lnmc2_misc_g2_h1/nn, where nn runs 00..09. From here,
an acq_send process sends the data to a corresponding acqServer on the
receiving host, and removes the files from the l... directories. The CP
will queue files if the receiver is not up (up to the limits specified in
the STORE line) and refresh them after a connection is established.
Logs: The CPs write their logs locally to
/awips/logs/,
/data/co/logs/Products/cpsbni-bou/sbn_procm/mcProduct.log,
.../acq_clntm_hn/mcProduct.log, and
.../acq_sendm.ho.gp_m5/mcProduct.log, where m
is the link number that you see in an acq_stats display (defined in
acq_send_parms.sbn), n is a subchannel number that's seen in the
inmon acq_stats display, o is the host number (defined in
acq_send_parms.sbn), and p is the group ID, again from
acq_send_parms.sbn. The system breaks these logs when they hit 1MB size,
putting older logs in ARCHIVE/mmmdd/mcProduct* files; these can be useful
in diagnosing missing data. If a cp gets overloaded, it's logged in the
mcProduct logs, tagged STATUS LOST n products. We saw
this periodically on cpsbn1-fsld when it was serving four hosts.
An ingest note: Twice on cpsbn1-fsld, we've seen IUAX02
(MDCRS) files fail to get to our dxs. Investigation showed numerous short
files in /data/co/NMC/lbufr_g5_h0/00/tmp0h0g5m5 (e.g.) with content
Key file not accessible! It seems that this occurred after
reboot, and the fix (thanks to Maureen Tankersley) is to make a link
/.key -> /root/.key. Apparently, upon reboot,
$HOME is not defined as /root, so the system looks in / for this file.
(I suppose with the link in place, the problem won't happen any more,
but, just in case...) And oh, by the way, /usr/local/bin/cruft is the
actual decrpytion program. Make sure that's in place - can copy from
another CP if necessary.
Use of WMO headers: In acq_wmo_parms.sbn, we exclude certain datasets (chiefly, AK/HI/PR grids and satellite images). The "codes" used with those are not at all obvious. Here is a list provided 3/00 by Leroy Klet.
/* Table of T1 letters to codes */
T1=A PC=13 ASCII Analysis
T1=B PC=19 ASCII Admin Msg
T1=C PC=14 ASCII Climatic
T1=D PC=44 GRID
T1=E PC=51 Satellite Imagery
T1=F PC=15 ASCII Forecast
T1=G PC=45 GRID
T1=H PC=46 GRID
T1=I PC=31 BUFR Obs
T1=J PC=32 BUFR Forecast
T1=K PC=71 Unused
T1=L PC=72 Unused
T1=M PC=73 Unused
T1=N PC=16 ASCII Notices
T1=O PC=43 Grid
T1=P PC=10 Graphic
T1=Q PC=11 Graphic
T1=R PC=74 Unused
T1=S PC=17 ASCII Surface
T1=T PC=52 Satellite Imagery (same as GOES)
T1=U PC=61 Upper Air
T1=V PC=62 National Data
T1=W PC=18 ASCII Warnings
T1=X PC=47 GRID
T1=Y PC=41 GRID
T1=Z PC=42 GRID
This definition is in the AWIPS baseline (NCF Comms workset) under
.../src/co/include/cp_product_code.h
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
More information on satellite sectors, including mapping and geographic coverage, is available on the NOAAPORT User's Page and in the AWIPS-NESDIS ICD (PDF).
Radar products come from the ORPG box, via the ORPGCommsMgr process. (For fsld and alps, the data flow is wideband raw data via LDM from GSD's Central Facility to cx-alps, where products are generated then sent to dx2. fslc connects to BOU's ORPG over the WAN. Note: After ORPG 9 was installed at BOU, our connection caused them problems. Evidently, ORPG 9 cares that there are two connections from the same PUP. (Since we run fslc with a BOU localization, it was using the BOU setting.) We're now using the FSL number in a customFiles version of pupId.txt, and all is copasetic.)
Files are stored temporarily in $FXA_DATA/radar/raw and /text. Files in /raw are moved by RadarStorage to the appropriate product directories, e.g., /kftg/Z/ or V/. The text/ files are processed by the RadarTextDecoder process; output goes to the text database (e.g., WSRVWPFTG). We have had problems occasionally on fsld/alps where each local product caused a 3s timeout while trying to connect to MHS. We set MHS_SERVER to localhost in ipc.config to avoid that.
A comms status file is maintained in /data/fxa/workFiles/wfoApi.StateInfo (so named for historical reasons). Every time a connection is received from an ORPGCommsMgr process, information about the radar and the process is recorded. The entries include radar ID and name, max number of products, ORPGCommsMgr 'target string,' current VCP, operations mode, scan interval, connection state (1=connected), and 'firstGsm' (1 when started, then changes to 0 after the first GSM product is received). (This information can be gleaned from ipc/radar/RadarServerClient.C.)
The ORPGCommsMgr not only receives data from the ORPG, but is also responsible for sending data out on the WAN. Whenever ORPGCommsMgr is stopped (or if the line goes down), the connection state tag is changed from 1 to 0 in wfoApi.StateInfo. As long as this is set to 1, then RadarStorage will not store SBN-received products for the radar in question. (Though I've not seen it, I'm told that wfoApi.StateInfo can have multiple lines.)
On dx2, a cron job runs $FXA_HOME/bin/restartRadar about every four minutes, checking whether the ingest (ORPGCommsMgr) is up and starting it if necessary. Data come from host rpg-kftg. (restartRadar gets the port number out of ~fxa/data/orgpDedicated.txt, then executes ORPGCommsMgr.)
Radar ingest processes also include the RadarServer and the DataController/RadarStorage pair. The former communicates via the ORPGCommsMgr process with the ORPG, while the latter are responsible for storing radar products as they are received.
General Status Messages (GSMs) can be used to check on the status of the 88D. This can be checked from the workstation `radar status' window or the Unit Status Message graphic, the last entry in the top section of kftg>Graphics> menu. You can also tail -30 $FXA_DATA/workFiles/RADAR_Announcer to see what's what.
Please note that the RadarServer process must be running in order to send the RPS list and get data. The radar ingest (ORPGCommsMgr) will start but will not stay up if the RadarServer is down. RadarServer is started as part of startIngest.
If no data to fsld/alps, check the ingest on cx-alps. Log on and become user fxa.
First, check to see if LDM is running: proc ldm. You
should see a couple of rpc.ldmd processes, a pqact, and a read_ldm.
Check to see if data are coming by using /usr/local/ldm/bin/ldmadmin
watch. You should see CRAFT files coming in every few seconds.
Check /scratch/data/ldm/nexradII/KFTG; read_ldm should be updating a
yyyymmddhhmmss.raw file, again every few seconds.
If LDM is not running, start it with /usr/local/ldm/bin/ldmadmin
start. To get radar data, ldmd.conf includes a line like
request CRAFT
^L2.*KFTG
137.75.129.113
NEXRAD2 works in place of CRAFT. For multiple radars, use (KFTG|KPUX)
form.
As for ORPG, proc fxa should show a host of processes
with names like rpgdbm -v and swp -v.
If you see Connection refused messages in the ORPGCommsMgr log,
you'll need to restart the mrpg software on cx-alps. As user fxa,
cd ~/orpg_build9 and source .cshrc. Type
site KFTG to restart the mrpg suite. You can run HCI
(see below) to watch what's happening.
Among other things, 'site' stops and starts the mrpg suite. If you just want to stop it, use:
mrpg shutdown
mrpg cleanup
From the ORPG home directory on cx-alps, cd cfg.
comms_link.conf includes settings for 1..6 TCP links, and tcp.conf
relates these to ports 4489..4494. We arbitrarily have decided to use
4490 (2) on fslc, 4492 (4) on fsld, and 4493 (5) on alps. The line
numbers and connections can be checked in the HCI Comms display. (See
below.)
Access to data from a receiving host (dx2-xxx) is configured in ~fxa/data/orpgDedicated.txt. Using the TCP port and link numbers noted above, enter the IP address of the host (cx-alps or whatever), and the appropriate radar name and ID, which can be found in nationalData/radarInfoMaster.txt.
To check on the connections from cx-alps, use netstat -a|grep
44 to see the status of these ports. If ORPGCommsMgr is not
running, you should see a LISTEN entry for the port. If it is running,
you'll see two ESTABLISHED entries. A WAIT notation is not a good
sign. Experience suggests you need to stop the ORPGCommsMgr and wait
for the port to disappear (back to just LISTEN), then restart.
cfg/product_generation_tables has the list of products and parameters. You'll need to restart mrpg after this table has been edited. cfg/site_info.dea also needs to be set up for the radar to be ingested.
Logs are in $ORPGDIR/logs, but they must be read with lelb_mon
<name> (leave off the .log).
As noted above, data are delivered via LDM, which runs under user
fxa. This includes the standard set of LDM processes. pqact directs the
data files to /scratch/data/ldm/nexradII/<radar>/, where they are
picked up by the read_ldm process started by the site script. To start
LDM, cd /usr/local/ldm/bin, then
ldmadmin start
We used to use a n2bz/ldml2server/ldml2client process set to handle the LDM data. These processes were written by Warren Blanchard, who has retired. Although the suite still works, we have moved to the ROC method, which uses read_ldm (incorporated in the 'site' scripts). To support this, /usr/local/ldm/etc/pqact.conf is simple, including
NEXRAD2 ^L2-([^/]*)/(....)/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9][0-5][0-9][0-9][0-9]) FILE /scratch/orpg_build8/ldm/nexradII/\2/\3.raw
(Don't make the mistake Joe made in Feb/Mar 06, where during debugging of the montana ingest he enabled both the NEXRAD2 and CRAFT lines in pqact.conf. Both get exercised, resulting in double storage of the data, which does not play well downstream.)
read_ldm puts the data in resp.0, whence they're read by mrpg. read_ldm
writes a standard ORPG log file, which you can read with lelb_mon
read_ldm. It typically logs 300 more "messages written to LB" every
few seconds. At the end of each scan, it does final processing on and
removes the file from /scratch... (An advantage of read_ldm is that it
will read the current volume's file when started. Thus, we don't have to
wait for the beginning of a volume to sync up and get data, unlike
ldml2server/client.)
Note that 'site' starts up mrpg with -p. Among other things, this creates a new resp.0, thus requiring a restart of read_ldm. Conveniently, this is built into the script.
Refer to the EPSS support
manual (current version, ORPG 7) for information on HCI. We can run
HCI by following these steps:
on cx-alps, become user fxa and set the ORPG environment as above
newer X configurations probably won't allow you to export the hci
output to your desktop. I've been using xt1-avs for this purpose, with
xhost + cx-alps and then setenv DISPLAY xt1-avs:0
hci
Try Status, Products -> Products in DB, Base Data Display.
Snowfall accumulation is reset using HCI. See the EPSS how-to for information. (Note: If you get a "password data not available" message, run hci_init_config as user v1.14.)
Here's one of those 'may not happen again' items. Perhaps due to work done at kftg, we lost connection from montana to dx2-fsld/alps. Although the hci display showed the line connected, there was in fact none. A restart using bin/site.montana was sucessful. After the fact, it was apparent that the Comms display indicated a problem - Delay showed a high percentage and Rate was on the order of 700k, vice the normal 2200k or so.
We have installed ORPG 10 on cx-alps and ORPG 11 on dx5-alps. We currently are using cx-alps for fsld, alps, a2dp, and RSA radar, but plan to move the first three to dx5-alps. The code and installation info are accessible from the NWS WSR-88D CODE Web site.
To get the mrpg suite going on cx-alps,
On dx5-alps:
To bring in data:
On cx-alps or dx5-alps:
Again, you can run bin/lnux_x86/hci to see what's happening.
To configure for another radar, only a couple of changes are necessary.
We run the ORPG (Build 10) software on cx-alps, under user fxa. The data feed is via LDM from a remote radar (currently KPTR) that we disguise as KVBX. [No longer true as of 5/09. We were using dx5-alps for kftg data and cx-alps for RSA, as described, but dx5-alps died and we moved kftg onto cx-alps, so RSA doesn't have an ingest at present - except that the kftg level II data are fed into the kvbx wideband stream on dx1-avs.]
To get things set up and going...
As user ldm,
As user fxa,
Look at bin/site.rsa.ax-avs. There are two lines near the bottom, to wit:
read_ldm -a -d $ORPG_HOME/ldm/nexradII/KMLB $ORPGDIR/ingest/resp.0 & #orpg_client -v $ORPGDIR/ingest/resp.0 &
In the configuration shown, we're using LDM. For direct ingest (such as at the Ranges), one would swap the commented-out line and issue command "site.rsa.ax-avs KMLB".
To enable the LDM ingest, as user fxa, make sure this line in /usr/local/ldm/etc/pqact.conf is active:
NEXRAD2 ^L2-([^/]*)/(....)/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9][0-5][0-9][0-9][0-9]) FILE /data/orpg/ldm/nexradII/\2/\3.raw
Also, make sure that /data/orpg/ldm/nexradII/KMLB.lbz has 777 permission (and its parent 775). (fxa is the owner.)
The CRAFT data are compressed Level II files, which ITS brings in from CONDUIT at U. Maryland. The pqact line above writes them to disk, and read_ldm reads and decompresses the data, writing output to resp.0, which is where the mrpg suite looks for its data. (Note that orpg_client puts its data in the same place, as you would expect.)
When the radar changes VCPs, the RPS list sent is based on /data/fxa/radar/lists/KFTG.[storm|clear-air].VCPxx. This information is merged with the appropriate national list, and the resulting list is stored in/data/fxa/radar/lists/KFTG.current. A user can edit this list, or create a list from scratch, and save it in /data/fxa/rps-lists. KXXX.current is recalled and sent out whenever a "Connection Up" message is received from the radar, or whenever a GSM comes in. If the mode, as specified in the GSM, has changed, the appropriate RPS list gets sent out and is saved in KXXX.current.
The national RPS lists, those containing required products and whose contents are merged with user requests (in RadarServer, module WanRpsManager.C), are found in $FXA_NATL_CONFIG_DATA/nationalData/. There are six of these, for clear air and storm modes, and for X.25 (wfoApi) and TCP (ORPGCommsMgr - LAN) connections, and associated radars. Names are rps-RPGOP.clear-air, rps-RPGOP.storm, rps-RPGOP-tcp.clear-air, rps-RPGOP-tcp.storm, rps-assoc.clear-air, and rps-assoc.storm. The choice of which national list to use is a bit arcane. The essential source of information is portInfo.txt, which includes a max number of products value. If no LLL-portInfo.txt is supplied (most sites have one), this defaults to 65 via localization. If the max prods is greater than 50 (a value set in Radar.H, applied in RadarStatus.C, and used in WanRpsManager::getList), then the -tcp version of the national list is used. Otherwise, the standard version is used.
Data for all radars are available on the SBN. Products are stored for the RPGs listed in dx2's localizationDataSets/xxx/radarsInUse.txt. There are 21 products sent on the SBN: CZ, STP, SRM 0.5, VIL, V 0.5, Z 0.5 res 1, DHR, DSP (SDUS5); DPA (SDUS8); GSM (NXUS6); SRM 1.5, SRM 2.4, Z 1.5, Z 2.4 (SDUS2); OHP, SRM 3.4, VWP, Z 3.4 (SDUS3); ET, V1.5, Z 0.5 res2 (SDUS7) [see the NWS 88D list and 88D/TDWR list [PDF] for more info]. DHR and DSP were added site by site with the installation of AWIPS OB8.2.
rpc.ldmd -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ld (1 identically-named child) pqact pqexpire
LDM files are in /usr/local/ldm. The admin process is bin/ldmadmin.in. LDM ingest is managed with ldmadmin.in start and ldmadmin.in stop. The control file is etc/ldmd.conf. By default, no log is written; in order to do so, a line like this must be in /etc/syslog.conf:
local0.debug /usr/local/ldm/logs/ldmd.logAlso for logging and 'ldmadmin newlog' to work, /var/run/syslogd.pid must be world readable and /usr/local/ldm/bin/hupsyslog must have the suid bit set (chmod u+s)
---s--x--x 1 root 6100 20534 Jan 16 1997 /usr/local/ldm/bin/hupsyslog*
If you can't get LDM working, check to see if syslogd is running. If it's not, as root, run /usr/sbin/syslogd -D, then try starting LDM again. Another likely possibility is a bad pattern in a request line. Unfortunately, I don't know the rules. From experience, I can tell you that "*Graphic.*" is not a winner. I've made other mistakes, but don't recall what. A less likely possibility is an open socket. With LDM shut down, enter rpcinfo -p. If you see one or more lines beginning 300029 at the bottom, type sudo rpcinfo -d 300029 5 to remove this open socket, then startLdm again.
Another possibility is that portmap isn't running. You'll know this if rpcinfo returns an error message. sudo /usr/sbin/portmap, and you'll probably be in business.
etc/pqact.conf includes this line to move the files to where they need to go:
FSL3 ^FSL\.CompressedNetCDF\.(.*) FILE -close /data/Incoming/\1
Note that it is very important that the fields are tab-, not space-, delimited! This puts the files in /data/Incoming, whence LDAD will pull them over to dx1. Once there, an entry in LDADinfo.txt calls a simple script that moves the files to /data/fxa/nowrad/nowradZ.
In order for the NOWrad files to be processed from there, a line in the dx1 fxa crontab must be uncommented. This is the first one under "Denver/Boulder-specific items" in ingest.crontab.dx1, below. The second line there should be uncommented, too.
The LDMs on ls1-fslc and dx1-fsld receive NOWrad data from prism.fsl.noaa.gov. The same dx1 cron items need to be enabled as mentioned above. Also on dx1-fsld, we pull in MADIS data from eldm. On both dx1-fsld and dx1-alps, we use LDM to get MODIS imagery from NWS CRH. (These are processed on px1 via a script controlled by SITEpx1cron.)
To support this, LDM needs to run on ls1-fslc, dx1-alps, and dx1-fsld. (Note that the new LDAD boxes on fslc are called ls2 and ls3, but are accessed by logging in to ls1.)
| ls1-fslc | dx1-alps | dx1-fsld | |
|---|---|---|---|
| Request | FSL3 (NOWrad) from prism.fsl.noaa.gov and MADIS files of various sorts from eldm.fsl.noaa.gov | EXP (MODIS) from ldm.crh.noaa.gov | FSL3 (NOWrad) from prism.fsl.noaa.gov, MADIS files of various sorts from eldm.fsl.noaa.gov, and EXP (MODIS) from ldm.crh.noaa.gov |
| Send | ECMWF files to borg.fsl.noaa.gov |
As outlined in the Radar section, we also run LDM on cx-alps, dx5-alps, and ax-avs to pull in local radar.
Now for a few words about Nowrad. We bring Nowrad files in from the Central
Facility using LDM, as noted here, and they're processed on dx1. We had an
occasion where our LDM was down for a while, but were able to recover the
files. Copy files from /public/data/radar/fsl-conus/nowrad/netcdf/ to
/data/fxa/nowrad/nowradZ, compress them (compress *), do a batch
rename (
foreach filen (07*)
foreach? mv $filen ConusNowrad.${filen}
foreach? end
), then run the conversion script that is in SITEdx1cron:(cd ${FXA_HOME}/xfer/nowrad; ./xferNowrad_v3.com ${FXA_HOME}/xfer/nowrad)
And by the way, all xferNowrad does is reformat the netCDF files received from /public into D2D-compatible form - it's a CDL change.
The text database system is also managed separately from the general startIngest and stopIngest. Text products are stored in PostgreSQL databases.
The database runs under user postgres. A number of processes are normally running on dx1, which you can see in a proc post listing. The main process consumes the most CPU time, with the postgres: writer using a fair amount, as well.
The main thing to do for postgres is to check the log (/var/log/postgres). Database problems should be referred to NCF for resolution.
The workstation uses three processes to communicate with the text database, to wit:
$FXA_HOME/bin/TextDB_Server -Write $FXA_HOME/bin/TextDB_Server -Read $FXA_HOME/bin/textdbThe first two of these are started and stopped by the startTextDB.dx1 and stopTextDB.dx1 scripts. Another script, stopTextNotification, will stop the textNotificationServer (it's started, if necessary, by startTextDB.px1). It's generally left running. textdb runs as needed to read/write the database.
Since it's not safe simply to kill the write server, as it may be in the middle of a transaction and the text database could get corrupted, stopTextDB issues a KILLSERV command to the text database to let it down gracefully.
If stopTextDB/startTextDB does not clear up text
storage/retrieval problems, try restarting PostgreSQL.
stopTextDB.dx1.
/etc/init.d/postgresql stop, then
/etc/init.d/postgresql start.
/etc/init.d postgresql status to check that the server
is up.
startTextDB.
Note: When the database is down while the data ingest is running, text messages will queue up inside the TextDB DataController process. Once the database is back up and accepting messages, this queue will be processed. It may take a long time to catch up, however. (To see what's being processed, tail the CollDecoder or StdDecoder logs.) If it's necessary to empty the queue (due to excessive length), use the "CollDB, StdDB" section of startIngest.dx1 to restart the DataController - most easily done by using X to copy the lines out of ~fxa/bin/startIngest.dx1).
Periodically, it's a good idea to do some database cleanup. There are three vacuumdb runs daily, but those don't take care of all of the space cleanup. Wayne Martin recommends a monthly full cleanup, to wit:
stopTextDB.dx1
/usr/bin/vacuumdb -v --analyze --full -d
fxatext
startTextDB.dx1
There are also some scripts in /home/awipsadm/scripts to check on and maintain the text database. One handy one is purge-by-time.sh, which you can use if you have some old stuff hanging around. Another is fxatext-delete.ksh to completely remove an obsolete NNN.
01 0,4,8,12,16,20 * * * /awips/hydroapps/whfs/standard/bin/CleanWFO 27 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_cleanup 37 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_tuneup 15 * * * * /awips/hydroapps/whfs/standard/bin/run_precip_accum
And as fxa
3,8,13,18,23,28,33,38,43,48,53,58 * * * * csh -c
'${FXA_HOME}/bin/moveProds.ksh /data/fxa/ispan/text/hydro/
/awips/hydroapps/whfs/local/data/shef_input/'
Decoder logs are found in /awips/hydroapps/whfs/local/data/log/shef/decoder.
Messages are passed between processes using TCP sockets. The software runs essentially flawlessly and requires no maintenance.
Many fxa activities are managed by cron. For workstations, the currently-running cron lists are found in /var/spool/cron/crontabs/<username>. Servers are set up as heartbeat pairs - dx1/dx2, dx3/dx4, and px1/px2. Crons are managed by the heartbeat system. The crontabs are all owned by root, with all component items merged into one host-specific file, in /etc/ha.d/cron.d. When the package is activated, the appropriate crontab is placed in /etc/cron.d. Information on these files is shown below.
| host | cron.d file | our tree file |
|---|---|---|
| dx1 | dx1cron | ingest.crontab.dx1 |
| dx2 | dx2cron | ingest.crontab.dx2 |
| dx3 | dx3cron | ingest.crontab.dx3 |
| dx4 | dx4cron | ingest.crontab.dx4 |
| px1 | px1cron | ingest.crontab.px1 |
| px2 | px2cron | ingest.crontab.px2 |
Local crontab additions may be put in files SITE<host>cron, which are also kept in /etc/ha.d/cron.d and /etc/cron.d. For manual update, modify the file(s) in /etc/cron.d. Note that duplicate copies of both <host>cron and SITE<host>cron need to be kept on both hosts in a pair, in /etc/ha.d/cron.d so they'll be available during failover.
The fxa lists are shown here:
(Can you say "outdated"?)
ingest.crontab.ds1
# Crontab file for starting transient data ingest processes.
# This file, ingest.crontab.ds1, contains the items that run on the primary
# data server. It is to be installed as
# ds1:/etc/cmcluster/crons/fxa/ds1.dsswap
# ds2:/etc/cmcluster/crons/fxa/ds2.dsswap
# under root ownership.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
#-------------------------------------------------------------------------------
# Break ingest log and announcer files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
0 0 * * * csh -c '${FXA_HOME}/bin/breakAnnouncementFiles >&! ${LOG_DIR}/breakAnnouncementFiles.log'
# Run scour daily to clean up log files and a few items not hit by master.purge.
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# Radar ingest
2,6,11,14,18,22,26,31,35,39,43,47,51,56 * * * * csh -c '${FXA_HOME}/bin/restartRadar' > /dev/null 2>&1
# Process monitor/CPU monitor start-up
36 * * * * csh -c '${FXA_HOME}/bin/DS_startProcMon.sh'
37 * * * * /awips/fxa/bin/startCtrlCpu.sh
# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'
# Data Monitor Scripts
0,10,20,30,40,50 * * * * /awips/fxa/bin/diskUsage.pl -c /awips/fxa/data/disk.cfg -o diskUsage_data.html
# Data archiving and archive purging
55 * * * * csh -c '${FXA_HOME}/bin/legalArchiver.sh'
# Get the RUC model data for the tstorm decoder (MDL)
20,40 0,3,6,9,12,15,18,21 * * * csh -c '${FXA_HOME}/bin/getModelData >& ${LOG_DIR}/getModelData.log'
# Scheduled radar distribution
25,55 * * * * csh -c '${FXA_HOME}/bin/startRadarDist.pl RCM >& /dev/null'
17,34 * * * * csh -c '${FXA_HOME}/bin/startRadarDist.pl THP >& /dev/null'
# Scheduled radar requests to the RadarServer
#23,53 * * * * csh -c '${FXA_HOME}/bin/sendOTR.sh 74 >& /dev/null'
#15 * * * * csh -c '${FXA_HOME}/bin/sendOTR.sh 79 >& /dev/null'
#35 * * * * csh -c '${FXA_HOME}/bin/waitUpTo.pl 600 >& /dev/null' ; csh -c '${FXA_HOME}/bin/sendOTR.sh 136 >& /dev/null'
#5 0,8,16 * * * csh -c '${FXA_HOME}/bin/waitUpTo.pl 600 >& /dev/null' ; csh -c '${FXA_HOME}/bin/sendOTR.sh 152 >& /dev/null'
#1 0,6,12,18 * * * csh -c '${FXA_HOME}/bin/sendOTR.cfc.sh >& /dev/null'
# send radar precipitation bias table data to ORPG via the RadarServer
#26,46 * * * * csh -c '${FXA_HOME}/bin/sendEnvData.pl'
# Watch to make sure nwrTrans.pl has not died, and restart if has
* * * * * /awips/fxa/bin/nwrWatchDog.sh > /dev/null 2>&1
ingest.crontab.dx1
# Crontab file for starting dx1apps data ingest processes for fxa.
#
# MODIFICATION HISTORY:
# ---------------------------------------------------------------------------
# NAME DATE CHANGES
# M. Huang 05/26/05 - Moved NWWSKeepAliveMsg to DX (DR_16193)
# M. Huang 05/27/05 - Moved mhs-data.purge into DX (DR_16194)
#-----------------------------------------------------------------------------
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# INGEST SCRIPTS
# ACARS profiles
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startAcarsProfiles.sh >&! ${LOG_DIR}/acarsProfiles.log'
# Scheduled radar requests to the RadarServer
# RCM
23,53 * * * * csh -c '${FXA_HOME}/bin/sendOTR.sh 74 >& /dev/null'
# THP
15 * * * * csh -c '${FXA_HOME}/bin/sendOTR.sh 79 >& /dev/null'
# SO
35 * * * * csh -c '${FXA_HOME}/bin/waitUpTo.pl 600 >& /dev/null' ; csh -c '${FXA_HOME}/bin/sendOTR.sh 136 >& /dev/null'
# RSS
5 0,8,16 * * * csh -c '${FXA_HOME}/bin/waitUpTo.pl 600 >& /dev/null' ; csh -c '${FXA_HOME}/bin/sendOTR.sh 152 >& /dev/null'
# CFC
1 0,6,12,18 * * * csh -c '${FXA_HOME}/bin/sendOTR.cfc.sh >& /dev/null'
# NWWSKeepAliveMsg - test uplink status
13,28,43,58 * * * * csh -c '${FXA_HOME}/bin/NWWSKeepAliveMsg >& ${LOG_DIR}/nwwsKeepAlive.log'
# send radar precipitation bias table data to ORPG via the RadarServer
26,46 * * * * csh -c '${FXA_HOME}/bin/sendEnvData.pl'
# MONITOR SCRIPTS
# Process Monitor start-up
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/DX_startProcMon.sh'
# Disk usage monitor
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'
# CtrlCpu (CPU monitor) start-up
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startCtrlCpu.sh'
# Purge MHS data once per day.
20 1 * * * csh -c '${FXA_HOME}/bin/mhs-data.purge'
# CLEAN-UP ITEMS
# Run scour daily to clean up log files and a few items not hit by other purgers
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# Break ingest log files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'
# Restart Grib2Decoder every week for memory leak workaround
41 0 * * 0 csh -c '${FXA_HOME}/bin/RestartGribSatDecoders.sh >& /data/logs/fxa/RestartGribSatDecoders.log'
ingest.crontab.dx2
# Crontab file for starting dx2apps data ingest processes for fxa.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest log files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
# PURGER/SCOUR...
# Run scour daily to clean up log files and a few items not hit by master.purge.
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# MONITOR SCRIPTS
# Process Monitor start-up script
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/DX_startProcMon.sh'
# CtrlCpu Monitor start-up script (CPU monitor)
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startCtrlCpu.sh'
# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'
ingest.crontab.px1
# Crontab file for starting px1apps data ingest processes for fxa.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest log and announcer files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
# PURGER/SCOUR...
# Keep purgeProcess running
*/10 * * * * csh -c '${FXA_HOME}/bin/startPurgeProcess >& /dev/null'
# Run the master purger twice hourly, to pare data back to necessary levels.
15,45 * * * * csh -c '${FXA_HOME}/bin/master.purge >&! ${LOG_DIR}/master.purge.log'
# Run the radar purger every hour
#30 * * * * csh -c '${FXA_HOME}/bin/fxa-radar.purge >&! ${LOG_DIR}/fxa-radar.purge'
# Run scour daily to clean up log files and a few items not hit by master.purge.
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# MONITOR SCRIPTS
# Process monitor/summary monitor/LDAD monitor/CPU monitor start-up
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/PX_startProcMon.sh >&! ${LOG_DIR}/procmon.log'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startLdadMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startCtrlCpu.sh'
# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'
# Data Monitor scripts
4,14,24,34,44,54 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/grid.cfg -o ${FXA_HOME}/data/grid_data.html -h "Grid Data"'
5,15,25,35,45,55 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/graphic.cfg -o ${FXA_HOME}/data/graphic_data.html -h "Redbook Graphics Products"'
6,16,26,36,46,56 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/radar.cfg -o ${FXA_HOME}/data/radar_data.html -h "Radar Data"'
7,17,27,37,47,57 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/point.cfg -o ${FXA_HOME}/data/point_data.html -h "Point Data"'
8,18,28,38,48,58 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/sat.cfg -o ${FXA_HOME}/data/sat_data.html -h "Satellite Data"'
9,19,29,39,49,59 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/local.cfg -o ${FXA_HOME}/data/local_data.html -h "Local Data"'
# Disk Usage Monitor
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'
# Data Monitor summary page
3,13,23,33,43,53 * * * * csh -c '${FXA_HOME}/bin/monitorSummary.pl'
# Climate (MDL)
# morning climate
25 12 * * * csh -c '/awips/adapt/climate/bin/Linux/climate.sh auto am>& /dev/null'
# evening climate
25 22 * * * csh -c '/awips/adapt/climate/bin/Linux/climate.sh auto pm>& /dev/null'
# Run Mtr_scd_dvr at hh:07. Controls the MTR decoder feeding Climate (MDL).
7 * * * * csh -c '/awips/adapt/verification/bin/Linux/launch_AEV.csh Mtr_scd_dvr >&! ${LOG_DIR}/Mtr_scd_drv.log'
# Purge the MTR decoder tables about once a week (MDL).
40 3 1,8,15,22 * * csh -c '/awips/adapt/verification/bin/Linux/clean_FSS_tables.sh >&! /dev/null'
# HWR crons (MDL)
10 * * * * csh -c '/awips/adapt/hwr/bin/hwrnwr -t >&! ${LOG_DIR}/hwrnwr.log'
10 * * * * csh -c '/awips/adapt/hwr/bin/hwrnwws -t >&! ${LOG_DIR}/hwrnwws.log'
# MSAS - The MAPS/RUC Surface Assimilation System #
# ----------------------------------------------- #
# In PVCS at ldad/src/MSAS/WFOA_scripts/WFOA_MSAS_cron_file
# Ingest the NCEP surface grids every 6 hours
# Programs = sfcnmc
# Valid Times = 00Z 06Z 12Z 18Z
# Runtime Z = 05:37, 11:37, 17:37, 23:37
37 5,11,17,23 * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_Sfcnmc.run >&! ${FXA_HOME}/ldad/MSAS/logs/sfcnmclog &' > /dev/null 2>&1
# Run the surface cycle every hour at 18 minutes after the hour.
# Programs = sfcing sfchqc sfcanl sfcncdf sfcver srcplot
18 * * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_Surface.run >&! ${FXA_HOME}/ldad/MSAS/logs/sfclog &' > /dev/null 2>&1
# Compile the surface QC stats at the end of the day
# Programs = asos
# Valid Times = 00Z
# Runtime Z = 23:53
53 23 * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_Asos.run >&! ${FXA_HOME}/ldad/MSAS/logs/asoslog &' > /dev/null 2>&1
# QCMS processing
#################
# Run the stage 1 & 2 QC on current hour's data
3,8,13,18,23,28,33,38,43,48,53,58 * * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_QCstage1_2.run >&! ${FXA_HOME}/ldad/MSAS/logs/qcstg1_2log &' > /dev/null 2>&1
# Run the stage 1 & 2 QC on previous hour's data
3,8,13,18,23,28,33,38,43,48,53,58 * * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_QCstage1_2_late.run >&! ${FXA_HOME}/ldad/MSAS/logs/qclatelog &' > /dev/null 2>&1
# Get yesterday's QC stage 1, 2 & 3 daily summaries
35 0 * * * /bin/csh -c '${FXA_HOME}/ldad/MSAS/WFOA_MSAS_QCday.run >&! ${FXA_HOME}/ldad/MSAS/logs/qcdaylog &' > /dev/null 2>&1
# LAPS #
# ---- #
20 * * * * /usr/local/perl/bin/perl /awips/laps/etc/sched.pl /awips/laps /data/fxa/laps
03,19,34,49 * * * * /usr/local/perl/bin/perl /awips/laps/etc/LapsRadar.pl /awips/laps /data/fxa/laps
08,14,23,29,38,45,53,59 * * * * /usr/local/perl/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /data/fxa/laps
ingest.crontab.px2
# Crontab file for starting px2apps data ingest processes for fxa.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Run scour daily to clean up log files and other leftovers
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# MONITOR SCRIPTS
# CPU monitor start-up
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startCtrlCpu.sh'
# Break ctrlCpu log daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogCtrlCpu >&! ${LOG_DIR}/breakLogCtrlCpu.log'
# Disk Usage Monitor Script
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'
The main purger is purgeProcess, running on px1. See the localization write-up (OB6 version).
Legacy purgers are still run by cron, as noted in the previous section. The first, master.purge, runs twice an hour on px1. It in turn runs ~fxa/bin/fxa-data.purge, plus an optional, site-supplied, ~fxa/bin/fxa-data-addons.purge. The second, startScour, runs daily at 0030Z on each server. It starts ~fxa/bin/scour, which reads ~fxa/data/scour.conf.[ds|dx|px] for the list of directories to clear out. Finally, mhs-data.purge runs daily to clean out files in $MHS_DATA and /data/x400 areas. Logs for these processes are $LOG_DIR/master.purge.log and $LOG_DIR/startScour.log (no log for the MHS purger). Each is overwritten each run.
The data monitor comprises a series of perl scripts that run via cron on px1. These scripts build HTML pages that are then copied to $SERVER_DIRECTORY/dataMon/html/, where SERVER_DIRECTORY is defined in ~fxa/data/dataMon.cfg. (The files are also retrieved by an http process on www-sdd (quasar), for use in the summary monitor.) Cron entries are as shown above.
The ingest process monitor is started via cron on each server, also as shown above. The XX_startProcMon.sh script starts ~fxa/bin/ingProcMon.pl, which checks processes in /data/fxa/data/fxa_monitor/monitorProcesses.txt, and builds an HTML file (XXX_ingestProcMon.html, in the same directory) showing what's up and down. These are copied to $FXA_WWW_SERVER_HOST:$SERVER_DIRECTORY/dataMon/, where SERVER_DIRECTORY is defined in ~fxa/data/dataMon.cfg. (On quasar, the document root is defined in /opt/apache/httpd.conf as /opt/apache/lib/htdocs; this is SERVER_DIRECTORY.)
Included at the bottom of the process monitor Web page is a link to bring up a restart menu, pointing to /awips/fxa/htdocs/cgi-bin/restart-setup.sh. (Note that the link just says /cgi-bin..., which one would think points to /awips/fxa/htdocs/dataMon/cgi-bin, since .../dataMon is the document root. However, an alias for /cgi-bin is set in the Web server configuration to point to ~fxa/htdocs/cgi-bin.) This runs ~fxa/bin/restart-ingest.sh on as1, which in turn runs ~fxa/bin/restart-ingest-display.tcl. That finally runs ~fxa/bin/restart-ingest.tcl, which puts up a menu and takes action based on the user's selection. Except for radar, this tcl script runs restartIngest.pl, using information from $FXA_DATA/data/fxa-monitor/monitorProcesses.txt to decide what to do. Specific radar actions depend on the contents of ~fxa/data/localizationDataSets/<siteID>/portInfo.txt on as1, but /awips/fxa/bin/stopRadarProc.pl and icpReset[01] are used. A write-up of the process is in the header block of ~fxa/bin/restart-ingest.tcl.
Procedures are stored in $FXA_DATA/scripts/<username>. Each procedure is in a file, and consists of a list of commands. The usernames are found in ~fxa/data/fxa-users.
Each text Xterm is hosted by its associated workstation. Text `stuff' is stored in $FXA_DATA/textWSwork/xtn-bou:0. Subdirectories include saved (copies of all products that have been created on this station), and journals (in-progress editing, saved for crash recovery), and archived (permanent copies of products sent out over the WAN. Also here is textAlarmAlertProducts.txt, the list of alarm/alert products specific to this workstation. (Site-wide products are in ~fxa/data/textAlarmAlertProducts.txt.)
Log files are in $LOG_DIR/display/xtn-bou:0/yymmdd/textWish<pid>. Logs exist for the text windows, but not the parent textWS.tcl process.
If an Xterm gets mis-configured, the title window will come up, but the individual text windows will not. (You'll get a tcl error when you try to start one.) Press F12 on the keyboard for a second or two, then select Server. Press the Access Control button (middle button in second panel) `on' and click OK (upper right). Answer OK in the dialog box, wait for the reset, log in, and you should be ready to roll.
LAPS (analysis) runs on as2, hourly by cron. In Build 4.3, LAPS is moved onto the new fxa_local partition (will be a separate disk in 5.0). For now, this link is critical to successful LAPS runs:
lrwxr-xr-x 1 fxa fxalpha /data/fxa/laps@ -> /data/fxa_local/laps
As noted earlier, four LAPS processes run via cron:
20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps/ /awips/laps/data 03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data 08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data 22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data
The entire LAPS ingest/analysis generally completes in approximately 5 minutes. Run times longer than 15 minutes or shorter than 2 minutes may indicate a problem. Run completion times are logged in runtime.log.
LAPS localization is effected by
cd /awips/laps/etc perl laps_localization > local.out
More information about LAPS run-time details is available in the LAPS README file, http://laps.noaa.gov/software/README.html.
LDAD runs partly on ds1 ("internal") and partly on ls1 ("external"). The internal part includes these processes:
/awips/ldad/bin/watchDogInternal.sh /awips/ldad/bin/listener /awips/ldad/bin/pollForData.pl /awips/fxa/bin/CommsRouter LDAD_ROUTER /awips/fxa/bin/DataController LDAD_ROUTER LdadController.config /awips/fxa/bin/routerStoreText /awips/fxa/bin/routerShefEncoder /awips/fxa/bin/routerStoreNetcdf /awips/fxa/bin/routerLdadDecoder /awips/ldad/bin/LDADdecoder
The listener process gets data through the firewall, storing files in $FXA_DATA/LDAD/Raw. There is a listener log in /data/logs/ldad, but it's not at all easy to read. (I did on one occasion find a permissions problem writing the raw data by looking at the listener log.) You'll also see there a LDADdecoder.log file, which is the log of the current decoder. The watchDogInternal script checks every 30 seconds to see if the listener and decoder are running. Decoder logs are also written to the usual spot along with other ingest logs. Those files include PID in the name, so there are lots of 'em. (The LDADdecoder.log file includes time stamps on the messages, but those in $LOG_DIR/<date> do not.)
Sometimes, both decoder and listener are up, but no data are coming through. This suggests a problem on the external side. You can restart the whole LDAD system:
This procedure starts both internal and external processes, and may shake things loose.
The LDAD monitors run on as1 (summary and internal) and ls1 (external, acquisition, and dissemination). If one of the pages is more than 5 minutes old (time is in one of the config files), it won't show it. Also, we've seen problems where the obj.conf file on ls1 or as1 had the wrong data root. It's in /etc/opt/ns-fasttrack/httpd-default.
For the record, here are the steps needed to set passwords for LDAD admin access to the fsli Web server. I imagine that a quite similar procedure is used for others.
On 13 Oct 98, we received a call that all of the workstations at Denver had "locked up." What they were seeing was that displays could be zoomed and panned and the pop-up menus worked, but no menus could be used. Further, logging out of the workstation and then logging in and starting D2D resulted in the main pane only coming up. Investigation showed that only 2 IGCs were starting, and that the startup halted when trying to access the system announcer. (This was seen by adding "all all file all" to displayLogPref.) Further, we saw that rpc.lockd was using lots of CPU time on as1.
Darien tried all of her tricks, but we were unable to come up with anything that was causing the problem. The work-around was to run two workstations on ds1, using xhost + ds1-bou on the workstation and setenv DISPLAY wsn-bou:0.0 on the ds, then running ~fxa/bin/d2d. (We tried to do the same on as1 and as2, but in both cases, the startup hung as before.)
In the morning, Bob Ladd rebooted both as1 and as2, but the same problem surfaced, including rpc.lockd's CPU usage.
Finally, Bob found a page in his SMM that he'd extracted from the Build 3 SMM, which said to check ds1 for rpc.statd. Indeed it was down, and as soon as he started it (using '/sbin/init.d/nfs.server start' as root), everything was copesetic again. (The hung d2d starts proceeded to bring up the other IGCs at that point.) Evidently, rpc.lockd on the remote systems communicates with rpc.statd on the server to effect NFS transfers. If we'd rebooted ds1, the problem would have been solved, as well, since rpc.statd comes up as part of the boot process.
What caused rpc.statd to go down remains a mystery.
A similar event occurred 10 Dec 98 on fsli. In that case, the RaobBufrDecoder and profilerDecoder were not working, either.
We couldn't start D2D on lx5-fsli. It appeared to be getting going
OK, but then crashed with this error:
Error in startup script: invalid command name ".imageProp.colorTable1.menu"
Susan found this in the fxaWish log:
FAC-LOCK cannot open /awips/fxa/data/colorMaps.nc: permission denied
and the problem was that colorMaps.nc was not group writable. The
first error message was not very illuminating!
Data are stored on a NAS device, on a volume known to the data ingest software as $FXA_DATA.
Use df (bdf on ds1) to check on disk space.
Click here for data storage information.
Mike Graf wrote a tutorial on model grid WMO headers back in early 1999. Although the file is no longer available on the Web, much of the information can be found at http://www.nws.noaa.gov/tg/awips.html#ccc. There's also a nice summary of NCEP grid information available.