Primary support is from the NCF (301-713-9344). FSL staff are also "on call" (informally) to help with problems. WFO staff are authorized to call Joe Wakefield, Darien Davis, or Carl Bullock at home for help, if NCF can't. They also have Gregg Phillips' cell phone number.
Note for FSL folks: to get hold of WFO staff, use 303-494-4454 (this is the national coordination number, the one other WFOs use to call in). The Admin number there is 303-494-3210, and the external coordination number (media, etc.) is 303-494-4479. FSL operators are on duty 7 a.m to 10:30 p.m., daily and can be reached at 303-497-6887.
Our goal is to make the Denver office operate in an AWIPS-like fashion as far as system support is concerned. Of course, we most always have something beyond the currently-fielded AWIPS software running there, so NCF is not going to be able to handle everything.
All D2D processes are found in ~fxa/bin, and data files (tables, menus, WarnGen templates, etc.) are in ~fxa/data. Most ingest logs are in $LOG_DIR/yymmdd (type "logs" to get there), with a few in $LOG_DIR; these are on local disks.
The log file for the user interface process is $LOG_DIR/display/<displayName>/<date>/fxaWish<pid> where <displayName> is :0.0 for the left display and :0.1 for the right display in a one-mouse configuration and :1.0 for the right display in a two-mouse configuration; <date> is the UTC date when the user interface process was started in YYMMDD format; and <pid> is the process ID.
The log files for the IGC processes, the application manager, the applications, the extensions, and all children, grandchildren, great grandchildren, great great grandchildren, etc., of the user interface process are in the directory $LOG_DIR/display/<displayName>/<date>/fxaWish<pid>.children and have the format <programName><pid> where <programName> is the name of the executable.
Since FXA_HOME/bin is in fxa's PATH, there's no need to include that when entering process commands, and that's reflected in the commands included in these instructions. All commands that you'll need to enter are shown in bold type. Except as noted, all will be run from the fxa account.
You can get to today's ingest log directory simply by typing logs,
and up will get you to its parent, where some logs live. The naming
convention for ingest logs is <processName><pid><hostname><hhmmss>.
You can run these scripts by hand. Should it be necessary to restart, use stopIngest and startIngest.[ds1|as1|as2].
Processes included in stop/startIngest, in the order they are started: (Note that the scripts use $FXA_HOME, which resolves to /awips/fxa/bin. What's shown here is the text that appears in a ps listing.)
For ds1: /awips/fxa/bin/MhsServer /awips/fxa/bin/acqserver 1800 /awips/fxa/bin/acqserver 1800 /awips/fxa/bin/acqserver 1800 /awips/fxa/bin/acqserver 1800 /awips/fxa/bin/CommsRouter COMMS_ROUTER /awips/fxa/bin/CommsRouter GRID_ROUTER /awips/fxa/bin/DataController GRID_ROUTER GribController.config /awips/fxa/bin/GribDecoder /awips/fxa/bin/DataController COMMS_ROUTER SatelliteController.config /awips/fxa/bin/Satdecoder /awips/fxa/bin/RadarServer /awips/fxa/bin/DialServer /awips/fxa/bin/RMR_Server /awips/fxa/bin/hmMonitorServer /awips/fxa/bin/wwaServer /awips/fxa/bin/wwaMonServer /awips/fxa/bin/MhsRequestServer /awips/fxa/bin/NWWSProduct /awips/fxa/bin/caseArchiveServer /awips/fxa/bin/ldadServer For as1: /awips/fxa/bin/DataController COMMS_ROUTER TextCont.config /awips/fxa/bin/RaobBufrDecoder /awips/fxa/bin/profilerDecoder /awips/fxa/bin/MetarDecoder /awips/fxa/bin/MaritimeDecoder /awips/fxa/bin/DataController COMMS_ROUTER TextCont2.config /awips/fxa/bin/binLightningDecoder /awips/fxa/bin/shefEncoder /awips/fxa/bin/RedbookStorage /awips/fxa/bin/DataController COMMS_ROUTER RadarController.config /awips/fxa/bin/RadarStorage /awips/fxa/bin/RadarMsgHandler /awips/fxa/bin/DataController COMMS_ROUTER tStormController.config /awips/fxa/bin/tStormDecoder /awips/fxa/bin/NWWSSchedule For as2: /awips/fxa/bin/DataController COMMS_ROUTER TextDB_Controller.config /awips/fxa/bin/CollDB_Decoder /awips/fxa/bin/StdDB_Decoder /awips/fxa/bin/RadarTextDecoderAnd processes in start/stopTextDB:
For ds1: /awips/fxa/bin/TextDB_Server -Read /awips/fxa/bin/TextDB_Server -Write For as1: /awips/fxa/bin/afoscommsrv /awips/fxa/bin/textNotificationServer For as2: (none)The stop/start scripts handle the non-indented items in the list. Indented items are children spawned by the process listed immediately above.
Other persistent items started by cron:
on ds1: ./syncComms ./syncComms cs_config0 0 ./wfoApi cs_config0 0 /awips/fxa/bin/ingProcMon.pl /awips/fxa/bin/ctrlCpu on as1: /awips/fxa/bin/ctrlCpu ingProcMon.pl /awips/fxa/bin/ingProcMon.pl -c AS1 /awips/fxa/bin/processSummary.pl /awips/fxa/htdocs/ldadMon/bin/MakeSUMMpage /awips/fxa/htdocs/ldadMon/bin/MakePROCpage on as2: /awips/fxa/bin/ctrlCpu ingProcMon.pl /awips/fxa/bin/ingProcMon.pl -c AS2
Also started separately:
on as1: /awips/fxa/bin/notificationServer /awips/fxa/bin/asyncScheduler
Rarely, the GRIB decoder will hang on bad grids. You'll see this by the GribDecoder process using lots of CPU time for extended periods, and a check on the log will show nothing happening. Issue kill -10 <pid> to force a crash. The signal handler will remove the bad grid and the controller will start a new decoder.
If you get a call that radar is not auto-updating, you'll probably need
to restart the notificationServer. When you use stopNotificationServer
to kill the server, it may take some time to write out its client list,
which is found in $FXA_DATA/workFiles/notificationServerClientListState.txt.
Make sure you give it a chance (check the log to see if it's heard the
signal 15) before using kill -9. Otherwise, when the server is restarted,
the workstations won't receive green time and auto-update messages until
they, in turn, are restarted. After it's stopped, use startNotificationServer
to get it running again. The textNotificationServer has a similar feature;
its client list is in $FXA_DATA/workFiles/textNotificationServerClientList.txt.
If SBN data (satellite, METARs, text, grids) are not arriving, check the CP operation, to see if it's hung. rlogin cpsbn1 for TG data, or cpsbn2 for NESDIS, as user root. (Note: if you need to log in at the console, you'll need to move the CP switch to the Monitor position.) Type acq_stats -k0 -k1 to run the acquisition monitor. If the line beginning "0 ds-bou" is not up to date (the other one will read
0 null 0 0 - n/a n/a 0 0 0 0), you'll need to restart. (If both CPs have stopped at the same time, it's likely that there's an uplink problem at the NCF, or there could be a downlink problem. Check with NCF (301-713-9344) before restarting.) Other problem indicators are lots of buffers or distribution headers in use. First, stop the system with acq_ctl -A -S -f. Type ps -xaf to see what processes are running. Kill any /awips/bin/acq* that's running, then start with start_cpsbn_acq. A lot of text will scroll by as the software is downloaded from ds1-bou. In many cases, you'll need to push Enter to get your prompt back. Monitor the system again with acq_stats -k0 -k1 -i10; you should see the TG line connect within a few seconds, though the NESDIS line may take several minutes (use ctrl-C to exit). Log out (exit) (and switch back to Modem if at the console). The child acqserver may go down when you stop the CP, then will come back as data are sent. Check for the old child (back on ds), and kill it if necessary. Otherwise, it will periodically log a PROBLEM: CP connection has timed out :header message, and also send notification to the NCF - something we'd rather not do.
If this doesn't work, have the forecasters check the Sync and Signal green lights on the demod. If these are out, have them contact the NCF for information. (This is unlikely, as the NCF monitors that portion of the system.) If the signal looks good, but you can't connect, you may need to reboot the CP. Log in and enter /etc/reboot. Ingest processes start automatically. (If you can't log in, you can press the reset button that's just above the CP's power switch at lower right. The system will reboot itself. Using the reboot command is preferred.)
If necessary, either CP can be configured to send both data streams to the server. Call the NCF and tell them which of the CPs has failed. They will perform the failover.
Should you restart the ingest and still receive no SBN data, check on the acqserver processes (proc acq) on ds1. One child process handles the TG data, another NESDIS, and a third the encrypted radar data (in concert with as2). The first and third usually connect almost immediately, while the second may take a few minutes. If there are not four of them, the system is not connecting to the SBN CPs. Check the acqserver logs, then login to the appropriate CP (cpsbn1 for TG data and cpsbn2 for NESDIS), per the SBN section. (In a partial-failure situation you might see only text (and thus METAR) or only satellite data arriving, and maybe only two of the acqserver processes. If restarts fail, it will be necessary to fail over to a single CP.)
Over on as2, you'll find three root-owned acqserver processes, along with two uplink_sends and a dwb_nfe. (If these are missing, you'll see a "fail" notation in the acq_stats log.) To fix, go to as2 as root, and enter
cd /awips/ops/bin . ./awips.profile ./start_sbn_decrypt
The diskless SBN CPs boot off of the data server. One side effect of this is that ingest log files are available on the data server disks, in directory /awips/hprt/logs/Products/cpsbnn-bou/acq_clntm_h0/mcProduct.log, where m is 0 for TG data and 1 for NESDIS. (From the cp, these are found in /awips/logs/Products/...) The system breaks these logs when they hit 1MB size (keeping a previous version called mcProduct.old), so there may not be a whole lot of history available (particularly for the TG side), but these can be useful in diagnosing missing data.
In acq_wmo_parms.sbn, we exclude certain datasets (chiefly, AK/HI/PR grids and satellite images). The "codes" used with those are not at all obvious. Here is a list provided 3/00 by Leroy Klet.
/* Table of T1 letters to codes */
T1=A PC=13 ASCII Analysis
T1=B PC=19 ASCII Admin Msg
T1=C PC=14 ASCII Climatic
T1=D PC=44 GRID
T1=E PC=51 Satellite Imagery
T1=F PC=15 ASCII Forecast
T1=G PC=45 GRID
T1=H PC=46 GRID
T1=I PC=31 BUFR Obs
T1=J PC=32 BUFR Forecast
T1=K PC=71 Unused
T1=L PC=72 Unused
T1=M PC=73 Unused
T1=N PC=16 ASCII Notices
T1=O PC=43 Grid
T1=P PC=10 Graphic
T1=Q PC=11 Graphic
T1=R PC=74 Unused
T1=S PC=17 ASCII Surface
T1=T PC=52 Satellite Imagery (same as GOES)
T1=U PC=61 Upper Air
T1=V PC=62 National Data
T1=W PC=18 ASCII Warnings
T1=X PC=47 GRID
T1=Y PC=41 GRID
T1=Z PC=42 GRID
This definition is in the AWIPS baseline under
../src/co/include/cp_product_code.h
(I don't find this file in our AWIPS tree.)
syncComms is a script that runs wfoApi, which handles the transfer of data between the Freeway and the DS. Files are stored temporarily in $FXA_DATA/radar/raw and /text. Files in /raw are moved by RadarStorage to the appropriate product directories, e.g., /kftg/Z or V. The /text files are processed by the RadarTextDecoder process; output goes to the text database (e.g., WSRVWPFTG).
A comms status file is maintained in /data/fxa/workFiles/wfoApi.StateInfo. Every time a connection is received from a wfoApi process, information about the radar and the wfoApi is recorded. I'm not sure what all the numbers mean, but at least the radar ID and wfoApi pid are in there.
Radar ingest processes also include the RadarServer and the DataController/RadarStorage pair. The former communicates via the wfoApi process with the RPG over an X.25 link, while the latter are responsible for storing radar products as they are received.
System Reboot in Progress...You must hit Enter on the keyboard to return to your session.
cd /usr/local/freeway/bin
x25_manager < fw_init
If the following lines do not appear, you will need to repeat the above command until buffers and circuits are configured.
Now for a few words on RPS lists.
A user can edit the current RPS list and send that out. This RPS list is saved in /data/fxa/radar/lists/KXXX.current. A user can also edit the current RPS list, or any other (except the default clear-air and storm mode) RPS list and save it in the /data/fxa/rps-lists directory. These RPS lists including .current can be changed at will. Any RPS list that gets sent out gets saved in KXXX.current, and is recalled and sent out whenever a "Connection Up" message is received from wfoApi, or whenever a GSM comes in. If the mode, as specified in the GSM, has changed, that RPS list gets sent out and is saved in KXXX.current.
In order for a user to put the RPS list back to what it was, one of two things is done: manually copy the current-mode RPS list into KXXX.current and send that out, or use the RPS list application editor to edit the current RPS list for that RPG.
And now for a few words on localization on FSL systems.
When you run a DEN localization, you get Denver's PUP ID and port ID numbers. The correct numbers for Boulder are in the FSL files. On ds1-fsli, copy ~fxa/data/localization/FSL/FSL-portInfo.txt and FSL-pupId.txt to ~fxa/data/localizationDataSets/DEN/portInfo.txt and pupId.txt.
rpc.ldmd -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ld (1 identically-named child) pqact pqexpire
LDM files are in /usr/local/ldm. The admin process is bin/ldmadmin.in. LDM ingest is managed with ldmadmin.in start and ldmadmin.in stop. The control file is etc/ldmd.conf. At present, no log is written; in order to do so, a line like this must be in /etc/syslog.conf:
local0.debug /usr/local/ldm/logs/ldmd.logAlso, /usr/local/ldm/bin/hupsyslog must have the suid bit set
---s--x--x 1 root 6100 20534 Jan 16 1997 /usr/local/ldm/bin/hupsyslog*
in order for 'ldmadmin newlog' to work, and apparently to allow writing to the log.
If you can't get LDM working, check to see if syslogd is running. If it's not, as root, run /usr/sbin/syslogd -D, then try starting LDM again. Another likely possibility is a bad pattern in a request line. Unfortunately, I don't know the rules. From experience, I can tell you that "*Graphic.*" is not a winner. I've made other mistakes, but don't recall what. A less likely possibility is an open socket. With LDM shut down, enter rpcinfo -p. If you see one or more lines beginning 300029 at the bottom, type sudo rpcinfo -d 300029 5 to remove this open socket, then startLdm again.etc/pqact.conf includes this line to move the files to where they need to go:
FSL3 ^FSL\.CompressedNetCDF\.(.*) FILE -close /data/Incoming/\1
Note that it is very important that the fields are tab-, not space-, delimited! This puts the files in /data/Incoming, whence LDAD will pull them over to ds1. Once there, an entry in LDADinfo.txt calls a simple script that moves the files to /data/fxa/nowrad/nowradZ.
In order for the NOWrad files to be processed from there, a line in the ds1 fxa crontab must be uncommented. This is the first one under "Denver/Boulder-specific items" in ingest.crontab.ds1, below. The second line there should be uncommented, too.
kftg radar ingest is on fsld. Data are sent from as1-fsld to ds1-fsli, ls1-fslc, and an FD machine, jarts. This works because RadarStorage calls the LDM queue insertion routine.
LDAD data are sent from as1-fsld to FD via the LDM. This is done via cron, with a line found in /awips/fxa/etc/sendLDAD.cron on as1-fsld. This needs to be added locally to fxa's cron there.
The LDMs on ls1-fslc and ds1-fsld receive NOWrad data from ldm.fsl.noaa.gov, and ds1-fsli receives it from as1-fsld-59, a special name known to the fsli firewall. The same ds1 cron items need to be enabled as mentioned above.
To support all of this, LDM needs to run on ls1-fslc, as1-fsld, ds1-fsld, and ds1-fsli.
| ls1-fslc | as1-fsld | ds1-fsld | ds1-fsli | |
|---|---|---|---|---|
| Request | FSL3 (NOWrad) from ldm.fsl.noaa.gov FSL5 (radar) from as1-fsld |
FSL3 (NOWrad) from ds1-fsld, to pass to ds1-fsli | FSL3 (NOWrad) from ldm.fsl.noaa.gov FSL3 (AWC radar) from ldm.fsl.noaa.gov |
FSL3 (NOWrad) from as1-fsld-59 FSL5 (radar) from as1-fsld-59 |
| Send | FSL5 to ds1-fsli, ls1-fslc, jarts|foxtail, shylock|gobbo | FSL3 to as1-fsld |
The text database system is also managed separately from the general startIngest and stopIngest. Text products are stored in an Informix database.
The datbase runs under user root. A number of processes are normally running on ds1, which you can see in a proc oninit listing. Usually, only the parent process will consume significant amounts of CPU time.
The main things to do for informix are to check the log (onstat -m) and replication (onstat -g dri). Informix problems should be referred to NCF for resolution. You can follow the procedures in the 4.1 System Manager's Manual, if needed.
The workstation uses 4 processes to communicate with the text database, to wit:
$FXA_HOME/bin/TextDB_Server -Write $FXA_HOME/bin/TextDB_Server -Read $FXA_HOME/bin/textdb $FXA_HOME/bin/textdbRemoteThe first two of these are started and stopped by the startTextDB and stopTextDB scripts. Another script, stopTextNotification, will stop the textNotificationServer (it's started, if necssary, by startTextDB.as1). We prefer not to stop it, because doing so necessitates restarting all text workstations to get alarm/alert notices. The others, textdb and textdbRemote, run as needed to read/write the database. (The former communicates directly with the database, while the latter goes through the read/write server.)
Managing the text database requires care, because of the nature of the database software. In particular, it's not safe simply to kill the write server, as it may be in the middle of a transaction, and the text database could get corrupted. Thus, stopTextDB issues a KILLSERV command to the text database.
If stopTextDB/startTextDB does not clear up text storage/retrieval problems, there may be something wrong with Informix. In that case,
On ds1-fslt, the instructions are somewhat different:
(There's a script, informix_linux, that's supposed to do this, but it's not working - worth a fix, I'm sure. Also, this is not set up in the boot startup files, and probably should be.)
You may find database errors (things like `database update error: -346' or `database insert error -239') in the TextDB_Server logs. (Use finderr <nnn> to see information on these error codes.)
Note: When the database is down while the data ingest is running, text
messages will queue up inside the TextDB DataController process. Once the
database is back up and accepting messages, this queue will be processed.
It may take a long time to catch up, however. (To see what's being processed,
look at the end of the CollDecoder or StdDecoder logs.) If it's necessary
to empty the queue (due to excessive length), you must kill the TextDB
DataController (use proc TextDB to get the pid) and restart it using
DataController COMMS_ROUTER TextDB_Controller.config & (most
easily done by using X to copy this line out of ~fxa/bin/startIngest).
01 0,4,8,12,16,20 * * * /awips/hydroapps/whfs/standard/bin/CleanWFO 27 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_cleanup 37 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_tuneup 15 * * * * /awips/hydroapps/whfs/standard/bin/run_precip_accum
And as fxa
3,8,13,18,23,28,33,38,43,48,53,58 * * * * csh -c
'${FXA_HOME}/bin/moveProds.ksh /data/fxa/ispan/text/hydro/
/awips/hydroapps/whfs/local/data/shef_input/'
Decoder logs are found in /awips/hydroapps/whfs/local/data/log/shef/decoder.
Service: afoscommsrv host ds1-bou connect failed: connection refused error sending to AFOSOn as1, use startAFOS to restart it. Logs are written to $LOG_DIR/afoscommsrv.*.
|
|
|
|
|
| ds1 | normal ops | ds1.dsswap | ingest.crontab.ds1 |
| ds2 | failover ops | ds2.dsswap | ingest.crontab.ds1 |
| as1 | normal ops | as1.as1swap | ingest.crontab.as1 |
| as2 | normal ops | as2.as2swap | ingest.crontab.as2 |
| as1 | failover ops | as1.as1swap.as2swap | ingest.crontab.as1-as2 |
| as2 | failover ops | as2.as2swap.as1swap | ingest.crontab.as1-as2 |
ingest.crontab.ds1
# Crontab file for starting transient data ingest processes.
# This file, ingest.crontab.ds1, contains the items that run on the primary
# data server. It is to be installed as
# ds1:/etc/cmcluster/crons/fxa/ds1.dsswap
# ds2:/etc/cmcluster/crons/fxa/ds2.dsswap
# under root ownership.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest log and announcer files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
0 0 * * * csh -c '${FXA_HOME}/bin/breakAnnouncementFiles >&! ${LOG_DIR}/breakAnnouncementFiles.log'
# Purgers...
# Run the master purger twice hourly, to pare data back to necessary levels.
15,45 * * * * csh -c '${FXA_HOME}/bin/master.purge >&! ${LOG_DIR}/master.purge.log'
# Purge excess Redbook graphics hourly
0 * * * * csh -c '${FXA_HOME}/bin/purgeAllRedbook >&! ${LOG_DIR}/purgeAllRedbook.log'
# Run scour daily to clean up log files and a few items not hit by master.purge.
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# Purge MHS data once per day.
20 1 * * * csh -c '${FXA_HOME}/bin/mhs-data.purge'
# Radar ingest
* * * * * csh -c '${FXA_HOME}/bin/restartRadar'
0 0 * * * csh -c '${FXA_HOME}/bin/breakLog pingFreeway0.log'
0 0 * * * csh -c '${FXA_HOME}/bin/breakLog pingFreeway1.log'
# hydro scripts - something different will exist in Build 4
#00 0 * * * /usr/local/hydro/wfo/bin/CleanBad.scr
#01 20,0,4,8,12,16 * * * /usr/local/hydro/wfo/bin/CleanWFO
#03 9 * * * /usr/local/hydro/wfo/bin/run_db_cleanup
#03 11 * * * /usr/local/hydro/wfo/bin/run_db_tuneup
#15 * * * * /usr/local/hydro/wfo/bin/run_precip_accum
#2,7,12,17,22,27,32,37,42,47,52,57 * * * * csh -c '/usr/bin/perl ${FXA_HOME}/bin/renameHydroFiles.pl'
3,8,13,18,23,28,33,38,43,48,53,58 * * * * csh -c '${FXA_HOME}/bin/moveProds.ksh /data/fxa/ispan/text/hydro/ /awips/hydroapps/whfs/local/data/shef_input/'
# Process Monitor start-up script
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/DS_startProcMon.sh'
# Data archiving and archive purging
50 * * * * csh -c '${FXA_HOME}/bin/textArchiver.sh'
# Denver/Boulder-specific items
#14,29,44,59 * * * * csh -c '(cd ${FXA_HOME}/xfer/nowrad; ./xferNowrad_v3.com ${FXA_HOME}/xfer/nowrad) >&! ${LOG_DIR}/xfer_nowrad.log'
#0 0 * * * /usr/local/ldm/bin/ldmadmin newlog
#0,15,30,45 * * * * csh -c '${FXA_HOME}/bin/ldmBridgeRestart >&! ${LOG_DIR}/ldmBridgeRestart.log'
# 40 km MAPS ingest - Boulder only (uncomment on ds1-fsla only)
#5 0,6,9,12,18,21 * * * csh -c '${FXA_HOME}/bin/maps40.script >&! ${LOG_DIR}/maps40.log'
#30 3,15 * * * csh -c '${FXA_HOME}/bin/maps40.script >&! ${LOG_DIR}/maps40.log'
#30 * * * * csh -c '${FXA_HOME}/bin/gridWatchdog >>& ${LOG_DIR}/gridWatchdog.log'
ingest.crontab.as1
# Crontab file for starting transient data ingest processes.
# This file, ingest.crontab.as1, contains the items that run on the "#1"
# application server. It is to be installed as
# as1:/etc/cmcluster/crons/fxa/as1.as1swap
# under root ownership.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest logs daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
# Run scour daily to clean up log files
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# Data Monitor scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/grid.cfg -o ${FXA_HOME}/data/grid_data.html -h "Grid Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/graphic.cfg -o ${FXA_HOME}/data/graphic_data.html -h "Redbook Graphics Products"'
#0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/text.cfg -o ${FXA_HOME}/data/text_data.html -h "SBN Text Products"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/radar.cfg -o ${FXA_HOME}/data/radar_data.html -h "Radar Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/point.cfg -o ${FXA_HOME}/data/point_data.html -h "Point Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/sat.cfg -o ${FXA_HOME}/data/sat_data.html -h "Satellite Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/local.cfg -o ${FXA_HOME}/data/local_data.html -h "Local Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'
# Data Monitor summary page script
3,13,23,33,43,53 * * * * csh -c '${FXA_HOME}/bin/monitorSummary.pl'
# Process Monitor start-up scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS1_startProcMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'
[we also run sar on as1, though it's not in ingest.crontab.as1]
# SAR system performance monitor data collection. -JSW 3 Mar 98
5 * * * * /awips/fxa/htdocs/perfMon/bin/updatesar.pl
ingest.crontab.as2
# Crontab file for starting transient data ingest processes.
# This file, ingest.crontab.as2, contains the items that run on the "#2"
# application server. It is to be installed as
# as2:/etc/cmcluster/crons/fxa/as2.as2swap
# under root ownership.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest logs daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
# Run scour daily to clean up log files
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
[the above has been interactively modified to read 30 2, for GribDecoder logs]
# Process Monitor start-up scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS2_startProcMon.sh'
#0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'
# LAPS #
# ---- #
20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps /awips/laps/data
03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data
08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data
22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data
# MSAS - The MAPS/RUC Surface Analysis System #
# ------------------------------------------- #
# Ingest the NCEP surface grids every 12 hours
# Programs = sfcnmc & prsnmc
# Valid Times = 00Z 12Z
# Runtime Z = 06:57 & 18:57, to catch late arriving NGM211 grids
57 6,18 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Sfcnmc.run >&! /awips/fxa/ldad/MSAS/logs/sfcnmclog'
# Run the surface cycle every hour at 20 minutes after the hour.
# Programs = sfcing sfchqc sfcanl sfcncdf sfcver srcplot
20 * * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Surface.run >&! /awips/fxa/ldad/MSAS/logs/sfclog'
# Compile the surface QC stats at the end of the day
# Programs = asos
# Valid Times = 00Z
# Runtime Z = 23:53
53 23 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Asos.run >&! /awips/fxa/ldad/MSAS/logs/asoslog'
ingest.crontab.as1-as2
# Crontab file for starting transient data ingest processes.
# This file, ingest.crontab.as1-as2, contains items that run on either
# application server when it is running as both in failover mode. In
# general, it is a union of ingest.crontab.as1 and ingest.crontab.as2,
# though some items may be dropped due to loading considerations. It is
# to be installed as
# as1:/etc/cmcluster/crons/fxa/as1.as1swap.as2swap
# as2:/etc/cmcluster/crons/fxa/as2.as2swap.as1swap
# under root ownership.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest logs daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
# Run scour daily to clean up log files
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# Data Monitor scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/grid.cfg -o ${FXA_HOME}/data/grid_data.html -h "Grid Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/graphic.cfg -o ${FXA_HOME}/data/graphic_data.html -h "Redbook Graphics Products"'
#0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/text.cfg -o ${FXA_HOME}/data/text_data.html -h "SBN Text Products"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/radar.cfg -o ${FXA_HOME}/data/radar_data.html -h "Radar Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/point.cfg -o ${FXA_HOME}/data/point_data.html -h "Point Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/sat.cfg -o ${FXA_HOME}/data/sat_data.html -h "Satellite Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/local.cfg -o ${FXA_HOME}/data/local_data.html -h "Local Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'
# Data Monitor summary page script
3,13,23,33,43,53 * * * * csh -c '${FXA_HOME}/bin/monitorSummary.pl'
# Process Monitor start-up scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS1_startProcMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS2_startProcMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'
# LAPS #
# ---- #
20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps /awips/laps/data
03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data
08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data
22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data
# MSAS - The MAPS/RUC Surface Analysis System #
# ------------------------------------------- #
# Ingest the NCEP surface grids every 12 hours
# Programs = sfcnmc & prsnmc
# Valid Times = 00Z 12Z
# Runtime Z = 06:57 & 18:57, to catch late arriving NGM211 grids
57 6,18 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Sfcnmc.run >&! /awips/fxa/ldad/MSAS/logs/sfcnmclog'
# Run the surface cycle every hour at 20 minutes after the hour.
# Programs = sfcing sfchqc sfcanl sfcncdf sfcver srcplot
20 * * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Surface.run >&! /awips/fxa/ldad/MSAS/logs/sfclog'
# Compile the surface QC stats at the end of the day
# Programs = asos
# Valid Times = 00Z
# Runtime Z = 23:53
53 23 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Asos.run >&! /awips/fxa/ldad/MSAS/logs/asoslog'
The ingest process monitor is started via cron on ds1, as1, and as2,
also as shown above. The XXX_startProcMon.sh script starts
~fxa/bin/ingProcMon.pl, which checks processes in
~fxa/data/XXX_ingestProcesses.txt, and builds an HTML file
(XXX_ingestProcMon.html) showing what's up and down. These are copied to
$FXA_WWW_SERVER_HOST:$SERVER_DIRECTORY/dataMon/, where SERVER_DIRECTORY
is defined in ~fxa/data/dataMon.cfg. (On jailbird, the document root is
defined in /opt/apache/httpd.conf as /opt/apache/lib/htdocs; this is
SERVER_DIRECTORY.)
Each text Xterm is hosted by its associated workstation. Text `stuff' is stored in $FXA_DATA/textWSwork/xtn-bou:0. Subdirectories include saved (copies of all products that have been created on this station), and journals (in-progress editing, saved for crash recovery), and archived (permanent copies of products sent out over the WAN. Also here is textAlarmAlertProducts.txt, the list of alarm/alert products specific to this workstation. (Site-wide products are in ~fxa/data/textAlarmAlertProducts.txt.)
Log files are in $LOG_DIR/display/xtn-bou:0/yymmdd/textWish<pid>. Logs exist for the text windows, but not the parent textWS.tcl process.
If an Xterm gets mis-configured, the title window will come up, but
the individual text windows will not. (You'll get a tcl error when you
try to start one.) Press F12 on the keyboard for a second or two, then
select Server. Press the Access Control button (middle button in second
panel) `on' and click OK (upper right). Answer OK in the dialog box, wait
for the reset, log in, and you should be ready to roll.
LAPS (analysis) runs on as2, hourly by cron. In Build 4.3, LAPS is moved onto the new fxa_local partition (will be a separate disk in 5.0). For now, this link is critical to successful LAPS runs:
lrwxr-xr-x 1 fxa fxalpha /data/fxa/laps@ -> /data/fxa_local/laps
As noted earlier, four LAPS processes run via cron:
20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps/ /awips/laps/data 03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data 08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data 22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data
LAPS localization is effected by
cd /awips/laps/etc perl laps_localization > local.out
More information about LAPS run-time details is available in the
LAPS README file, http://laps.fsl.noaa.gov/frd/laps/software/README.html.
/awips/ldad/bin/listener /awips/ldad/bin/LDADdecoder ./watchDogInternal.shThe listener process gets data through the firewall, storing files in $FXA_DATA/LDAD/Raw. There is a listener log in /data/logs/ldad, but it's not at all easy to read. (I did on one occasion find a permissions problem writing the raw data by looking at the listener log.) You'll also see there a LDADdecoder.log file, which is the log of the current decoder. The watchDogInternal script checks every 30 seconds to see if the listener and decoder are running. Decoder logs are also written to the usual spot along with other ingest logs. Those files include PID in the name, so there are lots of 'em. (The LDADdecoder.log file includes time stamps on the messages, but those in $LOG_DIR/<date> do not.)
Sometimes, both decoder and listener are up, but no data are coming through. This suggests a problem on the external side. You can restart the whole LDAD system:
The LDAD monitors run on as1 (summary and internal) and ls1 (external, acquisition, and dissemination). If one of the pages is more than 5 minutes old (time is in one of the config files), it won't show it. Also, we've seen problems where the obj.conf file on ls1 or as1 had the wrong data root. It's in /etc/opt/ns-fasttrack/httpd-default.
For the record, here are the steps needed to set passwords for LDAD admin access to the fsli Web server. I imagine that a quite similar procedure is used for others.
On 13 Oct 98, we received a call that all of the workstations at Denver had "locked up." What they were seeing was that displays could be zoomed and panned and the pop-up menus worked, but no menus could be used. Further, logging out of the workstation and then logging in and starting D2D resulted in the main pane only coming up. Investigation showed that only 2 IGCs were starting, and that the startup halted when trying to access the system announcer. (This was seen by adding "all all file all" to displayLogPref.) Further, we saw that rpc.lockd was using lots of CPU time on as1.
Darien tried all of her tricks, but we were unable to come up with anything that was causing the problem. The work-around was to run two workstations on ds1, using xhost + ds1-bou on the workstation and setenv DISPLAY wsn-bou:0.0 on the ds, then running ~fxa/bin/d2d. (We tried to do the same on as1 and as2, but in both cases, the startup hung as before.)
In the morning, Bob Ladd rebooted both as1 and as2, but the same problem surfaced, including rpc.lockd's CPU usage.
Finally, Bob found a page in his SMM that he'd extracted from the Build 3 SMM, which said to check ds1 for rpc.statd. Indeed it was down, and as soon as he started it (using '/sbin/init.d/nfs.server start' as root), everything was copesetic again. (The hung d2d starts proceeded to bring up the other IGCs at that point.) Evidently, rpc.lockd on the remote systems communicates with rpc.statd on the server to effect NFS transfers. If we'd rebooted ds1, the problem would have been solved, as well, since rpc.statd comes up as part of the boot process.
What caused rpc.statd to go down remains a mystery.
A similar event occurred 10 Dec 98 on fsli. In that case, the RoabBufrDecoder and profilerDecoder were not working, either.
Use bdf to check on disk space.
Click here for data storage information.