Note: The current version of this document is for Build 4.2.
Primary support is from the NCF (301-713-1284). FSL staff are also "on call" (informally) to help with problems. WFO staff are authorized to call Joe Wakefield, Darien Davis, or Carl Bullock at home for help, if NCF can't. They also have Gregg Phillips' cell phone number.
Note for Boulder folks: use 303-494-4454 (this is the national coordination number, the one other WFOs use to call in) to get hold of WFO staff. The Admin number there is 303-494-3210, and the external coordination number (media, etc.) is 303-494-4479. FSL operators are on duty 7 a.m to 10:30 p.m., daily and can be reached at 303-497-6887.
Our goal is to make the Denver office operate in an AWIPS-like fashion as far as system support is concerned. Of course, we aren't using AWIPS, and recognize that this goal will be difficult to reach.
All D2D processes are found in ~fxa/bin, and data files (tables, menus, WarnGen templates, etc.) are in ~fxa/data. Most ingest logs are in $LOG_DIR/yymmdd (type "logs" to get there), with a few in $LOG_DIR; these are on local disks.
The log file for the user interface process is $LOG_DIR/display/<displayName>/<date>/fxaWish<pid> where <displayName> is :0.0 for the left display and :0.1 for the right display in a one-mouse configuration and :1.0 for the right display in a two-mouse configuration; <date> is the UTC date when the user interface process was started in YYMMDD format; and <pid> is the process ID.
The log files for the IGC processes, the application manager, the applications, the extensions, and all children, grandchildren, great grandchildren, great great grandchildren, etc., of the user interface process are in the directory $LOG_DIR/display/<displayName>/<date>/fxaWish<pid>.children and have the format <programName><pid> where <programName> is the name of the executable.
Since FXA_HOME/bin is in fxa's PATH, there's no need to include that when entering process commands, and that's reflected in the commands included in these instructions. All commands that you'll need to enter are shown in bold type. Except as noted, all will be run from the fxa account.
You can get to today's ingest log directory simply by typing logs,
and up will get you to its parent, where some logs live. The naming
convention for ingest logs is <processName><pid><hostname><hhmmss>.
You can run these scripts by hand. Should it be necessary to restart, use stopIngest and startIngest.[ds1|as1|as2].
Processes included in stop/startIngest, in the order they are started: (Note that the scripts use $FXA_HOME, which resolves to /awips/fxa/bin. What's shown here is the text that appears in a ps listing.)
For ds1: /awips/fxa/bin/acqserver 900 /awips/fxa/bin/acqserver 900 /awips/fxa/bin/acqserver 900 /awips/fxa/bin/CommsRouter COMMS_ROUTER /awips/fxa/bin/CommsRouter GRID_ROUTER /awips/fxa/bin/RadarServer /awips/fxa/bin/DialServer /awips/fxa/bin/MhsServer /awips/fxa/bin/pingFreeway 0 /awips/fxa/bin/pingFreeway 1
For as1: /awips/fxa/bin/DataController COMMS_ROUTER TextCont.config /awips/fxa/bin/MetarDecoder /awips/fxa/bin/RaobBufrDecoder /awips/fxa/bin/profilerDecoder /awips/fxa/bin/MaritimeDecoder /awips/fxa/bin/DataController COMMS_ROUTER TextCont2.config /awips/fxa/bin/AlertDecoder /awips/fxa/bin/binLightningDecoder /awips/fxa/bin/CdotDecoder /awips/fxa/bin/shefEncoder /awips/fxa/bin/DataController COMMS_ROUTER SatelliteController.config /awips/fxa/bin/Satdecoder /awips/fxa/bin/DataController COMMS_ROUTER RadarController.config /awips/fxa/bin/RadarStorage /awips/fxa/bin/notificationServer /awips/fxa/bin/RadarMsgHandler For as2: /awips/fxa/bin/DataController COMMS_ROUTER TextDB_Controller.config /awips/fxa/bin/CollDB_Decoder /awips/fxa/bin/StdDB_Decoder /awips/fxa/bin/RadarTextDecoder /awips/fxa/bin/DataController GRID_ROUTER GribController.config /awips/fxa/bin/GribDecoderAnd processes in start/stopTextDB:
For ds1: /awips/fxa/bin/TextDB_Server -Read /awips/fxa/bin/TextDB_Server -Write For as1: /awips/fxa/bin/afoscommsrv /awips/fxa/bin/textNotificationServer For as2: (none)The stop/start scripts handle the non-indented items in the list. Indented items are children spawned by the process listed immediately above.
Should you restart the ingest and still receive no SBN data, check on the acqserver processes (proc acq) on ds1. One child process handles the TG data and the other, NESDIS. The former usually connects almost immediately, while the latter may take a few minutes. If there are not 3 of them, the system is not connecting to the SBN CPs. Check the acqserver logs, then login to the appropriate CP (cpsbn1 for TG data and cpsbn2 for NESDIS), per the SBN section. (In a partial-failure situation you might see only text (and thus METAR) or only satellite data arriving, and maybe only two of the acqserver processes. If restarts fail, it will be necessary to fail over to a single CP.)
Rarely, the GRIB decoder will hang on bad grids. You'll see this by the GribDecoder process using lots of CPU time for extended periods, and a check on the log will show nothing happening. Issue kill -10 <pid> to force a crash. The signal handler will remove the bad grid and the controller will start a new decoder.
If you get a call that radar is not auto-updating, you'll probably need
to restart the notificationServer. When you use stopNotificationServer
to kill the server, it may take some time to write out its client list,
which is found in $FXA_DATA/workFiles/notificationServerClientListState.txt.
Make sure you give it a chance (check the log to see if it's heard the
signal 15) before using kill -9. Otherwise, when the server is restarted,
the workstations won't receive green time and auto-update messages until
they, in turn, are restarted. After it's stopped, use startNotificationServer
to get it running again. The textNotificationServer has a similar feature;
its client list is in $FXA_DATA/workFiles/textNotificationServerClientList.txt.
If SBN data (satellite, METARs, text, grids) are not arriving, check the CP operation, to see if it's hung. rlogin cpsbn1 for TG data, or cpsbn2 for NESDIS, as user root. (Note: if you need to log in at the console, you'll need to move the CP switch to the Monitor position.) Type acq_stats -k0 -k1 to run the acquisition monitor. If the line beginning "0 ds-bou" is not up to date (the other one will read
0 null 0 0 - n/a n/a 0 0 0 0), you'll need to restart. (If both CPs have stopped at the same time, it's likely that there's an uplink problem at the NCF, or there could be a downlink problem. Check with NCF (301-713-1284) before restarting.) Other problem indicators are lots of buffers or distribution headers in use. First, stop the system with acq_ctl -A -S -f. Type ps -xaf to see what processes are running. Kill any /awips/bin/acq* that's running, then start with start_cpsbn_acq. A lot of text will scroll by as the software is downloaded from ds1-bou. In many cases, you'll need to push Enter to get your prompt back. Monitor the system again with acq_stats -k0 -k1 -i10; you should see the TG line connect within a few seconds, though the NESDIS line may take several minutes (use ctrl-C to exit). Log out (exit) (and switch back to Modem if at the console). The child acqserver may go down when you stop the CP, then will come back as data are sent. Check for the old child (back on ds), and kill it if necessary. Otherwise, it will periodically log a PROBLEM: CP connection has timed out :header message, and also send notification to the NCF - something we'd rather not do.
If this doesn't work, have the forecasters check the Sync and Signal green lights on the demod. If these are out, have them contact the NCF for information. (This is unlikely, as the NCF monitors that portion of the system.) If the signal looks good, but you can't connect, you may need to reboot the CP. Log in and enter /etc/reboot. Ingest processes start automatically. (If you can't log in, you can press the reset button that's just above the CP's power switch at lower right. The system will reboot itself. Using the reboot command is preferred.)
If necessary, either CP can be configured to send both data streams to the server. Call the NCF and tell them which of the CPs has failed. They will perform the failover.
The diskless SBN CPs boot off of the data server. One side effect of
this is that ingest log files are available on the data server disks,
in directory
/awips/hprt/logs/Products/cpsbnn-bou/acq_clntm_h0/mcProduct.log,
where m is 0 for TG data and 1 for NESDIS. (From the cp, these
are found in /awips/logs/Products/...) The system breaks these logs
when they hit 1MB size (keeping a previous version called
mcProduct.old), so there may not be a whole lot of history available
(particularly for the TG side), but these can be useful in diagnosing
missing data.
syncComms is a script that runs wfoApi, which handles the transfer of data between the Freeway and the DS. Files are stored temporarily in $FXA_DATA/radar/raw and /text. Files in /raw are moved by RadarStorage to the appropriate product directories, e.g., /kftg/Z or V. The /text files are processed by the RadarTextDecoder process; output goes to the text database (e.g., WSRVWPFTG).
Radar ingest processes also include the RadarServer and the DataController/RadarStorage pair. The former communicates via the wfoApi process with the RPG over an X.25 link, while the latter are responsible for storing radar products as they are received.
System Reboot in Progress...You must hit Enter on the keyboard to return to your session.
cd /usr/local/freeway/bin
x25_manager < fw_init
If the following lines do not appear, you will need to repeat the above command until buffers and circuits are configured.
Now for a few words on RPS lists.
A user can edit the current RPS list and send that out. This RPS list is saved in /data/fxa/radar/lists/KXXX.current. A user can also edit the current RPS list, or any other (except the default clear-air and storm mode) RPS list and save it in the /data/fxa/rps-lists directory. These RPS lists including .current can be changed at will. Any RPS list that gets sent out gets saved in KXXX.current, and is recalled and sent out whenever a "Connection Up" message is received from wfoApi, or whenever a GSM comes in. If the mode, as specified in the GSM, has changed, that RPS list gets sent out and is saved in KXXX.current.
In order for a user to put the RPS list back to what it was, one of two things is done: manually copy the current-mode RPS list into KXXX.current and send that out, or use the RPS list application editor to edit the current RPS list for that RPG.
And now for a few words on localization in Boulder.
When you run a DEN localization, you get Denver's PUP ID and port ID numbers. The correct numbers for Boulder are in the FSL files. On ds1-fsli, copy ~fxa/data/localization/FSL/FSL-portInfo.txt and FSL-pupId.txt to ~fxa/data/localizationDataSets/DEN/portInfo.txt and pupId.txt.
rpc.ldmd -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ld (3 identically-named children) pqact pqexpire -a .5LDM ingest is managed with stopLdm and startLdm. The control file is /usr/local/ldm/etc/ldmd.conf, and the log is $LOG_DIR/ldmd.log.
If you can't get LDM working, check to see if syslogd is running. If it's not, as root, run /usr/sbin/syslogd -D, then try starting LDM again. Another, less likely, possibility is an open socket. With LDM shut down, enter rpcinfo -p. If you see one or more lines beginning 300029 at the bottom, type sudo rpcinfo -d 300029 5 to remove this open socket, then startLdm again.
LDAD data are sent from as1-fsli to ds1-fsld, frisbee, and jarts via the LDM. This is done via cron, with a line found in /awips/fxa/etc/sendLDAD.cron on as1-fsli. This needs to be added locally to fxa's cron there.
SFM data are sent to as2-bou, as2-fsld, and as2-fslc via the LDM. The SFM (on as3-fsli) calls the script /awips/fxa/etc/sendLAPS.sh to send the data. No crons are necesary; the LDM on the as2 machines has a pqact action to create the "bigfile" from the individual grids being sent.
The LDMs on ds1-fsli and ds1-fsld receive nowrad data from ldm.fsl.noaa.gov.
To support all of this, LDM needs to run on ds1-fsld, as2-fsld, ds1-fsli, as1-fsli, as3-fsli, and as2-fslc.
The text database system is also managed separately from the general startIngest and stopIngest. Text products are stored in an Informix database.
The datbase runs under user root. A number of processes are normally running on ds1, which you can see in a proc oninit listing. Usually, only the parent process will consume significant amounts of CPU time.
The main things to do for informix are to check the log (onstat -m) and replication (onstat -g dri). Informix problems should be referred to NCF for resolution. You can follow the procedures in the 4.1 System Manager's Manual, if needed.
The workstation uses 4 processes to communicate with the text database, to wit:
$FXA_HOME/bin/TextDB_Server -Write $FXA_HOME/bin/TextDB_Server -Read $FXA_HOME/bin/textdb $FXA_HOME/bin/textdbRemoteThe first two of these, along with the AFOS comms server, are started and stopped by the startTextDB and stopTextDB scripts. Another script, stopTextNotification, will stop the textNotificationServer (it's started, if necssary, by startTextDB). We prefer not to stop it, because doing so necessitates restarting all text workstations to get alarm/alert notices. The others, textdb and textdbRemote, run as needed to read/write the database. (The former communicates directly with the database, while the latter goes through the read/write server.)
Managing the text database requires care, because of the nature of the database software. In particular, it's not safe simply to kill the write server, as it may be in the middle of a transaction, and the text database could get corrupted. Thus, stopTextDB issues a KILLSERV command to the text database.
If stopTextDB/startTextDB does not clear up text storage/retrieval problems, there may be something wrong with Informix. In that case,
If the database is corrupted (usually as a result of a system crash), it will be necessary to restore it from a backup or another database.
Method 1: The Informix database (text and hydro) is backed up daily. If it's not too long after archive time, the easiest thing to do is restore from archive:
02:30:12 Level 0 Archive started on rootdbs, textblobspace, ldadblobs, textdbs, textdbs2, textdbs3, textdbs4, textdbs5, ldad, wfodendbs 02:39:16 Archive on rootdbs, textblobspace, ldadblobs, textdbs, textdbs2, textdbs3, textdbs4, textdbs5, ldad, wfodendbs Completed.(indicating a clean archive) before any lines that read
15:27:48 Assert Failed: WARNING! Incorrect BLOB stamps. 15:27:48 Who:Session(8, fxa@fsldata1.fsl.noaa.gov, 1821, -1059350808) Thread(31, sqlexec, c0d98948, 1) 15:27:48 Results: BLOBSpace textblobspace, BLOB addr: 0xa0be14, BLOB stamp 25317(These latter are the indication of your corrupted database. Note: If you see errors other than textblobspace here, the failure is related to the hydro database. In this case, the text database is OK, and you need only stopTextDB, restoreHydro, and startTextDB (with appropriate becomes) to get going again.) If you don't have a clean archive, or if it's been many hours and you don't want to lose intervening data, you'll have to use one of the other methods. Skip past steps 3 & 4 for more fun!
01 0,4,8,12,16,20 * * * /awips/hydroapps/whfs/standard/bin/CleanWFO 27 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_cleanup 37 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_tuneup 15 * * * * /awips/hydroapps/whfs/standard/bin/run_precip_accum
And as fxa
3,8,13,18,23,28,33,38,43,48,53,58 * * * * csh -c
'${FXA_HOME}/bin/moveProds.ksh /data/fxa/ispan/text/hydro/
/awips/hydroapps/whfs/local/data/shef_input/'
Decoder logs are found in /awips/hydroapps/whfs/local/data/log/shef/decoder.
Service: afoscommsrv host ds1-bou connect failed: connection refused error sending to AFOSOn as1, use startAFOS to restart it. Logs are written to $LOG_DIR/afoscommsrv.*.
|
|
|
|
|
| ds1 | normal ops | ds1.dsswap | ingest.crontab.ds1 |
| ds2 | failover ops | ds2.dsswap | ingest.crontab.ds1 |
| as1 | normal ops | as1.as1swap | ingest.crontab.as1 |
| as2 | normal ops | as2.as2swap | ingest.crontab.as2 |
| as1 | failover ops | as1.as1swap.as2swap | ingest.crontab.as1-as2 |
| as2 | failover ops | as2.as2swap.as1swap | ingest.crontab.as1-as2 |
The fxa lists are shown here:
ingest.crontab.ds1
# Crontab file for starting transient data ingest processes.
# This file, ingest.crontab.ds1, contains the items that run on the primary
# data server. It is to be installed as
# ds1:/etc/cmcluster/crons/fxa/ds1.dsswap
# ds2:/etc/cmcluster/crons/fxa/ds2.dsswap
# under root ownership.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest log and announcer files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
0 0 * * * csh -c '${FXA_HOME}/bin/breakAnnouncementFiles >&! ${LOG_DIR}/breakAnnouncementFiles.log'
# Purgers...
# Run the master purger twice hourly, to pare data back to necessary levels.
15,45 * * * * csh -c '${FXA_HOME}/bin/master.purge >&! ${LOG_DIR}/master.purge.log'
# Purge excess Redbook graphics hourly
0 * * * * csh -c '${FXA_HOME}/bin/purgeAllRedbook >&! ${LOG_DIR}/purgeAllRedbook.log'
# Run scour daily to clean up log files and a few items not hit by master.purge.
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# Purge MHS data once per day.
20 1 * * * csh -c '${FXA_HOME}/bin/mhs-data.purge'
# Radar ingest
* * * * * csh -c '${FXA_HOME}/bin/restartRadar'
0 0 * * * csh -c '${FXA_HOME}/bin/breakLog pingFreeway0.log'
0 0 * * * csh -c '${FXA_HOME}/bin/breakLog pingFreeway1.log'
# hydro scripts - something different will exist in Build 4
#00 0 * * * /usr/local/hydro/wfo/bin/CleanBad.scr
#01 20,0,4,8,12,16 * * * /usr/local/hydro/wfo/bin/CleanWFO
#03 9 * * * /usr/local/hydro/wfo/bin/run_db_cleanup
#03 11 * * * /usr/local/hydro/wfo/bin/run_db_tuneup
#15 * * * * /usr/local/hydro/wfo/bin/run_precip_accum
#2,7,12,17,22,27,32,37,42,47,52,57 * * * * csh -c '/usr/bin/perl ${FXA_HOME}/bin/renameHydroFiles.pl'
3,8,13,18,23,28,33,38,43,48,53,58 * * * * csh -c '${FXA_HOME}/bin/moveProds.ksh /data/fxa/ispan/text/hydro/ /awips/hydroapps/whfs/local/data/shef_input/'
# Process Monitor start-up script
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/DS_startProcMon.sh'
# Data archiving and archive purging
50 * * * * csh -c '${FXA_HOME}/bin/textArchiver.sh'
# Denver/Boulder-specific items
#14,29,44,59 * * * * csh -c '(cd ${FXA_HOME}/xfer/nowrad; ./xferNowrad_v3.com ${FXA_HOME}/xfer/nowrad) >&! ${LOG_DIR}/xfer_nowrad.log'
#0 0 * * * /usr/local/ldm/bin/ldmadmin newlog
#0,15,30,45 * * * * csh -c '${FXA_HOME}/bin/ldmBridgeRestart >&! ${LOG_DIR}/ldmBridgeRestart.log'
# 40 km MAPS ingest - Boulder only (uncomment on ds1-fsla only)
#5 0,6,9,12,18,21 * * * csh -c '${FXA_HOME}/bin/maps40.script >&! ${LOG_DIR}/maps40.log'
#30 3,15 * * * csh -c '${FXA_HOME}/bin/maps40.script >&! ${LOG_DIR}/maps40.log'
#30 * * * * csh -c '${FXA_HOME}/bin/gridWatchdog >>& ${LOG_DIR}/gridWatchdog.log'
ingest.crontab.as1
# Crontab file for starting transient data ingest processes.
# This file, ingest.crontab.as1, contains the items that run on the "#1"
# application server. It is to be installed as
# as1:/etc/cmcluster/crons/fxa/as1.as1swap
# under root ownership.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest logs daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
# Run scour daily to clean up log files
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# Data Monitor scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/grid.cfg -o ${FXA_HOME}/data/grid_data.html -h "Grid Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/graphic.cfg -o ${FXA_HOME}/data/graphic_data.html -h "Redbook Graphics Products"'
#0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/text.cfg -o ${FXA_HOME}/data/text_data.html -h "SBN Text Products"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/radar.cfg -o ${FXA_HOME}/data/radar_data.html -h "Radar Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/point.cfg -o ${FXA_HOME}/data/point_data.html -h "Point Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/sat.cfg -o ${FXA_HOME}/data/sat_data.html -h "Satellite Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/local.cfg -o ${FXA_HOME}/data/local_data.html -h "Local Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'
# Data Monitor summary page script
3,13,23,33,43,53 * * * * csh -c '${FXA_HOME}/bin/monitorSummary.pl'
# Process Monitor start-up scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS1_startProcMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'
[we also run sar on as1, though it's not in ingest.crontab.as1]
# SAR system performance monitor data collection. -JSW 3 Mar 98
5 * * * * /awips/fxa/htdocs/perfMon/bin/updatesar.pl
ingest.crontab.as2
# Crontab file for starting transient data ingest processes.
# This file, ingest.crontab.as2, contains the items that run on the "#2"
# application server. It is to be installed as
# as2:/etc/cmcluster/crons/fxa/as2.as2swap
# under root ownership.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest logs daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
# Run scour daily to clean up log files
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
[the above has been interactively modified to read 30 2, for GribDecoder logs]
# Process Monitor start-up scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS2_startProcMon.sh'
#0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'
# LAPS #
# ---- #
20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps /awips/laps/data
03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data
08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data
22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data
# MSAS - The MAPS/RUC Surface Analysis System #
# ------------------------------------------- #
# Ingest the NCEP surface grids every 12 hours
# Programs = sfcnmc & prsnmc
# Valid Times = 00Z 12Z
# Runtime Z = 06:57 & 18:57, to catch late arriving NGM211 grids
57 6,18 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Sfcnmc.run >&! /awips/fxa/ldad/MSAS/logs/sfcnmclog'
# Run the surface cycle every hour at 20 minutes after the hour.
# Programs = sfcing sfchqc sfcanl sfcncdf sfcver srcplot
20 * * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Surface.run >&! /awips/fxa/ldad/MSAS/logs/sfclog'
# Compile the surface QC stats at the end of the day
# Programs = asos
# Valid Times = 00Z
# Runtime Z = 23:53
53 23 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Asos.run >&! /awips/fxa/ldad/MSAS/logs/asoslog'
ingest.crontab.as1-as2
# Crontab file for starting transient data ingest processes.
# This file, ingest.crontab.as1-as2, contains items that run on either
# application server when it is running as both in failover mode. In
# general, it is a union of ingest.crontab.as1 and ingest.crontab.as2,
# though some items may be dropped due to loading considerations. It is
# to be installed as
# as1:/etc/cmcluster/crons/fxa/as1.as1swap.as2swap
# as2:/etc/cmcluster/crons/fxa/as2.as2swap.as1swap
# under root ownership.
# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.
# Break ingest logs daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
# Run scour daily to clean up log files
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# Data Monitor scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/grid.cfg -o ${FXA_HOME}/data/grid_data.html -h "Grid Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/graphic.cfg -o ${FXA_HOME}/data/graphic_data.html -h "Redbook Graphics Products"'
#0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/text.cfg -o ${FXA_HOME}/data/text_data.html -h "SBN Text Products"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/radar.cfg -o ${FXA_HOME}/data/radar_data.html -h "Radar Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/point.cfg -o ${FXA_HOME}/data/point_data.html -h "Point Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/sat.cfg -o ${FXA_HOME}/data/sat_data.html -h "Satellite Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/local.cfg -o ${FXA_HOME}/data/local_data.html -h "Local Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'
# Data Monitor summary page script
3,13,23,33,43,53 * * * * csh -c '${FXA_HOME}/bin/monitorSummary.pl'
# Process Monitor start-up scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS1_startProcMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS2_startProcMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'
# LAPS #
# ---- #
20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps /awips/laps/data
03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data
08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data
22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data
# MSAS - The MAPS/RUC Surface Analysis System #
# ------------------------------------------- #
# Ingest the NCEP surface grids every 12 hours
# Programs = sfcnmc & prsnmc
# Valid Times = 00Z 12Z
# Runtime Z = 06:57 & 18:57, to catch late arriving NGM211 grids
57 6,18 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Sfcnmc.run >&! /awips/fxa/ldad/MSAS/logs/sfcnmclog'
# Run the surface cycle every hour at 20 minutes after the hour.
# Programs = sfcing sfchqc sfcanl sfcncdf sfcver srcplot
20 * * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Surface.run >&! /awips/fxa/ldad/MSAS/logs/sfclog'
# Compile the surface QC stats at the end of the day
# Programs = asos
# Valid Times = 00Z
# Runtime Z = 23:53
53 23 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Asos.run >&! /awips/fxa/ldad/MSAS/logs/asoslog'
The ingest process monitor is started via cron on ds1, as1, and as2,
also as shown above. The XXX_startProcMon.sh script starts ~fxa/bin/ingProcMon.pl,
which checks processes in ~fxa/data/XXX_ingestProcesses.txt, and builds
an HTML file (XXX_ingestProcMon.html) showing what's up and down. These
are copied to $FXA_WWW_SERVER_HOST:$SERVER_DIRECTORY/dataMon/, where SERVER_DIRECTORY
is defined in ~fxa/data/dataMon.cfg.
Each text Xterm is hosted by its associated workstation. Text `stuff' is stored in $FXA_DATA/textWSwork/xtn-bou:0. Subdirectories include saved (copies of all products that have been created on this station), and journals (in-progress editing, saved for crash recovery), and archived (permanent copies of products sent out over the WAN. Also here is textAlarmAlertProducts.txt, the list of alarm/alert products specific to this workstation. (Site-wide products are in ~fxa/data/textAlarmAlertProducts.txt.)
Log files are in $LOG_DIR/display/xtn-bou:0/yymmdd/textWish<pid>. Logs exist for the text windows, but not the parent textWS.tcl process.
If an Xterm gets mis-configured, the title window will come up, but
the individual text windows will not. (You'll get a tcl error when you
try to start one.) Press F12 on the keyboard for a second or two, then
select Server. Press the Access Control button (middle button in second
panel) `on' and click OK (upper right). Answer OK in the dialog box, wait
for the reset, log in, and you should be ready to roll.
LAPS (analysis) runs on as2, hourly by cron.
As noted earlier, four LAPS processes run via cron:
20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps/ /awips/laps/data 03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data 08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data 22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data
More information about LAPS run-time details is available in the
LAPS README file, http://laps.fsl.noaa.gov/frd/laps/software/README.html.
/awips/ldad/bin/listener /awips/ldad/bin/LDADdecoder ./watchDogInternal.shThe listener process gets data through the firewall, storing files in $FXA_DATA/LDAD/Raw. There is a listener log in /data/logs/ldad, but it's not at all easy to read. (I did on one occasion find a permissions problem writing the raw data by looking at the listener log.) You'll also see there a LDADdecoder.log file, which is the log of the current decoder. The watchDogInternal script checks every 30 seconds to see if the listener and decoder are running. Decoder logs are also written to the usual spot along with other ingest logs. Those files include PID in the name, so there are lots of 'em. (The LDADdecoder.log file includes time stamps on the messages, but those in $LOG_DIR/<date> do not.)
Sometimes, both decoder and listener are up, but no data are coming through. This suggests a problem on the external side. You can restart the whole LDAD system:
On 13 Oct 98, we received a call that all of the workstations at Denver had "locked up." What they were seeing was that displays could be zoomed and panned and the pop-up menus worked, but no menus could be used. Further, logging out of the workstation and then logging in and starting D2D resulted in the main pane only coming up. Investigation showed that only 2 IGCs were starting, and that the startup halted when trying to access the system announcer. (This was seen by adding "all all file all" to displayLogPref.) Further, we saw that rpc.lockd was using lots of CPU time on as1.
Darien tried all of her tricks, but we were unable to come up with anything that was causing the problem. The work-around was to run two workstations on ds1, using xhost + ds1-bou on the workstation and setenv DISPLAY wsn-bou:0.0 on the ds, then running ~fxa/bin/d2d. (We tried to do the same on as1 and as2, but in both cases, the startup hung as before.)
In the morning, Bob Ladd rebooted both as1 and as2, but the same problem surfaced, including rpc.lockd's CPU usage.
Finally, Bob found a page in his SMM that he'd extracted from the Build 3 SMM, which said to check ds1 for rpc.statd. Indeed it was down, and as soon as he started it (using '/sbin/init.d/nfs.server start' as root), everything was copesetic again. (The hung d2d starts proceeded to bring up the other IGCs at that point.) Evidently, rpc.lockd on the remote systems communicates with rpc.statd on the server to effect NFS transfers. If we'd rebooted ds1, the problem would have been solved, as well, since rpc.statd comes up as part of the boot process.
What caused rpc.statd to go down remains a mystery.
Use bdf to check on disk space.
Click here for data storage information.