D2D Operations Info

v AWIPS 4.0/4.1

Note: The current version of this document is for Build 4.2.

Quick access internal links

Audience

This document is directed at FSL and WSFO staff who may be called upon to diagnose problems with the WFO-Advanced data ingest, internal communications, and display software at WFO Denver/Boulder. The operational staff at the WFO has higher-level monitoring and restart tools available that are not described here.

Support

Primary support is from the NCF (301-713-1284). FSL staff are also "on call" (informally) to help with problems. WFO staff are authorized to call Joe Wakefield, Darien Davis, or Carl Bullock at home for help, if NCF can't. They also have Gregg Phillips' cell phone number.

Note for Boulder folks: use 303-494-4454 (this is the national coordination number, the one other WFOs use to call in) to get hold of WFO staff. The Admin number there is 303-494-3210, and the external coordination number (media, etc.) is 303-494-4479. FSL operators are on duty 7 a.m to 10:30 p.m., daily and can be reached at 303-497-6887.

Our goal is to make the Denver office operate in an AWIPS-like fashion as far as system support is concerned. Of course, we aren't using AWIPS, and recognize that this goal will be difficult to reach.

Environment

The Denver subnet computers carry the suffix -bou, which is the official ID of the Denver/Boulder WFO. The complement of servers includes ds1-bou (the primary data server), ds2-bou (backup, normally idle), as1-bou (application server 1), and as2-bou (application server 2). (Name aliases allow these to be referred to locally as ds1, ds2, as1, and as2; further, since only 1 ds is active at a time, the name "ds" will suffice in most cases.) Data ingest, Informix database, and disk serving are handled by the ds, while most of the decoders are on one or the other of the as machines. When you log into one of the -bou subnet machines as fxa, awipsusr, or textdemo, several scripts are automatically run to set a number of environment variables, etc. The .cshrc script sets this process in motion. Settings of interest include FXA_DATA (/data/fxa), LOG_DIR (/data/logs/fxa), FXA_HOME (/awips/fxa), and TZ (GMT).

All D2D processes are found in ~fxa/bin, and data files (tables, menus, WarnGen templates, etc.) are in ~fxa/data. Most ingest logs are in $LOG_DIR/yymmdd (type "logs" to get there), with a few in $LOG_DIR; these are on local disks.

The log file for the user interface process is $LOG_DIR/display/<displayName>/<date>/fxaWish<pid> where <displayName> is :0.0 for the left display and :0.1 for the right display in a one-mouse configuration and :1.0 for the right display in a two-mouse configuration; <date> is the UTC date when the user interface process was started in YYMMDD format; and <pid> is the process ID.

The log files for the IGC processes, the application manager, the applications, the extensions, and all children, grandchildren, great grandchildren, great great grandchildren, etc., of the user interface process are in the directory $LOG_DIR/display/<displayName>/<date>/fxaWish<pid>.children and have the format <programName><pid> where <programName> is the name of the executable.

Since FXA_HOME/bin is in fxa's PATH, there's no need to include that when entering process commands, and that's reflected in the commands included in these instructions. All commands that you'll need to enter are shown in bold type. Except as noted, all will be run from the fxa account.

You can get to today's ingest log directory simply by typing logs, and up will get you to its parent, where some logs live. The naming convention for ingest logs is <processName><pid><hostname><hhmmss>.

Overview

The diagram below outlines the flow of messages through the WFO-A ingest system. (It's way out of date, obviously, but still instructive.) Following sections discuss each interface in detail.


General data ingest

All ingest processes are started automatically at boot time. MCServiceGuard looks in /etc/cmcluster/[ds|as1|as2]swap/[ds|as1|as2]swap.control to find out what start script to run. What it finds is /etc/cmcluster/[ds|as1|as2]swap/[ds|as1|as2]swap.run.cmds, which says to run /etc/cmcluster/shared/start.[ds|as1|as2]. (Whew!) There, we find our old friends startLdm, startRadar, startIngest.[ds1|as1|as2], startTextDB.[ds1|as1|as2]. On shutdown, a similar trail leads to /etc/cmcluster/shared/stop.[ds|as1|as2].

You can run these scripts by hand. Should it be necessary to restart, use stopIngest and startIngest.[ds1|as1|as2].

Processes included in stop/startIngest, in the order they are started: (Note that the scripts use $FXA_HOME, which resolves to /awips/fxa/bin. What's shown here is the text that appears in a ps listing.)

For ds1:
/awips/fxa/bin/acqserver 900
  /awips/fxa/bin/acqserver 900
  /awips/fxa/bin/acqserver 900
/awips/fxa/bin/CommsRouter COMMS_ROUTER
/awips/fxa/bin/CommsRouter GRID_ROUTER
/awips/fxa/bin/RadarServer
/awips/fxa/bin/DialServer
/awips/fxa/bin/MhsServer
/awips/fxa/bin/pingFreeway 0
/awips/fxa/bin/pingFreeway 1
For as1:
/awips/fxa/bin/DataController COMMS_ROUTER TextCont.config
  /awips/fxa/bin/MetarDecoder
  /awips/fxa/bin/RaobBufrDecoder
  /awips/fxa/bin/profilerDecoder
  /awips/fxa/bin/MaritimeDecoder
/awips/fxa/bin/DataController COMMS_ROUTER TextCont2.config
  /awips/fxa/bin/AlertDecoder
  /awips/fxa/bin/binLightningDecoder
  /awips/fxa/bin/CdotDecoder
  /awips/fxa/bin/shefEncoder
/awips/fxa/bin/DataController COMMS_ROUTER SatelliteController.config
  /awips/fxa/bin/Satdecoder
/awips/fxa/bin/DataController COMMS_ROUTER RadarController.config
  /awips/fxa/bin/RadarStorage
/awips/fxa/bin/notificationServer
/awips/fxa/bin/RadarMsgHandler

For as2:
/awips/fxa/bin/DataController COMMS_ROUTER TextDB_Controller.config
  /awips/fxa/bin/CollDB_Decoder
  /awips/fxa/bin/StdDB_Decoder
  /awips/fxa/bin/RadarTextDecoder
/awips/fxa/bin/DataController GRID_ROUTER GribController.config
  /awips/fxa/bin/GribDecoder
And processes in start/stopTextDB:
For ds1:
/awips/fxa/bin/TextDB_Server -Read
/awips/fxa/bin/TextDB_Server -Write

For as1:
/awips/fxa/bin/afoscommsrv
/awips/fxa/bin/textNotificationServer

For as2:
(none)
The stop/start scripts handle the non-indented items in the list. Indented items are children spawned by the process listed immediately above.

Should you restart the ingest and still receive no SBN data, check on the acqserver processes (proc acq) on ds1. One child process handles the TG data and the other, NESDIS. The former usually connects almost immediately, while the latter may take a few minutes. If there are not 3 of them, the system is not connecting to the SBN CPs. Check the acqserver logs, then login to the appropriate CP (cpsbn1 for TG data and cpsbn2 for NESDIS), per the SBN section. (In a partial-failure situation you might see only text (and thus METAR) or only satellite data arriving, and maybe only two of the acqserver processes. If restarts fail, it will be necessary to fail over to a single CP.)

Rarely, the GRIB decoder will hang on bad grids. You'll see this by the GribDecoder process using lots of CPU time for extended periods, and a check on the log will show nothing happening. Issue kill -10 <pid> to force a crash. The signal handler will remove the bad grid and the controller will start a new decoder.

If you get a call that radar is not auto-updating, you'll probably need to restart the notificationServer. When you use stopNotificationServer to kill the server, it may take some time to write out its client list, which is found in $FXA_DATA/workFiles/notificationServerClientListState.txt. Make sure you give it a chance (check the log to see if it's heard the signal 15) before using kill -9. Otherwise, when the server is restarted, the workstations won't receive green time and auto-update messages until they, in turn, are restarted. After it's stopped, use startNotificationServer to get it running again. The textNotificationServer has a similar feature; its client list is in $FXA_DATA/workFiles/textNotificationServerClientList.txt.

SBN ingest

The bulk of our datasets are received over the Satellite Broadcast Network (SBN) via the SBN communications processors. Please note that cpsbn1 and cpsnb2 are monitored by the AWIPS Network Control Facility (NCF), which is also responsible for their maintenance. There is a switch box near cpsbn1 that must usually be in `Modem' position, so NCF operators can check on its operation.

If SBN data (satellite, METARs, text, grids) are not arriving, check the CP operation, to see if it's hung. rlogin cpsbn1 for TG data, or cpsbn2 for NESDIS, as user root. (Note: if you need to log in at the console, you'll need to move the CP switch to the Monitor position.) Type acq_stats -k0 -k1 to run the acquisition monitor. If the line beginning "0 ds-bou" is not up to date (the other one will read

 0 null         0   0  -       n/a       n/a        0     0         0        0
), you'll need to restart. (If both CPs have stopped at the same time, it's likely that there's an uplink problem at the NCF, or there could be a downlink problem. Check with NCF (301-713-1284) before restarting.) Other problem indicators are lots of buffers or distribution headers in use. First, stop the system with acq_ctl -A -S -f. Type ps -xaf to see what processes are running. Kill any /awips/bin/acq* that's running, then start with start_cpsbn_acq. A lot of text will scroll by as the software is downloaded from ds1-bou. In many cases, you'll need to push Enter to get your prompt back. Monitor the system again with acq_stats -k0 -k1 -i10; you should see the TG line connect within a few seconds, though the NESDIS line may take several minutes (use ctrl-C to exit). Log out (exit) (and switch back to Modem if at the console). The child acqserver may go down when you stop the CP, then will come back as data are sent. Check for the old child (back on ds), and kill it if necessary. Otherwise, it will periodically log a PROBLEM: CP connection has timed out :header message, and also send notification to the NCF - something we'd rather not do.

If this doesn't work, have the forecasters check the Sync and Signal green lights on the demod. If these are out, have them contact the NCF for information. (This is unlikely, as the NCF monitors that portion of the system.) If the signal looks good, but you can't connect, you may need to reboot the CP. Log in and enter /etc/reboot. Ingest processes start automatically. (If you can't log in, you can press the reset button that's just above the CP's power switch at lower right. The system will reboot itself. Using the reboot command is preferred.)

If necessary, either CP can be configured to send both data streams to the server. Call the NCF and tell them which of the CPs has failed. They will perform the failover.

The diskless SBN CPs boot off of the data server. One side effect of this is that ingest log files are available on the data server disks, in directory /awips/hprt/logs/Products/cpsbnn-bou/acq_clntm_h0/mcProduct.log, where m is 0 for TG data and 1 for NESDIS. (From the cp, these are found in /awips/logs/Products/...) The system breaks these logs when they hit 1MB size (keeping a previous version called mcProduct.old), so there may not be a whole lot of history available (particularly for the TG side), but these can be useful in diagnosing missing data.

Radar ingest

A Simpact Freeway box, cpsync1-bou, handles the low-level comms from the RPG, making the data understandable by ds1-bou. A second box, cpsync2-bou, stands ready as a backup, but normally is not used.

syncComms is a script that runs wfoApi, which handles the transfer of data between the Freeway and the DS. Files are stored temporarily in $FXA_DATA/radar/raw and /text. Files in /raw are moved by RadarStorage to the appropriate product directories, e.g., /kftg/Z or V. The /text files are processed by the RadarTextDecoder process; output goes to the text database (e.g., WSRVWPFTG).

Radar ingest processes also include the RadarServer and the DataController/RadarStorage pair. The former communicates via the wfoApi process with the RPG over an X.25 link, while the latter are responsible for storing radar products as they are received.

  1. On ds1-bou, a cron job runs $FXA_HOME/bin/restart_radar every minute, checking whether the ingest (syncComms & wfoApi) is up and starting it if necessary. (restart_radar gets the port number out of ~fxa/data/localizationDataSets/DEN/portInfo.txt, then executes x25_restart$portNum. It logs to $LOG_DIR/x25_restart$portNum.log.)
    If a clean shutdown of the WFO-A connection (at the UCP) is performed, this process will work properly, restarting the ingest once the connection is re-established. If, however, the RPG dies or the connection to WFO-A is pulled without doing a clean shutdown, it will be necessary to run x25_stop0.
  2. If the PUP is getting data, but WFO-Advanced is not, you can stop the radar ingest and let it restart itself. Execute x25_stop0.
    General Status Messages (GSMs) can be used to check on the status of the 88D. This can be checked from the workstation `radar status' window or the Unit Status Message graphic, the last entry in the kftg>Graphics> menu. Remotely, you can tail -30 $FXA_DATA/workFiles/RADAR_Announcer to see what's what.
  3. If Step 2 does not work, first make sure that the RadarServer is running. If it is, and you've restarted the radar ingest, then the freeway may need to be rebooted, using icpReset0. If that procedure does not succeed, and tells you to reboot manually, follow these steps:
    1. rlogin to cpsync1-bou
      user: freeway
      password: password
    2. Select 1 (shutdown options)
    3. Select 2 (reboot server)
      You will see the message
      System Reboot in Progress...
      You must hit Enter on the keyboard to return to your session.
    Wait for about 60 seconds and run icpReset0. This will do the following:
    1. kill wfoApi and syncComms running on all ports on ICP0
    2. reset ICP0
    3. run the x25_manager to reconfigure buffers and circuits
If for some reason the x25_manager cannot configure buffers, it will try five times, printing out a message each time saying `Buffers not yet initialized, so retry'. If the x25_manager fails after five times, you can run the x25_manager yourself as follows:

cd /usr/local/freeway/bin
x25_manager < fw_init

If the following lines do not appear, you will need to repeat the above command until buffers and circuits are configured.

Please note that the RadarServer process must be running in order to send the RPS list and get data. The radar ingest (syncComms & wfoApi) will start but will not stay up if the RadarServer is down. RadarServer is started as part of startIngest.

Now for a few words on RPS lists.

A user can edit the current RPS list and send that out. This RPS list is saved in /data/fxa/radar/lists/KXXX.current. A user can also edit the current RPS list, or any other (except the default clear-air and storm mode) RPS list and save it in the /data/fxa/rps-lists directory. These RPS lists including .current can be changed at will. Any RPS list that gets sent out gets saved in KXXX.current, and is recalled and sent out whenever a "Connection Up" message is received from wfoApi, or whenever a GSM comes in. If the mode, as specified in the GSM, has changed, that RPS list gets sent out and is saved in KXXX.current.

In order for a user to put the RPS list back to what it was, one of two things is done: manually copy the current-mode RPS list into KXXX.current and send that out, or use the RPS list application editor to edit the current RPS list for that RPG.

And now for a few words on localization in Boulder.

When you run a DEN localization, you get Denver's PUP ID and port ID numbers. The correct numbers for Boulder are in the FSL files. On ds1-fsli, copy ~fxa/data/localization/FSL/FSL-portInfo.txt and FSL-pupId.txt to ~fxa/data/localizationDataSets/DEN/portInfo.txt and pupId.txt.

LDM ingest

While most datasets used by D2D are delivered by the SBN or LDAD or generated locally, NOWrad and SFM (see next section) data are relayed from FSL via Unidata's Local Data Manager (LDM). LDM processes (on ds1 and as2) include
rpc.ldmd -q /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ld
  (3 identically-named children)
  pqact
  pqexpire -a .5
LDM ingest is managed with stopLdm and startLdm. The control file is /usr/local/ldm/etc/ldmd.conf, and the log is $LOG_DIR/ldmd.log.

If you can't get LDM working, check to see if syslogd is running. If it's not, as root, run /usr/sbin/syslogd -D, then try starting LDM again. Another, less likely, possibility is an open socket. With LDM shut down, enter rpcinfo -p. If you see one or more lines beginning 300029 at the bottom, type sudo rpcinfo -d 300029 5 to remove this open socket, then startLdm again.

Boulder LDM info

The radar is ingested on as1-fsli, and is sent via LDM to as1-fsld and two FD machines, frisbee and jarts. sendPing runs via cron on ds1-fsld to get these stored on fsld. The necessary cron line is in /awips/fxa/etc/sendRadarPing.cron; it needs to be added locally to fxa's cron on ds1-fsld.

LDAD data are sent from as1-fsli to ds1-fsld, frisbee, and jarts via the LDM. This is done via cron, with a line found in /awips/fxa/etc/sendLDAD.cron on as1-fsli. This needs to be added locally to fxa's cron there.

SFM data are sent to as2-bou, as2-fsld, and as2-fslc via the LDM. The SFM (on as3-fsli) calls the script /awips/fxa/etc/sendLAPS.sh to send the data. No crons are necesary; the LDM on the as2 machines has a pqact action to create the "bigfile" from the individual grids being sent.

The LDMs on ds1-fsli and ds1-fsld receive nowrad data from ldm.fsl.noaa.gov.

To support all of this, LDM needs to run on ds1-fsld, as2-fsld, ds1-fsli, as1-fsli, as3-fsli, and as2-fslc.


Text ingest and database

The text database system is also managed separately from the general startIngest and stopIngest. Text products are stored in an Informix database.

The datbase runs under user root. A number of processes are normally running on ds1, which you can see in a proc oninit listing. Usually, only the parent process will consume significant amounts of CPU time.

The main things to do for informix are to check the log (onstat -m) and replication (onstat -g dri). Informix problems should be referred to NCF for resolution. You can follow the procedures in the 4.1 System Manager's Manual, if needed.

The workstation uses 4 processes to communicate with the text database, to wit:

  $FXA_HOME/bin/TextDB_Server -Write
  $FXA_HOME/bin/TextDB_Server -Read
  $FXA_HOME/bin/textdb
  $FXA_HOME/bin/textdbRemote
The first two of these, along with the AFOS comms server, are started and stopped by the startTextDB and stopTextDB scripts. Another script, stopTextNotification, will stop the textNotificationServer (it's started, if necssary, by startTextDB). We prefer not to stop it, because doing so necessitates restarting all text workstations to get alarm/alert notices. The others, textdb and textdbRemote, run as needed to read/write the database. (The former communicates directly with the database, while the latter goes through the read/write server.)

Managing the text database requires care, because of the nature of the database software. In particular, it's not safe simply to kill the write server, as it may be in the middle of a transaction, and the text database could get corrupted. Thus, stopTextDB issues a KILLSERV command to the text database.

If stopTextDB/startTextDB does not clear up text storage/retrieval problems, there may be something wrong with Informix. In that case,

  1. Shut down the Read and Write servers with stopTextDB.
  2. As root, issue /sbin/init.d/informix stop, then /sbin/init.d/informix start. Then check the log (onstat -m) to see if it started up OK.
  3. As fxa again bring the Read and Write servers back up with startTextDB.
You may find database errors (things like `database update error: -346' or `database insert error -239') in the TextDB_Server logs. (Use finderr <nnn> to see information on these error codes.) This can probably be cleared up by issuing onmode -l (as informix), then become fxa and stop/startTextDB.

If the database is corrupted (usually as a result of a system crash), it will be necessary to restore it from a backup or another database.

Method 1: The Informix database (text and hydro) is backed up daily. If it's not too long after archive time, the easiest thing to do is restore from archive:

  1. Shut down the Read and Write servers with stopTextDB.
  2. become informix and type logs. Look at the end of online.log1. You should see lines like this:
  3. 02:30:12 Level 0 Archive started on rootdbs, textblobspace,
    ldadblobs, textdbs, textdbs2, textdbs3, textdbs4, textdbs5, ldad,
    wfodendbs 02:39:16 Archive on rootdbs, textblobspace, ldadblobs,
    textdbs, textdbs2, textdbs3, textdbs4, textdbs5, ldad, wfodendbs
    Completed.
    (indicating a clean archive) before any lines that read
    15:27:48 Assert Failed: WARNING! Incorrect BLOB stamps.
    15:27:48 Who:Session(8, fxa@fsldata1.fsl.noaa.gov, 1821, -1059350808)
    Thread(31, sqlexec, c0d98948, 1) 15:27:48 Results: BLOBSpace
    textblobspace, BLOB addr: 0xa0be14, BLOB stamp 25317
    (These latter are the indication of your corrupted database. Note: If you see errors other than textblobspace here, the failure is related to the hydro database. In this case, the text database is OK, and you need only stopTextDB, restoreHydro, and startTextDB (with appropriate becomes) to get going again.) If you don't have a clean archive, or if it's been many hours and you don't want to lose intervening data, you'll have to use one of the other methods. Skip past steps 3 & 4 for more fun!
  4. cd, then ./restore to restore the database. Press Enter when asked to mount tape 1, answer y to the continue restore? question, and n to the rest. This will take around 15 minutes, at the end of which you should see On-Line status echoed on your display.
  5. become fxa and startTextDB.
  6. If you have difficulty, check the Informix log ($FXA_DATA/logs/informix/online.log1). If you see mention of quiescent mode, become informix and stop/startInformix.
Method 2: If you don't have a clean archive, but do have a good database on another machine (e.g., ds2-bou, ds1-fsli), you can make a backup, copy, and restore. (Note that you can't use the same archive method or an archive from another system, because the archive carries host information with it and is non-transferable.) Plan on spending an hour on this procedure.
  1. Start unload
    1. Go to system that is not corrupted, become informix, and find a disk that has at least 50 MB free.
    2. Enter touch <path>/fxatext.out. (If you get an error, become fxa and set appropriate protections.)
    3. Enter onunload -t <path>/fxatext.out fxatext.
    4. You will get a prompt of

    5. Please mount tape and press Return to continue ...
      Just press Return.
  2. Prepare corrupted system while onunload is running
    1. Make sure that the TextDB_Server -Read and -Write are shut down with the stopTextDB script as fxa.
    2. become informix and go into dbaccess by typing dbaccess
    3. Do the following commands:
      1. Type d to go to the Database Menu option.
      2. Type d again to select Drop. You should see fxatext, sysmaster, and wfoden in the list. If not, you'll need to rebuild the database from scratch, as outlined in `Database Restoration Instructions' (~fxa/doc/userGuides/DBrestoration).
      3. Choose fxatext@ONLINE with the arrow keys or type in fxatext at the prompt.
      4. Informix will give you ONE chance to verify that this is the database that you want to drop. If fxatext is chosen hit y for yes, if any other database is chosen, hit n for no and start from ii. again.
      5. Hit e until dbaccess exits.
    4. At the command line type onspaces -d textblobspace. You will get a prompt for verification of the blobspace to be dropped and, after pressing y, a statement that there will have to be a Level 0 archive before any of the space can be reused.
    5. cd ~/etc and vi onconfig. Find the TAPEDEV variable and comment out the currently active TAPEDEV. Uncomment the line that says: #TAPEDEV /dev/null
    6. Save and exit and enter ontape -s. This will run a Level 0 archive and send the archive to /dev/null.
    7. Re-enter the onconfig file and change the TAPEDEV variable back to the old value.
    8. Issue the command onmode -l (letter l) to move to the next logical log file and make it possible to connect to the instance of Informix.
    9. cd and grep blob spacesetup. You'll get

    10. onspaces -c -b textblobspace -g 2 -p /dev/informix-1 -o 550000 -s 200000
    11. Run that command to recreate the blobspace.
  3. Move fxatext.out file and restore database
    1. When onunload is finished running on the uncorrupted system, ftp the file to the corrupted system, wherever you find room for it.
    2. cd and run the command

    3. onload -t <path>/fxatext.out -d textdbs5 fxatext
      Note: be sure to use the full path name of the file, even if you're in the directory. Informix will prompt you to mount the tape and press return so just press return. Then it will ask you if you want to relocate any of the blob spaces ­ answer n.
    4. If you get errors like this:

    5. ISAM error: illegal argument to ISAM function.
      Error building TBLspace.
      in step b, do the onload again, but this time answer y, then enter textblobspace. This will take a bit longer, but should work.
  4. Run ./archive.sh to create a clean archive of the restored database. (Archive requires the pre-existence of the archive.tape file. If it's missing, you'll get an error. A simple touch will do it.)
  5. When the load is finished, restart the Read and Write servers and delete the fxa text.out files on both the good and bad machines.
Method 3: If you don't have a clean archive, and your only good database is on a machine (such as bluejay) with a different database structure, then you're going to have to extract the data and insert it into an empty database you build. Once again, plan to spend an hour on this procedure.
  1. Start unload on uncorrupted system
    1. become informix on uncorrupted system and type dbaccess to run dbaccess.
    2. Enter the following commands:
      1. Hit q for Query Language.
      2. Select fxatext@ONLINE with arrow keys or type fxatext.
      3. Hit c for choose and select `unloadtext'.
      4. If the system being unloaded does not indicate the correct path, hit u for Use Editor, then hit return to get to vi and change the directory to the appropriate corresponding name. Save and exit vi to get back to dbaccess.
      5. Hit r for Run to start the data unload.
      6. Once the rows are unloaded, hit e twice to exit.
  2. Prepare corrupted system
    1. Do all of the steps (a - j) from Method 2 section 2.
    2. Go back into dbaccess to recreate the database.
    3. Run the following commands:
      1. Hit d for Database.
      2. Hit c for Create.
      3. Type in fxatext at the prompt and hit return.
      4. Hit d to choose the dbspace to put the database.
      5. Type textdb at the prompt and hit return.
      6. Hit e to exit dbspace section.
      7. Hit c to choose Create Database.
      8. Hit e to exit Database section.
      9. Hit q to go to Query Language.
      10. Hit c for Choose.
      11. Select fxatext with the arrow keys or type fxatext at the prompt.
      12. Hit r to run SQL.
      13. Hit e until dbaccess exits.
  3. Move data files and reload database
    1. On corrupted system, go to /data/fxa-2 and ftp to non-corrupted system and get the stdTextProd.out, lrgTextProd.out and textInfo.out files.
    2. Go back into dbaccess and run the following commands:
      1. Hit q for Query Language
      2. Select fxatext@ONLINE or type fxatext at the prompt.
      3. Hit c for choose and select loadtext.sql.
      4. If the data files are not in /data/fxa-2 hit u for Use Editor and modify the directory names.
      5. Hit r to Run the SQL and reload the database.
      6. Hit e until dbaccess exits.
  4. Run ./archive.sh to create a clean archive of the restored database.
  5. Re-start the Read and Write servers.
Note: When the database is down while the data ingest is running, text messages will queue up inside the TextDB DataController process. Once the database is back up and accepting messages, this queue will be processed. It may take a long time to catch up, however. (To see what's being processed, look at the end of the CollDecoder or StdDecoder logs.) If it's necessary to empty the queue (due to excessive length), you must kill the TextDB DataController (use proc TextDB to get the pid) and restart it using DataController COMMS_ROUTER TextDB_Controller.config & (most easily done by using X to copy this line out of ~fxa/bin/startIngest).

Hydro decoder & database

A SHEF decoder runs on ds1 as part of the hydrology package. /awips/hydroapps/shefdecode/bin/shefdecode runs under oper, and is started at boot time. If it is down, you must sudo su - oper, then /awips/hydroapps/shefdecode/bin/start_shefdecode &. Data are stored in an Informix database, separate from the text database. Other hydro cron jobs are run (under user oper) to manage the database, to wit:
  01 0,4,8,12,16,20 * * * /awips/hydroapps/whfs/standard/bin/CleanWFO
  27 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_cleanup 
  37 7 * * * /awips/hydroapps/whfs/standard/bin/run_db_tuneup
  15 * * * * /awips/hydroapps/whfs/standard/bin/run_precip_accum
And as fxa
  3,8,13,18,23,28,33,38,43,48,53,58 * * * * csh -c
    '${FXA_HOME}/bin/moveProds.ksh /data/fxa/ispan/text/hydro/
    /awips/hydroapps/whfs/local/data/shef_input/'
Decoder logs are found in /awips/hydroapps/whfs/local/data/log/shef/decoder.

AFOS product storage

Products created on the text workstations are stored in the local Informix database and are sent to AFOS for dissemination. Process afoscommsrv handles this connection and is started as part of the startTextDB suite. Should this process go down, a message like this will be seen when trying to `Save & Exit' a text product:
Service: afoscommsrv host ds1-bou connect failed: connection refused error sending to AFOS
On as1, use startAFOS to restart it. Logs are written to $LOG_DIR/afoscommsrv.*.

Interprocess communication

Messages are passed between processes using TCP sockets. The software runs essentially flawlessly and requires no maintenance.

Cron

A host of fxa activities are managed by cron. The currently-running cron lists are found in /var/spool/cron/crontabs/<username>. The data ingest part of AWIPS is managed by ServiceGuard; fxa's crons are maintained in /etc/cmcluster/crons/fxa. Information on these files is shown below.
 
host
function
cmcluster file
our tree file
ds1 normal ops ds1.dsswap ingest.crontab.ds1
ds2 failover ops ds2.dsswap ingest.crontab.ds1
as1 normal ops as1.as1swap  ingest.crontab.as1
as2 normal ops as2.as2swap ingest.crontab.as2
as1 failover ops as1.as1swap.as2swap ingest.crontab.as1-as2
as2 failover ops as2.as2swap.as1swap ingest.crontab.as1-as2
We have taken the approach to install the files in /etc/cmcluster/crons/fxa under our "tree" names, then make soft links to these under the names ServiceGuard expects.

The fxa lists are shown here:

ingest.crontab.ds1

# Crontab file for starting transient data ingest processes.

# This file, ingest.crontab.ds1, contains the items that run on the primary
# data server. It is to be installed as
#  ds1:/etc/cmcluster/crons/fxa/ds1.dsswap
#  ds2:/etc/cmcluster/crons/fxa/ds2.dsswap
# under root ownership.

# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.

# Break ingest log and announcer files daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'
0 0 * * * csh -c '${FXA_HOME}/bin/breakAnnouncementFiles >&! ${LOG_DIR}/breakAnnouncementFiles.log'

# Purgers...
# Run the master purger twice hourly, to pare data back to necessary levels.
15,45 * * * * csh -c '${FXA_HOME}/bin/master.purge >&! ${LOG_DIR}/master.purge.log'
# Purge excess Redbook graphics hourly
0 * * * * csh -c '${FXA_HOME}/bin/purgeAllRedbook >&! ${LOG_DIR}/purgeAllRedbook.log'
# Run scour daily to clean up log files and a few items not hit by master.purge.
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
# Purge MHS data  once per day.
20 1 * * * csh -c '${FXA_HOME}/bin/mhs-data.purge'

# Radar ingest
* * * * * csh -c '${FXA_HOME}/bin/restartRadar'
0 0 * * * csh -c '${FXA_HOME}/bin/breakLog pingFreeway0.log'
0 0 * * * csh -c '${FXA_HOME}/bin/breakLog pingFreeway1.log'

# hydro scripts - something different will exist in Build 4
#00              0 * * * /usr/local/hydro/wfo/bin/CleanBad.scr
#01 20,0,4,8,12,16 * * * /usr/local/hydro/wfo/bin/CleanWFO
#03  9             * * * /usr/local/hydro/wfo/bin/run_db_cleanup 
#03 11             * * * /usr/local/hydro/wfo/bin/run_db_tuneup
#15 *              * * * /usr/local/hydro/wfo/bin/run_precip_accum
#2,7,12,17,22,27,32,37,42,47,52,57  * * * * csh -c '/usr/bin/perl ${FXA_HOME}/bin/renameHydroFiles.pl'
3,8,13,18,23,28,33,38,43,48,53,58  * * * * csh -c '${FXA_HOME}/bin/moveProds.ksh /data/fxa/ispan/text/hydro/ /awips/hydroapps/whfs/local/data/shef_input/'

# Process Monitor start-up script
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/DS_startProcMon.sh'

# Data archiving and archive purging
50 * * * * csh -c '${FXA_HOME}/bin/textArchiver.sh'

# Denver/Boulder-specific items
#14,29,44,59 * * * * csh -c '(cd ${FXA_HOME}/xfer/nowrad; ./xferNowrad_v3.com ${FXA_HOME}/xfer/nowrad) >&! ${LOG_DIR}/xfer_nowrad.log'
#0 0 * * * /usr/local/ldm/bin/ldmadmin newlog
#0,15,30,45 * * * * csh -c '${FXA_HOME}/bin/ldmBridgeRestart >&! ${LOG_DIR}/ldmBridgeRestart.log'

# 40 km MAPS ingest - Boulder only (uncomment on ds1-fsla only)
#5 0,6,9,12,18,21 * * * csh -c '${FXA_HOME}/bin/maps40.script >&! ${LOG_DIR}/maps40.log'
#30 3,15 * * * csh -c '${FXA_HOME}/bin/maps40.script >&! ${LOG_DIR}/maps40.log'
#30 * * * * csh -c '${FXA_HOME}/bin/gridWatchdog >>& ${LOG_DIR}/gridWatchdog.log'
ingest.crontab.as1
# Crontab file for starting transient data ingest processes.

# This file, ingest.crontab.as1, contains the items that run on the "#1"
# application server. It is to be installed as
#  as1:/etc/cmcluster/crons/fxa/as1.as1swap
# under root ownership.

# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.

# Break ingest logs daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'

# Run scour daily to clean up log files
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'

# Data Monitor scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/grid.cfg  -o ${FXA_HOME}/data/grid_data.html -h "Grid Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/graphic.cfg  -o ${FXA_HOME}/data/graphic_data.html -h "Redbook Graphics Products"'
#0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/text.cfg  -o ${FXA_HOME}/data/text_data.html -h "SBN Text Products"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/radar.cfg -o ${FXA_HOME}/data/radar_data.html -h "Radar Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/point.cfg -o ${FXA_HOME}/data/point_data.html -h "Point Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/sat.cfg -o ${FXA_HOME}/data/sat_data.html -h "Satellite Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/local.cfg -o ${FXA_HOME}/data/local_data.html -h "Local Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'

# Data Monitor summary page script
3,13,23,33,43,53 * * * * csh -c '${FXA_HOME}/bin/monitorSummary.pl'

# Process Monitor start-up scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS1_startProcMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'

[we also run sar on as1, though it's not in ingest.crontab.as1]

# SAR system performance monitor data collection.  -JSW 3 Mar 98
5 * * * * /awips/fxa/htdocs/perfMon/bin/updatesar.pl
ingest.crontab.as2
# Crontab file for starting transient data ingest processes.

# This file, ingest.crontab.as2, contains the items that run on the "#2"
# application server. It is to be installed as
#  as2:/etc/cmcluster/crons/fxa/as2.as2swap
# under root ownership.

# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.

# Break ingest logs daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'

# Run scour daily to clean up log files
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'
[the above has been interactively modified to read 30 2, for GribDecoder logs]

# Process Monitor start-up scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS2_startProcMon.sh'
#0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'

# LAPS #
# ---- #
20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps /awips/laps/data
03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data
08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data
22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data

# MSAS - The MAPS/RUC Surface Analysis System #
# ------------------------------------------- #

# Ingest the NCEP surface grids every 12 hours 
# Programs    =  sfcnmc & prsnmc
# Valid Times =  00Z 12Z
# Runtime Z   =  06:57 & 18:57, to catch late arriving NGM211 grids
57 6,18 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Sfcnmc.run  >&! /awips/fxa/ldad/MSAS/logs/sfcnmclog'

# Run the surface cycle every hour at 20 minutes after the hour.
# Programs    =  sfcing  sfchqc  sfcanl  sfcncdf  sfcver  srcplot
20 * * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Surface.run  >&! /awips/fxa/ldad/MSAS/logs/sfclog'

# Compile the surface QC stats at the end of the day
# Programs    =  asos
# Valid Times =  00Z
# Runtime Z   =  23:53
53 23 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Asos.run  >&! /awips/fxa/ldad/MSAS/logs/asoslog'
ingest.crontab.as1-as2
# Crontab file for starting transient data ingest processes.

# This file, ingest.crontab.as1-as2, contains items that run on either
# application server when it is running as both in failover mode. In
# general, it is a union of ingest.crontab.as1 and ingest.crontab.as2,
# though some items may be dropped due to loading considerations. It is
# to be installed as
#  as1:/etc/cmcluster/crons/fxa/as1.as1swap.as2swap
#  as2:/etc/cmcluster/crons/fxa/as2.as2swap.as1swap
# under root ownership.

# Any entry that needs to use ${FXA_HOME} or ${FXA_DATA} should use "csh -c"
# to run the command. The command and any output redirection to a file must
# all be include in single quotes after the "-c". The output redirection
# will then be done by the csh so it must use csh syntax.

# Break ingest logs daily
0 0 * * * csh -c '${FXA_HOME}/bin/breakLogIngest >&! ${LOG_DIR}/breakLogIngest.log'

# Run scour daily to clean up log files
30 0 * * * csh -c '${FXA_HOME}/bin/startScour >&! ${LOG_DIR}/startScour.log'

# Data Monitor scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/grid.cfg  -o ${FXA_HOME}/data/grid_data.html -h "Grid Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/graphic.cfg  -o ${FXA_HOME}/data/graphic_data.html -h "Redbook Graphics Products"'
#0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/text.cfg  -o ${FXA_HOME}/data/text_data.html -h "SBN Text Products"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/radar.cfg -o ${FXA_HOME}/data/radar_data.html -h "Radar Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/point.cfg -o ${FXA_HOME}/data/point_data.html -h "Point Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/sat.cfg -o ${FXA_HOME}/data/sat_data.html -h "Satellite Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/http.pl -c ${FXA_HOME}/data/local.cfg -o ${FXA_HOME}/data/local_data.html -h "Local Data"'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/diskUsage.pl -c ${FXA_HOME}/data/disk.cfg -o ${FXA_HOME}/data/diskUsage_data.html'

# Data Monitor summary page script
3,13,23,33,43,53 * * * * csh -c '${FXA_HOME}/bin/monitorSummary.pl'

# Process Monitor start-up scripts
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS1_startProcMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/AS2_startProcMon.sh'
0,10,20,30,40,50 * * * * csh -c '${FXA_HOME}/bin/startProcSum.sh'

# LAPS #
# ---- #
20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps /awips/laps/data
03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data
08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data
22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data

# MSAS - The MAPS/RUC Surface Analysis System #
# ------------------------------------------- #

# Ingest the NCEP surface grids every 12 hours 
# Programs    =  sfcnmc & prsnmc
# Valid Times =  00Z 12Z
# Runtime Z   =  06:57 & 18:57, to catch late arriving NGM211 grids
57 6,18 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Sfcnmc.run  >&! /awips/fxa/ldad/MSAS/logs/sfcnmclog'

# Run the surface cycle every hour at 20 minutes after the hour.
# Programs    =  sfcing  sfchqc  sfcanl  sfcncdf  sfcver  srcplot
20 * * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Surface.run  >&! /awips/fxa/ldad/MSAS/logs/sfclog'

# Compile the surface QC stats at the end of the day
# Programs    =  asos
# Valid Times =  00Z
# Runtime Z   =  23:53
53 23 * * * /bin/csh -c '/awips/fxa/ldad/MSAS/WFOA_MSAS_Asos.run  >&! /awips/fxa/ldad/MSAS/logs/asoslog'

Data purging

There are three purgers, all run by cron, as noted in the previous section. The first, master.purge, runs twice an hour on ds1. It in turn runs ~fxa/bin/fxa-data.purge and ~fxa/bin/laps-data.purge. The second, startScour, runs daily at 0030Z on each server. It starts ~fxa/bin/scour, which reads ~fxa/data/scour.conf.[ds|as] for the list of directories to clear out. The third, purgeAllRedbook, runs on ds1 and manages Redbook graphics. Logs for these processes are in $LOG_DIR/master.purge.log, $LOG_DIR/scour.log, and $LOG_DIR/purgeAllRedbook.log, respectively. Each is overwritten each run.

Data and process monitoring

The data monitor comprises a series of perl scripts that run via cron on as1. These scripts build HTML pages that are then copied to $SERVER_DIRECTORY/dataMon/html/, where SERVER_DIRECTORY is defined in ~fxa/data/dataMon.cfg. (The files are also retrieved by an http process on www-sdd (raptor), for use in the summary monitor.) Cron entries are as shown above.

The ingest process monitor is started via cron on ds1, as1, and as2, also as shown above. The XXX_startProcMon.sh script starts ~fxa/bin/ingProcMon.pl, which checks processes in ~fxa/data/XXX_ingestProcesses.txt, and builds an HTML file (XXX_ingestProcMon.html) showing what's up and down. These are copied to $FXA_WWW_SERVER_HOST:$SERVER_DIRECTORY/dataMon/, where SERVER_DIRECTORY is defined in ~fxa/data/dataMon.cfg.

The restart mechanism

Included at the bottom of the process monitor Web page is a link to bring up a restart menu, pointing to /awips/fxa/htdocs/cgi-bin/restart-setup.sh.This runs ~fxa/bin/restart-ingest.sh on as1, which in turn runs ~fxa/bin/restart-ingest-display.tcl. That finally runs ~fxa/bin/restart-ingest.tcl, which puts up a menu and takes action based on the user's selection. This tcl script calls the various *.tclProg shell scripts to stop and start processes. A write-up of this is found in ~fxa/doc/userGuides/IngestRestart.fm.

Text workstation

Procedures are stored in $FXA_DATA/scripts/<username>. Each procedure is in a file, and consists of a list of commands. The usernames are found in ~fxa/data/fxa-users.

Each text Xterm is hosted by its associated workstation. Text `stuff' is stored in $FXA_DATA/textWSwork/xtn-bou:0. Subdirectories include saved (copies of all products that have been created on this station), and journals (in-progress editing, saved for crash recovery), and archived (permanent copies of products sent out over the WAN. Also here is textAlarmAlertProducts.txt, the list of alarm/alert products specific to this workstation. (Site-wide products are in ~fxa/data/textAlarmAlertProducts.txt.)

Log files are in $LOG_DIR/display/xtn-bou:0/yymmdd/textWish<pid>. Logs exist for the text windows, but not the parent textWS.tcl process.

If an Xterm gets mis-configured, the title window will come up, but the individual text windows will not. (You'll get a tcl error when you try to start one.) Press F12 on the keyboard for a second or two, then select Server. Press the Access Control button (middle button in second panel) `on' and click OK (upper right). Answer OK in the dialog box, wait for the reset, log in, and you should be ready to roll.

Local LAPS processing

LAPS (analysis) runs on as2, hourly by cron.

As noted earlier, four LAPS processes run via cron:

  20 * * * * /usr/local/bin/perl /awips/laps/etc/sched.pl /awips/laps/ /awips/laps/data
  03,19,34,49 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl vrc_driver.x /awips/laps /awips/laps/data
  08,14,23,29,38,45,53,59 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lvd_sat_ingest.exe /awips/laps /awips/laps/data
  22,30 * * * * /usr/local/bin/perl /awips/laps/etc/laps_driver.pl lsr_driver.exe /awips/laps /awips/laps/data
  1. The sched.pl script runs the analysis starting at 20 past the hour. This script runs processes that ingest various datasets, run the analysis, purge analysis and intermediate files, and write the results into /data/fxa/Grid/FSL/netCDF/LAPS_Grid/LAPS. Part of this process is a set of grid notifications, logged in $LOG_DIR//GridNotify*, at about 22 past the hour.
    Logfiles for the individual processes (named *.log.<hhmm>) are written to $LOG_DIR//laps. Analyses and intermediate ingest files are written to /awips/laps/data/lapsprd/*, in which the `*' refers to the appropriate product subdirectory. (This is directed via soft link to /data/fxa/lapsprd.)
  2. The second entry runs the radar (NOWrad) ingest process for LAPS.
  3. The third crontab entry activates the satellite ingest process (called lvd) 8 times an hour (to accommodate rapid scan operations ­ in many cases, it does nothing, requiring appropriate satellite files). This puts GOES data on the LAPS grid, creating files in .../lapsprd/lvd/. Similar to the other processes, logs are written to log/lvd.log.hhmm and log/lvd.err.hhmm.
  4. The final entry ingests satellite sounder data. (None is available on AWIPS, so this is essentially a no-op.)
The entire LAPS ingest/analysis generally completes in approximately 5 minutes. Run times longer than 15 minutes or shorter than 2 minutes may indicate a problem. Run completion times are logged in runtime.log.

 More information about LAPS run-time details is available in the LAPS README file, http://laps.fsl.noaa.gov/frd/laps/software/README.html.

LDAD processes

LDAD runs partly on ds1 ("internal") and partly on ls1 ("external"). The internal part includes three processes: The listener process gets data through the firewall, storing files in $FXA_DATA/LDAD/Raw. There is a listener log in /data/logs/ldad, but it's not at all easy to read. (I did on one occasion find a permissions problem writing the raw data by looking at the listener log.) You'll also see there a LDADdecoder.log file, which is the log of the current decoder. The watchDogInternal script checks every 30 seconds to see if the listener and decoder are running.  Decoder logs are also written to the usual spot along with other ingest logs. Those files include PID in the name, so there are lots of 'em. (The LDADdecoder.log file includes time stamps on the messages, but those in $LOG_DIR/<date> do not.)

Sometimes, both decoder and listener are up, but no data are coming through. This suggests a problem on the external side. You can restart the whole LDAD system:

This procedure starts both internal and external processes, and may shake things loose. Via cron, this command is issued daily at 1010Z (at least partly because that's the only way at present that the LDADdecoder log gets broken).

Some other stuff


Data sources and storage

Data are stored on a set of three 2-GB disks. The 3 disks are configured as one logical volume, known to the data ingest software as $FXA_DATA.

Use bdf to check on disk space.

Click here for data storage information.

This page is maintained by Joe Wakefield.
Last modified: 15 Oct 98