Enhancements to LogStream

Gerry Murray, 3/2000

This document describes three additional capabilities that have been implemented as a proof of concept. My hope is that all three will be adopted for AWIPS 5.1. These capabilities are enhancements; they are backward compatible since they don't affect the existing functionality or performance. This document just describes what these enhancements, why they were done, and how to use them. For implementation details, use the source, Luke!

Questions or comments: email me

Overview

Thread Safe Logging Collective Logging Stack Traces

Thread Safe Logging

What is Thread Safe?

One of the big advantages of using multiple threads is also one of its disadvantages. Multiple threads can execute independently over the same address space in memory, while multiple processes use distinct address spaces (unless they are using shared memory.). Each thread has its own calling stack (and local variables) but it shares the process's heap memory, static and global variables with other threads. On the plus side, this cuts down the amount of memory used in a system. On the down side, one thread can be reading a location in memory while another thread is writing to that location simultaneously, causing some very subtle and hard to debug errors. One solution is to use mutex locks (sometimes called semaphores) to protect memory locations that are prone to multiple thread access. This technique is often called synchronization or "making a process thread safe". Many Unix system calls and C/C++ library functions are thread safe but some are not.

LogStream was implemented using the C++ IO Stream package which is not thread safe (at least in the Linux implementation.). Even if it was, LogStream still uses static data structures which can easily be corrupted by two threads logging a message at the same time. One solution is to only use LogStream in the main thread of execution. This approach is OK for simple threads that don't need to log anything useful, but could be problematic for more complex threads. A better approach is to make LogStream thread safe by wrapping mutex locks around the critical logging code.

Why Now?

When LogStream was originally developed, all AWIPS executables contained a single thread of execution. Today, this is no longer the case. The inter-process communication library now spawns an additional thread while waiting to write to a full socket. The asyncProdSchedluer developed by the Office of System Development (OSD) also uses multiple threads. As hosts with multiple processors become more prevalent, I predict many more AWIPS executables will be multi-threaded.

Will All Executables Use Thread Safe Log Stream?

At first I thought, why not? After further investigation, I decided against it. For every log entry, there is the overhead of two additional system calls. The performance impact is probably only significant if excessive logging was taking place. More of a concern is that all client executables will have to link with the POSIX thread library. On most platforms, this is a shared library so it won't affect executable size. However, linking with the thread library does affect signal delivery in strange ways. Some of our executables link with third party libraries that handle signals (SIMPACT's Freeway library is a good example.). For these executables, linking with the thread library could cause serious and even fatal errors. In my testing, I even found some simple localization executables that crashed when linked with the thread library. Therefore, by default, executables are linked with a thread unsafe version of LogStream.

How is Thread Safe Log Stream Enabled?

Even though by default, executables use a thread unsafe implementation of LogStream, it is relatively easy to tell an executable to use a thread safe implementation. Simply use a different macro for LogStream when specifying the libraries to link with. Check out this example. Here's a link list of an executable using the default log stream:
LOG_STREAM_TEST_OBJECTS =           \
       logStreamTest.o              \
       ${SIGNAL_CATCHER}            \
       ${LIB_PROC_MGMT}             \
       ${LIB_IPC}                   \
       ${LIB_CONFIG}                \
       ${LIB_LOG_STREAM}            \
       ${LIB_UTIL}                  \
       ${LIB_FOUNDATION}

Here's a link list of an executable using the thread safe log stream:

LOG_STREAM_TEST_OBJECTS =           \
       logStreamTest.o              \
       ${SIGNAL_CATCHER}            \
       ${LIB_PROC_MGMT}             \
       ${LIB_IPC}                   \
       ${LIB_CONFIG}                \
       ${LIB_THREAD_SAFE_LOG_STREAM} \
       ${LIB_UTIL}                  \
       ${LIB_FOUNDATION}

As with all of our macros, LIB_THREAD_SAFE_LOG_STREAM is defined in ${STAGING}/Makefile.aliases

Collective Logging

What is Collective Logging?

LogStream was originally designed for three sinks (locations where output would end up.) Up until now, only the tty and file sinks have been implemented, although it has always been possible to specify the collective sink in a log preferences file.

The collective sink could be useful for collecting the logs of different runs of the same program into one file. Or maybe collecting all the output of one of the heads of a two headed display in one file. Or maybe it might be useful to have all the programs from a particular host all log to the same file.

How is the Collective Filename Specified?

As mentioned earlier, the filename is controlled by the contents of the COLLECTIVE_FILE environment variable. This variable can have an explicit pathname, such as $FXA_DATA/logs/collective. Inside the pathname, it's possible to specify directives that substitute some information about the time, process, or host. Here are the directives that are allowed for collective logging:
%L LOG_DIR environment variable
%H Hostname with domain name
%D UTC date, YYYYMMDD
%N Program name (just the leaf, not the full path)
%d DISPLAY environment variable

For example, suppose you were to run the D-2D today (3/20/2000) on ws1 and the LOG_PREF environment variable is set to $FXA_HOME/logs. If you set the COLLECTIVE_FILE environment variable to %L/%D/%N-%H, all the five of the IGC processes would log to the following file: $FXA_HOME/logs/20000320/IGC_Process-ws1.fsl.noaa.gov.

How is Collective Logging Enabled?

Collective logging is enabled through the log preferences file. The log preference file specifies filters (what gets logged where), prefixes (text that appears in front of every log entry.), and other special log stream features. Both filters and prefixes have influences on collective logging.

A filter has the following syntax:


Here's an example. These three filters will turn on collective logging for the program "asyncProdScheduler" for all categories except debug or verbose.

asyncProdScheduler all collective all = on
asyncProdScheduler all collective verbose = off
asyncProdScheduler all collective debug = off

A prefix has the following syntax.

For collective logging, it makes a lot of sense to enable the process ID, since multiple processes will be logging to the same file.
This line in a log preferences file will do just that:

collective processID = on

Are Collective Log Files Automagically Broken At Midnight?

Short answer: no, although that would be a great idea. Could have easily added the code to do this, but since single process log files have to be explicitly broken by a USR1 signal, collective log files also must be explicitly broken. Now, a process can potentially have two separate log files open (single process and collective). After the receipt of a USR1 signal, both log files will be broken the next time a log entry is made.

Stack Traces

Enhancement? Hasn't this Feature Always Been Part of LogStream?

Stack traces have been working and used quite extensively right up to release 4.3. Prior to 5.0, stack traces were implemented by a function called U_STACK_TRACE() included in the HP Stack Unwind library. Unfortunately, this function relies on some code generated by the HP compilers. Thus, this function is not available to executables built with gcc, and stack traces will not be available for release 5.0.

It is possible to get a stack trace on a HP gcc executable since gdb (the GNU debugger) can do these flawlessly. After much head scratching and studying the gdb source code, I was able to write code that produces stack traces even through virtual functions, shared libraries, signal handlers, and long jumps. On Linux, the task was much simpler. There is already a function in the libc library that gets the addresses of each frame in the calling sequence. All that was left to do was translate those addresses into function names.

How are Stack Traces Different Now?

Stack traces are still enabled in the log preferences file for the debug, verbose, bug and fatal categories and they look basically the same as before with only a few subtle differences.
Filenames and line numbers.
U_STACK_TRACE only provided the function name for each frame. It's also very useful to know exactly where in the function that the previous frame was called. Both the filename and line number describing that location is now provided for Linux platforms. Unfortunately, for HPs, I couldn't figure out how to get at this information. The filename is just the leaf. The full pathname could have easily been shown, but would have cluttered the stack trace, making it harder to read. If the executable is partially stripped, the filename and line number information can not be retrieved because it's no longer in the executable.
Demangling of C++ function names.
Prior to this enhancement, examining stack traces could be confusing since some of the function names appearing in the output looked very different than the corresponding name in the source code. This was because the C++ compiler mangles all function names, and that's how they appear in the executable symbol table. U_STACK_TRACE does not decipher these mangled names. Now stack traces will print out function names as they appear in the source code. In addition, the parameter types of the function will also be displayed. This is useful for distinguishing overloaded C++ functions. In order to save space, the parameters will not be displayed if the filename and line number information is displayed.
Suppressing recent stack traces.
A very useful feature of LogStream is that it doesn't continually display repeated stack traces which can really clutter the log. Prior to this enhancement, LogStream would suppress a traceback if the previous stack trace originated from the same location (file and line) even if the inner frames of the trace were very different. With this strategy, useful stack traces were being suppressed. Now, only stack traces that are completely identical to recent stack traces are suppressed. See the examples below for further clarification.
Limiting the stack depth.
The programmer can now specify the maximum number of frames for a stack trace by calling the static method, LogStream::setTracebackMaxDepth(). Since each frame has a trace line associated with it, this feature can be useful for reducing the size of the logs.
Limitations
Quite often, stack traces are produced whenever the process generates a fatal signal, like a segmentation violation. On HP-UX, these stack traces are accurate, but not so with Linux. Unfortunately, it's not possible to report the frame where the signal occurred, (which is the most interesting frame). Even gdb fails to do this properly. That frame is replaced with a message indicating this frame is missing. Fortunately, it's possible to infer where the signal occurred from the previous frame.
Examples:
Here is an actual stack trace of an IGC_Process. This was caused by loading an IR Satellite product. I purposely put a segmentation violation in the depictable code that ingests the data to illustrate how stack traces work with signals. Here is the stack trace running on Linux:
IGC_Process 4118 IGCSignalClient.C 55 BUG: Signal 11 (Segmentation violation) received.
0x8106950 in IGCSignalClient::shutdown() at IGCSignalClient.C:57
0x831c864 in SignalCatcher::handleSignal() at SignalCatcher.C:131
0x831d83f in ThreadSignalCatcher::handleSignal() at ThreadSignalCatcher.C:257
0x831c798 in handleSignal() at SignalCatcher.C:94
<Cannot determine where signal occurred>
0x81f7c5e in PVImageDepict::paint() at PVImageDepict.C:119
0x8110885 in ImageDepictTuple::getPsuedoData() at DepictTuple.C:1280
0x811055d in ImageDepictTuple::render() at DepictTuple.C:1191
0x810eb39 in DepictTuple::displayDrawable() at DepictTuple.C:289
0x811ad9f in Frame::display() at Frame.C:355
0x8123281 in FrameSeq::refresh() at FrameSeq.C:1758
0x81202ac in FrameSeq::load() at FrameSeq.C:421
0x8137540 in DisplayMgr::handleKeyEvent() at KeyPressHandler.C:872
0x8116710 in DisplayPanel::handleDisplayAreaEvent() at DisplayPanel.C:1172
0x812e29d in GraphicsDispatcher::dispatchEvent() at GraphicsDispatcher.C:113
0x812e180 in GraphicsDispatcher::handleEvent() at GraphicsDispatcher.C:84
0x831c250 in dispatchFileEvent() at TclDispatchEngine.C:63
0x84029d2 in FileHandlerEventProc() at tclUnixNotfy.c:407
0x83f20d0 in Tcl_ServiceEvent() at tclNotify.c:444
0x83f236e in Tcl_DoOneEvent() at tclNotify.c:747
0x831c69a in TclDispatchEngine::dispatch() at TclDispatchEngine.C:204
0x8320f7b in EventDispatcher::enterDispatchLoop() at EventDispatcher.C:76
0x81066eb in main() at IGC_Process.C:559

Notice that the signal occurred in the fifth frame, but we can't determine where in Linux. Here's the same stack trace on HP-UX. Instead of file names and line numbers, the parameters of each function are displayed.

IGC_Process 1800 IGCSignalClient.C 55 BUG: Signal 11 (Segmentation violation) received.
0x13c9dc in IGCSignalClient::shutdown(bool, int)
0x925630 in SignalCatcher::handleSignal(int)
0x927054 in ThreadSignalCatcher::handleSignal(int)
0x925440 in handleSignal(int)
0x484e10 in SatPVImageDepict::ingestData(void)
0x400acc in PVImageDepict::paint(GraphicsTarget const *, CartCoord2D<int>, Depictor const *, ZoomAndDensityInfo const &, Depictable::ComboType)
0x178658 in ImageDepictTuple::getPsuedoData(CartCoord2D<int> const &)
0x178228 in ImageDepictTuple::render(bool)
0x17570c in DepictTuple::displayDrawable(bool, bool)
0x1cc208 in Frame::display(void)
0x1e4a70 in FrameSeq::refresh(short)
0x1e0acc in FrameSeq::load(unsigned long const &, AbsTime const &, long, LoadMode, bool, long)
0x241c58 in DisplayMgr::handleKeyEvent(unsigned long, int)
0x19bcf4 in DisplayPanel::handleDisplayAreaEvent(_XEvent const *)
0x21ad5c in GraphicsDispatcher::dispatchEvent(_XEvent const &)
0x21ab98 in GraphicsDispatcher::handleEvent(void)
0x922094 in dispatchFileEvent(void *, int)
0xa4a234 in FileHandlerEventProc()
0xa3acb0 in Tcl_ServiceEvent()
0xa3b004 in Tcl_DoOneEvent()
0x9226e0 in TclDispatchEngine::dispatch(EventDispatcher *)
0x939284 in EventDispatcher::enterDispatchLoop(EventDispatcher::DispatchType)
0x13743c in main()

Here's a stack trace caused by the same segmentation violation. However this one was triggered by a frame being rendered in the background. The current version of logStream would have suppressed this stack trace, but this version displays it.

IGC_Process 4118 IGCSignalClient.C 55 BUG: Signal 11 (Segmentation violation) received.
0x8106950 in IGCSignalClient::shutdown() at IGCSignalClient.C:57
0x831c864 in SignalCatcher::handleSignal() at SignalCatcher.C:131
0x831d83f in ThreadSignalCatcher::handleSignal() at ThreadSignalCatcher.C:257
0x831c798 in handleSignal() at SignalCatcher.C:94
<Cannot determine where signal occurred>
0x81f7c5e in PVImageDepict::paint() at PVImageDepict.C:119
0x8110885 in ImageDepictTuple::getPsuedoData() at DepictTuple.C:1280
0x811055d in ImageDepictTuple::render() at DepictTuple.C:1191
0x810eb39 in DepictTuple::displayDrawable() at DepictTuple.C:289
0x811af42 in Frame::prepareATuple() at Frame.C:423
0x812be07 in FrameSeq::handleTimerExpiration() at FrameSeq.C:4797
0x831c27d in dispatchTimerEvent() at TclDispatchEngine.C:83
0x83f8b98 in TimerHandlerEventProc() at tclTimer.c:520
0x83f20d0 in Tcl_ServiceEvent() at tclNotify.c:444
0x83f236e in Tcl_DoOneEvent() at tclNotify.c:747
0x831c69a in TclDispatchEngine::dispatch() at TclDispatchEngine.C:204
0x8320f7b in EventDispatcher::enterDispatchLoop() at EventDispatcher.C:76
0x81066eb in main() at IGC_Process.C:559

You Have to be Kidding; More COTS to do Stack Traces?

Yes it's true, I used two freeware libraries to implement this functionality. Hey, at least they're free and portable. I could have used DMQ or OI. I know this is gonna raise some eyebrows for the powers that be, and I honestly tried to do this without using these libraries. I know this sounds like a cop-out but this was very difficult code to write even with the libraries.

For the name demangling, I used a library that comes with gcc called libiberty.a. Since this library always comes with gcc, hopefully this should not be a configuration issue for PRC.

The other library is a GNU library called libbfd.a and comes with the BinUtils package. To get it via ftp, click here. We have this package for HP-UX installed on our Auspex file server. For Linux, it comes bundled with Red Hat 6.

I set up the logStream Makefile so that these two libraries are linked directly with the StackTracer module. The pathname of where to find libbfd.a is in the environment variable, GNU_DIR while the pathname of where to find libiberty.a is in the env var, GCC_DIR. It doesn't appear that linking with these two libraries increases the executable sizes substantially.

Are Stack Traces Possible with a Stripped Executable?

When an executable is fully stripped, all debugging and symbol information is removed. As a result, the stack trace will only contain frame addresses. However, these can be slightly useful since these addresses can be used to set breakpoints when debugging with gdb.

Fortunately, an executable can be partially stripped, removing the debugging information but keeping the symbol information. A partial strip can net substantial savings in the executable size. And the good news is that function names are still available. Filenames and line numbers are only available with executables that still have complete debugging info. To partially strip an executable with HP gcc, use the argument, -Wl,-G when linking. (For Linux, use, -Wl,-S)

The following table shows the size in megabytes of the IGC_Process on HP-UX after linked with the various strip options.
Complete Strip (-s) 12.92
Partial Strip (-Wl,-G) 16.46
No strip 61.91

If these enhancements are adopted for build 5.1, I strongly urge that executables running in a production environment be partially stripped.