Peter A. Mandics, Chief Information Officer
(303-497-6854)
Web Homepage: http://www-fd.fsl.noaa.gov/
Mark D. Andersen, Senior Database Analyst, 303-497-6518
Jonathon B. Auerbach, Computer Operator, 303-497-3760
Joan M. Brundage, Deputy Chief Info. Officer, 303-497-6895
Joseph R. Carlson, Professional Research Asst., 303-497-6794
Lee M. Cohen, Professional Research Asst., 303-497-6052
Michael A. Doney, FSL Network Manager, 303-497-6364
Steve J. Ennis, Network Engineer, 303-497-6372
Leslie A. Ewy, Systems Analyst, 303-497-6018
Joaquin Felix, Systems Administrator, 303-497-5267
Paul Hamer, Systems Analyst, 303-497-6342
Huming Han, Computer Operator, 303-497-6862
Chris Harrop, Associate Scientist , 303-497-6808
Leslie B. Hart, Jet Management Team Lead, 303-497-7253
Yeng Her, Computer Operator, 303-497-7339
Patrick D. Hildreth, Computer Operator, 303-497-7359
Forrest Hobbs, HPTi Program Manager, 303-497-3821
Keith G. Holub, Systems Administrator, 303-497-6774
Ara T. Howard, Professional Research Asst., 303-497-7238
Paul Hyder, Professional Research Asst., 303-497-6656
Peter Lannigan, Systems Administrator, 303-497-4639
Robert C. Lipschutz, Production Control Mgr., 303-497-6636
Chris MacDermaid, Data Systems Group Lead, 303-497-6987
Debra J. Martinez, Secretary OA, 303-497-6109
Chuck Morrison, Systems Engineer, 303-497-6486
Ed Moxley, Systems Administrator, 303-497-6844
Scott T. Nahman, Logistics Mgt. Specialist, 303-497-5349
Glen F. Pankow, Systems Analyst, 303-497-7028
John V. Parker, FSL IT Security Officer, 303-497-5124
Gregory M. Phillips, Lead Systems Admin., 303-497-7685
Peter Rahm-Coffey, Computer Operator, 303-497-7341
Richard Ryan, Systems Analyst, 303-497-6991
Robert Sears, Network Engineer, 303-497-4226
Amenda B. Stanley, Systems Analyst, 303-497-6964
Sarah E. Thompson, Systems Administrator, 303-497-6024
Dr. Craig C. Tierney, Systems Engineer, 303-497-3112
Cristel Van Leer, Computer Operator, 303-497-7537
(The above roster, current when document is published, includes
government, cooperative agreement, and commercial affiliate staff.)
Address: NOAA Forecast Systems Laboratory Mail Code: FST
David Skaggs Research Center
325 Broadway
Boulder, Colorado 80305-3328
The group designs, develops, upgrades, administers, operates, and maintains the FSL Central Computer Facility. For the past 22 years, the facility has undergone
continual enhancements and upgrades in response to changing and expanding FSL project requirements and new advances in computer and communications
technology. In addition, ITS lends technical support and expertise to other federal agencies and research laboratories in meteorological data acquisition,
processing, storage, distribution, and telecommunications.
The Central Facility acquires and stores a large variety of conventional (operational) and advanced (experimental) meteorological observations in real time.
The ingested data encompass almost all available meteorological observations in the Front Range of Colorado and much of the available data in the entire
United States. Data are also received from Canada, Mexico, and some observations from around the world. The richness of this meteorological database is
illustrated by such diverse datasets as advanced automated aircraft, wind and temperature profiler, satellite imagery and soundings, Global Positioning
System (GPS) moisture, Doppler radar measurements, and hourly surface observations. The Central Facility computer systems are used to analyze and
process these data into meteorological products in real time, store the results, and make the data and products available to researchers, systems developers,
and forecasters. The resultant meteorological products cover a broad range of complexity, from simple plots of surface observations to meteorological analyses
and model prognoses generated by sophisticated mesoscale computer models.
Figure 9. Monitoring screen showing the status of HPCS systems activity.
ITS continued its support of at least 40 projects on FSL's supercomputer Jet. The HPCS provides computational capability for numerous modeling efforts
related to the atmosphere, ocean, climate, and air quality, which are carried out by FSL and non-FSL researchers. For example, several Joint Institutes,
OAR (NOAA's Office of Atmospheric Research) laboratories including the Environmental Technology Laboratory (ETL), Aeronomy Laboratory (AL),
National Severe Storms Laboratory (NSSL), and the NWS National Centers for Environmental Prediction (NCEP) all take advantage of the HPCS.
FSL Mass Store System (MSS) Major upgrades were made to the Mass Store System to correct reliability and performance problems. This was
necessary primarily because the large database maintained by Advanced Digital Information Corporations's (ADIC) FileServ Hierarchical Storage Management
(HSM) software had compromised performance, and Sony's Advanced Intelligent AIT-2 drives and cassettes had become unreliable. First, steps were taken to
stabilize the MSS by upgrading the ADIC FileServ/VolServ Hierarchical File System (HFS) software and server operating system. A major upgrade was
implemented later that included installation of an additional, completely new HFS, which logically split the ADIC AML/J automated storage library robot into
two virtual robots. The original FileServ/VolServ-based system continues to function in a read-only mode with 1,232 Sony AIT-2 tape slots and 4 AIT-3 tape drives.
The new HFS, based on a Sun SunFire 480 server running ADIC's StoreNext software, features l,040 Linear Tape-Open (LTO) tape slots and 8 IBM LTO tape drives.
Two 600-Gigabyte managed file systems (caches) were also provided, one dedicated to real-time data ingested by the Central Facility and the other available for
user data. The new HFS has significantly increased speed and reliability. Major enhancements were also made to the FSL-developed tools for accessing the MSS.
Central Facility Systems Enhancements and Cost Savings A major ongoing project in ITS involves defining ways to cut costs in the FSL Central
Facility. Toward this end, ITS system administrators have decommissioned several older systems with high maintenance costs after moving to newer, less
expensive systems. Central administration processes are being implemented for most Unix systems to cut system management costs. The printing systems
have been reconfigured to increase reliability and offer better service to users. These activities allow system adminstrators more time to address other important
issues.
System administrators became familiar with Sun's Solaris 9 operating system (OS) before moving systems to the newer OS. A used testbed system was procured
and configured, and standard Solaris 9 installation procedures were defined and implemented. With the exception of systems running software that requires
Solaris 8, new (replacement) Sun systems and rebuilds of current Sun systems have been placed on the more secure Solaris 9 platform, increasing security
and decreasing system administrator time.
Another effective cost-cutting measure included developing more efficient use of existing resources. FSL's central data repository employing a Network
Appliance, Inc. filer (NFS server) is a good example. This filer had become excessively overloaded, and often failed to respond to real-time data-access needs.
An intensive mitigation project was implemented to reduce unnecessary load on this costly resource, avoiding (or at least postponing) the need to procure
a new system.
FSL system administrators have been applying an unending stream of security related patches and upgrades. It is a major task to keep multiple versions of six
different operating systems (Sun, Solaris, Linux, SGI IRIX, Microsoft Windows, etc.) patched and up to date.
The FSL mail lists were converted to NOAA Enterprise Messaging System (NEMS) groups. The names and descriptions of these groups are now visible in the
NEMS directory, and conform to the NOAA enterprise mail strategy. Also, most of the laboratory was transferred to the main FSL mail server, eliminating
miscellaneous mail servers and improving mail-handling reliability.
FSL PC Administration The FSL Windows 2000 network was stabilized. Server logs containing errors and configuration problems related to
Domain Name System (DNS) issues were corrected and updated. Prior to these upgrades, users were experiencing logon failures and connectivity outages.
FSL's domain servers were rebuilt and patched with all known fixes and service packs, and are now running smoothly.
Network maintenance on the server level also included an upgrade to the antivirus software and a full rollout of the updated software to all PCs on the FSL
network running the Windows operating system.
An additional 25 machines from the FSL International Division were transferred to the FSL PC Administrator. Network management software suites were
evaluated to help manage the increasing number of PCs. The IBM Tivoli suite was chosen for its ability to control, update, and administer windows computers
remotely.
PC security and systems patching remained a high priority throughout the year. Systems were kept up to date using Microsoft's Windows Update Utility. Also,
the Microsoft Security Baseline Advisor was used to constantly monitor for security holes on all Windows networked machines.
The PC administrators' day-to-day tasks included support for various problems involving hardware and software, failed logons, password changing, disk problems,
printing errors, drive failures, RAM issues, program errors, security updates, E-mail, OS reloads, backup configurations, dial-up accounts, data recovery, and
network connectivity.
Systems Support and Computer Operations The Systems Support Group (SSG) maintains a log (utilizing the FSLHelp System) that provides
effective communication among the SSG staff, ITS Data Systems Group (DSG), system and network administrators, and other essential staff. The SSG log
provides a higher level of service to FSL users in dealing with the numerous, varied issues responded to on a daily basis. This log also offers, among other things,
a means for recording the history of events and tracking the procedures used to correct problems. During the year, about 2,170 log tickets were initiated and
resolved. In addition, approximately 154 customer FSLHelp requests were processed for data compilations, file restoration, account management, video
conferencing, and other requests requiring operator assistance.
The Web database used to document the procedures for maintaining the Central Facility has grown to 131 documents. New procedures and updated information
require continual refinements, corrections, and updates to the documents. Good documentation, in turn, provides operators the means to troubleshoot and resolve
issues involving real-time data, Central Facility equipment, and customer queries. The improved efficiency and consistency resulted in shorter downtimes and faster
response to users.
SGG staff renewed efforts to provide assistance to system administrators, when feasible, in user account maintenance (such as adding/removing accounts) and other
special projects on an as-needed basis.
The SSG weekly schedule was adjusted so that the lead operator could be more available during busier days. Also, overlap days, when three operators were on
duty at once, were more spread out. This allows more time for special projects, facilitates flexibility in group training, and helps reduce overtime when operators
take leave.
To accommodate 24-hour/7-day onsite support and augment staffing during emergencies, an emergency operator coverage plan was implemented which outlines
the course of action to be taken when emergency coverage is required. Also, because of staff departures, and to ensure shift coverage, two full-time operators were
hired and trained.
The SSG oversaw and monitored the daily laboratorywide computer system backups, with ~300 GB of information written each night for ~260 FSL client systems.
Quarterly offsite backups were successfully completed on time. The tape rotation for quarterly offsite backups was increased to provide individual machine backups
for up to one year.
In coordination with the Data Systems Group, numerous new products and critical systems (such as Fire Weather data servers, Temperature and Air Quality;
TAQ systems, and RUC/RSAS (Rapid Update Cycle and the RUC Surface Assimilation System) backup were added to the Facility Information and Control
System (FICS). To support these additions, several critical support documents and SSG Help documentation were updated so that the basic functions of the
SSG (monitoring, troubleshooting, and discussing real-time data issues) are properly maintained.
A renewed emphasis placed on proper procedures for notifying data end-users (customers) resulted in updated documentation and other assistance tools
(e.g., flow diagrams to ensure consistency within the SSG in this important area of customer service. The FSL Central Facility Data Availability Status Webpage
was updated, and so was the tool that creates updates to this important customer information source.
A new feature was added to FICS that monitors product delivery to the NWS Telecommunications Gateway servers, in support of continued FSL backup of
RUC/RSAS products for NWS/NCEP. The SSG online documentation was updated, and other assistance materials and tools were developed and implemented.
These improvements ensured that SSG is more proactive and responsive in monitoring and communicating about FSL RUC/RSAS production and delivery to
NCEP.
To keep well informed of computer security issues and maintain compliance with DOC, NOAA, and OAR security guidance, SSG staff took the NOAA IT online
Security Awareness training, and also completed the online, in-depth SANS (SysAdmin, Audit, Network, Security) Institute Security training course. All SSG staff
received ongoing, in-depth training on the main computer room VESDA Smoke Detection System and FM-200 Fire Suppression System.
Facility Infrastructure Upgrades FSL underwent two substantial infrastructure upgrades to address the power, cooling, and space requirements of
the final upgrade to the High-Performance Computing System. Every effort was made to implement the infrastructure upgrades with minimal downtime to
existing equipment and FSL users.
The first infrastructure upgrade involved the expansion of the Central Facility Annex. An office and a storage room were relocated to add space for the computer
room next door. The walls surrounding the new computer room required extensive sound mitigation work to meet the American Society of Heating, Refrigeration,
and Air-Conditioning Engineers (ASHRAE) noise protection criteria for private offices. Surrounding walls were extended deck to deck, and an Uninterrruptible
Power Supply (UPS, Figure 10) was installed to provide short-term backup power. A ramp was installed to raise the floor to 12 inches in support of a new
dedicated CRAC (Computer Room Air Conditioner) unit.
Figure 10. Uninterruptable Power Supply (cabinets to the left) and the
GOES ground-station rack (tall, black unit toward the back) in the new
Central
Facility Annex.
To create space for the final upgrade, older racks and equipment were moved from the main computer room to the new annex. The finished Central Facility
Annex (Figure 11) was then fully certified in accordance with National Fire Protection Association standards.
Figure 11. (top) Forecast Research Division compute servers, (middle) Network equipment row, (bottom) one of four CRAC (Computer Room Air
Conditioner) units in the Central Facility.
The second infrastructure upgrade brought the Central Computer Facility up to original specifications by increasing the cooling capacity to 120 tons and
emergency UPS electric power to 300 kVA. Four 15-ton CRAC units were replaced with four 30-ton units. Chilled water piping modifications and leak
detection upgrades were required as well as floor tile cutouts and stronger underfloor supports. Additional power distribution panels and larger power
transformers were also installed to support the increased electrical requirements. The Emergency Power Off (EPO) bypass capability was separated from
the FM-200 Fire Suppression bypass switch in order to perform functional FM-200 Fire Suppression testing and maintenance without powering down the
entire computer room. Finally, 28 legacy HPCS computer racks were removed and 48 new HPCS final upgrade racks were installed. The implemented
specifications for the main computer room and the annex are shown in Table 1.
Table 1.
Specifications for Upgraded FSL Central Computer Facility and Annex
Main Computer Room
The ability of two Cisco 6509s to perform hardware-based routing represents a substantial improvement over the previous configuration of 5 Marconi PowerHub
software-based routers for the 35 active networks at FSL. Figure 12 shows the upgraded network configuration.