September 2006
Features

Alarm systems greatly affect offshore facilities amid high oil prices

Reliability of oil and gas production facilities has never been more important. Poorly performing alarm systems negatively impact reliability and production. They can interfere with, rather than assist, the operator in handling an abnormal situation. This article covers the problem’s origin, its nature and a seven-step methodology for significant improvement of alarm systems. BUTTERFLIES AND OIL PRICES Small changes can have drastic consequences. Chaos Theory’s “Butterfly Effect” proposes that the fluttering of a butterfly’s wings in the South China Sea could cause tiny changes in atmospheric conditions, which, over time, can propagate into hurricanes in the Gulf of Mexico. These are followed by offshore oil production losses and oil price increases. This effect applies to all sorts of processes that are sensitive to initial conditions and small changes, including hydrocarbon production processes and the systems that control them. Oil production facilities deal with this effect every day – in the alarm system.
Vol. 227 No. 9 

Automation And Control

Alarm systems greatly affect offshore facilities amid high oil prices

By following seven key steps, operators can create highly effective, reliable alarm systems to optimize offshore production facilities’ operation.

Eddie Habibi, Founder and CEO, and Bill Hollifield, Principal Alarm Management Consultant, PAS, Houston

Reliability of oil and gas production facilities has never been more important. Poorly performing alarm systems negatively impact reliability and production. They can interfere with, rather than assist, the operator in handling an abnormal situation. This article covers the problem’s origin, its nature and a seven-step methodology for significant improvement of alarm systems.

BUTTERFLIES AND OIL PRICES

Small changes can have drastic consequences. Chaos Theory’s “Butterfly Effect” proposes that the fluttering of a butterfly’s wings in the South China Sea could cause tiny changes in atmospheric conditions, which, over time, can propagate into hurricanes in the Gulf of Mexico. These are followed by offshore oil production losses and oil price increases. This effect applies to all sorts of processes that are sensitive to initial conditions and small changes, including hydrocarbon production processes and the systems that control them.

Oil production facilities deal with this effect every day – in the alarm system. Small changes can have large consequences. A quick, accurate response to a well-designed alarm results in continued production and profitability. But, overlooking an important alarm in a poorly designed and overloaded alarm system can result in equipment shutdown, damage and loss. This effect has been demonstrated many, many times.

ABNORMAL SITUATIONS THREATEN RELIABILITY

More than ever, reliability of production platforms is playing a key role in supplying the most essential resources for sustaining today’s quality of life for people around the world: crude oil and natural gas. Demand is high, supply is limited, new sources are few and refining capacity is strained. Any interruption in production directly impacts downstream processing plants and the price of hydrocarbon-based consumer products. Abnormal situations that impact production vary from minor upsets to full shutdowns. They can be caused by a number of factors, including equipment failure, human error and severe weather conditions.

Proper abnormal situation mitigation can make the difference between experiencing a low-impact upset or a catastrophic accident. A well-tuned alarm system can make the difference. Timely detection, assessment and response to an alarm are critical to an operator successfully keeping a platform up and running. With available technologies and proven work processes, steps can be taken today to significantly improve the robustness of a production platform during an abnormal situation.

THE SYSTEM: A PROBLEM AND A SOLUTION

The DCS alarm system is a well recognized problem area. A proper, effective alarm system should always act to assist the operator in handling an abnormal situation. It is a key technology for that purpose. Instead, we consistently find overloaded, poorly performing alarm systems that would better serve the operator, if they were actually turned off during abnormal conditions. These systems often become a source of confusion and exacerbate the situation, lessening an operator’s ability to focus on crisis recovery.

Poorly performing alarm systems have been identified by regulatory agencies as significant contributing factors to several major accidents. Poor alarm systems equally impact the operation of offshore platforms.

Distributed control system (DCS) alarm problems appear in various forms. Some of the most prevalent alarm problems include the following.

  • Alarm floods are a prevalent problem, wherein an operator may experience hundreds of alarms within a few minutes of a minor upset, and consequently miss detecting critical alarms as a result.
  • High, continuous alarm rates are common. Rates are often far above an operator’s ability to handle. Hundreds to thousands of alarms must be ignored each week in such a system, with no guarantee that the “right” ones are always ignored.
  • Improperly suppressed alarms, without records or notifications, are a common problem and pose a risk to operations. We often find improperly suppressed, critical alarms.
  • Chattering and similar nuisance alarms make detection of valid alarms much more difficult and more likely to be missed. This can make an upset condition become much worse or last much longer.
  • Stale or long-standing alarms clutter an operator’s view of the overall situation.

HOW WE GOT HERE

The alarm problem is a symptom of a broader issue related to human factors in the modern control room. With the advent of the DCS and all its known benefits, came an unintended consequence of limiting the operator’s ability to effectively manage abnormal situations. The people who implemented these systems often lacked knowledge of proper human factors related to designing an effective alarm management system. Often, alarms were set arbitrarily and inconsistently.

In the DCS, the addition of an alarm has no direct cost. Alarms are often configured and enabled by default. The result is massive over-configuration. Additionally, many alarm systems lack proper management-of-change control. In many control rooms, the operators are allowed to change alarm settings at will, and without documentation or proper engineering design considerations. This is often overlooked as an issue in most operating companies.

Prior to the DCS, and with the wall-mounted instrument panel, an operator typically had around 50 “lightbox” alarms to inform him of events requiring corrective action. The alarm panel was always visible, and the light arrangement became practically seared into the operator’s memory. Being expensive to implement, alarms were chosen carefully. Various upsets produced recurring patterns in the instruments and the lightbox that were identifiable to an experienced operator. Situation awareness was attained at a glance.

The migration to the DCS overlooked the big picture and pattern recognition benefits of a wall-mounted instrument panel. Today, it is not at all unusual to see a single operator console configured to have 3,500-plus alarms, Fig. 1. These alarms are generally presented in a tabular alarm summary display and are often confusing, due to the limited amount of space provided on a DCS screen. It is not unusual to have to call up four different screens to investigate an alarm and take corrective action. Without careful and proper design of the human-machine interface, the DCS cannot provide an operator with the big picture.

Fig 1

Fig. 1. Exponential growth of alarm counts per operator. 

The operator interface is one of the most critical human factors in the safe operation of a production platform, and a major topic deserving of its own extensive analysis. This article focuses primarily on alarm management. All of the topics mentioned are discussed in depth in the recently published book, The Alarm Management Handbook – A Comprehensive Guide.

SEVEN STEPS TO A HIGHLY EFFECTIVE ALARM SYSTEM

The improvement of a poorly performing, overloaded alarm system can be accomplished using a straightforward, seven-step plan outlined below.

Step 1: Create and adopt an alarm philosophy. An alarm philosophy is a comprehensive document providing best practices guidelines for proper definition, design, reengineering, implementation and ongoing maintenance of a screen-based alarm system. It covers both new systems and modifications to existing systems. It is a critical success factor for an effective system.

An alarm philosophy should be developed at each company and for each operating group through a rigorous process that engages a number of stakeholders – primarily operators, engineers and management. A single operating company is unlikely to have the wide-ranging experience and expertise needed to formulate a comprehensive philosophy, so having an experienced, knowledgeable facilitator in the development effort is crucial. The facilitator will provide guidance and understanding of industry’s best practices, and create a common operational definition of terms.

Management endorsement of the final philosophy document is essential. It should be taken as seriously as a safety standards document.

Step 2: Alarm performance benchmarking. Existing DCS-based alarm systems should be benchmarked against industry best practices. A benchmark:

  • Establishes current performance, essential to defining an improvement plan.
  • Provides for a data-driven decision process for management approval and funding for an improvement project.
  • Provides a baseline against which project gains can be measured, Fig. 2.
  • Identifies “bad actor” alarms that often provide significant improvement opportunities with relatively minimal cost and effort.
Fig 2

Fig. 2. Rate of alarms per day far exceeds manageable levels.

Benchmarking a system requires analysis of the alarm events and system configuration for a recent two-to-six month period. The following analyses are typically included in a benchmark:

  • Overall alarm rates per operating position.
  • Alarm flood periods, magnitudes and characteristics.
  • Nuisance alarms, such as chattering, fleeting and stale alarms.
  • Controlled and uncontrolled alarm suppression.
  • Alarm priority distribution. 
  • Alarm configuration analysis, including Management of Change (MOC).
  • Operator action analysis.

Step 3: Bad actor alarm resolution. Nuisance or “bad actor” alarms are a common problem in most systems. With enough bad actors, an alarm system is rendered virtually useless, because important or critical alarms are lost in the sea of bad actor alarms. This situation can have adverse economic, environmental or safety consequences.

Usually, the top 10 most frequent alarms comprise anywhere from 25% to 95% of the entire system load, Fig. 3. If just these alarms are dealt with successfully, then major system improvement will result, and with comparatively little effort.

Fig 3

Fig. 3. Resolving top 10 bad actor alarms often leads to significant improvement.

There are well-defined methods for solving bad actor alarms, but many control engineers do not know about them. Use of these methods usually results in a more than 50% reduction in a system’s alarm rate, with comparatively low cost and low effort. These methods involve the application of:

  • Proper alarm settings.
  • Proper alarm deadbands (for both analog and digital sensors).
  • Alarm time delays (both ON-delay and OFF-delay).
  • Process value filtering.
  • Proper point ranging and measurement clamping.

Step 4: Alarm documentation and rationalization. Alarm documentation and rationalization (D&R) is an effective, consistent and logical methodology for determining, prioritizing and documenting alarms. D&R involves a thorough re-examination of the alarm system. This solves many problems by removing the guess work, and making the design and engineering of an alarm system deterministic.

The basic methodology of a D&R is simple. For each point on the system, a team of knowledgeable people does the following:

  • Discuss each configured and possible alarm on that point.
  • Verify whether any alarm should exist at all.
  • Verify that all alarms represent abnormal situations that require operator action.
  • Verify that an alarm does not duplicate another similar alarm.
  • Determine the proper priority of each alarm as a combination of 1) the severity of the consequences if the alarm receives no response; and 2) the time available for the operator to successfully respond to the alarm.
  • Document the alarm causes, verification steps, consequences and operator response.
  • Determine the proper trip point for the alarm, based on examination of: 1) process history; 2) relevant operational procedures; and 3) equipment and safety system specifications.
  • Determine and highlight the need for special handling of an alarm, e.g., special logic, graphical interface, etc.

A proper D&R is a significant effort involving, at minimum, representatives from operations, engineering and safety groups. For maximum economy and effectiveness, it is important to complete the first three of the seven-step methodology prior to beginning D&R. Proper software and an experienced, knowledgeable facilitator significantly improve the D&R team’s productivity.

Fig 4

Fig. 4. Best practice recommendation for alarm priority determination.

Step 5: Alarm system audit and enforcement. DCS alarm systems are notoriously easy to change. Lack of proper MOC is one of the root causes for poorly performing alarm systems, Fig. 5. The significant investments expended in redesigning an alarm system must be protected through a rigorous MOC process.

Fig 5

Fig. 5. Alarm systems degrade over time without proper MOC.

Paper-based MOC systems have generally failed to effectively maintain alarm configuration integrity. Ideally, alarm Audit and Enforcement is a software function that will periodically and automatically check for changes from the proper settings, as contained in a Master Alarm Database. Software can report such changes and optionally restore the system to proper settings.

Step 6: Real-time alarm management. Production platforms and processing plants are dynamic, and have multiple operating states requiring different alarm settings for each state. Alarms in control systems are inherently designed to support a single operating state and, therefore, become useless outside the normal steady state. For optimum performance, alarm systems should be altered in real time under certain defined, controlled conditions to support each operating state. The need is expressed in three related functions:

  • Alarm shelving.
  • State-based alarming.
  • Alarm flood suppression.

Alarm shelving. Malfunctioning alarms may need to be suppressed for temporary periods. Alarm shelving supports this need in a controlled manner, where alarms are always known and never forgotten. Improper, uncontrolled alarm suppression is a rampant problem on DCS’s throughout the industry. Alarm shelving software must work in proper coordination with other dynamic alarm techniques, such as Audit & Enforce software, State-Based alarming capabilities, and Alarm Flood Suppression.

State-Based Alarming. Process equipment often has several normal, but differing, operating states. DCS alarm capabilities do not generally accommodate this fact.

A few common state examples include:

  • Running/ Not Running.
  • Startup/ Shutdown.
  • Full Rates/ Half Rates.
  • Both Trains Running/ Single Train Operation.

State-Based alarm software reads various process values from the DCS to determine the current operating state, including operator confirmation. It then makes the desired, predetermined modifications to alarm trip points, priorities and other settings to match the current operating state.

Alarm Flood Suppression. Alarm floods are sustained periods of high alarm rates. Alarm floods can make a difficult operating situation much worse, and they are common. In a severe alarm flood, the system becomes a hindrance to the operator rather than being a useful tool, Fig. 6. Risks associated with a major process upset or an accident are much higher during an alarm flood.

Fig 6

Fig. 6. Alarm floods render the alarm system useless to the operator.

Flood Suppression is the dynamic management of pre-defined groups of alarms based on triggering events. The most common cause of an alarm flood is the inadvertent shutdown of a piece of equipment.

Step 7. Control and maintain alarm system performance. Processes and sensors change over time. Alarms that have never been a problem may become nuisances. An effective process for ongoing monitoring of alarm system performance is needed.

Every control system should have a named Alarm Management System Champion, whose role will be to maintain the alarm system’s integrity and performance. Alarm System Key Performance Indicators (KPIs) should be developed and routinely reported to key stakeholders, such as operators, engineers and managers. Modern alarm analysis software, such as PAS’s PlantState Suite, makes this easy, automatically and periodically publishing results that are accessible from a company intranet. Reports should contain the following:

  • Alarm rates per console (per day and per 10 min.) vs. target levels.
  • Percentage of time alarm system exceeds target rates.
  • Frequency analysis for the most frequent alarms and chattering alarms, showing the top 10.
  • Alarm flood analysis, frequency and magnitude.
  • Alarm priority distribution.
  • Nuisance alarms, such as chattering.
  • Uncontrolled alarm suppression.
  • List of long-standing (stale) alarms.
  • Nuisance Alarm List and progress made on those alarms
  • Action plans to improve performance.

It is advisable to create special reports after major upsets, or incidents such as spills, that detail alarm system performance prior, during and after the incident. Frequency of the alarm performance reports should be tailored to the specific culture of the company and individual stakeholders.

CONCLUSIONS

Overloaded and malfunctioning alarm systems are common in the hydrocarbon industries. Poorly functioning alarm systems continue to negatively impact the profitability, safety and environmental performance of production and processing plants worldwide. The good news is that alarm problems can be identified, isolated and solved through proven alarm management methodologies. Meteorologists may embrace Chaos Theory, but chaos has no place in the control room. WO

ACKNOWLEDGEMENT

The authors thank Edward Lorenz for the theory of the “Butterfly Effect” from his presentation, “Predictability: Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” They also thank Pierre Grosdidier, Phd. and P.E.; Patrick Connor, P.E.; Bill Hollifield; and Samir Kulkarni for their article, “A path forward for DCS alarm management,” available at www.pas.com. Particular thanks go to the employees of PAS, without whom none of this would be possible.


THE AUTHORS

Habibi

Eddie Habibi is the founder and CEO of Houston-based PAS. He has led the growth of PAS to its present standing as a global provider of advanced automation solutions to the processing industries worldwide. He is a recognized industry thought leader in the areas of critical condition management, including alarm management and operator effectiveness, as well as automation asset management. Mr. Habibi is co-author, with Bill Hollifield, of The Alarm Management Handbook – A Comprehensive Guide, available at www.pas.com and at www.amazon.com.


Hollifield

Bill R. Hollifield is the principal consultant at PAS, responsible for Alarm Management work processes and products, intellectual property and software product directions. He is a voting member of the ISA SP-18 Alarm Management committee. Mr. Hollifield has international, multi-company experience in all aspects of Alarm Management, along with 30 years of industry experience that focused on project management, production and control systems.


      


ABOUT THE COMPANY

      

PAS (www.pas.com) is a Houston-based company that provides a full portfolio of process automation software solutions and services. These include Critical Condition Management, Advanced Process Control, and Automation Asset Management. PAS is a recognized industry leader in Alarm Management, with a comprehensive set of offerings that include Alarm Analysis, Documentation and Rationalization, Real-Time Alarm Management, and Control Loop Performance Optimization software.


      

Related Articles FROM THE ARCHIVE
Connect with World Oil
Connect with World Oil, the upstream industry's most trusted source of forecast data, industry trends, and insights into operational and technological advances.