Using Data Science to Improve the Natural Gas Pipeline System 

Executive Summary

  • A local distribution company (LDC) in the natural gas industry needed to predict which parts of its pipeline were at greatest risk of failure and consequence to nearby residents and businesses
  • A consultancy working with the company’s Asset Integrity Management department, used data science to predict pipeline failures
  • By overlaying the potential consequences of each predicted failure, the consultants helped the company adopt a monetized risk approach to managing the pipeline
  • As a result, the company’s pipeline operations are now safer, while spend decision-making is more efficient

About the Company and the RapidMiner User Group

The focus of this case study is a local distribution company (LDC) in the United States natural gas industry. Like most LDCs, the company receives gas at the city gate, the point where gas leaves the long distance transportation system for the lower pressure, more diffuse, LDC systems that deliver the gas locally. The company buys gas from the long distance transporter and delivers and sells it to homes and businesses in the territories in which the company operates.

The gas pipeline network is a critical part of the company’s operations, as with any LDC. The company’s network is typical of LDCs, in terms of its age and health. A large portion of the nation’s local distribution network was installed in the 1950s and 1960s as consumer demand for natural gas more than doubled following World War II. In the 1990s, over 200,000 miles of new local distribution pipelines were installed to provide service to new commercial facilities and housing developments. Over 30,000 more miles of distribution pipelines were added from 2000 to 2014. There are, in total, more than two million miles of gas distribution pipeline in the US.

The company has an Asset Integrity Management department, which is responsible for ensuring the well being of the pipeline network, including identifying sections of the network needing repair and overseeing repairs when they happen. To aid in this effort, the company has contracted with a specialist consulting firm to lend its data science and machine learning expertise and help the company increase the precision and effectiveness of its asset management process. The consulting firm is a certified partner of RapidMiner, and the consultants are the primary RapidMiner users on behalf of the company.

The Company’s Need

Better predict which parts of the pipeline need repair

The company hired the consultants to use their data science expertise to help it better identify the parts of the pipeline that represented the highest overall risk, in terms of need of repair and potential consequences should a leak occur.

Local distribution companies have an obligation to their customers and their communities to ensure their pipeline networks are safe and reliable. Small leaks in the gas pipelines are common, especially considering how old some of the infrastructure is. In fact, the LDC industry finds and repairs over 500,000 leaks per year. Mercaptan, the odorant added at the city gate which gives natural gas its distinctive rotten egg smell, means gas leaks are often detected and fixed quickly. Further, there are strict government regulations on pipeline safety and leak prevention promulgated by the federal government Pipeline and Hazardous Materials Safety Administration (PHMSA) implemented by State Agencies.

And yet, despite the frequency of leak detection, and the industry and government’s best efforts to ensure pipeline integrity, the risk of not finding or fixing a critical leak remains high. The ramifications of failure to proactively address a pipeline network issue can range from costly to cataclysmic. For example, in September 2018, a series of explosions and fires occurred in natural gas lines of the Merrimack Valley region of Massachusetts, damaging dozens of houses, causing the evacuation of thousands of people, and killing one victim.

The Solution

RapidMiner enables building multiple predictive models and weighing the unique, pipeline-related trade-offs of each

The company has a large volume of data available for use in building predictive models. Data about the pipeline itself is the starting point – the specific design, the diameter of pipes, pipe wall thickness, the metallurgy of the pipes, how exactly they were welded, and much more. Geo-spatial data is also important, as it describes the areas the pipeline networks run through, including buildings, houses, roads, waterways, etc. This not only impacts the likelihood of failure, but also the damage that might be done should an incident occur. Data on the performance of the pipeline rounds out the set, covering leak surveys, pressure upsets, measures of water introduction into pipelines (which causes corrosion), and corrosion surveys. This data also includes the history of pipeline failures, which serves as the independent variable when building models.

But the data is not always ready for use in building models. In some cases, the company collects data to satisfy regulatory requirements – but the regulations don’t ask the company to do much with it. So, the consultants made heavy use of RapidMiner’s data prep functions to get the data in shape for modeling.

Once the data has been prepped, RapidMiner puts over a hundred different modelling approaches at users’ fingertips. They can quickly build multiple models and compare results and predictive power. The consultants make heavy use of RapidMiner’s confusion matrices to provide transparency into the performance of the different models, which is important given the unique nature of the work they’re doing. When trying to predict potentially catastrophic gas leaks, the company needs to understand and make very deliberate choices when weighing the trade-offs between false positives, which might result in digging a hole to find a leak that isn’t there (costly, but not a big deal), and predictive misses which could result in not anticipating and fixing a leak (which could be very dangerous).

The Results

RapidMiner has helped pinpoint the right repairs to make proactively

The impact of the consultants’ work using RapidMiner on the company’s asset integrity management is substantial. Better identifying the sections of the pipeline needing repair increases the chance of fixing leaks before they happen. This reduces the risk of significant property damage and, most importantly, loss of life.

But the impact of data science goes beyond these top-line concerns. It also helps the company with more day-to-day issues, such as optimizing the allocation of repair budget to the parts of the pipeline that need it most. This allows the company’s asset integrity management department to have a bigger impact without needing additional resources.

Looking beyond just regulatory compliance to assessing monetized risk in the pipeline, and applying data science to that need, means more efficient repair operations, more leaks prevented and fixed, a better bottom line, and less risk of catastrophic damage. It’s a win on every front. Every company with a network of any kind – other kinds of fuel pipelines, power grids, telecommunications networks, even roads and bridges – should take note, as they could use data science in the same way with similar benefits.