March 2019
Special Focus

Neural network-derived model accurately predicts oil recovery in water-drive reservoirs

Artificial neural networks are information processing systems constructed to mimic procedures that resemble those of the human brain. They have the ability to learn procedures from training patterns or data. These systems excel at pattern matching, classification, data clustering and forecasting.
Don LeBlanc / Eastex Petroleum Consultants

Artificial neural networks are well-suited for modeling complex non-linear relationships, including the modeling of oil recovery factors in water-drive reservoirs. Image: Apache Corp.

This theorum states that any continuous function that maps a set of real numbers, to another set of real numbers, can be approximated with a certain degree of accuracy by a feed-forward ANN with a single hidden layer and a finite number of hidden units, which contain non-linear transfer functions.

The first publicly available, general-purpose ANN designed specifically for the oil industry was called Neuro3 and was released by the DOE in 2001.2 It was rudimentary, but is still relevant.

An ANN generates a predictive model as a multidimensional function containing elements with adjustable parameters. The most important elements of an ANN model are the neurons, which are contained in a “hidden layer,” Fig. 1. The neurons receive information in the form of numerical data input. This information is then combined with a set of parameters within the neural network to produce a result in the form of a numerical output.

Fig. 1. Artificial neural network model.
Fig. 1. Artificial neural network model.

Most common ANN models use a back-propagation, feed-forward, neural network system. The calculation procedure is feed-forward; in other words, from input layer through hidden layers to output layer. The back-propagation occurs during the training phase, where the calculated outputs are compared with the desired values, and the errors are input to modify the weights used in the ANN for the next iteration.

The neural network parameters are made up of two components: a “combination function,” which takes all the inputs and produces a set of net input values by calculating the combination of each neuron using a weight factor and a bias; and a “transfer function” which produces the neuron output. Figure 2 presents a single neuron feed-forward neural network process, where ∑ is the combination function and ∫ is the transfer function. It should be noted that the most common transfer function in predictive neural networks is the hyperbolic tangent (tanh).

Fig. 2. Single neuron artificial neural network model.
Fig. 2. Single neuron artificial neural network model.

Full-featured neural network programs are normally combined with genetic algorithms, statistics/linear regression, and fuzzy logic to automatically find optimal or near-optimal solutions for the problem.3,4

The validity of any ANN model is dependent on how well the system has been trained. Normally, a large dataset of well-behaved data is used for training, and a different set of data is used as a blind test after the model is built. ANN systems will subdivide the training data set into a training set and a validating subset. This is not the same as using a data set as a blind test. The validating subset is still part of the training subset and is used in the back-propagation algorithms, so it is not an independent test of the model.

There are two steps in the development of a neural network: Training establishes the network structure and, the weight factors for the network and therefore, always occurs before the prediction step. Once trained, the ANN can be used to predict output data from new input.


Artificial neural networks are particularly well-suited for modeling complex non-linear relationships, which cannot be easily patterned by traditional linear regression methods. SPE has published numerous papers on the use of neural networks in the oil industry. A search in, using the keywords “neural networks,” yields over 3,900 references.

The ANN oil recovery factor model was developed, using an open-source ANN system. There are free and commercial ANN programs available that are based on the open-source, OpenANN system, comprised of freely distributed neural network libraries.5

Building the ANN Oil RF model. There are several empirical relationships that have been published for recovery factor predictions. The most common are the equations for recovery efficiency published by the API Subcommittee on Recovery Efficiency, developed from a statistical study of 70 water drive reservoirs.6,7

The empirical API oil recovery factor equation for water drive (WD) oil reservoirs, for a given porosity (Ф), water saturation (Sw), formation volume factor (Boi), permeability (k), water viscosity (µw), oil viscosity (µoi), initial pressure (Pi) and abandonment pressure (Pab) is as follows:

WD RF = 54.898 (Ф(1–Sw)/Boi)0.0422 (1000kµwioi)0.077 Sw–0.1903 (Pi/Pab)–0.2159

An API RF equation description, and how it can be converted into a probabilistic determination of recovery factor, is described in the literature.7

Based on general reservoir and production engineering principals, the parameters used to build the ANN Oil RF model were: sandstone reservoir, water drive, original oil in place (STOOIP), porosity (Ф), permeability (k), oil viscosity (µ), oil gravity (API) and net pay (h). Although these are not the only parameters that impact recovery, these parameters were considered to be the major components impacting long-term production and recovery of oil from a reservoir.

For the sandstone reservoir ANN Oil RF model, 264 sandstone/clastic water drive/combination drive oil reservoirs were available with complete data sets, including recovery factors. A random subset of 46 reservoirs was removed from the full data set, to be used as a blind test. Of the remaining 218 reservoirs, 20% were used as validating data for training. Figure 3 presents the data for the sandstone reservoir data set.

After a visual inspection, it becomes apparent that the data have a large scatter and a wide range. This is not unexpected when considering oil reservoir recovery factors. The scatter is one of the problems with trying to generate an oil recovery factor correlation that is universally valid. The scatter is predominantly caused by using average reservoir parameters, as well as the accuracy of the data.

To make the data more conducive to neural network use, the actual inputs to the ANN were: Log(STOOIP), Log(kh), Log(viscosity), Log(phi-h) and Oil API. The logarithm of the input data was used to linearize the data range and to place the maximum and minimum in a reasonable range, as well as to pre-process/linearize the data prior to use in the ANN.

Figure 3 shows that each input data set exhibits an approximate linear trend in terms of recovery factor. These apparent trends help the neural network system find an optimal solution. The original data set included 38 carbonate/dolomite oil reservoirs. These data were removed from the gross data set of 302 reservoirs, because the recovery factors calculated for the carbonate reservoirs were always the outliers in the resulting ANN model. This was the result of carbonate reservoirs containing natural fractures, which will impact the ultimate recovery from these reservoirs.

Fig. 3. Sandstone reservoir data set with 264 reservoirs (218 training, 46 blind test).
Fig. 3. Sandstone reservoir data set with 264 reservoirs (218 training, 46 blind test).

The significant unknown when building ANN’s is the number of hidden layers and the number of neurons to include in the model. The generally accepted procedure is to make the ANN as simple as possible. For the majority of problems, one hidden layer is sufficient. With well-behaved data, it may be possible to minimize the neurons so that they are less than or equal to the number of inputs to the model—5 in the oil recovery factor case.

Unfortunately, it was found that when using a small number of neurons (<=5) the resulting ANN could not handle the recovery factor problem. Specifically, with a lower neuron count, the neural network was unable to model the high and low recovery factor trends, Fig. 4.

Fig. 4. Example ANN with four neurons.
Fig. 4. Example ANN with four neurons.

Since it appeared that the neural network had trouble finding an optimum solution that would cover the entire range of recovery factors for low neuron counts, multiple ANN models were built and evaluated. The objective was to find an optimum number of neurons that would yield a neural network that would generate values that were optimized over the entire range of expected recovery factors.

Figure 5 presents the slope and R2 correlation factor (actual RF vs ANN RF) for ANN’s built using different numbers of neurons. It appears that maximum accuracy of the ANN-calculated recovery factors occurs when the neural network contains ten neurons.

Fig. 5. ANN model construction – evaluation of number of neurons.
Fig. 5. ANN model construction – evaluation of number of neurons.

One caveat that should be kept in mind when considering the number of neurons to include is the potential of over-fitting the data. Neural network training is an exercise in fitting data to a complex mathematical function. If the function is made to match the training data too precisely by using a large number of neurons, then when independent data (blind test data) are run through the network, there is a risk that the results may not be acceptable.

The reason for this is that rather than generalizing patterns in the training data, the network has become a look-up table, because the neural network is too complex. In this situation, the ANN has essentially regenerated the input-output results used in training and cannot accurately process new inputs that are not within the look-up table. Keeping the caveat in mind, the final ANN Oil RF Model for sandstone reservoirs was built and trained with ten neurons.


Figure 6 presents the actual recovery factors compared to the ANN-generated recovery factors for both the training and blind test data sets for sandstone reservoirs:

  • Sandstone reservoirs
    • 10 non-linear neurons in ANN
    • Dataset: 264 reservoirs
    • Training/blind: 218/46
    • A random 20% of the training data set was used for validation checks during the training stage
    • The blind data set as not part of the training
    • Lines representing +/– 25% variation
  • Correlation coefficient is 0.59 (the outliers cause the correlation coefficient to be low).
Fig. 6. ANN Oil RF Model training; ANN modeled oil recovery factor - sandstone.
Fig. 6. ANN Oil RF Model training; ANN modeled oil recovery factor - sandstone.

After training the ANN Oil RF Model, 70% of the ANN calculated recovery factors were within +/–25% of the
actual value.

The resulting equation that represents the ANN Oil RF Model for sandstone reservoirs is given in Fig. 7. The data ranges for the parameters used in the sandstone reservoir ANN Oil RF Model are as follows:

  • Oil in place: 10 MMbbl to 55,000 MMbbl
  • Permeability: 0.6 md to 7,000 md
  • Net pay: 10 ft to 1,800 ft
  • Porosity: 5% to 35%
  • Oil gravity: 15° API to 55° API
  • Oil viscosity: 0.1 cp to 88 cp.
Fig. 7. ANN Oil RF model equation – sandstone reservoirs.
Fig. 7. ANN Oil RF model equation – sandstone reservoirs.

Using input data outside the ranges presented above may yield unreasonable results. It should be kept in mind that even using input data within the ranges presented above may still yield results that are unreasonable, since there was not enough training data to cover all possible input combinations.

As an example of an ANN Oil RF model calculation, a sandstone water drive oil reservoir containing the following: STOOIP=10,000,000 bbl; permeability=10 md; net pay=50 ft; porosity=20%; Oil API=35°, and viscosity=0.13 cp is calculated to have an ANN recovery factor of 34.5%.

If the ANN Oil RF model is going to be used with missing data, then it may be valid to use an average value for the missing parameter. For the sandstone reservoir data set, the averages are as follows: Log(STOOIP) = 2.6524, Log(kh) = 4.3573, Log(viscosity) = 0.2167, Log(phi-h) = 3.2745 and Oil API = 33.5. If more than one of the parameters is missing, the ANN Oil RF Model should not be used.


An artificial neural network was built to generate recovery factors for sandstone oil reservoirs. The resulting model predicted the actual recovery factors within +/–25% for 70% of the data.

Although the results appear to be reasonable, there is still large scatter in the results. This is common for all recovery factor correlations and is the result of the variability, and averaging, of the
input data.

As is the case for all recovery factor calculations, the results generated by the ANN Oil RF Model should be used with caution and checked against other techniques.

The ANN Oil RF Model represents the first step in generating a generic oil recovery factor model using artificial neural networks. The next step is to increase the size of the training data set and to further refine the training data.


  6. Arps, J.J., et al., “Statistical analysis of crude oil recovery and recovery efficiency,” Bulletin D14, American Petroleum Institute, Washington D.C., October 1967.
  7. LeBlanc, D., “Petroleum engineering and economics essentials: Tools and techniques to evaluate unconventional (and conventional) wells and reservoirs,” Section 4, self-published book available online at 
About the Authors
Don LeBlanc
Eastex Petroleum Consultants
Don LeBlanc is president and principal engineer of Eastex Petroleum Consultants Inc. His experience includes reservoir engineering, production engineering, field development, reservoir management, formation evaluation, reservoir/ production modeling, and reserve calculations. During his 38-year career, he has worked with supermajors, national oil companies, small independents and consulting firms worldwide. He is the author of the self-published book, PE Essentials and the PE Essentials software. Mr. LeBlanc holds a BS degree (magnis cum) in engineering-physics from Dalhousie University. He can be contacted at
Related Articles
Connect with World Oil
Connect with World Oil, the upstream industry's most trusted source of forecast data, industry trends, and insights into operational and technological advances.