# Neural network-derived model accurately predicts oil recovery in water-drive reservoirs

Artificial neural networks are well-suited for modeling complex non-linear relationships, including the modeling of oil recovery factors in water-drive reservoirs. Image: Apache Corp.

This theorum states that any continuous function that maps a set of real numbers, to another set of real numbers, can be approximated with a certain degree of accuracy by a feed-forward ANN with a single hidden layer and a finite number of hidden units, which contain non-linear transfer functions.

The first publicly available, general-purpose ANN designed specifically for the oil industry was called Neuro3 and was released by the DOE in 2001.^{2} It was rudimentary, but is still relevant.

An ANN generates a predictive model as a multidimensional function containing elements with adjustable parameters. The most important elements of an ANN model are the neurons, which are contained in a “hidden layer,” **Fig. 1.** The neurons receive information in the form of numerical data input. This information is then combined with a set of parameters within the neural network to produce a result in the form of a numerical output.

Most common ANN models use a back-propagation, feed-forward, neural network system. The calculation procedure is feed-forward; in other words, from input layer through hidden layers to output layer. The back-propagation occurs during the training phase, where the calculated outputs are compared with the desired values, and the errors are input to modify the weights used in the ANN for the next iteration.

The neural network parameters are made up of two components: a “combination function,” which takes all the inputs and produces a set of net input values by calculating the combination of each neuron using a weight factor and a bias; and a “transfer function” which produces the neuron output. **Figure 2** presents a single neuron feed-forward neural network process, where ∑ is the combination function and ∫ is the transfer function. It should be noted that the most common transfer function in predictive neural networks is the hyperbolic tangent (tanh).

Full-featured neural network programs are normally combined with genetic algorithms, statistics/linear regression, and fuzzy logic to automatically find optimal or near-optimal solutions for the problem.^{3,4}

The validity of any ANN model is dependent on how well the system has been trained. Normally, a large dataset of well-behaved data is used for training, and a different set of data is used as a blind test after the model is built. ANN systems will subdivide the training data set into a training set and a validating subset. This is not the same as using a data set as a blind test. The validating subset is still part of the training subset and is used in the back-propagation algorithms, so it is not an independent test of the model.

There are two steps in the development of a neural network: Training establishes the network structure and, the weight factors for the network and therefore, always occurs before the prediction step. Once trained, the ANN can be used to predict output data from new input.

## OIL RECOVERY FACTOR

Artificial neural networks are particularly well-suited for modeling complex non-linear relationships, which cannot be easily patterned by traditional linear regression methods. SPE has published numerous papers on the use of neural networks in the oil industry. A search in www.onepetro.org, using the keywords “neural networks,” yields over 3,900 references.

The ANN oil recovery factor model was developed, using an open-source ANN system. There are free and commercial ANN programs available that are based on the open-source, OpenANN system, comprised of freely distributed neural network libraries.^{5}

**Building the ANN Oil RF model.** There are several empirical relationships that have been published for recovery factor predictions. The most common are the equations for recovery efficiency published by the API Subcommittee on Recovery Efficiency, developed from a statistical study of 70 water drive reservoirs.^{6,7}

The empirical API oil recovery factor equation for water drive (WD) oil reservoirs, for a given porosity (Ф), water saturation (Sw), formation volume factor (Boi), permeability (k), water viscosity (µw), oil viscosity (µoi), initial pressure (Pi) and abandonment pressure (Pab) is as follows:

WD RF = 54.898 (Ф(1–Sw)/B_{oi})^{0.0422 }(1000kµ_{wi}/µ_{oi})^{0.077} S_{w}^{–0.1903} (P_{i}/P_{ab})^{–0.2159}

An API RF equation description, and how it can be converted into a probabilistic determination of recovery factor, is described in the literature.^{7}

Based on general reservoir and production engineering principals, the parameters used to build the ANN Oil RF model were: sandstone reservoir, water drive, original oil in place (STOOIP), porosity (Ф), permeability (k), oil viscosity (µ), oil gravity (API) and net pay (h). Although these are not the only parameters that impact recovery, these parameters were considered to be the major components impacting long-term production and recovery of oil from a reservoir.

For the sandstone reservoir ANN Oil RF model, 264 sandstone/clastic water drive/combination drive oil reservoirs were available with complete data sets, including recovery factors. A random subset of 46 reservoirs was removed from the full data set, to be used as a blind test. Of the remaining 218 reservoirs, 20% were used as validating data for training. Figure 3 presents the data for the sandstone reservoir data set.

After a visual inspection, it becomes apparent that the data have a large scatter and a wide range. This is not unexpected when considering oil reservoir recovery factors. The scatter is one of the problems with trying to generate an oil recovery factor correlation that is universally valid. The scatter is predominantly caused by using average reservoir parameters, as well as the accuracy of the data.

To make the data more conducive to neural network use, the actual inputs to the ANN were: Log(STOOIP), Log(kh), Log(viscosity), Log(phi-h) and Oil API. The logarithm of the input data was used to linearize the data range and to place the maximum and minimum in a reasonable range, as well as to pre-process/linearize the data prior to use in the ANN.

**Figure 3** shows that each input data set exhibits an approximate linear trend in terms of recovery factor. These apparent trends help the neural network system find an optimal solution. The original data set included 38 carbonate/dolomite oil reservoirs. These data were removed from the gross data set of 302 reservoirs, because the recovery factors calculated for the carbonate reservoirs were always the outliers in the resulting ANN model. This was the result of carbonate reservoirs containing natural fractures, which will impact the ultimate recovery from these reservoirs.

The significant unknown when building ANN’s is the number of hidden layers and the number of neurons to include in the model. The generally accepted procedure is to make the ANN as simple as possible. For the majority of problems, one hidden layer is sufficient. With well-behaved data, it may be possible to minimize the neurons so that they are less than or equal to the number of inputs to the model—5 in the oil recovery factor case.

Unfortunately, it was found that when using a small number of neurons (<=5) the resulting ANN could not handle the recovery factor problem. Specifically, with a lower neuron count, the neural network was unable to model the high and low recovery factor trends, **Fig. 4.**

Since it appeared that the neural network had trouble finding an optimum solution that would cover the entire range of recovery factors for low neuron counts, multiple ANN models were built and evaluated. The objective was to find an optimum number of neurons that would yield a neural network that would generate values that were optimized over the entire range of expected recovery factors.

**Figure 5** presents the slope and R^{2 }correlation factor (actual RF vs ANN RF) for ANN’s built using different numbers of neurons. It appears that maximum accuracy of the ANN-calculated recovery factors occurs when the neural network contains ten neurons.

One caveat that should be kept in mind when considering the number of neurons to include is the potential of over-fitting the data. Neural network training is an exercise in fitting data to a complex mathematical function. If the function is made to match the training data too precisely by using a large number of neurons, then when independent data (blind test data) are run through the network, there is a risk that the results may not be acceptable.

The reason for this is that rather than generalizing patterns in the training data, the network has become a look-up table, because the neural network is too complex. In this situation, the ANN has essentially regenerated the input-output results used in training and cannot accurately process new inputs that are not within the look-up table. Keeping the caveat in mind, the final ANN Oil RF Model for sandstone reservoirs was built and trained with ten neurons.

## SANDSTONE RESERVOIRS

**Figure 6** presents the actual recovery factors compared to the ANN-generated recovery factors for both the training and blind test data sets for sandstone reservoirs:

- Sandstone reservoirs
- 10 non-linear neurons in ANN
- Dataset: 264 reservoirs
- Training/blind: 218/46
- A random 20% of the training data set was used for validation checks during the training stage
- The blind data set as not part of the training
- Lines representing +/– 25% variation

- Correlation coefficient is 0.59 (the outliers cause the correlation coefficient to be low).

After training the ANN Oil RF Model, 70% of the ANN calculated recovery factors were within +/–25% of the

actual value.

The resulting equation that represents the ANN Oil RF Model for sandstone reservoirs is given in **Fig. 7.** The data ranges for the parameters used in the sandstone reservoir ANN Oil RF Model are as follows:

- Oil in place: 10 MMbbl to 55,000 MMbbl
- Permeability: 0.6 md to 7,000 md
- Net pay: 10 ft to 1,800 ft
- Porosity: 5% to 35%
- Oil gravity: 15° API to 55° API
- Oil viscosity: 0.1 cp to 88 cp.

Using input data outside the ranges presented above may yield unreasonable results. It should be kept in mind that even using input data within the ranges presented above may still yield results that are unreasonable, since there was not enough training data to cover all possible input combinations.

As an example of an ANN Oil RF model calculation, a sandstone water drive oil reservoir containing the following: STOOIP=10,000,000 bbl; permeability=10 md; net pay=50 ft; porosity=20%; Oil API=35°, and viscosity=0.13 cp is calculated to have an ANN recovery factor of 34.5%.

If the ANN Oil RF model is going to be used with missing data, then it may be valid to use an average value for the missing parameter. For the sandstone reservoir data set, the averages are as follows: Log(STOOIP) = 2.6524, Log(kh) = 4.3573, Log(viscosity) = 0.2167, Log(phi-h) = 3.2745 and Oil API = 33.5. If more than one of the parameters is missing, the ANN Oil RF Model should not be used.

## CONCLUSIONS

An artificial neural network was built to generate recovery factors for sandstone oil reservoirs. The resulting model predicted the actual recovery factors within +/–25% for 70% of the data.

Although the results appear to be reasonable, there is still large scatter in the results. This is common for all recovery factor correlations and is the result of the variability, and averaging, of the

input data.

As is the case for all recovery factor calculations, the results generated by the ANN Oil RF Model should be used with caution and checked against other techniques.

The ANN Oil RF Model represents the first step in generating a generic oil recovery factor model using artificial neural networks. The next step is to increase the size of the training data set and to further refine the training data.

**REFERENCES**

- http://en.wikipedia.org/wiki/Universal_approximation_theorem
- https://www.netl.doe.gov/research/oil-and-gas/Software/ep-tools#RA
- https://en.wikipedia.org/wiki/Genetic_algorithm
- https://en.wikipedia.org/wiki/Fuzzy_logic.
- https://www.opennn.net
- Arps, J.J., et al., “Statistical analysis of crude oil recovery and recovery efficiency,” Bulletin D14, American Petroleum Institute, Washington D.C., October 1967.
- LeBlanc, D., “Petroleum engineering and economics essentials: Tools and techniques to evaluate unconventional (and conventional) wells and reservoirs,” Section 4, self-published book available online at http://www.eastexpetroleum.com/PE_Essentials/Book.htm

**About the Authors**

**Related Articles**

- Executive viewpoint (November 2023)
- Digital transformation: Accelerating productivity, sustainability in oil and gas (November 2023)
- Technological innovation delivers transformative product suite to upstream sector (November 2023)
- Taking the next step in offshore digitalization (November 2023)
- Optimizing BHA and fluid selection with a machine learning-based drilling system recommender (October 2023)
- An advanced model for hydrodynamic analysis and development planning of reservoirs: A case study in southwestern Iran (October 2023)