Application of Multivariate Statistical Modeling and Geographic ...

6 downloads 872 Views 3MB Size Report
This paper was selected for presentation by an SPE program committee following ... The use of multivariate statistical analysis allowed modeling the impact of ...
SPE 168628 Application of Multivariate Statistical Modeling and Geographic Information Systems Pattern-Recognition Analysis to Production Results in the Eagle Ford Formation of South Texas Randy F. LaFollette, Dr. Ghazal Izadi, Dr. Ming Zhong, SPE, Baker Hughes

Copyright 2014, Society of Petroleum Engineers This paper was prepared for presentation at the SPE Hydraulic Fracturing Technology Conference held in The Woodlands, Texas, USA, 4–6 February 2014. This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract As of this date, approximately 5,000 horizontal Eagle Ford wells have been completed in South Texas. Still, geologists and engineers question whether their companies are using the most appropriate operating practices. Side-by-side case studies may show value or not, given the challenge of small sample size and hidden influences on outcome. Multivariate statistical analysis of larger data sets offers sound interpretation across larger geographic areas, with the caveat that correlations need to be scaled to local conditions. The purpose of this paper is to apply multivariate statistical modeling in conjunction with Geographic Information Systems (GIS) pattern recognition work to the Eagle Ford. The investigation began by acquiring Eagle Ford data using both proprietary and public information. The different data sets were loaded into a common database and put through quality control sanity checks. Production proxies, such as maximum oil rate in the first 12 producing months and normalized 12 month cumulative production, were selected and merged with the other data. Final data sets were then subjected to analysis in both an open-source multivariate statistical analysis and visualization code and a commercial Geographic Information Systems (GIS) application. Similar to other studies in unconventional reservoirs, integration of the two analysis and interpretation methods highlighted the importance of using well location as a proxy for reservoir quality when working with data sets that lack such measurements. The use of multivariate statistical analysis allowed modeling the impact of particular well architecture, completion, and stimulation parameters on the production outcome by integrating out the impact of other variables in the system. This work is a continuation of the prior work designed to address well optimization in unconventional reservoirs. It is significant in that it takes full advantage of GIS map-based methods and multivariate statistical methods to capitalize on the volume of data available through the public domain. Introduction With approximately 5,000 horizontal Eagle Ford wells now completed, large sets of public and proprietary data are available for production and completion / stimulation optimization studies. However, the processes of gathering data, quality control, learning what questions to ask of the data sets and learning how to ask these questions in robust statistical fashion contain many challenges for practitioners in the industry. Data sets involving large well counts contain many variables that are not ideally distributed, have missing values, bad values, etc. This work uses particular data mining methods, particularly GIS mapping and Boosted Regression Tree Modeling, to attempt to overcome some of the above challenges with the available data sets in order to better understand the impact of key well, completion, and stimulation parameters on productivity and production efficiency. In this work, the Eagle Ford Formation of South Texas, USA, was divided into three major producing areas that were subsequently studied with mapping techniques and that were individually modeled using Boosted Trees. The results of the study have been discussed both by area and in sum.

2

SPE 168628

Formation, Study Area, and Goals The Eagle Ford Formation is a Late Cretaceous Age sedimentary rock formation that underlies much of South Texas. The rocks are mainly organic matter-rich fossiliferous marine shales of the Lower Eagle Ford interval. The play extends over an area of approximately 11 million acres overall and the main body of the play stretches from the Texas / Mexico border to the eastern borders of Gonzales and Lavaca Counties [1] (Figure 1, U.S. Energy Information Administration). The northern part of the play (highlighted in green) is in the oil maturation window, and in addition to producing crude oil, it also contains lesser amounts of natural gas and natural gas liquids (NGLs). Situated to the south and southeast of the oil window, the wetgas region (highlighted in yellow) produces gas along with high volumes of NGLs. The southernmost region (highlighted in red) contains mostly dry natural gas. Because oil and natural gas liquids command a higher price than natural gas, producers have mostly focused on extracting the formation’s oil and NGL resources [2].

Figure 1: Location map showing general study area of Eagle Ford production, South Texas, United States: Source: US Energy Information Administration.

The goal of this study is to apply GIS and multivariate statistical data mining methods to Eagle Ford data sets in order to focus on the production impacts of well location, well architecture, completion, and stimulation on production results in three geographical areas (Figure 2, below, Areas 1, 2 and 3).

SPE 168628

3

Figure 2: Area map in this study, South Texas.

Data Sources, Quality Control and Methodology The data sets forming the basis of this work are taken from both public and proprietary sources. Basic well header data, location, key well dates, producing formation, actual directional surveys, well test and treatment data, and monthly production stream values are from subscription to a commercial database. Well Treatment Reports internal to the authors’ company were also used to sanity check the public stimulation data, particularly fluid volumes and proppant quantities, for those wells treated by the authors’ company. Data were gathered initially and additional quality control checks were performed as appropriate. Suspect data values were flagged to limit their influence on study results. Well data were subdivided into reservoir quality, well architecture, completion, stimulation, and production classes. Reservoir Quality was proxied by X-Y surface location as the data sources and resources for petrophysical analysis are not available on the scale required when working with thousands of wells and when public data are used. It should be noted that different geological / geographic areas may exhibit different production rate drivers. Grouping wells into subsets and modeling by area can aid in assessment of the relative importance of the different possible productivity drivers. Well architecture data included completed lateral length (CLAT), well azimuth, and well dip angle. CLAT was calculated as measured depth (MD) of the bottom perforation or sleeve minus MD of the top perforation or sleeve. Average azimuth calculations were taken from the actual directional survey in the completed lateral section of the well. Special care was taken when calculating well azimuth when survey points crossed the 0 / 360 degree mark of the Northwest / Northeast quadrants. These wells were flagged, azimuths projected onto the Southeast / Southwest quadrants, average azimuth calculated, and then projected back to their original directions. Well dip angle is averaged from the actual directional survey over the completed section of the lateral. Wells with dip angles less than 90 degrees are toe down. Wells having dip angles greater than 90 degrees are toe up, and 90 degree wells are flat. Stage counts from the public data indicate that over 2,000 (~ 60%) wells are single-stage wells. This is questionable, given that the study wells are all horizontal. The stage counts from the in-house database have been used to overwrite public stage data in those cases where the in-house data were shown to be more reliable. To address the concern on the quality of stage counts, the authors evaluated its impact on the production with a more reliable subset of data. Preliminary boosting models were run using only wells reliably known to be multi-stage. This work showed that the relative influence of stage count was outranked in these data sets by geographic locations, gas-oil ratio, and well stimulation parameters. Given the interpretation that stage count held lower importance, and given that sample size is a key to the quality of statistical analyses, the final models were run using all wells, including those wells reported as single-stage completions.

4

SPE 168628

The well stimulation treatment data analyzed in this study focused on generic fracturing fluid type and volume, and on proppant type and mass. Stage-by-stage data were aggregated to the well level for analysis. As part of the data validation process, public treatment data, even within the same data model, were cross-checked for internal consistency and inconsistencies were flagged. The comparison enabled an improved acceptance rate of inconsistent data when obvious errors, such as wrong units reported, could be matched and corrected. Production Trend Analysis Different hydrocarbon types in the Eagle Ford may be characterized by their cumulative gas-oil ratio (GOR). Fluid types evolve basin ward from black oil to volatile oil, condensate and finally to dry gas, and vary with increasing formation depth, pore pressure, API gravity, and thermal maturity [3]. Eagle Ford production is found at depths between 4,000 and 14,000 feet TVD. Porosity ranges from about 6 – 11% and pressure gradient ranges between 0.5 - 0.8+ psi/ft [3, 6, 7].

Figure 3: Eagle Ford shale fluids types characterized by GOR.

An average GOR is calculated from Equation 1: ∑𝑛 𝐶𝑢𝑚 𝐺𝑎𝑠𝑖

GOR= ∑1𝑛

1 𝐶𝑢𝑚 𝑂𝑖𝑙𝑖

(Equation 1)

Special care was taken when calculating GOR and selecting wells for inclusion or exclusion from the analysis. For the purposes of this paper, wells having GOR values greater than 15,000 scf/bbl were excluded from the study. The high GOR values were excluded as the study was focused primarily on liquids production. Indeed, better production rates were observed in each study area when GOR values were less than 5,000 scf/bbl. The top part of Figure 4 shows the GOR for all Eagle Ford wells in Areas 1, 2 and 3. Wells with GOR less than 5,000 scf/bbl are bubbled in red and wells with GOR greater than 5,000 scf/bbl are bubbled in green. The lower part of Figure 4 shows that the top 10% producers occur in each of the studied producing areas (1, 2 and 3). Production rate was plotted versus GOR for Areas 1, 2 and 3 (Figure 5). These two-dimensional cross plots show different trends for these three areas. In Area 1, most wells have GOR values less than 5,000 scf/bbl and the decrease in production rate is apparent when wells have GOR values larger than 5,000 scf/bbl. For Area 2, again the majority of the wells have GOR values less than 5,000 scf/bbl. The production rate is generally higher at lower GOR, but the trend is more flat than observed in Areas 1 and 3. For Area 3, when the GOR is higher than 5,000 scf/bbl, production is reduced and shows a steeper slope, as compared to Areas 1 and 2. Note that a larger number of wells in Area 3 have GOR values greater than 5,000 scf/bbl. Two different production metrics were used in the project. Barrels of oil – best producing month in the first 12 producing months (BO) and barrels of oil per completed length of lateral [CLAT], (BO/ft). The first proxy gives an indicator of the factors studied that can drive overall well productivity. The second is a measure of efficiency of completed well length. Because peak monthly well production varies significantly from Gonzales to Webb and Dimmit counties it was decided to

SPE 168628

5

analyze productivity in three split areas, designated as Areas 1, 2 and 3. Table 1 summarizes selected variable ranges in Areas 1, 2 and 3. Due to the existence of outliers, 5% and 95% percentiles were reported instead of minimum and maximum values. Figure 6 shows the histograms of selected variables with vertical axis as density so the total area under the fitted curve is “1.” Examination of Figure 6 shows that GOR and proppant amount are right-skewed, and that well azimuth and Xlocation have bimodal distributions. The scatterplot matrix in Figure 7 shows no clear pairwise linear patterns between the production and the predictors. All suggest that a direct multiple regression analysis would be inappropriate. Either transformation or an alternative method is a more robust choice to model this data set.

Figure 4 : The map at top shows GOR for all Eagle Ford wells. Wells with GOR < 5,000 scf/bbl are bubbled in red and > 5,000 scf/bbl bubbled in green. The map at bottom shows that top 10% producers occur in all of the 3 producing areas and almost invariably have GOR values 5,000 scf/bbl. Area 2: Again majority of wells have GOR < 5,000 scf/bbl and we see that production rate is higher for lower GOR but the trend is more flat than Areas 1 and 3. Area 3: When the GOR is > 5,000 scf/bbl production is reduced with steeper slope, compared to Areas 1 and 2 and more wells have GOR values > 5000 scf/bbl.

Boosted Tree Regression Models Considering the complex nonlinear nature of the data set, the authors adopted a tree boosting method, specifically gradient boosting, for the purposes of this study. This powerful machine-learning method generates a sequence of simple decision trees. Each tree is built upon the prediction residuals of the preceding tree, eventually producing a predictive model in the

SPE 168628

7

form of a weighted additive ensemble of simple trees. Compared to the traditional multivariate modeling methods, this method of tree boosting is more resistant to common data issues such as missing data and outliers. It also handles nonlinearity and variable interactions well, because of the hierarchical structure of decision trees. Boosted regression tree models were built to predict three target production and production efficiency variables: BO (barrels oil, maximum oil rate), BO/ft (barrels oil per CLAT), and BO/lbm (barrels oil per lbm proppant pumped). The predictors include GOR, geographic surface locations, well azimuth, well drift angle, number of fracture stages, CLAT, fracture fluid volume, proppant amount and proppant concentration. Results and Discussion Maximum Oil Rate (MMO) Models Area 1 covers Eagle Ford production in Atascosa, De Witt, Gonzales, Karnes, Live Oak and Wilson counties. Figure 8 shows log 10 peak monthly production and the best- and worst-producing wells in these counties. Review of Figure 8 best wells verifies the influence of well location on productivity and shows that the majority of best producers are located in a narrow Northeast-Southwest trending band in Karnes County and the border of Gonzales and Dewitt counties. Figure 9 displays the relative influence of each study variable in the area. When multiple variables influence the target variables simultaneously, the goal is to learn which ones are the key drivers. The relative influence is essentially a weighted average of the frequencies a variable is used for splitting trees and higher value on the influence plot suggests stronger impact on the target variable. The influence value is proportional to the length of the blue horizontal bar and is scaled to have a sum of 100. For producing Eagle Ford wells in Area 1, GOR stands out as the most influential predictor, followed by proppant amount, X-Y-location and CLAT. The remaining variables are somewhat less influential. As another output from the boosting model, partial dependence plots show the marginal effect of the chosen variable(s) on the target variable. These plots provide useful clues for interpretation. Figure 10 shows the partial dependence plots of the first 6 predictors in the relative influence ranking list. In Figure 10 the upper left partial dependence plot highlights GOR as a major driver of peak month oil productivity in Area 1. The partial dependence plot of prop quantity shows that increased proppant quantity is generally associated with increased productivity at least up to the maximum 8+ million lb treatments shown in the data set. From Figure 10, well location is also a key driver, as is CLAT (bottom center) with increased peak oil rate being associated with longer laterals over the range of approximately 3,000 to 6,000 feet. Area 2 covers Eagle Ford production in Atascosa, Dimmit, Frio, La Salle, McMullen and Zavala counties. Figure 11 shows peak monthly production and best/worst producing wells in Area 2. Review of Figure 11 indicates that best wells are generally located in the northern half of La Salle and McMullen counties and that best wells are typically located geographically apart from the poorest wells, an indicator of the importance of reservoir quality on production outcomes. Results from the boosting model indicate that the most important variables in Area 2 are similar to those in Area 1 with a ranking order: Y-location, GOR, proppant amount, X-location, CLAT and well azimuth, see Figure 12. The partial responses for the six most influential variables in Figure 13 indicate that the mid-west wells towards north side with low GOR, larger fracturing treatment sizes (up to 8+million lb, and 6,000-7,000 ft CLAT’s along optimum azimuth show the best production performance in Area 2. Area 3 covers Dimmit, Maverick, Webb and Zavala counties. Figure 14 shows peak monthly production and best/worst producing wells in Area 3. Review of Figure 14 shows a Northeast-Southwest trend of best wells more or less diagonally centered in Dimmit County. The best wells tend to be geographically separated from the poorest wells. Figure 15 suggests the following most influential drivers on production in Area 3: GOR, Y-location, CLAT, X-location and proppant amount. Again, these top drivers overlap with those in Areas 1 and 2, except that the ordering is a bit of different. The partial dependence plots in Figure 16 indicate that the more western wells at mid-north sites with low GOR, 7,000+ ft CLAT and larger fracture treatment sizes (7 million + lb) are more productive with respect to maximum oil rate in Area 3. Efficiency (BO/CLAT) Models In addition to production performance, operators are often concerned about production efficiency for economic analysis. To investigate a measure of well efficiency, boosting models were constructed on production over horizontal well length. Figure 17 shows the relative influence of predictors on BO/ft in Area 1. CLAT tops the list, and GOR, well azimuth, proppant amount and X-location also have strong impact on the efficiency. Figure 18 indicates that CLAT is negatively associated with production efficiency, i.e., longer CLAT was related to lower BO/CLAT. The production efficiency is highest when CLAT is less than 2,000 ft. Wells with GOR between 1,000~5,000 scf/bbl showed better production efficiency. Besides, optimum well azimuth, well location and larger fracture treatment sizes (up to 8 million + lb) have positive impacts on production efficiency. Figure 19, Figure 20 and Figure 21, and Figure 22 show relative influence and partial dependence plots for Areas 2 and 3, respectively. From the figures of all areas, the single most influential variable on BO/ft is CLAT, the denominator variable.

8

SPE 168628

Considering that CLAT and proppant amount are both associated with substantial well cost, a subsequent economic analysis that is beyond the scope of this paper is needed.

Parameters CLAT (ft) Proppant (lbs) Fluid (gals) Stage Counts Proppant Concentration (lb/gal) Proppant/Lateral Length (lb/ft)

Area 1 (n1 = 1908) 5% 95% 3,017 6,288 1,600,210 7,644,200 1,865,875 7,332,679 1 19 0.45 1.73

487

1,555

Area 2 (n2 = 1103) 5% 95% 3,551 7,466 530,045 8,956,052 260,034 9,453,051 1 20 0.35 1.57

165

Area 3 (n3 = 855) 5% 95% 3,330 7,321 1,915,940 7,538,200 2,408,469 10,249,256 1 16 0.43 1.78

1,544

581

Table1: Typical completion and stimulation for horizontal Eagle Ford wells

Figure 6: Histogram of selected variables. The vertical axis is density and the total area under the fitted curve is 1.

1,355

SPE 168628

9

Figure 7: Scatterplot matrix of selected variables. Each off-diagonal cell contains a 2-variable scatterplot.

Figure 8: Peak monthly oil production distribution and best/worst producing wells in Area 1.

10

SPE 168628

Figure 9: Relative influence of predictors on BO in Area 1. The relative influence is proportional to the blue bar length. GOR has the strongest impact on the BO, followed by proppant amount and geographic location. The sum of relative influence is scaled to 100.

Figure 10: Partial dependence plots of most influential predictors on BO in Area 1. The vertical axes (in the unit of 1k) have been centered at zero and put on the same range for comparison. The tick marks on the horizontal axes label deciles of each predictor variable.

SPE 168628

11

Figure 11: Peak monthly oil production and best/worst producing wells in Area 2

Figure 12: Relative influence of predictors on BO in Area 2

12

SPE 168628

Figure 13: Partial dependence plots of most influential predictors on BO in Area 2

Figure 14: Peak monthly oil production and best/worst producing wells in Area 3

SPE 168628

13

Figure 15: Relative influence of predictors on BO in Area 3

Figure 16: Partial dependence plots of the most influential predictors on BO in Area 3

14

SPE 168628

Figure 17: Relative influence of predictors on BO/ft in Area 1

Figure 18: Partial dependence plots of most influential predictors on BO/ft in Area 1

SPE 168628

15

Figure 19: Relative influence of predictors on BO/ft in Area 2

Figure 20: Partial dependence plots of most influential predictors on BO/ft in Area 2

16

SPE 168628

Figure 21: Relative influence of predictors on BO/ft in Area 3

Figure 22: Partial dependence plots of most influential predictors on BO/ft in Area 3

SPE 168628

17

Conclusions • • •

• • • • •

Many variables in well architecture, completion, stimulation, and production are not normally distributed, e.g., may be strongly skewed or bimodally distributed. For these data sets, Boosted Regression Tree Models may serve better than more classical Multiple Linear Regression Models. Well location in each of the three study areas is a strong predictor of productivity. Location is interpreted to carry with it a relatively systematic variation in fundamental reservoir parameters such as matrix and system permeability, reservoir pressure, thickness, and reservoir fluid viscosity. GOR is also a strong predictor of productivity across the range of GOR values studied (