0% found this document useful (0 votes)
9 views

Petroleum Data Managment

Uploaded by

Homayoun Najafi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Petroleum Data Managment

Uploaded by

Homayoun Najafi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Petroleum Data

Management
Titles
❑Chapter 1: Data in Petroleum Engineering Field
Introduction
• Data is a key company asset
• Large volumes of data need to be integrated for information,
knowledge and innovation
• An essential component of artificial intelligence applications
Chapter 1:
Data in Petroleum Engineering Field
Data Sources in Upstream Oil Industry
BIG Data
distributed computing
Big Data Like Crude!
Big Data

Data Analytics
Chapter 2:
Statistics
Descriptive Statistics
Bivariate Data

Pearson correlation coefficient

Spearman correlation coefficient


Quantile
The word “quantile” comes from the word quantity. In simple terms, a quantile is where a sample is divided into
equal-sized, adjacent, subgroups (that’s why it’s sometimes called a “fractile“). It can also refer to dividing
a probability distribution into areas of equal probability.

Median is 50 percentile or .5 quantile(50 before and after it!)

▪ Quantile of ith data is about i/N+1


▪ Respectively data for q quantile is q(N+1)

❑ Quantile plot can be used to detect symmetry of a distribution; it is sketch of qi vs. xi:
✓ A symmetrical distribution is characterized by an S-shaped quantile plot, where the distance on the
horizontal axis between the median (50th percentile) and any percentile P below the median is equal to
the distance from the median to the (100-Pth percentile). Symmetrical distributions are characterized by
mean¼median¼mode.
✓ If the distribution has positive skewness, that portion of the quantile plot corresponding to q>0.9 will
usually be longer and flatter than the rest of the plot.
✓ Conversely, distributions with negative skewness have a long flat portion on the quantile plot
corresponding to q<0.1.
Vertical Heterogeneity in Permeability Using Q Plot

Dykstra and Parsons used the log-normal distribution of


permeability to define the coefficient of permeability variation

In a normal distribution, the value of k is such that 84.1% of the


permeability values are less than k¯+s and 15.9% of
the k values are less than k¯−s.

For a log-normal permeability distribution, the Dykstra–Parsons


coefficient can be estimated from
Parametric Models
• Uniform Distribution

The uniform distribution is useful as a rough model for representing low states of knowledge when only the
upper and lower bounds are known
Parametric Models
• Triangular Distribution
Parametric Models
• Normal Distribution
CDF: F(x) has no closed-form solution but is often
presented using the complementary error function
solution
Parametric Models
• Lognormal Distribution
Parametric Models
• Poisson Distribution
When events occur as a purely random (Poisson) process, the number of independent events occurring
within a fixed time interval follows a Poisson distribution.

1. Events are independent of each other. The occurrence of one event does not affect the probability
another event will occur.
2. The average rate (events per time period) is constant.
3. Two events cannot occur at the same time.
Parametric Models
• Binomial Distribution
A binomial distribution is the distribution of the number of successes k in a sequence of n independent trials, where
the probability of success p is constant from trial to trial. Each trial with two outcomes (success or failure) is also
called a Bernoulli experiment:
The binomial distribution can be
approximated by the normal distribution if n
is large and p approaches 0.5 such that:
Parametric Models
• Weibull Distribution
The Weibull distribution is a commonly used tool for modeling growth (or decline) in biological, clinical,
population, and natural resource studies. It has also been used to analyze production decline from
unconventional reservoirs.
K<1 decreasing rate with time
K=1 constant rate with time (exponential decline rate)
K>1 increasing rate with time
Parametric Models
• Beta Distribution
Shows some example beta distributions. The beta distribution does not have a mechanistic basis but can be
very useful for fitting empirical data to distributions, because of the flexible mathematical form .This becomes
particularly relevant for the purposes of uncertainty quantification using Monte Carlo simulation,
Central Limit Theorem
• When independent random variables are summed up, their properly normalized sum tends toward a normal
distribution (informally a bell curve) even if the original variables themselves are not normally distributed.
Q-Q plot
• Comparing Two Distributions
• are compared by plotting their corresponding quantiles.
• A Q-Q plot of two identical distributions will be a straight line with unit slope (i.e., x=y).If the Q-Q plot plots
as a straight line with a nonunit slope, then the two distributions have the same shape but their location and
spread may differ.
Normal Score Transformation
• Often, it is useful to transform a sample distribution into the space of
an equivalent normal distribution, where many statistical operations
can be easily performed and visualized.
First Moments Distributions Fitting Method

From Sample
Chapter 3

Prediction
Linear Regression
Estimating Confidence Intervals for the Mean Response and Forecast
Best fit leads to normal distribution for the residuals
Chapter 2:
Classification
Unsupervised Classification

❑K Mean Classification
❑Model Based Classification
❑Hierarchical Classification
❑Forest Random Classification
K-Means Classification

Random centers
Minimize E show a cluster
New center for each class
Again previous steps

Normalization would be needed where


we deal with diverse data types.
Hierarchical Classification
• Agglomerative
• Divisive
HOW DO WE CALCULATE THE SIMILARITY BETWEEN TWO
CLUSTERS???

• MIN sim(C1,C2) = Min Sim(Pi,Pj) such that Pi ∈ C1 & Pj ∈ C2

•MIN approach cannot separate clusters properly if there is


noise between clusters.
MAX: Sim(C1,C2) = Max Sim(Pi,Pj) such that Pi ∈ C1 & Pj ∈ C2

•MAX approach does well in separating clusters if there is noise between clusters.

•Max approach is biased towards globular clusters.


•Max approach tends to break large clusters.
Group Average: sim(C1,C2) = ∑ sim(Pi, Pj)/|C1|*|C2| where, Pi ∈ C1 & Pj ∈ C2

•The group Average approach does well in separating clusters if


there is noise between clusters.

•The group Average approach is biased towards globular


clusters.
Distance between centroids: less popular

•Ward’s Method: This approach of calculating the similarity between two clusters is exactly the
same as Group Average except that Ward’s method calculates the sum of the square of the
distances Pi and PJ.
sim(C1,C2) = ∑ (dist(Pi, Pj))²/|C1|*|C2|

•Ward’s method approach also does well in separating clusters


if there is noise between clusters.

•The group Average approach is biased towards globular


clusters.
Dendrogram
It is determined after
checking dendrogram
Model Based Clustering
Forest Random Clustering
• It is supervised but can be used as unsupervised with trivial data

Property matrix

Random Matrix
and then
Clustering them
Figure 1 Raw Data
Figure 2 K Mean
Figure 3AgglomerativeClustering
Figure 4 GaussianMixture
Figure 5 Forest Random

You might also like