Data Validation
Data Validation
Harben
Exploration is a business not a science
Regolith Exploration Geochemistry
Primary Dispersion
Applied Exploration
Geochemistry
All data in this presentation are the intellectual property of IMEx Consulting (except where
obtained from the public domain) and must not be reproduced, copied, shared or forwarded
in any form without the express permission of IMEx Consulting
Merge Field Information
Before interpreting geochemical results think about:
Data sources
What is the target? Cu? Au? Lithology? Alteration?
Sample types / fractions / splits
Sample sites / Field observations and comments
Geology and regolith
Analytical techniques and quality
If you don’t know – check the report, source or ASK!
Where available, this information should be merged with the
analytical results BEFORE you begin interpretation
Drill hole geochemistry should have this information attached to
every sample interval
IMEx Consulting 5
Check Locations
Avoid unlocated geochemical data – DH not uncommon
Sort by Easting/Longitude and Northing/Latitude in Excel/database and look at the top &
bottom of the spreadsheet for missing or spurious locations or “Filter”
Highlight Easting/Northing columns & plot scatter plot to check for dodgy co-ordinates.
Import the data into GIS and confirm everything is correctly located.
IMEx Consulting 6
Coded Observation Clean-up
Many observations attached to
geochemical data are in a coded /
categorical format (lithology,
alteration, sample type, lab, batch
number, sampler, date etc.)
These codes will be used to split the
data into subsets so they should be
consistent
eg. GNT, Gnt, Grnt, Granit, granite, Ggnt,
fg. GNT, minor GNT = GRANITE
Excel “Data Filter” & “Sort” can be
used to assist clean up codes
IMEx Consulting 7
Coded Observation Clean-up
Make a copy of the file or column you are cleaning up in case you make a
mistake or lose useful information when simplifying codes.
Can replace “CODES” with real words e.g. Gnt = GRANITE
Can change blank cells to “NO DATA” or “NIL” – or why would you not?
Often “-7777” or other term used for no data blank (ioGAS can handle in
Data Doctor)
Numbers can be stored as text – Terra Search data in Access
Ensure spelling and capitalization are consistent within groups
Merge small groups into larger groups where it makes geol/geochem sense
Limestone, dolomitic limestone, Calcarenite = CARBONATES
(n=300) (n = 15) (n = 3) = (n = 318)
IMEx Consulting 8
Things that annoy me….
Illogical names for projects Nb, Ni, P, Sn, V in %
WN : Winglet; WL: Willton North S in ppm, DL is 0.01; P in % but DL is 10
Strange names for elements Mindless QA/QC
As Ars; In Ind Res Def ≠ Grass roots exploration
Hunting for elements Drillhole Surveys
Alphabetical order assists Azimuth – mag or local grid – label!
Multiple columns of the same data Data export “macros”
Cu% & Cu ppm; Au ppb & Au ppm; Au1_ppm = Element stuck in database
Au_ppm ioGAS has “allergies”
Au Repeats & Duplicates Csv can produce errors
Au, Au1, Au2, Au3 Au, AuR, AuD, AuDR
Projections… The reality….
“East” “East_MGA55” Errors detected when the data is used
Unintelligible Geological Codes so use the data!
IFMG ???
IMEx Consulting 9
Geochemical Data Clean-up
Common Errors
Cells missing results contain “0” or other character ( ., /,
NR, *, !, x) Convert to empty cells.
Below detection results as text field. Replace (<, bld) with
“-” detection limit i.e. <5 –5. Not half the detection
limit nor zero please!
% or ppb units in a ppm column or visa-versa
Data entry errors; truncated results – lat/long as “float”
from MapInfo; decimal – assign decimal places
A block of cells shifted to the right, putting results into the
wrong columns. A common error when importing text, csv
files
Headers – compile & cull to one line
Excel “Data Filter” & “Sort” can be used to assist clean up
the data
IMEx Consulting 10
Are these data “fit for purpose”?
Arsenic
Detection Limit = 1ppm
Analytical repeats as Mean Percent
Difference
IMEx Consulting 11
What to expect
IMEx Consulting 12
What is wrong here?
The samples were analysed by XXX Laboratory in Perth. Aqua regia digestion is
generally suitable for the determination of gold in soil samples. Method Description:
Au; Up to 25g aqua regia digest, ICPMS finish, with a lower detection limit of 0.1
ppb. ME; aqua regia digest, ICPAES and ICPMS finish for full package (51
elements).
The samples are digested and refluxed with a mixture of Acids including
Hydrofluoric, Nitric, Hydrochloric and Perchloric Acids. This extended digest
approaches a Total digest for many elements however some refractory minerals are
not completely attacked. Fe, Mg, Mn, Ni, Ca, Zn, Cu are determined by Inductively
Coupled Plasma (ICP) Optical Emission Spectrometry. Ag, As, Bi, Co, Mo, Pb, Sb,
Te, W are determined by Inductively Coupled Plasma (ICP) Mass Spectrometry.
IMEx Consulting 13
Digestion variation
Primary rocks
Four acid digest near total - most
zircons will dissolve, especially in
younger rocks.
Aqua regia (HNO3, HCl) will dissolve:
CO3, sulfides, Fe oxide & chlorite
NOT quartz, feldspar, illite or most accessory Four acid
minerals.
Volcanic rocks: Al - 6 to 9%
Aqua regia - Al results around 1 to 3%. Aqua regia
Scatterplot of Al v Zr two distinct populations
Separates 4-acid digest from aqua regia data.
IMEx Consulting 14
Batch Shifts
Upper - Complete mis-match from
one line to the next.
Mo steel
Tungsten carbine puck in ring mill
IMEx Consulting 16
Ratios
IMEx Consulting 17
Basic Statistics Summary
IMEx Consulting 18
Percentiles
A relative measure of the significance of a result compared to the rest of the
data.
Useful for reporting and discussing results - 250ppm Cu which is >98th
percentile out of 3,800 samples
Define cut-off levels for the display of geochemical data as dot plots or
colour contoured images.
Can be absolutely meaningless!
Use with caution – best for less mobile elements
IMEx Consulting 19
Log Transformation
Geochemical data distributions are generally strongly skewed - relatively few
very high or high results.
Can be difficult to graph, display and interpret highly skewed data
distribution.
IMEx Consulting 21
Log Transform Histograms
Is the background defined adequately – are the data “OK”?
IMEx Consulting 22
Histograms and Anomalous results
Natural, random background Elements with anomalous or unusual results
variation generates a normal compared to the natural background variation
distribution. generate strongly skewed distributions
NO ANOMALOUS RESULTS ANOMALOUS RESULTS
IMEx Consulting 23
Box Plots
Percentiles - A measure (0-100%) of the magnitude of results compared to
the remainder of the data set e.g. If 250 ppm Cu is equivalent to the 90th
Percentile, then 10% of the data will be >250ppm and 90% will be <250ppm
Box Plots - A graphical representation of the 10th, 25th, 50th, 75th and 90th
percentile levels
IMEx Consulting 24
Box Plots - Useful Data Summary /
Where are the “long tails”?
SECOFI Stream
Sediments, Mexico,
n = 15,864 samples
IMEx Consulting 25
Volume Variance effect
The volume/variance effect is fundamental. The smaller the
samples the higher the variability… all other things being equal
As samples get larger, common elements tend to normal
distributions while rare elements pass through strongly skewed
distributions before approximating normal when samples are very
large with respect to the frequency of the element host mineral.
e.g. Au in soil/rock samples compared with Au in Ore Blocks
The more individual samples analysed the closer they will average
the “real” grade.
At the limit of smallness all measurements are present/absent.
They are binary and have maximum variance.
Concentrations are always non-negative and <= 100%
IMEx Consulting 26
Domaining & Mixing
A sampling grid may straddle geological units
(domains). The shape of the distribution will
change as the position of the array changes
relative to the units domains (Extensive Effect)
IMEx Consulting 27
Single Element Distributions
Geochemical distributions are not normal or lognormal. They are symmetric to variably skewed and
maybe both
Simple normal distribution models may sometimes fit geochemical data and facilitate statistical
calculations but it is an imposition of the interpreter, NOT a characteristic of the data
They cannot be normal (Gaussian) or lognormal as they cannot be less than zero nor greater than 100%
Geochemical data are symmetric to skewed and, if useful to exploration, often polymodal
They are a function of:
Volume/variance effects (intensive effect)
Non constant population domains (extensive effect)
Mixing effects
This means geology + sample size/area
They are also “closed”. That is the sum of components must be <= to the whole
Violate the assumptions of most classical statistical techniques
IMEx Consulting 28
QA / QC
Must be appropriate to the stage of exploration program and element targeted
Difference between 1ppm (e.g. Au) and 1% (e.g. Cu)?
Example: First drilling program – 1,400 samples, 400 QA/QC – 30% of assay costs. Really????
Make QA/QC appropriate to stage of exploration
First pass exploration
Follow up
Res Def
JORC compliance
Standards – Au STD analyse for Au only.
When do you assess QC?
Stanley et al, 2010. “Determining the Magnitude of True Analytical Error in Geochemical
Analysis”, Geochemistry Exploration Environment Analysis, Vol. 10(4), p355-364.
IMEx Consulting 29