0% found this document useful (0 votes)
9 views

RSS 1

The document discusses modeling multivariate time series data for applications like screening, forecasting, and explanation. It describes challenges like analyzing short multivariate time series and coping with large search spaces for explanation. It proposes using a genetic algorithm to help with vector autoregressive modeling and forecasting of visual field deterioration data. It also discusses generating explanations using dynamic Bayesian networks learned with evolutionary methods.

Uploaded by

sfaisalaliuit
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

RSS 1

The document discusses modeling multivariate time series data for applications like screening, forecasting, and explanation. It describes challenges like analyzing short multivariate time series and coping with large search spaces for explanation. It proposes using a genetic algorithm to help with vector autoregressive modeling and forecasting of visual field deterioration data. It also discusses generating explanations using dynamic Bayesian networks learned with evolutionary methods.

Uploaded by

sfaisalaliuit
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

Bridging the Gap between

Applications and Tools: Modeling


Multivariate Time Series

X Liu, S Swift & A Tucker


Department of Computer Science
Birkbeck College
University of London
MTS Applications at Birkbeck

 Screening

 Forecasting

 Explanation
Forecasting

 Predicting Visual Field Deterioration of


Glaucoma Patients
 Function Prediction for Novel Proteins from
Multiple Sequence/Structure Data
Explanation
Input (observations):

t-0 : Tail Gas Flow in_state 0


t-3 : Reboiler Temperature in_state 1

Output (explanation):

t - 7 : Top Temperature in_state 0 with probability=0.92


t - 54 : Feed Rate in_state 1 with probability=0.71
t - 75 : Reactor Temperature in_state 0 with probability=0.65
The Gaps
 Screening
 Automatic / Semi- Automatic Analysis of
Outliers
 Forecasting
 Analysing Short Multivariate Time Series
 Explanation
 Coping with Huge Search Spaces
The Problem - What/Why/How
 Short-Term Forecasting of Visual Field Progression
 Using a Statistical MTS Model
 The Vector Auto-Regressive Process - VAR(P)
 There Could be Problems if the MTS is Short
 A Modified Genetic Algorithm (GA) can be Used
 VARGA

The Prediction of Visual Field Deterioration Plays an


Important Role in the Management of the Condition
Background - The Dataset
The interval between tests 76 75 18 19
is about 6 months
74 73 72 15 16 17
Typically, 76 points 71 70 69 68 11 12 13 14
are measured
67 66 65 64 63 6 7 8 9 10
Values Range Between 62 61 60 59 58 1 2 3 4 5
60 =very good, 0 = blind
43 42 41 40 39 20 21 22 23 24
The number of tests can 48 47 46 45 44 25 26 27 28 29
range between 10 and 44
52 51 50 49 30 31 32 33

x
Points used in this 55 54 53 34 35 36
paper (Right Eye)
57 56 37 38

x
Usual Position of
Blind Spot (Right Eye)
Background - The VAR Process
Vector Auto-Regressive Process of Order P: VAR(P)

x(t) VF Test for Data Points at Time t (K1)


Ai Parameter Matrix at Lag i (KK)
x(t-i) VF Test for Data Points at lag i from t (K1)
 (t) Observational Noise at time t (K1)
The Genetic Algorithm
“A Search/Optimisation method that solves a problem
through maintaining and improving a population of
suitable candidate solutions using biological metaphors”

Generate a Population of random Chromosomes (Solutions)


Repeat for a number of Generations
Cross Over the current Population
Mutate the current Population
Select the Fittest for the next Population
Loop
The best solution to the problem is the Chromosome in
the last generation which has the highest Fitness
GAs - Chromosome Example

X Y
0-127 0-31
0000000-1111111 00000-11111

0000000.00000-1111111.11111
GAs - Mutation
 Each Bit (gene) of a Chromosome is Given
a Chance MP of inverting
 A ‘1’ becomes a ‘0’, and a ‘0’ becomes a 1’

01101101 00101111

These Ones!
GAs - Crossover (2)

A B
01011101 11101010
X=4

11101101 01011010
C D
VARGA - Representation

Chromosome

a111 … … a211 … … ... am11 … … ... ap11 … …

… a1ij … … a2ij … … amij … … apij …

… … a1KK … … a1KK … … amKK … … apKK

A1 A2 Am Ap
VARGA - The Genetic Algorithm
 GA With Extra Mutation
 Order Mutation After Gene Mutation
 Parents and Children Mutate (Both)
 Genes are Bound Natural Numbers
 Fitness is -ve Forecast Error
 Minimisation Problem - Roulette Wheel
 Run for EACH Patient
Evaluation - Methods for
Comparison
 SPlus: Yule Walker Equations, AIC and
Whittles Recursion, NK(P+1), Standard Package
 Holt-Winters Univariate Forecasting Method,
Is the Data Univariate? (GA Solution)
 Pure Noise Model, VAR(0), Worst Case
Forecast, (Non-Differenced = 0)
 54 out of the Possible 82 Patients VF Records
 Could not be Used : SPlus Implementation
Results - Graph Comparison
Scores for Cases 0 to 6

2000

1500 HW
S-Plus
Score 1000
VARGA
500 Noise

0
0 1 2 3 4 5 6
Case Number

 The Lower the Score - the Better


 Score is the One Step Ahead Forecast Error
Results - Table Summary
Method Order Average
(number of order) Score
VARGA 26 of 1, 2 of 2 559.82
S-Plus 12 of 0, 14 of 1, 1 of 2, 1 of 3 616.12
HW N/A 683.79
Noise 28 of 0 816.53

Average = The Average One Step Forecast Error


For the 28 Patients (Both GA’s Fitness)
(The Lower - The Better)
Conclusion - Results

 VARGA Has a Better Performance


 VARGA Can Model Short MTS
 The Visual Field Data is Definitely Multivariate
 Data Has a High Proportion of Noise
Conclusion - Remarks

 Non-Linear Methods and Transformations


 Performance Enhancements for the GA
 Improve Crossover
 Irregularly Spaced Methods
 Space-Time Series Methods
 Time Dependant Relationships Between Variables
Generating Explanations in MTS
 Useful to know probable explanations for a
given set of observations within a time series
 E.g. Oil Refinery: ‘Why a temperature has
become high whilst a pressure has fallen below
a certain value?’
 Possible paradigm which facilitates Explanation
is the Bayesian Network
 Evolutionary Methods to learn BNs
 Extend work to Dynamic Bayesian Networks
Dynamic Bayesian Networks
 Static BNs repeated over t time slices
 Contemporaneous / Non-Contemporaneous Links
 Used for Prediction / Diagnosis within dynamic
systems
n
P( X 1... X n )   P( X i |  i )
i 1
Assumptions - 1
 Assume all variables take at least one time slice to
impose an effect on another.
 The more frequently a system generates data, the
more likely this will be true.
 Contemporaneous Links can be excluded from the
DBN
 Each variable at time, t, will be considered
independent of one another
Representation
 P pairs of the form (ParentVar, TimeLag)
 Each pair represents a link from a node at a previous time
slice to the node in question at time t.
Examples :
Variable 1: { (1,1); (2,2); (0,3)}
Variable 4: { (4,1); (2,5)}
Search Space

 Given the first assumption and proposed


representation the Search Space for each
variable will be:

N MaxLag
2
Algorithm
Structure Search : Evolutionary
Algorithms, Hill Climbing etc.

Parameter Calculation
Multivariate given structure
Time Series

Dynamic Bayesian Network Library


for Different Operating States

Explanation Algorithm
User (e.g. using Stochastic
Simulation)
Generating Synthetic Data

(1)

(2)
Oil Refinery Data

 Data recorded every minute


 Hundreds of variables
 Selected 11 interrelated variables
 Discretised each variable into k states
 Large Time Lags (up to 120 minutes between
some variables)
 Different Operating States
Results
SOT

FF

TGF

TT

RinT
Explanations - using Stochastic
Simulation
Explanations - using Stochastic
Simulation
1
0.9
0.8
0.7
SOF-SP
0.6
SOT
P(y=1)

0.5 TT
BPF-SP
0.4
BPF
0.3
0.2
0.1
0
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
time-x
Explanation
Input (observations):

t-0 : Tail Gas Flow in_state 0


t-3 : Reboiler Temperature in_state 1

Output (explanation):

t - 7 : Top Temperature in_state 0 with probability=0.92


t - 54 : Feed Rate in_state 1 with probability=0.71
t - 75 : Reactor Temperature in_state 0 with probability=0.65
Future Work

 Exploring the use of different searches and metrics


 Improving accuracy
(e.g. different discretisation policies, continuous
DBNs)
 Using the library of DBNs in order to quickly
classify the current state of a system
 Automatically Detecting Changing Dependency
Structure
Acknowledgements

BBSRC
BP-AMOCO
British Council for Prevention of Blindness
EPSRC
Honeywell Hi-Spec Solutions
Honeywell Technology Center
Institute of Opthalmology
Moorfields Eye Hospital
MRC
Intelligent Data Analysis
X Liu
Department of Computer Science
Birkbeck College
University of London
Intelligent Data Analysis

 An interdisciplinary study concerned with


effective analysis of data
 Intelligent application of data analytic
tools
 Application of “intelligent” data analytic
tools
IDA Requires

 Careful thinking at every stage of an


analysis process (strategic aspects)
 Intelligent application of relevant domain
knowledge
 Assessment and selection of appropriate
analysis methods
IDA Conferences

 IDA-95, Baden-Baden
 IDA-97, London
 IDA-99, Amsterdam
 IDA-2001, Lisbon
IDA in Medicine and Pharmacology

 IDAMAP-96, Budapest
 IDAMAP-97, Nagoya
 IDAMAP-98, Brighton
 IDAMAP-99, Washington DC
 IDAMAP-2000, Berlin
Other IDA Activities

 IDA Journal (Elsevier 1997)


 Journal Special Issues (1997 -)
 Introductory Books (Springer 1999)
 The Dagstuhl Seminar (Germany 2000)
 European Summer School (Italy 2000)
 Special Sessions at Conferences
Concluding Remarks
 Strategies for data analysis and mining
 Strategies for human-computer
collaboration in IDA
 Principles for exploring and analysing “big
data”
 Benchmarking interesting real-world data-
sets as well as computational methods
 A long term interdisciplinary effort
The Screening Architecture
Results from a GP Clinic

You might also like