AI Notes
AI Notes
A data-driven model is a type of model that relies on empirical data to understand, predict, or make
decisions about a system or process, rather than relying on predefined theories or assumptions. These
models are built by analyzing large datasets and identifying patterns, trends, and relationships within the
data.
1. Data as the Foundation: The model's parameters, structures, and behavior are inferred directly
from the data rather than being imposed by human expertise or prior assumptions.
2. Learning from Data: These models use machine learning algorithms or statistical methods to
"learn" the underlying patterns in the data. Examples include regression models, decision trees,
neural networks, and clustering algorithms.
3. Adaptability: Data-driven models can improve as more data becomes available. This makes them
highly flexible and suitable for complex or dynamic environments where human-designed
models may fall short.
4. Types of Models:
o Predictive Models: These predict future outcomes based on historical data (e.g.,
demand forecasting, stock price prediction).
o Prescriptive Models: These recommend actions based on the analysis of data (e.g.,
optimization models for decision-making).
5. Applications: Data-driven models are widely used in fields like business analytics, finance,
healthcare, and engineering. Examples include recommendation systems, fraud detection,
predictive maintenance, and natural language processing systems.
In essence, data-driven models shift from traditional rule-based models to models that are entirely
derived from the data.
AR, MA, ARMA, ARIMA, SARIMA
These terms—AR, MA, ARMA, ARIMA, and SARIMA—refer to statistical models used primarily for time
series analysis. These models aim to predict future values of a variable based on its past values and can
help in forecasting, identifying trends, or understanding seasonality in data. Here's a breakdown of each:
1. AR (AutoRegressive) Model
• Concept: In an AR model, the value of a variable at time ttt is regressed on its previous values.
This means the model uses past observations to predict future ones.
• Use Case: AR models are useful when past values have a linear relationship with future values.
• Concept: In an MA model, the current value of a time series is expressed as a linear combination
of past error terms (random shocks).
• Use Case: Useful when past errors (noise) influence future observations.
• Concept: ARMA combines both AR and MA components. It models a time series using both past
values (AR) and past errors (MA).
• Use Case: ARMA is effective when both the past values of the series and past random shocks are
significant in explaining future values.
• Concept: ARIMA extends ARMA by adding a differencing step to make the time series stationary
(i.e., constant mean and variance over time). This model is suited for non-stationary time series.
• Formula:
ARIMA(p, d, q) where:
• Use Case: ARIMA is widely used for forecasting time series that are non-stationary but can be
made stationary through differencing.
• Formula:
SARIMA(p, d, q)(P, D, Q, s) where:
o sss: the number of periods in each season (e.g., s=12s=12s=12 for monthly data with
yearly seasonality).
• Use Case: Used for time series with both trend and seasonal patterns, like sales data with yearly
cycles.
• AR: Use when past values of the series are important for predicting future values.
• ARMA: Use when both past values and past errors are important for prediction.
• ARIMA: Use for non-stationary data that needs to be differenced to become stationary.
• SARIMA: Use when both trend and seasonality are present in the data.
These models are widely used in finance, economics, environmental science, and other fields requiring
time series forecasting.
1. Training in AI
Training is the process where an AI model learns from data. It involves feeding the model a dataset
(called the training set) and adjusting the model’s parameters based on this data to minimize the error
between the model’s predictions and the actual values.
• Data Feeding: The model is provided with labeled data (input and corresponding output). For
supervised learning, this input-output pair is crucial for learning.
• Learning: The model uses algorithms (e.g., gradient descent) to iteratively adjust its internal
parameters (e.g., weights in a neural network) to improve predictions.
• Error Calculation: The model makes predictions on the training data, and the difference between
predicted and actual values (the loss or error) is computed.
• Optimization: The model’s parameters are adjusted to minimize the loss. Optimizers like
stochastic gradient descent (SGD) or Adam are used for this process.
• Repetition: This process repeats for several epochs (iterations over the dataset) until the model
converges (the error is minimized sufficiently or the performance plateaus).
2. Validation in AI
Validation is the process of evaluating the model's performance on a separate dataset (called the
validation set) during training. The key idea is to test the model on data it has not seen before to assess
how well it generalizes to unseen data. It helps prevent overfitting, where the model performs well on
training data but poorly on new data.
• Hold-out Validation Set: A portion of the data is set aside and not used for training. The model is
validated on this set during training.
• Evaluation Metrics: After training for a certain number of epochs, the model’s performance is
measured on the validation set using metrics like accuracy, precision, recall, F1-score, or loss.
• Early Stopping: If the model performs well on training data but starts performing poorly on the
validation set (i.e., validation loss increases), training may be stopped early. This prevents
overfitting.
1. Preventing Overfitting: Validation helps detect overfitting, where the model memorizes the
training data but fails to generalize.
3. Model Selection: Different models (or variations of a model) can be trained and compared based
on their validation performance to choose the best one.
Common Practices:
• Train/Validation Split: A typical dataset might be split into 80% for training and 20% for
validation.
• Cross-Validation: Instead of using a single validation set, k-fold cross-validation involves splitting
the dataset into k subsets, using k-1 for training and 1 for validation in turns, to get more robust
performance estimates.
Together, training and validation work to ensure the AI model performs well both on the data it has seen
(training set) and the data it hasn’t (validation set), striking a balance between learning patterns and
maintaining generalizability.
AKAIKE CRITERIA
The Akaike Information Criterion (AIC) is a widely used metric in statistics and machine learning to
evaluate and compare different models. It helps to identify the model that best explains the data while
penalizing overfitting, balancing the trade-off between goodness of fit and model complexity.
1. Goodness of Fit: This refers to how well the model fits the data. Typically, models that have a
lower error or loss (such as residual sum of squares in regression) are considered to have a
better fit.
2. Model Complexity: More complex models (e.g., models with more parameters) can often fit the
data better but may lead to overfitting, where the model performs well on the training data but
poorly on unseen data. AIC penalizes models that are unnecessarily complex.
AIC=2k−2ln(L)\text{AIC} = 2k - 2\ln(L)AIC=2k−2ln(L)
Where:
• LLL is the maximum likelihood of the model (how likely the model is given the data).
Explanation of Terms:
• Maximum Likelihood: LLL is a measure of how well the model explains the observed data.
Higher likelihood means the model fits the data better.
• Penalty for Complexity: The term 2k2k2k penalizes models with more parameters, preventing
overfitting by discouraging overly complex models.
Interpretation of AIC:
• Lower AIC values indicate a better model. The AIC score can only be interpreted relative to other
models: the model with the lowest AIC is generally considered the best.
• Model Comparison: AIC is primarily used to compare different models fitted to the same
dataset. It doesn't give an absolute measure of model quality, only a relative one.
AIC and Model Selection:
• Overfitting Prevention: By penalizing the number of parameters, AIC helps in selecting simpler
models that generalize better to unseen data, reducing the risk of overfitting.
• Trade-off: AIC tries to find a balance between goodness of fit and model complexity, but it does
not guarantee that the selected model is the most accurate for future predictions.
• Time Series Models: AIC is often used to compare models like AR, MA, ARMA, ARIMA, SARIMA,
etc., to select the best model for forecasting.
• Regression Models: In linear regression, AIC can be used to compare models with different sets
of predictors.
• Machine Learning: AIC can be applied in model selection when dealing with probabilistic
models, though modern machine learning frameworks often rely on cross-validation for more
robust evaluation.
Limitations of AIC:
• Sample Size: AIC might perform poorly for small sample sizes. In such cases, the corrected AIC
(AICc) is preferred, as it adjusts for small samples.
• Non-Nested Models: AIC is more effective when comparing nested models (i.e., models that can
be obtained by adding or removing parameters). It may not work as well for comparing
fundamentally different types of models.
Conclusion:
The Akaike Information Criterion is a powerful tool for model selection in both statistics and machine
learning. It balances between the goodness of fit and the simplicity of the model, helping avoid
overfitting. It is widely used, especially in time series forecasting, regression, and probabilistic models.
MA2
In the context of time series analysis, MA(2) stands for a Moving Average model of order 2. This is a
specific instance of the broader Moving Average (MA) model, commonly used in statistics and AI for
time series forecasting.
Breakdown of MA(2):
1. Moving Average (MA) Model: The MA model expresses the value of a time series as a linear
combination of past error terms (random shocks or noise). Unlike the AutoRegressive (AR)
model, which depends on past values of the series itself, the MA model depends on past errors.
2. MA(q): In general, an MA(q) model is defined by the number of lagged error terms included in
the model. Here, qqq is the order of the model. For example, in MA(2), the current value of the
series depends on the current error and the two most recent past errors.
Formula for MA(2):
Where:
• θ1\theta_1θ1 and θ2\theta_2θ2 are the coefficients of the lagged error terms,
• ϵt−1\epsilon_{t-1}ϵt−1 and ϵt−2\epsilon_{t-2}ϵt−2 are the errors at times t−1t-1t−1 and t−2t-
2t−2, respectively.
• The current value YtY_tYt depends on the current random shock ϵt\epsilon_tϵt and the previous
two shocks (errors) ϵt−1\epsilon_{t-1}ϵt−1 and ϵt−2\epsilon_{t-2}ϵt−2.
• The coefficients θ1\theta_1θ1 and θ2\theta_2θ2 control the influence of the past errors on the
current value.
Example of MA(2):
Suppose we are trying to model a time series of daily temperature fluctuations. An MA(2) model could
account for today’s temperature (YtY_tYt) by considering not only today’s weather-related random
factors (ϵt\epsilon_tϵt) but also the random effects from the previous two days (ϵt−1\epsilon_{t-1}ϵt−1
and ϵt−2\epsilon_{t-2}ϵt−2).
• Time Series Forecasting: MA models like MA(2) are used to predict future values of a series
based on past errors. It can capture short-term dependencies and noise in the data.
• Stationary Time Series: MA models are best suited for time series that are stationary, meaning
they have a constant mean and variance over time.
• Financial Markets: In stock market analysis, moving average models can help in predicting price
movements based on random past fluctuations.
Advantages of MA(2):
• Simple Model: MA models are easier to interpret and apply compared to more complex models.
• Capturing Noise: MA(2) can capture the short-term correlations in the errors that affect the
series over time.
Limitations:
• Lag Dependency: The MA(2) model only considers the two most recent errors. For time series
with longer-term dependencies, higher-order models or different models (e.g., ARMA, ARIMA)
may be more appropriate.
• Stationarity Requirement: Like other moving average models, MA(2) assumes that the time
series is stationary.
Conclusion:
An MA(2) model in time series analysis expresses the current value of a series as a linear combination of
the last two error terms, plus a current error. It is useful for modeling short-term noise or fluctuations in
stationary time series data.
AR 3
In the context of time series analysis, AR(3) stands for an AutoRegressive model of order 3. It is a
specific case of the more general AutoRegressive (AR) model, which predicts future values of a time
series based on its own past values.
Breakdown of AR(3):
1. AutoRegressive (AR) Model: The AR model assumes that the current value of a time series is a
linear combination of its previous values. This model is often used for forecasting and analyzing
time series data that exhibits autocorrelation (i.e., the values are correlated with past values).
2. AR(p): In general, an AR(p) model of order ppp predicts the current value of the time series
based on its ppp previous values. In the case of AR(3), the current value depends on the three
most recent past values.
Where:
• ϕ1,ϕ2,ϕ3\phi_1, \phi_2, \phi_3ϕ1,ϕ2,ϕ3 are the coefficients that quantify the impact of the
three lagged values on YtY_tYt,
• The coefficients ϕ1,ϕ2,ϕ3\phi_1, \phi_2, \phi_3ϕ1,ϕ2,ϕ3 determine the weight given to each of
these past values in predicting the current value.
• The error term ϵt\epsilon_tϵt captures any random fluctuations not explained by the past values.
Example of AR(3):
Suppose you are modeling daily stock prices, and today's price YtY_tYt depends on the prices from the
last three days Yt−1,Yt−2,Yt−3Y_{t-1}, Y_{t-2}, Y_{t-3}Yt−1,Yt−2,Yt−3. The AR(3) model would estimate
today's stock price using the influence of the previous three days' prices, adjusted by their respective
coefficients ϕ1,ϕ2,ϕ3\phi_1, \phi_2, \phi_3ϕ1,ϕ2,ϕ3.
• Time Series Forecasting: AR(3) can be used to predict future values based on past observations.
It's useful for financial data, sales forecasting, weather prediction, etc.
• Data with Short-Term Memory: If the current value of a series is heavily influenced by a few
previous observations, an AR(3) model can capture these dependencies.
• Stationary Time Series: AR models generally assume that the time series is stationary, meaning
it has a constant mean and variance over time.
Advantages of AR(3):
• Capturing Temporal Dependencies: AR(3) effectively captures the influence of the last three
observations, making it suitable for time series with short-term autocorrelation.
• Simple Interpretation: The model is easy to interpret because it directly shows how previous
values affect the current value.
Limitations:
• Fixed Lag Order: AR(3) only accounts for the three most recent values. If longer-term
dependencies are present in the data, a higher-order model or a different model (like ARMA or
ARIMA) may be needed.
• Stationarity Requirement: The AR(3) model assumes the time series is stationary, so it might not
perform well on non-stationary data without additional transformations like differencing.
Conclusion:
An AR(3) model in time series analysis predicts the current value of a series based on the three most
recent past values. It is effective for short-term forecasting and capturing autocorrelation in stationary
time series data. By using the previous three values, the model can reveal how recent history impacts
current outcomes, making it useful in various applications, from finance to weather prediction.
ARMA 2,1
ARMA(2,1) is a combined AutoRegressive Moving Average model used for time series forecasting and
analysis. It incorporates both the AutoRegressive (AR) component and the Moving Average (MA)
component, where the numbers refer to the order of each component:
• The AR(2) part represents an AutoRegressive model of order 2, meaning it uses the last two
past values of the time series to predict the current value.
• The MA(1) part represents a Moving Average model of order 1, meaning it uses the most recent
past error term to model the random shocks influencing the current value.
Breakdown of ARMA(2,1):
1. AutoRegressive (AR) Component: In AR(2), the current value of the series is a linear
combination of the previous two values. This part accounts for the dependence on the past
values of the series itself.
2. Moving Average (MA) Component: In MA(1), the model depends on the current error term
(random noise) and the error from the previous time step. This part helps capture shocks or
random noise in the system that impacts future values.
Where:
• Yt−1,Yt−2Y_{t-1}, Y_{t-2}Yt−1,Yt−2 are the values of the time series at times t−1t-1t−1 and t−2t-
2t−2 (previous observations),
• ϕ1,ϕ2\phi_1, \phi_2ϕ1,ϕ2 are the autoregressive coefficients that quantify the influence of past
values on the current value,
• θ1\theta_1θ1 is the moving average coefficient that represents the influence of past error on
the current value.
• AR(2) Component: The model predicts the current value YtY_tYt based on the two most recent
past values Yt−1Y_{t-1}Yt−1 and Yt−2Y_{t-2}Yt−2.
• MA(1) Component: The model also incorporates the current random noise ϵt\epsilon_tϵt and
the random shock from the previous time step ϵt−1\epsilon_{t-1}ϵt−1, which allows it to account
for random fluctuations in the data.
Example of ARMA(2,1):
Suppose you're analyzing a time series of monthly sales data. An ARMA(2,1) model would predict the
sales for the current month YtY_tYt based on the sales from the previous two months Yt−1Y_{t-1}Yt−1
and Yt−2Y_{t-2}Yt−2, as well as the random noise in the current and previous months ϵt\epsilon_tϵt and
ϵt−1\epsilon_{t-1}ϵt−1.
• Financial Forecasting: ARMA(2,1) models are commonly used in financial time series, such as
stock prices or currency exchange rates, where both the recent past and random fluctuations are
important.
• Demand Forecasting: Retailers can use ARMA(2,1) models to predict future demand based on
past sales and market trends.
• Environmental Data: ARMA models can be applied to time series data like temperature, air
quality, or rainfall, where the past values and recent random variations both play a role in
determining the current outcome.
Advantages of ARMA(2,1):
• Capturing Both Patterns and Noise: The ARMA model combines both autoregressive and
moving average components, allowing it to capture relationships between past values and also
account for the noise in the system.
• Flexibility: ARMA(2,1) offers a good balance of model complexity while capturing important
dynamics in the data.
Limitations:
• Stationarity Requirement: ARMA models assume the time series is stationary (constant mean
and variance over time). If the series is non-stationary, transformations like differencing may be
needed before applying ARMA.
Conclusion:
An ARMA(2,1) model combines both the AR(2) (AutoRegressive of order 2) and MA(1) (Moving Average
of order 1) components to forecast time series data. It uses the last two observations and the last error
term to predict future values, making it useful for capturing both patterns in the data and random noise.
4o