Quantitative_Framework_for_Liquidity_Imbalance_Detection_and_Counterspoofing
Quantitative_Framework_for_Liquidity_Imbalance_Detection_and_Counterspoofing
α
Futures Analytica
γ
Quantis Lindell Capital
θ
Statista Q
This paper presents an integrated framework for detecting and exploiting market
manipulation, specifically focusing on spoofing and liquidity imbalances in high-
frequency trading environments. We begin by defining key spoofing detection
metrics, such as High Quoting Activity, Unbalanced Quoting, and Abnormal
Cancellations, and introduce a Spoofing Identification Score built on machine
learning models, including random forest classifiers. The next component intro-
duces the Microprobability Metric, a Bayesian-based framework that combines
limit order book data with forecasting techniques to predict small factor market
movements.
Part II explores market imbalance scanners, starting with the Counterspoof
Scanner, which classifies different phases of spoofing cycles and determines op-
timal trade points based on court-validated spoofing activity data. The DEX-
Array Scanner identifies aggressive order placement between bid and ask lev-
els, refining signals to detect shifts in market sentiment. Finally, the Trespass
Scanner focuses on liquidity imbalances, leveraging proprietary noise filtering to
identify high-confidence trade signals in volatile environments.
Our findings indicate that by integrating probabilistic models, machine learn-
ing, and proprietary noise filtering mechanisms, these tools provide a compre-
hensive solution for navigating high-frequency trading markets. The paper con-
cludes by discussing implications for market transparency and proposing av-
enues for future research, including the integration of additional data sources
and scalability in decentralized markets.
Acknowledgments
We would like to express our sincere gratitude to the following individuals for
their groundbreaking contributions to the field of market microstructure, which
have greatly influenced and inspired our research:
Dr. Martin David Gould and Dr. Julius Bonart Their seminal work, ”Queue
Imbalance as a One-Tick-Ahead Price Predictor in a Limit Order Book” (De-
cember 2015), provided essential insights into price prediction using order book
data. This research served as a key framework for the development of our mi-
croforecasting metric.
Dr. Bao Linh Do and Professor Tālis J. Putniņš We are deeply indebted
to their pivotal study, ”Detecting Layering and Spoofing in Markets” (August
2023). Their contributions to spoofing detection have directly informed the cre-
ation of our Counterspoofing Scanner and the spoofing identification method-
ologies outlined in this paper.
Without the pioneering research of these individuals, the techniques and
advancements presented in this paper would not have been possible. We are
grateful for their contributions to the academic community and for laying the
foundation for further exploration in this vital area of market analysis.
1
Contents
2
2.16 Final Output: 0-100 Score for Long and Short . . . . . . . . . . . 19
3
Part I
Market Manipulation
Filters and Prediction
Scoring Systems
4
Chapter 1
1.1 Introduction
Spoofing in the financial markets refers to the practice of placing large orders
on one side of the market to create a false impression of demand or supply, with
the intent of canceling those orders before they are executed. Spoofing distorts
market prices and creates unfair trading conditions. Detecting spoofing behavior
is vital for maintaining a fair market and for improving trading strategies.
5
|EntryAskSizei,t − EntryBidSizei,t |
HQi,s (d) = max
t∈s(d) AskSizei,t + BidSizei,t
This metric is particularly useful in identifying spoofing because spoofers
frequently submit large orders to create an imbalance in the order book, which
is later canceled once their manipulation achieves the desired market reaction.
6
1.5 Metric 3: Abnormal Cancellations
Abnormal cancellations occur when large orders are rapidly withdrawn from one
side of the market, usually after the manipulator has achieved their intended
price movement. This metric captures that behavior:
|CancelAskSizei,t − CancelBidSizei,t |
ACi,s (d) = max
t∈s(d) AskSizei,t + BidSizei,t
By monitoring abnormal cancellations, we can detect when spoofers pull
their orders after influencing the market.
|AskSizeLevel2to5i,t − BidSizeLevel2to5i,t |
LEi,s (d) = max
t∈s(d) AskSizei,t + BidSizei,t
This metric helps in pinpointing the exact moments when spoofing is likely
happening.
7
Spoofers will often cancel their orders immediately after trades occur on the
opposite side, making this a key indicator.
8
1.11 Metric Integration with Machine Learning
Classifiers
The Spoofing Detection Model combines traditional financial metrics with ma-
chine learning techniques to improve the accuracy of identifying manipulation
patterns. In this section, we integrate two machine learning classifiers—Random
Forest (RF) and Boosted Tree (BT)—and combine their outputs into a final
probability index.
Step 1: Data Preprocessing
Before training the classifiers, the input metrics must undergo a preprocess-
ing phase. This includes:
• Normalization: Each metric is normalized to ensure that features have
a mean of 0 and a standard deviation of 1, reducing potential bias from
larger-scaled metrics.
• Feature Selection: Important metrics such as High Quoting Activity
(HQ), Unbalanced Quoting (UQ), and Abnormal Cancellations (AC) are
selected based on their predictive power. Correlation analysis is performed
to avoid multicollinearity.
• Class Labeling: Historical data is labeled as either ”spoofing” or ”non-
spoofing” based on manual inspection or previous algorithmic models.
This labeled dataset serves as the ground truth for training.
Let the preprocessed features for observation i at time s be denoted as:
9
Where K is the number of boosting iterations, αk is the weight of the k-th
model, and fk (Xi,s ) is the output of the k-th weak learner.
Step 4: Cross-Validation and Model Evaluation
Both models are evaluated using k-fold cross-validation on the training set.
The evaluation metrics include:
The model with higher AUC-ROC scores receives a higher weight during the
final integration step.
Step 5: Probability Integration
After training and evaluating both classifiers, their outputs are integrated
into a combined probability index. The final probability of spoofing is computed
as a weighted combination of the outputs from the Random Forest and Boosted
Tree classifiers:
10
1.12 Conclusion on the Spoofing Detection Model
The Spoofing Detection Model incorporates metrics like High Quoting Activity,
Unbalanced Quoting, and Abnormal Cancellations to calculate a spoofing prob-
ability score. A logit model is applied to produce this score based on real-time
data. The model relies on these combined metrics to identify spoofing patterns
across various timeframes, enabling the system to differentiate between organic
market behavior and manipulation. This structured approach allows for the
integration of multiple data points to observe market conditions during known
spoofing cycles and refine detection accuracy.
11
Chapter 2
12
2.2 Limit Order Book Data: Input to the Model
The LOB represents all outstanding buy and sell orders in the market at different
price levels. For this model, we focus on the top five levels on both sides of the
book—buy (bid) and sell (ask). These levels give insights into market liquidity
and supply-demand imbalances, key factors in short-term price movements. The
data from these levels, particularly the quantity and price, serve as the primary
inputs for our Bayesian model.
P (D|H)P (H)
P (H|D) = (2.1)
P (D)
Where:
• P (H|D) is the posterior probability, or the probability of the hypothesis H
(in this case, a price increase or decrease) given the data D (LOB inputs).
• P (H) is the prior probability, which represents our belief in a price move-
ment before observing new LOB data.
• P (D|H) is the likelihood, representing the probability of observing the
LOB data given that the hypothesis is true.
13
2.5 Step 1: Defining Prior Probabilities
To begin, we define the prior probabilities for both the long (price increase) and
short (price decrease) directions. These priors can be derived from historical
price movement patterns, where we observe the frequency of upward and down-
ward price shifts over time. For example, if in a given market session, prices
increased 60% of the time and decreased 40%, we can set the priors accordingly:
Ii = bi − ai
Summing across all five levels gives the total imbalance:
5
X
Itotal = Ii
i=1
14
P (D|Price Decrease)P (Price Decrease)
P (Price Decrease|D) =
P (D)
Here, P (D) is the marginal likelihood and is computed as the sum of the
likelihoods over both hypotheses:
15
2.11 Step-by-Step Explanation of the 1-Tick Fore-
casting Metric
1. Market and Limit Orders in an LOB: - Market orders are orders that are
immediately matched with existing limit orders. - Limit orders are orders that
do not immediately match and instead become active orders in the LOB. - The
LOB L(t) at any given time t consists of all active buy and sell limit orders for
a given asset.
2. Bid Price and Ask Price: - The bid price b(t) is the highest price among
all active buy orders:
- The ask price a(t) is the lowest price among all active sell orders:
- The bid and ask prices are collectively referred to as the best quotes.
3. Queue Sizes at the Best Quotes: - The size of the buy queue at the bid
price b(t) is denoted nb (b(t), t), representing the total volume of buy orders at
the bid: X
nb (b(t), t) = |ωx |
x∈L(t),px =b(t)
- Similarly, the size of the sell queue at the ask price a(t) is denoted na (a(t), t),
representing the total volume of sell orders at the ask:
X
na (a(t), t) = ωx
x∈L(t),px =a(t)
nb (b(t), t) − na (a(t), t)
I(t) =
nb (b(t), t) + na (a(t), t)
- If I(t) is close to 1, this suggests strong buying pressure (the buy queue is
much larger than the sell queue). - If I(t) is close to -1, this suggests strong
selling pressure (the sell queue is much larger than the buy queue). - If I(t) is
close to 0, the buying and selling pressures are balanced.
5. Predicting the 1-Tick Move: - The core hypothesis is that the Queue
Imbalance I(t) provides predictive power for the direction of the next mid-price
movement (the average of the bid and ask prices). - Positive Queue Imbalance:
If I(t) > 0, indicating net buying pressure, the next price movement is more
likely to be an upward tick (an increase in the mid-price). - Negative Queue
16
Imbalance: If I(t) < 0, indicating net selling pressure, the next price movement
is more likely to be a downward tick (a decrease in the mid-price).
6. Scoring the 1-Tick Forecasting Metric: - The 1-Tick Forecasting Metric
assigns a probability score for both upward and downward price movements
based on the queue imbalance. - The probability of an upward 1-tick movement
at time t is given by:
I(t) + 1
M P1-tick long (i, t) =
2
- The probability of a downward 1-tick movement at time t is given by:
- These probabilities scale the Queue Imbalance from the range of [−1, 1] to the
range of [0, 1], providing a score that represents the likelihood of a one-tick price
movement in the long or short direction. - A score close to 1 for M P1-tick long
suggests a strong likelihood of an upward price movement, while a score close to
1 for M P1-tick short suggests a strong likelihood of a downward price movement.
17
• Measurement 1 M PBezian (i, t): The Bayesian ML Microprobability at
time t, providing a predicted probability of an upward or downward price
movement.
• Measurement 2 M P1-tick (i, t): The 1-Tick Forecasting Metric at time
t, offering a complementary probability prediction based on queue imbal-
ance.
The goal of the Kalman filter is to combine these two measurements to obtain
a more accurate estimate of xt , the true probability of a price movement.
Where:
• x̂t|t−1 is the predicted state at time t, given the state at time t − 1.
• A is the state transition matrix (in our case, this can be considered as 1,
assuming a steady system).
• But−1 represents any control input, which can be 0 if no external forces
are applied.
We also predict the uncertainty Pt associated with the state:
Pt|t−1 = APt−1|t−1 A⊤ + Q
Where:
• Pt|t−1 is the predicted uncertainty of the state at time t.
• Q is the process noise covariance matrix, representing the uncertainty in
the prediction model.
2. Update Step: In the update step, we correct the prediction using the
actual measurements from both metrics, combining them based on their uncer-
tainties.
Kt = Pt|t−1 H ⊤ (HPt|t−1 H ⊤ + R)−1
Where:
• Kt is the Kalman gain, which adjusts how much weight to give to the
measurements.
• H is the measurement matrix, mapping the state space to the measure-
ment space.
18
• R is the measurement noise covariance matrix, representing the uncer-
tainty in the measurements from the two metrics.
The Kalman filter then updates the state estimate based on the new mea-
surements:
x̂t|t = x̂t|t−1 + Kt (zt − H x̂t|t−1 )
Where:
• zt is the measurement vector containing the two metrics: [M PBezian (i, t), M P1-tick (i, t)]⊤ .
• x̂t|t is the updated estimate of the state, combining both metrics.
Finally, the uncertainty estimate is updated:
Pt|t = (I − Kt H)Pt|t−1
Where:
• I is the identity matrix.
• Pt|t is the updated uncertainty in the state estimate.
19
• M Pfinal short (i, t) is the probability score for a short (downward) price
movement, constrained between 0 and 100.
• x̂max is a normalization factor to scale the output to the 0-100 range.
These scores represent the combined and filtered predictions of the two met-
rics, offering a dynamic and adaptive tool for short-term market predictions.
20
Part II
Market Imbalance
Scanners: Counterspoofing,
Liquidity Imbalances, and
Aggressive Order
Identification
21
Chapter 3
Classification of Spoofing
Phases and Determination
of Optimal Trade Timing
(Counterspoof Scanner
22
spoofing trade to trigger, the spoofing score must exceed a threshold of 0.95,
indicating a 95% certainty that spoofing is occurring.
Once spoofing is detected, the next step is to check the **Microprobability
Congruity Metric** (if enabled). This further refines the decision by ensuring
that the expected price direction aligns with the spoofing behavior.
23
The model was trained using a supervised learning approach, where each
phase of the spoofing cycle was labeled. After sufficient training and cross-
validation, the model achieved high accuracy in predicting spoofing phases based
solely on order book behavior observable in real-time.
• If the spoofer is building large orders on the ask side, a long trade is
triggered (expecting an upward movement once the spoof is removed).
• If the spoofer is stacking orders on the bid side, a short trade is initiated
(anticipating a downward price move).
• If the current part of the cycle is ambiguous or suggests minimal move-
ment, the system may choose to wait until a clearer signal is generated.
24
Figure 3.1: Impact of Spoofing Detection and Microforecasting Score on Sharpe
Ratio
25
croforecasting significantly enhances its effectiveness. This synergy elevates the
Sharpe ratio to levels rarely seen in trading systems and demonstrates the im-
portance of blending detection and forecasting mechanisms for optimal market
performance.
26
Chapter 4
27
The scanner is designed to identify these rare occurrences with precision by
filtering out the noise typically present in high-frequency trading environments.
Although these events are infrequent, their detection is critical due to the sig-
nificant price impact they can create. The focus on isolating these aggressive
orders allows the system to accurately capture moments of market imbalance,
which are often followed by rapid price changes.
By concentrating on these specific market conditions, the DEX-Array Scan-
ner ensures that only high-confidence signals are processed. This approach min-
imizes false positives and leverages the full potential of the scanner’s design to
identify key market shifts with a degree of reliability that can only be achieved
through meticulous filtering of market data.
• For a buy order: We check if the new bid price (pbid new ) has risen to
the level of the aggressive order or higher.
• For a sell order: We check if the new ask price (pask new ) has fallen to
the level of the aggressive order or lower.
28
(
1 if pask new ≤ porder
gsell (porder , pask new ) =
0 otherwise
If the aggressive order causes the bid (for a buy) or the ask (for a sell) to
move to the order’s price, the algorithm confirms that the aggressive action
influenced the market.
1 if SS < 0.4 and
( M Pmax > 0.82 and HFΩ > 0
1 if M Plong = M Pmax
h(M Plong , M Pshort , Dtrade , SS, HFΩ ) = and Dtrade =
−1 if M Pshort = M Pmax
0 otherwise
29
4.6 Conclusion on DEX-ARRAY Scanner
In conclusion, the DEX-Array Scanner demonstrates a notable capacity to de-
tect and act upon subtle yet impactful market phenomena. The combination of
aggressive order detection, verification mechanisms, and the integration of ad-
vanced metrics such as the Microprobability Metric and Spoofing Score enables
it to identify rare but meaningful shifts in the order book. While the scanner’s
activity is relatively infrequent, its precision in capturing these moments, sup-
ported by robust filtering processes, makes it a reliable component in the broader
scope of algorithmic trading strategies. The efficacy of the DEX-Array Scanner
lies not in its frequency of activation but in its ability to discern high-conviction
trading opportunities within a complex and dynamic market structure.
30
Chapter 5
Liquidity Imbalance
Detection and Scalability in
High-Frequency
Environments (Trespass
Scanner)
31
balance detection is flexible, allowing traders to combine multiple tick levels
in cases where tick size is smaller (e.g., NQ or traditional stocks), making it
adaptable to various market conditions.
The stacked liquidity imbalance between the bid and ask sides can be quan-
tified as:
PL PL
i=1 Vbid,i − i=1 Vask,i
ILOB = PL PL
i=1 Vbid,i + i=1 Vask,i
This imbalance ratio measures the relative strength of the buy and sell sides
across multiple levels of the LOB. The logic behind tick stacking allows the user
to smooth out noise, particularly in instruments with smaller tick sizes, where
single-level imbalances may not provide reliable signals. By aggregating these
levels, we obtain a clearer picture of true liquidity conditions.
32
5.5 Trade Validation and Filtering
After detecting and filtering the liquidity imbalance, the Trespass Scanner val-
idates the signal through a series of checks. First, the signal must align with
both the **Microprobability Metric** and the **Spoofing Probability Score**.
Let:
• M Pdir be the Microprobability Metric in the detected trade direction (ei-
ther long or short),
• SP be the Spoofing Probability Score.
For a trade to be executed, the following conditions must be met:
1. M Pdir must confirm the detected trade signal, i.e., the predicted prob-
ability of price movement in the desired direction must exceed a preset
threshold,
2. SP < 0.2, meaning the likelihood of spoofing behavior must be less than
20%.
The decision to proceed with a trade is defined as:
(
1 if M Pdir > 0.8 and SP < 0.2
h(Ifiltered , M Pdir , SP ) =
0 otherwise
This rigorous validation ensures that only trades aligned with favorable prob-
abilities are executed, thereby enhancing the accuracy of the strategy.
33
For users who have disabled the Microprobability filter, the exit conditions
rely solely on the spoofing score, which acts as the primary exit trigger when
market manipulation is detected.
34
Part III
35
Chapter 6
Conclusion
This paper has explored various methodologies for detecting market manipula-
tion, predicting small factor movements, and classifying spoofing cycles in real-
time trading environments. The integration of probabilistic models, machine
learning, and proprietary filtering mechanisms provides an advanced toolkit for
algorithmic trading strategies. Each chapter presented in this study builds upon
existing financial research, with a focus on improving detection accuracy and
trade validation across high-frequency trading scenarios.
The Spoofing Detection Model introduced in Part I emphasizes the identifi-
cation of spoofing behavior through key market metrics such as High Quoting
Activity, Unbalanced Quoting, and Cyclical Patterns in Depth. These metrics
were shown to be effective at both the intraday and second-interval levels. The
use of machine learning techniques, such as random forests and boosted trees,
was demonstrated to significantly enhance the detection of spoofing, enabling
the identification of manipulation with high accuracy.
The Microprobability Metric, utilizing Bayesian techniques and Kalman fil-
tering, provided a robust framework for predicting market shifts at the tick level.
This metric’s ability to refine trade entry points and validate signal strength
demonstrates the power of combining probabilistic modeling with real-time limit
order book data. The integration of 1-tick and 2-tick forecasting offered further
layers of predictive insight, proving useful in both small and large factor envi-
ronments.
In Part II, the Counterspoofing Scanner built on the insights from the Spoof-
ing Detection Model by incorporating a machine learning classifier to detect
phases of the spoofing cycle. This allowed for a more precise determination of
trade timing, based on real-time spoofing activity. The preprocessing of court
case data for model training provided the framework necessary to align spoofing
cycle phases with optimal trade points. The interaction between market liquid-
ity and spoofers’ forced market exits further validated the effectiveness of this
model in high-frequency environments.
The DEX-Array Scanner capitalized on rare occurrences of aggressive limit
orders between the bid and ask, providing a unique tool for detecting shifts
36
in market sentiment. By isolating these aggressive orders, this scanner mini-
mized noise and provided high-confidence signals that were validated through
the Microprobability Metric and Spoofing Score.
Finally, the Trespass Scanner focused on detecting liquidity imbalances, us-
ing proprietary noise filtering techniques to stack tick levels and identify op-
portunities even in volatile environments. This scanner was responsible for a
high volume of trades, leveraging its focus on imbalances to generate consistent
alpha, with built-in risk management mechanisms for early trade exits.
37