SlideShare a Scribd company logo
SHUBHANGI
QUERYING OVER DISTRIBUTED
TABLES IN CITUS
QUERYING DISTRIBUTED TABLES IN CITUS
CONTENTS
▸ Query execution Overview
▸ Typical Features
▸ Joins
▸ Limit pushdown
▸ Aggregates
▸ topN
▸ Views
▸ Query Processing
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY EXECUTION
▸ Standard PostgreSQL select queries can be submitted
▸ Co-ordinator tasks
▸ Partition query into query fragments to parallelise joins, aggregates,
ordering etc
▸ Parallel execution speeds up execution
▸ Distribute fragments to worker nodes
▸ Monitor query executions on worker node
▸ Collect and merge results of query fragments
▸ Return result to client
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY EXECUTION
▸ Extends PostgreSQL for distributed execution
Query
Worker 1
Worker 2
Worker 3
Parallel execution of Joins , grouping , ordering etc
Query Fragm
ent 1
Query Fragment 2
Q
uery
Fragm
ent 3
Result
Co-ordinator
Receive, fragment and send
Query to worker
Gather, merge result from worker
and return to client
Result 1
Result 2
Result 3
QUERYING DISTRIBUTED TABLES IN CITUS
TYPICAL FEATURES - JOINS
▸ Equi-join on any # of tables supported
▸ Join over Co-located tables
▸ Efficient join
▸ Executes in parallel
▸ Prune away shards which will not involve in result
▸ Join over reference tables
▸ Repartition join
▸ Joins two tables on non-distribution column
▸ Dynamic re-partitioning is performed
▸ Query optimiser determines table to be re-partitioned
▸ Repartition requires data shuffling
▸ Table to repartition selected to reduce network traffic
QUERYING DISTRIBUTED TABLES IN CITUS
TYPICAL FEATURES - LIMIT PUSHDOWN
▸ “Limit” clause of SQL
▸ Limits number of rows returned in result
▸ CitusDB push down limit clause to worker nodes
▸ Reduces data transfer over network
▸ Approximate Limit clause
▸ Citus.limit_clause_row_fetch_count
▸ Limit number of rows returned by each task
▸ Default : Disabled
QUERYING DISTRIBUTED TABLES IN CITUS
TYPICAL FEATURES - AGGREGATE FUNCTIONS
▸ Query planner
▸ Transfers aggregate into associative and communicative form
▸ Helps in parallelisation of execution
▸ count(distinct)
▸ On Distribution column
▸ Count operation pushed down to worker node
▸ e.g. if sales is distributed on city column
▸ Then query - select count(distinct item_sold), city from sales group by city
▸ Count is computed by each worker node over shard and returned to co-
ordinator
QUERYING DISTRIBUTED TABLES IN CITUS
TYPICAL FEATURES - AGGREGATE FUNCTIONS CONT…
▸ Count(distinct)
▸ On non-distribution column
▸ Each worker node runs select distinct
▸ Worker returns list to co-ordinator
▸ Co-ordinator computes final count
▸ e.g. if sales is distributed on city column
▸ Then query - select count( distinct orders), item_sold from sales group by item_sold
▸ Each worker selects distinct product rows and sends them to co-ordinator , co-
ordinator computes count over received rows
▸ If distinct list is huge on worker nodes then queries are slow due to huge data transfer
QUERYING DISTRIBUTED TABLES IN CITUS
TYPICAL FEATURES - AGGREGATE FUNCTIONS CONT…
▸ Count(distinct)
▸ On non-distribution column
▸ Multiple aggregates in single query
▸ Select distinct statement on worker produces cross product hence lowering
performance
▸ Resolution : approximate count
▸ Extension HyperLogLog (hll)
▸ citus.count_distinct_error_rate
▸ Lower value - higher accuracy but more computation time
▸ Recommended value - 0.005
QUERYING DISTRIBUTED TABLES IN CITUS
TYPICAL FEATURES - TOPN
▸ topN
▸ Extension - approximates topN
▸ Data materialised into JSON
▸ Function - topn_add
▸ Updates JSON with object being added and corresponding
count
▸ N controlled by topn.number_of_counters
▸ Default value : 1000
QUERYING DISTRIBUTED TABLES IN CITUS
TYPICAL FEATURES - VIEWS
▸ Views are treated as subqueries
▸ Materialised views
▸ Stored on co-ordinator node as local tables
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ Data - Sharded and replicated on worker nodes
▸ Metadata - Exist on Co-ordinator node
▸ Query processing pipeline
▸ Distributed Query Planner and Executor
▸ PostgreSQL Planner and Executor
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ Distributed Query Planner
▸ Plans SQL for distributed execution
▸ Two types of query fragments
▸ Fragment executes on co-ordinator node
▸ Fragment executes on worker node
▸ Tasks
▸ Create plan tree
▸ Transform tree into cumulative and associative form for parallel execution
▸ Decides shards to route query fragments
▸ Rewrite query to reference shard table instead of original table
▸ Assign fragments to worker nodes for efficient resource usage
▸ Optimize query for lesser n/w IO
▸ Pass distributed plan to Distributed query Executor
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ Distributed Query Executor
▸ Resides on co-ordinator
▸ Following tasks
▸ Runs distributed query plan
▸ Executor connects to worker
▸ Send task to worker
▸ Oversee execution
▸ Handles failures
▸ If task fails on particular worker node then executor re-assign task to other worker
node where replica exist
▸ Executor process only failed sub-query
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ Distributed Query Executor
▸ Executor Types
▸ Real Time (for simple select, Insert, Update and Delete), Router
(for co-located single worker node data), Task Tracker (larger
select queries)
▸ Selected dynamically based on structure of query
▸ >=1 executor for single input query i.e. it examines subqueries
recursively
▸ EXPLAIN output displays executor type used
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ Distributed Query Executor
▸ Real Time Executor (Default executor)
▸ Suitable to get fast response from query containing aggregations, co-located joins, filters
▸ Steps
▸ Open per shard one connection to worker node
▸ Send query fragments
▸ Fetch Result of all fragments and merge them
▸ Send result back to user
▸ Drawback : FD and connection limits can be reached due to open connections to shards
▸ Increase OS limits
▸ Insert/Update/Delete : If fails on shard, executor marks that replica as invalid
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ Distributed Query Executor
▸ Real Time Executor (Default executor)
Co-ordinator
Real Time
Executor
Worker 1
Shard1 Execution
Shard2 execution
Shard3 execution
Worker 2
Shard1 Execution
Shard2 execution
Shard3 execution
Worker 3
Shard1 Execution
Shard2 execution
Shard3 execution
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ Distributed Query Executor
▸ Router Executor
▸ When all required data exist on single node , query is
sent to single worker node for execution
▸ Advantage : 100% SQL coverage
▸ Drawback : Loss of parallelism
19
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ Distributed Query Executor
▸ Task Tracker Executor
▸ Suitable for long running, complex DW queries , which requires re-partitioning
and shuffling intermediate data
▸ Opens one connection per worker node and sends all query fragments to task
tracker daemon on worker node
▸ Task tracker daemon on worker node schedules new task and monitors execution
▸ Task tracker executor on co-ordinator checks with deamons on worker node about
fragments execution
▸ citus.max_running_tasks_per_node : max tasks can be executed on any worker
node concurrently
▸ Distributed Query Executor
▸ Task Tracker Executor
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
Co-ordinator
Task
Tracker
Executor
Worker3
Task Tracker
Daemon
Task execution1
Task execution2
Task execution n
Worker 2
Task Tracker
Daemon
Task execution1
Task execution2
Task execution n
Worker 1
Task Tracker
Daemon
Task execution1
Task execution2
Task execution n
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ Subquery / CTE push-pull execution
▸ Citus gathers result of subqueries from worker nodes
and send them to worker node executing outer query
▸ Can use mixed type of executors
QUERYING DISTRIBUTED TABLES IN CITUS
QUERY PROCESSING
▸ PostgreSQL Query Planner and Executor
▸ Query fragments received by worker node are
processed as regular PostgreSQL queries
▸ PostgreSQL planner on worker node selects optimised
plan for query
▸ PostgreSQL executor runs query and return result back
THANK YOU

More Related Content

What's hot (20)

PPSX
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Araf Karsh Hamid
 
PPTX
Scientometrics for research assessment
Ludo Waltman
 
PPTX
Observability in the world of microservices
Chandresh Pancholi
 
PPTX
Good practices in patient involvement in HTA
EUPATI
 
PPTX
What is DevOps? What is DevOps CoE?
7Targets AI Sales Assistants
 
PDF
Big Data
Seminar Links
 
PDF
Data Analytics: From Basic Skills to Executive Decision-Making
Training Industry Conference & Expo
 
PDF
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
Edureka!
 
PPTX
Migrating Data and Databases to Azure
Karen Lopez
 
PDF
Sistemas de recomendacion
Mauricio Olguin
 
PDF
Activate Data Governance Using the Data Catalog
DATAVERSITY
 
PDF
VOSviewer: A software tool for analyzing and visualizing scientific literature
Nees Jan van Eck
 
PPT
Business intelligence
Dr. Dipti Patil
 
PDF
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
TheFamily
 
PPTX
Business Intelligence Presentation
Harrison Chisomo Chisonga
 
PPTX
Observability
Enes Altınok
 
PPTX
Monitoring and observability
Danylenko Max
 
PPTX
Azure Data Engineer Certification | How to Become Azure Data Engineer
Intellipaat
 
PDF
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
PPTX
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Araf Karsh Hamid
 
Scientometrics for research assessment
Ludo Waltman
 
Observability in the world of microservices
Chandresh Pancholi
 
Good practices in patient involvement in HTA
EUPATI
 
What is DevOps? What is DevOps CoE?
7Targets AI Sales Assistants
 
Big Data
Seminar Links
 
Data Analytics: From Basic Skills to Executive Decision-Making
Training Industry Conference & Expo
 
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
Edureka!
 
Migrating Data and Databases to Azure
Karen Lopez
 
Sistemas de recomendacion
Mauricio Olguin
 
Activate Data Governance Using the Data Catalog
DATAVERSITY
 
VOSviewer: A software tool for analyzing and visualizing scientific literature
Nees Jan van Eck
 
Business intelligence
Dr. Dipti Patil
 
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
TheFamily
 
Business Intelligence Presentation
Harrison Chisomo Chisonga
 
Observability
Enes Altınok
 
Monitoring and observability
Danylenko Max
 
Azure Data Engineer Certification | How to Become Azure Data Engineer
Intellipaat
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
 

Similar to Querying Distributed Tables in Citus (6)

PDF
Citus Architecture: Extending Postgres to Build a Distributed Database
Ozgun Erdogan
 
PDF
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Citus Data
 
PDF
Let's scale-out PostgreSQL using Citus (English)
Noriyoshi Shinoda
 
PDF
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Spark Summit
 
PDF
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Citus Data
 
PDF
What’s new in 9.6, by PostgreSQL contributor
Masahiko Sawada
 
Citus Architecture: Extending Postgres to Build a Distributed Database
Ozgun Erdogan
 
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Citus Data
 
Let's scale-out PostgreSQL using Citus (English)
Noriyoshi Shinoda
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Spark Summit
 
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Citus Data
 
What’s new in 9.6, by PostgreSQL contributor
Masahiko Sawada
 
Ad

Recently uploaded (20)

PPTX
UI5Con 2025 - Get to Know Your UI5 Tooling
Wouter Lemaire
 
PDF
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
PPTX
UI5Con 2025 - Beyond UI5 Controls with the Rise of Web Components
Wouter Lemaire
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
PDF
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
GITLAB-CICD_For_Professionals_KodeKloud.pdf
deepaktyagi0048
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
PPTX
Lifting and Rigging Safety AQG-2025-2.pptx
farrukhkhan658034
 
PDF
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
UI5Con 2025 - Get to Know Your UI5 Tooling
Wouter Lemaire
 
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
UI5Con 2025 - Beyond UI5 Controls with the Rise of Web Components
Wouter Lemaire
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
GITLAB-CICD_For_Professionals_KodeKloud.pdf
deepaktyagi0048
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
Lifting and Rigging Safety AQG-2025-2.pptx
farrukhkhan658034
 
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
Ad

Querying Distributed Tables in Citus

  • 2. QUERYING DISTRIBUTED TABLES IN CITUS CONTENTS ▸ Query execution Overview ▸ Typical Features ▸ Joins ▸ Limit pushdown ▸ Aggregates ▸ topN ▸ Views ▸ Query Processing
  • 3. QUERYING DISTRIBUTED TABLES IN CITUS QUERY EXECUTION ▸ Standard PostgreSQL select queries can be submitted ▸ Co-ordinator tasks ▸ Partition query into query fragments to parallelise joins, aggregates, ordering etc ▸ Parallel execution speeds up execution ▸ Distribute fragments to worker nodes ▸ Monitor query executions on worker node ▸ Collect and merge results of query fragments ▸ Return result to client
  • 4. QUERYING DISTRIBUTED TABLES IN CITUS QUERY EXECUTION ▸ Extends PostgreSQL for distributed execution Query Worker 1 Worker 2 Worker 3 Parallel execution of Joins , grouping , ordering etc Query Fragm ent 1 Query Fragment 2 Q uery Fragm ent 3 Result Co-ordinator Receive, fragment and send Query to worker Gather, merge result from worker and return to client Result 1 Result 2 Result 3
  • 5. QUERYING DISTRIBUTED TABLES IN CITUS TYPICAL FEATURES - JOINS ▸ Equi-join on any # of tables supported ▸ Join over Co-located tables ▸ Efficient join ▸ Executes in parallel ▸ Prune away shards which will not involve in result ▸ Join over reference tables ▸ Repartition join ▸ Joins two tables on non-distribution column ▸ Dynamic re-partitioning is performed ▸ Query optimiser determines table to be re-partitioned ▸ Repartition requires data shuffling ▸ Table to repartition selected to reduce network traffic
  • 6. QUERYING DISTRIBUTED TABLES IN CITUS TYPICAL FEATURES - LIMIT PUSHDOWN ▸ “Limit” clause of SQL ▸ Limits number of rows returned in result ▸ CitusDB push down limit clause to worker nodes ▸ Reduces data transfer over network ▸ Approximate Limit clause ▸ Citus.limit_clause_row_fetch_count ▸ Limit number of rows returned by each task ▸ Default : Disabled
  • 7. QUERYING DISTRIBUTED TABLES IN CITUS TYPICAL FEATURES - AGGREGATE FUNCTIONS ▸ Query planner ▸ Transfers aggregate into associative and communicative form ▸ Helps in parallelisation of execution ▸ count(distinct) ▸ On Distribution column ▸ Count operation pushed down to worker node ▸ e.g. if sales is distributed on city column ▸ Then query - select count(distinct item_sold), city from sales group by city ▸ Count is computed by each worker node over shard and returned to co- ordinator
  • 8. QUERYING DISTRIBUTED TABLES IN CITUS TYPICAL FEATURES - AGGREGATE FUNCTIONS CONT… ▸ Count(distinct) ▸ On non-distribution column ▸ Each worker node runs select distinct ▸ Worker returns list to co-ordinator ▸ Co-ordinator computes final count ▸ e.g. if sales is distributed on city column ▸ Then query - select count( distinct orders), item_sold from sales group by item_sold ▸ Each worker selects distinct product rows and sends them to co-ordinator , co- ordinator computes count over received rows ▸ If distinct list is huge on worker nodes then queries are slow due to huge data transfer
  • 9. QUERYING DISTRIBUTED TABLES IN CITUS TYPICAL FEATURES - AGGREGATE FUNCTIONS CONT… ▸ Count(distinct) ▸ On non-distribution column ▸ Multiple aggregates in single query ▸ Select distinct statement on worker produces cross product hence lowering performance ▸ Resolution : approximate count ▸ Extension HyperLogLog (hll) ▸ citus.count_distinct_error_rate ▸ Lower value - higher accuracy but more computation time ▸ Recommended value - 0.005
  • 10. QUERYING DISTRIBUTED TABLES IN CITUS TYPICAL FEATURES - TOPN ▸ topN ▸ Extension - approximates topN ▸ Data materialised into JSON ▸ Function - topn_add ▸ Updates JSON with object being added and corresponding count ▸ N controlled by topn.number_of_counters ▸ Default value : 1000
  • 11. QUERYING DISTRIBUTED TABLES IN CITUS TYPICAL FEATURES - VIEWS ▸ Views are treated as subqueries ▸ Materialised views ▸ Stored on co-ordinator node as local tables
  • 12. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸
  • 13. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ Data - Sharded and replicated on worker nodes ▸ Metadata - Exist on Co-ordinator node ▸ Query processing pipeline ▸ Distributed Query Planner and Executor ▸ PostgreSQL Planner and Executor
  • 14. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ Distributed Query Planner ▸ Plans SQL for distributed execution ▸ Two types of query fragments ▸ Fragment executes on co-ordinator node ▸ Fragment executes on worker node ▸ Tasks ▸ Create plan tree ▸ Transform tree into cumulative and associative form for parallel execution ▸ Decides shards to route query fragments ▸ Rewrite query to reference shard table instead of original table ▸ Assign fragments to worker nodes for efficient resource usage ▸ Optimize query for lesser n/w IO ▸ Pass distributed plan to Distributed query Executor
  • 15. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ Distributed Query Executor ▸ Resides on co-ordinator ▸ Following tasks ▸ Runs distributed query plan ▸ Executor connects to worker ▸ Send task to worker ▸ Oversee execution ▸ Handles failures ▸ If task fails on particular worker node then executor re-assign task to other worker node where replica exist ▸ Executor process only failed sub-query
  • 16. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ Distributed Query Executor ▸ Executor Types ▸ Real Time (for simple select, Insert, Update and Delete), Router (for co-located single worker node data), Task Tracker (larger select queries) ▸ Selected dynamically based on structure of query ▸ >=1 executor for single input query i.e. it examines subqueries recursively ▸ EXPLAIN output displays executor type used
  • 17. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ Distributed Query Executor ▸ Real Time Executor (Default executor) ▸ Suitable to get fast response from query containing aggregations, co-located joins, filters ▸ Steps ▸ Open per shard one connection to worker node ▸ Send query fragments ▸ Fetch Result of all fragments and merge them ▸ Send result back to user ▸ Drawback : FD and connection limits can be reached due to open connections to shards ▸ Increase OS limits ▸ Insert/Update/Delete : If fails on shard, executor marks that replica as invalid
  • 18. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ Distributed Query Executor ▸ Real Time Executor (Default executor) Co-ordinator Real Time Executor Worker 1 Shard1 Execution Shard2 execution Shard3 execution Worker 2 Shard1 Execution Shard2 execution Shard3 execution Worker 3 Shard1 Execution Shard2 execution Shard3 execution
  • 19. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ Distributed Query Executor ▸ Router Executor ▸ When all required data exist on single node , query is sent to single worker node for execution ▸ Advantage : 100% SQL coverage ▸ Drawback : Loss of parallelism 19
  • 20. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ Distributed Query Executor ▸ Task Tracker Executor ▸ Suitable for long running, complex DW queries , which requires re-partitioning and shuffling intermediate data ▸ Opens one connection per worker node and sends all query fragments to task tracker daemon on worker node ▸ Task tracker daemon on worker node schedules new task and monitors execution ▸ Task tracker executor on co-ordinator checks with deamons on worker node about fragments execution ▸ citus.max_running_tasks_per_node : max tasks can be executed on any worker node concurrently
  • 21. ▸ Distributed Query Executor ▸ Task Tracker Executor QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING Co-ordinator Task Tracker Executor Worker3 Task Tracker Daemon Task execution1 Task execution2 Task execution n Worker 2 Task Tracker Daemon Task execution1 Task execution2 Task execution n Worker 1 Task Tracker Daemon Task execution1 Task execution2 Task execution n
  • 22. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ Subquery / CTE push-pull execution ▸ Citus gathers result of subqueries from worker nodes and send them to worker node executing outer query ▸ Can use mixed type of executors
  • 23. QUERYING DISTRIBUTED TABLES IN CITUS QUERY PROCESSING ▸ PostgreSQL Query Planner and Executor ▸ Query fragments received by worker node are processed as regular PostgreSQL queries ▸ PostgreSQL planner on worker node selects optimised plan for query ▸ PostgreSQL executor runs query and return result back