87776104
87776104
com
OR CLICK HERE
DOWLOAD EBOOK
ebooknice.com
ebooknice.com
https://ebooknice.com/product/sat-ii-success-
math-1c-and-2c-2002-peterson-s-sat-ii-success-1722018
ebooknice.com
ebooknice.com
(Ebook) Beginning Azure Synapse Analytics: Transition from
Data Warehouse to Data Lakehouse by Bhadresh Shiyal ISBN
9781484270615, 1484270614
https://ebooknice.com/product/beginning-azure-synapse-analytics-
transition-from-data-warehouse-to-data-lakehouse-32702264
ebooknice.com
ebooknice.com
ebooknice.com
ebooknice.com
ebooknice.com
The Azure Data
Lakehouse Toolkit
Building and Scaling Data
Lakehouses on Azure with Delta
Lake, Apache Spark, Databricks,
Synapse Analytics, and Snowflake
Ron L’Esteve
The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure
with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake
Ron L’Esteve
Chicago, IL, USA
Introduction������������������������������������������������������������������������������������������������������������xxi
Advanced Analytics��������������������������������������������������������������������������������������������������������������� 33
Cognitive Services����������������������������������������������������������������������������������������������������������������� 34
Machine Learning������������������������������������������������������������������������������������������������������������������ 35
Continuous Integration, Deployment, and Governance��������������������������������������������������������������� 36
DevOps���������������������������������������������������������������������������������������������������������������������������������� 36
Purview��������������������������������������������������������������������������������������������������������������������������������� 39
Summary������������������������������������������������������������������������������������������������������������������������������������ 41
Chapter 3: Databricks��������������������������������������������������������������������������������������������� 83
Workspaces�������������������������������������������������������������������������������������������������������������������������������� 84
Data Science and Engineering���������������������������������������������������������������������������������������������� 84
Machine Learning������������������������������������������������������������������������������������������������������������������ 91
SQL���������������������������������������������������������������������������������������������������������������������������������������� 93
Compute�������������������������������������������������������������������������������������������������������������������������������������� 96
Storage�������������������������������������������������������������������������������������������������������������������������������������� 100
Mount Data Lake Storage Gen2 Account����������������������������������������������������������������������������� 101
Delta Lake��������������������������������������������������������������������������������������������������������������������������������� 114
Reporting���������������������������������������������������������������������������������������������������������������������������������� 115
Real-Time Analytics������������������������������������������������������������������������������������������������������������������ 117
Advanced Analytics������������������������������������������������������������������������������������������������������������������� 120
Security and Governance���������������������������������������������������������������������������������������������������������� 121
Continuous Integration and Deployment����������������������������������������������������������������������������������� 125
Integration with Synapse Analytics������������������������������������������������������������������������������������������� 126
Dynamic Data Encryption���������������������������������������������������������������������������������������������������������� 127
Data Profile������������������������������������������������������������������������������������������������������������������������������� 129
Query Profile����������������������������������������������������������������������������������������������������������������������������� 129
Constraints�������������������������������������������������������������������������������������������������������������������������������� 133
Identity�������������������������������������������������������������������������������������������������������������������������������������� 133
vii
Table of Contents
viii
Table of Contents
ix
Table of Contents
x
Table of Contents
xi
Table of Contents
xii
Table of Contents
xiii
Table of Contents
Index��������������������������������������������������������������������������������������������������������������������� 459
xiv
About the Author
Ron L’Esteve is a professional author, trusted technology
leader, and digital innovation strategist residing in Chicago,
IL, USA. He is well known for his impactful books and
award-winning article publications about Azure Data and AI
architecture and engineering. He possesses deep technical
skills and experience in designing, implementing, and
delivering modern Azure Data and AI projects for numerous
clients around the world.
Having several Azure Data, AI, and Lakehouse
certifications under his belt, Ron has been a trusted and go-
to technical advisor for some of the largest and most impactful Azure implementation
projects on the planet. He has been responsible for scaling key data architectures,
defining the roadmap and strategy for the future of data and business intelligence needs,
and challenging customers to grow by thoroughly understanding the fluid business
opportunities and enabling change by translating them into high-quality and sustainable
technical solutions that solve the most complex challenges and promote digital
innovation and transformation.
Ron is a gifted presenter and trainer, known for his innate ability to clearly articulate
and explain complex topics to audiences of all skill levels. He applies a practical and
business-oriented approach by taking transformational ideas from concept to scale. He
is a true enabler of positive and impactful change by championing a growth mindset.
xv
About the Technical Reviewer
Diego Poggioli is an analytics architect with over ten years of
experience managing big data and machine learning projects.
At Avanade, he is currently helping clients to identify,
document, and translate business strategy and requirements
into solutions and services that help clients to achieve their
business outcomes using analytics.
With a passion for software development, cloud, and
serverless innovations, he has worked on various product
development initiatives spanning IaaS, PaaS, and SaaS.
He is also interested in the emerging topics of artificial intelligence, machine
learning, and large-scale data processing and analytics.
He holds several Microsoft Azure and AWS certifications.
His hobbies are handcraft with driftwood, playing with his two kids, and learning
new technologies. He currently lives in Bologna, Italy.
xvii
Acknowledgments
The journey to completing this book has inspired me to learn more, dream more, and do
more. I am thankful to my family, friends, and supporters who have fueled me with the
determination and inspiration to write this book.
xix
Introduction
With the rise of data volume, velocity, and variety in the modern data and analytics
platform on Azure, there is an ever-growing demand for innovative low-cost storage and
on-demand compute options that are centered around their decoupled capabilities.
This book will enhance your understanding around some of the practical methods of
designing and implementing the Data Lakehouse paradigm on Azure by demonstrating
the capabilities of Apache Spark and Delta Lake to build cutting-edge modern
Lakehouse solutions on Databricks, Synapse Analytics, and Snowflake. You will gain an
understanding of these various technologies and how they fit into the modern data and
analytics Lakehouse paradigm by supporting the needs of ingestion, processing, storing,
serving, reporting, and consumption. You’ll also gain a better understanding of how
machine learning, data governance, and continuous integration and deployment play a
role in the Lakehouse.
The Data Lakehouse paradigm on Azure, which leverages Apache Spark and Delta
Lake heavily, has become a popular choice for big data engineering, ELT (extraction,
loading, and transformation), AI/ML, real-time data processing, reporting, and querying
use cases. In some scenarios of the Lakehouse paradigm, Spark coupled with MPP is
great for big data reporting and BI Analytics platforms that require heavy querying and
performance capabilities since MPP is based on the traditional RDBMS and brings
with it the best features of SQL Server, such as automated query tuning, data shuffling,
ease of analytics platform management, even data distribution based on a primary
key, and much more. As the Lakehouse matures, specifically with Delta Lake, it begins
to demonstrate its capabilities of supporting many critical features, such as ACID
(atomicity, consistency, isolation, and durability)-compliant transactions for batch and
streaming jobs, data quality enforcement, and highly optimized performance tuning.
In the upcoming chapters of this book, we will unravel the many complexities
of understanding the technologies used within the Lakehouse paradigm along with
their capabilities through hands-on, scenario-based exercises. You will learn how
to implement advanced performance optimization tools and patterns for Spark
performance improvement in the Lakehouse by using partitioning, indexing, and other
tuning options. You will also learn about the capabilities of Delta Lake which include
xxi
Introduction
schema evolution, change feed, Live Tables, sharing, and clones. Finally, you will
gain more knowledge about some of the advanced capabilities within the Lakehouse,
such as building and installing custom Python libraries, implementing security and
controls, and working with event-driven autoloading data on the Lakehouse platform.
The chapters presented within this book are intended to equip you with the right skills
and knowledge to design and implement a modern Lakehouse by serving as your Data
Lakehouse Toolkit.
xxii
CHAPTER 1
Background
Prior to the introduction of massively parallel processing (MPP) architectures in the early
1990s, the analytics database market was dominated by symmetrical multiprocessing
(SMP) architecture since around the 1970s. SMP had drawbacks around sizing,
scalability, workload management, resilience, and availability, and the MPP architecture
addressed many of these SMP drawbacks related to performance, scalability, high
availability, and read/write throughput. MPP had drawbacks related to cost, a critical
need for data distribution; downtime for adding new nodes and redistributing data;
limited ability to scale up compute resources on-demand for real-time processing needs;
and potential for overcapacity given the limitations to isolate storage from compute.
A Resilient Distributed Dataset (RDD) in Spark is similar to a distributed table in
MPP in that many of the RDD operations have an equivalent MPP operation. RDD does,
however, offer better options for real-time processing needs, ability to scale up nodes
for batch processing, while also scaling storage (Data Lake) independently and cost-
effectively from compute. Also, it is recommended over MPP for highly unstructured
data processing (text, images, video, and more). Additionally, it offers the capability for
large-scale advanced analytics (AI, ML, text/sentiment analysis, and more). Seeing the many
benefits that Spark and the modern Data Lakehouse platform have to offer, customers
are interested in understanding and getting started with the Data Lakehouse paradigm.
3
© Ron L’Esteve 2022
R. L’Esteve, The Azure Data Lakehouse Toolkit, https://doi.org/10.1007/978-1-4842-8233-5_1
Chapter 1 the Data Lakehouse paraDigm
Given the multitude of cloud offerings which include Amazon Web Services (AWS),
Google Cloud Platform (GCP), and Microsoft Azure, there are clearly multiple cloud-based
offerings for building out a Lakehouse. Many large organizations leverage multi-cloud
platforms and product offerings as part of their technology stack. This book will focus on
the Azure Data Lakehouse paradigm since it integrates well with a variety of other cloud
platforms and service providers.
The Data Lakehouse paradigm on Azure leverages Apache Spark and Delta Lake
heavily. Apache Spark is an open source unified analytics engine for large-scale data
processing which provides an interface for programming clusters which includes data
parallelism and fault tolerance. Similarly, Delta Lake is also open source and provides a
reliable Data Lake Storage layer which runs on top of an existing Data Lake. It provides
ACID-compliant transactions, scalable metadata handling, unified streaming, and
batch data processing and is fully compatible with Apache Spark–based APIs. Both
Spark and Delta Lake have become popular choices for big data engineering, ELT,
AI/ML, real-time data processing, reporting, and querying use cases. In some scenarios
of the Lakehouse paradigm, Spark coupled with MPP is great for big data reporting
and BI Analytics platforms that require heavy querying and performance capabilities
since MPP is based on the traditional RDBMS and brings with it the best features of
SQL Server, such as automated query tuning, data shuffling, ease of analytics platform
management, even data distribution based on a primary key, and much more. This
may be a more common scenario as the Lakehouse paradigm is in its infancy. As the
Lakehouse matures, specifically with Delta Lake, it can support critical features such as
ACID transactions for batch and streaming jobs, data quality enforcement, and highly
optimized performance tuning.
While Spark has traditionally been designed for large data processing, the
advancement of Spark is a hot industry topic that would help with aligning the
Lakehouse paradigm with the best features of traditional RDBMS to address Lakehouse
performance issues.
Architecture
As numerous modern Data Lakehouse technologies become generally available on the
Azure data platform with demonstrated capabilities of outperforming traditional on-
premises and cloud databases and warehouses, it is important to begin understanding
this Lakehouse architecture, the typical components utilized in the Lakehouse paradigm,
4
Chapter 1 the Data Lakehouse paraDigm
and how they all come together and contribute to realizing the modern Data Lakehouse
architecture. Azure’s modern resource-based consumption model for PaaS and SaaS
services empowers developers, engineers, and end users to use the platforms, tools, and
technologies that best serve their needs. That being said, there are many Azure resources
that serve various purposes in the Lakehouse architecture. The capabilities of these tools
will be covered in greater detail in subsequent sections. From a high level, they serve the
purpose of ingesting, storing, processing, serving, and consuming data.
From a compute perspective, Apache Spark is the gold standard for all things
Lakehouse. It is a multi-language engine for executing data engineering, data science,
and machine learning on single-node machines or clusters and is prevalent in Data
Factory’s Mapping Data Flows (DF), Databricks, and Synapse Analytics and typically
powers compute of the Lakehouse. As we grow and evolve the Lakehouse paradigm,
Photon (https://databricks.com/product/photon), a new execution engine on the
Databricks Lakehouse platform, provides even better performance than Apache Spark
and is fully compatible with Apache Spark APIs. When there is an opportunity to utilize
Photon for large data processing initiatives in supported platforms such as Databricks, it
would certainly be a sound option.
With storage being cheap within the Data Lake comes the idea of the Lakehouse;
however, there is a lack of ACID-compliant features within Data Lakes that persist files
as parquet format. Delta Lake is an open source storage layer that guarantees data
atomicity, consistency, isolation, and durability in the lake. In short, Delta Lake is ACID
compliant. In addition to providing ACID transactions, scalable metadata handling,
and more, Delta Lake runs on an existing Data Lake and is compatible with Apache
Spark APIs. There are a few methods of getting started with Delta Lake. Databricks
offers notebooks along with compatible Apache Spark APIs to create and manage Delta
Lakes. Alternatively, Azure Data Factory’s (ADF) Mapping Data Flows, which uses scaled
out Apache Spark clusters, can be used to perform ACID-compliant CRUD operations
through GUI-designed ETL pipelines.
From a data serving and consumption perspective, there are a variety of tools on
the Azure platform including Synapse Analytics Serverless and Dedicated Pools for
storage and ad hoc querying, along with Power BI (PBI) for robust reporting dashboards
and visualizations. Outside of the Azure platform, Snowflake also serves as a strong
contender to the Synapse Dedicated Pool (SQL DW) as a dedicated data warehouse. All
of these various tools and technologies serve a purpose in the Lakehouse architecture
which make it a necessity to include them for the particular use case that they best serve
rather than choosing between one and the other. The Lakehouse architecture, shown
5
Chapter 1 the Data Lakehouse paraDigm
in Figure 1-1, also supports deep advanced analytics and AI use cases with cognitive
services, Azure ML, Databricks ML, and Power BI AI/ML capabilities. Finally, it is
critical to build in DevOps best practices within any data platform, and the Lakehouse
supports multi-environment automated continuous integration and delivery (CI/CD)
best practices using Azure DevOps. All these modern Lakehouse components can be
automated with CI/CD, scheduled and monitored for health and performance with
Azure Monitor and App Insights, secured with Key Vault, and governed with Purview.
Data Factory
Azure Data Factory is a cloud-based ELT platform used for data integration and
transformation within its graphical user interface, much like Microsoft’s traditional
SQL Server Integration Services (SSIS) on-premises toolset. Data Factory supports tight
6
Chapter 1 the Data Lakehouse paraDigm
integration with on-premises data sources using the self-hosted Integration Runtime (IR).
The IR is the compute infrastructure used by Data Factory and Synapse Pipelines to
provide data integration capabilities across different network environments including
on-premises. This capability has positioned this service as a tool of choice for integrating
a combination of on-premises and cloud sources using reusable and metadata-driven
ELT patterns. For cloud sources, the Azure IR is recommended, and the SSIS IR is used
for running the SSIS engine which allows you to natively execute SSIS packages. An
Integration Runtime is the compute resource infrastructure used by Azure Data Factory
to provide data integration capabilities, such as data flows and data movement, which
has access to resources in either public, private, or hybrid network scenarios.
Within ADF, Integration Runtimes (IR) are the compute resource infrastructure used
to provide data integration capabilities, such as data flows and data movement. ADF has
the following three IR types:
Data Factory also supports complex transformations with its Mapping Data Flow
service which can be used to build transformations, Slowly Changing Dimensions (SCD),
and more. Mapping Data Flow utilizes Databricks Apache Spark clusters under the hood
and has a number of mechanisms for optimizing and partitioning data.
Mapping Data Flows are visually designed data transformations in Azure Data
Factory. Data flows allow data engineers to develop data transformation logic without
writing code. The resulting data flows are executed as activities within Azure Data
Factory pipelines that use scaled out Apache Spark clusters. We will explore some of
the capabilities of Mapping Data Flows in Chapters 11 through 15 for data warehouse
ETL using Slowly Changing Dimensions (SCD) Type I, big Data Lake aggregations,
7
Chapter 1 the Data Lakehouse paraDigm
incremental upserts, and Delta Lake. There are many other use cases around the
capabilities of Mapping Data Flows that will not be covered in this book (e.g.,
dynamically splitting big files into multiple small files, manage small file problems, Parse
transformations, and more), and I would encourage you to research many of these other
capabilities of Mapping Data Flows.
Additionally, data can be transformed through Stored Procedure activities in the
regular Copy Data activity of ADF.
There are three different cluster types available in Mapping Data Flows: general
purpose, memory optimized, and compute optimized. The following is a description
of each:
ADF also has a number of built-in and custom activities which integrate with other
Azure services ranging from Databricks, Functions, Login Apps, Synapse, and more.
Data Factory also has connectors for other cloud sources including Oracle Cloud,
Snowflake, and more. Figure 1-2 shows a list of the various activities that are supported
by ADF. When expanded, each activity contains a long list of customizable activities
for ELT. While Data Factory is typically designed for batch ELT, its robust event-driven
scheduling triggers can also support event-driven real-time processes, although
Databricks Structured Streaming is typically recommended for all things streaming.
Data Factory also has robust scheduling and monitoring capabilities with verbose
error logging and alerting to support traceability and restart capabilities of pipelines.
Pipelines that are built in ADF can be dynamic and parameterized, which contribute to
the reusability of pipelines that are driven by robust audit, balance, and control (ABC)
8
Chapter 1 the Data Lakehouse paraDigm
ingestion frameworks. Data Factory also securely integrates with Key Vault for secret and
credential management. Synapse Pipelines within Synapse Analytics workspace has a
very similar UI as Databricks as it continues to evolve into the Azure standard unified
analytics platform for the Lakehouse. Many of the same ADF pipelines can be built with
Synapse Pipelines with a few exceptions. For features and capabilities that are available
in the more Data Factory V2 toolbox, choose Data Factory V2 as the ELT tool.
Data Factory supports over 90 sources and sinks as part of its ingestion and load
process. In the subsequent chapters, you will learn how to create pipelines, datasets,
linked services, and activities in ADF. The following list shows and defines the various
components of a standard ADF pipeline:
9
Chapter 1 the Data Lakehouse paraDigm
• Datasets are named views of data that simply points or references the
data you want to use in your activities within the pipeline.
Currently, there are a few limitations with ADF. Some of these limitations include
1. Inability to add a For Each activity or Switch activity to an If
activity.
Some of these limitations are either on the ADF product team’s roadmap for future
enhancement, or there are custom solutions and workarounds. For example, the lack
of modular scheduling within a single pipeline can be offset by leveraging tools such as
Azure Functions, Logic Apps, Apache Airflow, and more. Ensure that you have a modular
design with many pipelines to work around other limitations such as the 40 activities per
pipeline limit. Additional limitations include 100 queued runs per pipeline and 1,000
concurrent pipeline activity runs per subscription per Azure Integration Runtime region.
Databricks
Databricks can also be used for data ingestion and is typically well suited for cloud and
streaming sources. While Databricks can certainly be connected to an on-premises
network using an architecture similar to Figure 1-3, it is an unnecessarily complex path
to access on-premises data sources given the robust capabilities of ADF’s self-hosted
IR. When connecting to on-premises sources, try to use ADF as much as possible. On
the other hand, from a streaming perspective, Databricks leverages Apache Spark to
ingest and transform real-time big data sources with ease. It can store the data in a
variety of file formats including parquet, Avro, delta, and more. These storage formats
will be covered in more detail within the storage section. When combined with ADF for
ingestion, Databricks can be a powerful customizable component in the Lakehouse data
ingestion framework.
10
Chapter 1 the Data Lakehouse paraDigm
11
Chapter 1 the Data Lakehouse paraDigm
SELECT
YEAR(pickup_datetime) AS year,
passenger_count,
COUNT(*) AS cnt
FROM
OPENROWSET(
BULK 'https://adls2.blob.core.windows.net/delta-lake/data/*.
parquet',
12
Chapter 1 the Data Lakehouse paraDigm
FORMAT='DELTA'
) nyc
WHERE
nyc.year = 2021
AND nyc.month IN (1, 2, 3)
AND pickup_datetime BETWEEN CAST('1/1/2021' AS datetime) AND
CAST('12/31/2021' AS datetime)
GROUP BY
passenger_count,
YEAR(pickup_datetime)
ORDER BY
YEAR(pickup_datetime),
passenger_count;
Stream Analytics
Azure Stream Analytics is an event-processing engine which allows examining high
volumes of data streaming from devices, sensors, websites, social media feeds,
applications, etc. It is easy to use and based on simple SQL query language. Additionally,
it is a fully managed (PaaS) offering on Azure that can run large-scale analytics jobs that
are optimized for cost since users only pay for streaming units that are consumed.
Azure Stream Analytics also offers built-in machine learning functions that can be
wrapped in SQL statements to detect anomalies. This anomaly detection capability
coupled with Power BI’s real-time streaming service makes it a powerful real-time
anomaly detection service. While there are a few use cases that can benefit from
Stream Analytics, the Lakehouse architecture has been adopting Databricks Structured
Streaming as the stream processing technology of choice since it can leverage Spark
for compute of big delta and parquet format datasets and has other valuable features
including advanced schema evolution, support for ML use cases, and more. As we wrap
up this section on Stream Analytics, the following script demonstrates just how easy it
is to write a Stream Analytics SQL query using the anomaly detection function. Stream
Analytics is a decent choice for a stream processing technology that is needed to process
real-time data from an IoT Hub which can then also be read in real time by a Power BI
dashboard.
13
Chapter 1 the Data Lakehouse paraDigm
WITH AnomalyDetectionStep AS
(
SELECT
EVENTENQUEUEDUTCTIME AS time,
CAST(temperature AS float) AS temp,
AnomalyDetection_SpikeAndDip(CAST(temperature AS float), 95, 120,
'spikesanddips')
OVER(LIMIT DURATION(second, 120)) AS SpikeAndDipScores
FROM IoTHub
)
SELECT
time,
temp,
CAST(GetRecordPropertyValue(SpikeAndDipScores, 'Score') AS float) AS
SpikeAndDipScore,
CAST(GetRecordPropertyValue(SpikeAndDipScores, 'IsAnomaly') AS
bigint) AS
IsSpikeAndDipAnomaly
INTO IoTPowerBIOutput
FROM AnomalyDetectionStep
14
Chapter 1 the Data Lakehouse paraDigm
• Snapshot window groups events that have the same timestamp. You
can apply a snapshot window by adding System.Timestamp() to the
GROUP BY clause.
Messaging Hubs
Azure’s IoT Hub and Event Hub are cloud service offerings which can ingest and
support large volumes of low-latency and high-reliability, real-time device data into
the Lakehouse. Azure IoT Hub connects IoT devices to Azure resources and supports
bidirectional communication capabilities between devices. IoT Hub uses Event Hubs
for its telemetry flow path. Event Hub is designed for high-throughput data streaming of
billions of requests per day. Both IoT and Event Hub fit well within both Stream Analytics
and Databricks Structured Streaming architectures to support the event-driven real-time
ingestion of device data for further processing and storage in the Lakehouse, as shown in
Figure 1-4.
15
Chapter 1 the Data Lakehouse paraDigm
Table 1-1 lists the differences in capabilities between Event Hubs and IoT Hubs.
16
Chapter 1 the Data Lakehouse paraDigm
Delta Lake
Delta Lake is an open source storage layer within the Lakehouse which guarantees
data atomicity, consistency, isolation, and durability (ACID compliance) in the
lake. Delta Lake runs on an existing Data Lake and is compatible with Apache Spark
APIs. Numerous Azure resources are Delta compatible including Synapse Analytics,
Databricks, Snowflake, Data Factory, and others. Databricks and Synapse Analytics
workspaces support queries that can be run on delta format files within the lake through
a variety of languages within notebooks. Additionally, Delta supports Apache Spark
APIs to create and manage Delta Lakes. Azure Data Factory’s Mapping Data Flows
can perform ACID-compliant Delta Lake CRUD operations through GUI-designed
ETL pipelines. Many of the ACID-compliant features of Delta Lake are included in the
following list:
When designing a Data Lake, it is important to design the appropriate zones and
folder structures, as shown in Figure 1-5. Typically, the Lakehouse can contain multiple
zones including raw, staging, and curated. In its most simplistic form, there exists a raw
and curated zone, as shown in Figure 1-5. The raw zone is basically the landing zone
where all disparate data sources including structured, semi-structured, and unstructured
data can land. The data in this zone is typically stored as parquet format but can also
support JSON, CSV, XML, images, and much more. Parquet files, in columnar format,
can be further compressed with what is called snappy compression which offers a
97.5% compression ratio. As data moves toward curation and consumption, there can
be various other zones in between ranging from data science zone to staging zone and
more. Databricks typically label their zones as Bronze, Silver, and Gold. Once the data
is ready for final curation, it would move to a curated zone which would typically be in
delta format and also serves as a consumption layer within the Lakehouse. It is typically
in this zone where the Lakehouse would store and serve their dimensional Lakehouse
models to consumers. A Data Lake can have multiple containers to segregate access, and
each Data Lake Storage account has up to five petabytes of storage capacity. There are a
variety of security-level access controls ranging from role-based access control (RBAC)
to shared access signature (SAS) that need to be considered when designing a Data Lake.
For more information on designing a Data Lake, check out my MSSQLTips article here:
www.mssqltips.com/sqlservertip/6807/design-azure-data-lake-store-gen2/.
Once curated data is available in the Delta Lake, it can be accessed via a variety of BI
tools including Synapse Analytics workspace for further analysis and insights, all within
the Lakehouse.
18
Chapter 1 the Data Lakehouse paraDigm
19
Chapter 1 the Data Lakehouse paraDigm
Power BI and non-Azure tools such as Tableau and more. Dedicated SQL Pools are
similar to Snowflake data warehouse and oftentimes are compared with Snowflake as
a data warehouse choice. Both fit certain use cases, and Dedicated SQL Pools are well
integrated with Azure and Synapse Analytics workspaces. The diagram in Figure 1-6
illustrates the architecture of Synapse Analytics Dedicated SQL Pools, specifically as it
relates to the control and compute nodes.
The two types of main data distributions that are used in Dedicated SQL Pools are
hash and round-robin distributed tables. Round robin is the default distribution type
for a table in a Dedicated SQL Pool, and the data for a round-robin distributed table is
distributed evenly across all the distributions. As data gets loaded, each row is simply
sent to the next distribution to balance the data across distributions. Usually, common
dimension tables or staging tables or tables that don’t distribute data evenly are good
candidates for round-robin distributed tables. A hash distributed table’s data gets
distributed across multiple distributions and eventually gets processed by multiple
compute nodes in parallel across all the compute nodes. Fact tables or large tables are
good candidates for hash distributed tables. You select one of the columns from the
20
Chapter 1 the Data Lakehouse paraDigm
table to use as the distribution key column when creating a hash distributed table, and
then Dedicated SQL Pools automatically distribute the rows across all 60 distributions
based on distribution key column value. Replicated tables eliminate the need to transfer
data across compute nodes by replicating a full copy of the data of the specified table
to each compute node. The best candidates for replicated tables are tables with sizes
less than 2 GB compressed and small dimension tables. For more details on getting
a better understanding of this section from a deeper technical perspective, check
out my MSSQLTips article on this topic that can be found here: www.mssqltips.com/
sqlservertip/4889/design-and-manage-azure-sql-data-warehouse/
Relational Database
Interestingly, relational databases still play a role in modern Data Lakehouse architectures
because of their reliability and vast battle-hardened feature set from decades of iterations
and product releases. Since the Lakehouse is still in its infancy and many fully native
Lakehouse technologies are still evolving and growing, organizations prefer to have a hybrid
architectural model where data can be served in both the Lakehouse and a traditional
relational database. This approach solves for learning curve issues for getting ramped up
with the Lakehouse’s capability along with the connectivity limitations for existing BI tools to
seamlessly connect to parquet and/or delta format data in the Lakehouse. There are various
options for relational databases on the Azure platform including SQL options, MariaDB,
PostgresSQL, and MySQL, so the database of choice will always be dependent on a variety
of factors driven by the particular need and use case. This section will focus on Azure SQL
database, specifically its purchasing models, service tiers, and deployment models.
Azure SQL database is a cloud-computing database service (database as a service),
which is offered by Microsoft Azure Platform which helps to host and use a relational
SQL database in the cloud without requiring any hardware or software installation. We
will be using the standard Azure SQL database in many of the chapters in this book as a
source, sink, and metadata-driven control database.
21
Chapter 1 the Data Lakehouse paraDigm
A DTU unit is a combination of CPU, memory, and read and write operations
and can be increased when more power is needed. It is a great solution if you have a
preconfigured resource configuration where the consumption of resources is balanced
across CPU, memory, and IO. You can increase the number of DTUs reserved once you
reach a limit on the allocated resources and experience throttling, which translates into
slower performance or timeouts.
The disadvantage of DTUs is that you don’t have the flexibility to scale only a specific
resource type, like the memory or the CPU. Because of this, you can end up paying
additional resources without needing or using them.
The vCore model allows you to scale each resource (CPU, memory, IO)
independently. You could scale the storage space up and down the database based on
how many GB of storage is needed, and you can also scale the number of cores (vCores).
The disadvantage is that you can’t control the size of memory independently. Also,
it is important to note that vCore serverless compute resources are twice the price of
provisioned compute resources, so a constant high load would cost more in serverless
than it would in provisioned. vCores can use the SQL Server licenses that you have from
your on-premises environment.
Service Tiers
There are three available service tiers for Azure SQL database:
22
Chapter 1 the Data Lakehouse paraDigm
Deployment Models
There are two available deployment models for Azure SQL database:
Non-relational Databases
NoSQL databases also play a role in the Lakehouse paradigm due to their flexibility in
handling unstructured data, millisecond response time, and high availability. Cosmos
DB is one such fully managed NoSQL database service, which offers automatic and
instant scalability, data replication, fast multi-region reads/writes, and open source APIs
for MongoDB and Cassandra. With Cosmos DB’s Azure Synapse Link, users can get near
real-time insights on data stored in a transactional system. Azure Synapse Link for Azure
Cosmos DB is a cloud-native hybrid transactional and analytical processing (HTAP)
capability that allows users to run near real-time analytics over operational data in Azure
Cosmos DB. Data engineers, business analysts, and data scientists now have the ability
to use Spark or SQL Pools to get near real-time insights into their data without impacting
the performance of their transactional workloads in Cosmos DB.
There are numerous advantages to Azure Synapse Link for Azure Cosmos DB
including reduced complexity since a near real-time analytical store either reduces
or eliminates the need for complex E-T-L or change feed job processes. Additionally,
there will be little to no impact on operational workloads since the analytical workloads
are rendered independently of the transactional workloads and do not consume the
provisioned operational throughput. Additionally, it is optimized for large-scale analytics
workloads by leveraging the power of Serverless Spark and SQL Pools which makes
it cost effective due to the highly elastic Azure Synapse Analytics compute engines.
With a column-oriented analytical store for workloads on operational data including
aggregations and more, along with decoupled performance for analytical workloads,
23
Chapter 1 the Data Lakehouse paraDigm
Azure Synapse Link for Azure Cosmos DB enables and empowers self-service, near real-
time insights on transactional data. Figure 1-7 illustrates the role Azure Synapse Link plays
between a transactional Cosmos DB and the analytical Synapse Analytics workspace.
Figure 1-7. Azure Synapse Link connected to Cosmos DB and Synapse Analytics
When deciding when to choose between a SQL and NoSQL database for your data
solution, ensure that you take the following comparison factors into consideration, as
shown in Table 1-2.
Definition sQL databases are primarily called NosQL databases are primarily called as
rDBms or relational databases non-relational or distributed database
Design for traditional rDBms uses sQL syntax NosQL database systems consist of various
and queries to analyze and get the kinds of database technologies. these
data for further insights. they are databases were developed in response to the
used for oLap systems demands presented for the development of
the modern application
Query structured query language (sQL) No declarative query language
language
(continued)
24
Chapter 1 the Data Lakehouse paraDigm
SQL NOSQL
type sQL databases are table-based NosQL databases can be document based,
databases key-value pairs, graph databases
schema sQL databases have a predefined NosQL databases use dynamic schema for
schema unstructured data
ability to scale sQL databases are vertically NosQL databases are horizontally scalable
scalable
examples oracle, postgres, and ms-sQL mongoDB, redis, Neo4j, Cassandra, hbase
Best suited for an ideal choice for the complex it is not a good fit for complex queries
query-intensive environment
hierarchical sQL databases are not suitable for more suitable for the hierarchical data store
data storage hierarchical data storage as it supports key-value pair methods
Variations one type with minor variations many different types which include key-
value stores, document databases, and
graph databases
Development it was developed in the 1970s to Developed in the late 2000s to overcome
year deal with issues with flat file storage issues and limitations of sQL databases
Consistency it should be configured for strong it depends on DBms as some offers strong
consistency consistency like mongoDB, whereas
others offer only eventual consistency, like
Cassandra
Best used for rDBms database is the right option NosQL is best used for solving data
for solving aCiD problems availability problems
importance it should be used when data validity use when it’s more important to have fast
is super important data than correct data
Best option When you need to support dynamic use when you need to scale based on
queries changing requirements
aCiD vs. Base aCiD (atomicity, consistency, Base (basically available, soft state,
model isolation, and durability) is a eventually consistent) is a model of many
standard for rDBms NosQL systems
25
Chapter 1 the Data Lakehouse paraDigm
• SQL: Provides capabilities for data users who are comfortable with
SQL queries. Even though the data is stored in JSON format, it can
easily be queried by using SQL-like queries.
Snowflake
Snowflake is a modern cloud data warehouse platform which integrates well with the
Azure platform and does not require dedicated resources for setup, maintenance, and
support. Snowflake’s architecture, shown in Figure 1-8, runs on cloud infrastructure
and uses a central data repository for persisted data that is accessible from all compute
nodes in the platform which is beneficial for data management, known as shared disk
architecture. Snowflake also processes queries using MPP compute clusters where
each node in the cluster stores a portion of the entire dataset locally, also known as
shared nothing architecture, which is beneficial for performance management. Multiple
compute clusters are grouped together to form virtual warehouses. Multiple virtual
warehouses can be created to access the same storage layer without needing multiple
copies of the data in each warehouse. These virtual warehouses can be scaled up or
down with minimal downtime or impact to storage. Snowflake also provides a variety
of services including infrastructure, security and metadata management, optimizers,
and robust authentication and access control methods. These services manage the
storage and compute layers and ensure security and high availability of the platform.
Snowflake’s architecture also includes and supports zero-copy cloning, time travel,
and data sharing. With zero-copy cloning, the CLONE command allows users to create
26
Chapter 1 the Data Lakehouse paraDigm
copies of their schemas, databases, and tables without copying the underlying data while
having access to close to real-time production data in various environments. With time
travel, historical data that has been changed or deleted can be retrieved within a defined
time period. With data sharing, Snowflake producers and consumers can easily share
and consume data from a variety of unique avenues without having to physically copy or
move the data. Snowflake costs are based on usage; therefore, users only pay for storage
and compute resources that are used.
27
Chapter 1 the Data Lakehouse paraDigm
Snowflake provides a number of capabilities including the ability to scale storage and
compute independently, data sharing through a Data Marketplace, seamless integration
with custom-built applications, batch and streaming ELT capabilities, complex data
manipulation functions and features, support for a variety of file formats, and more.
Snowflake provides a variety of connectivity options including command-line clients
such as Snow SQL, ODBC/JDBC drivers, Python/Spark connectors, and a list of third-
party connectors. These capabilities make Snowflake a strong complementary choice
during the design and implementation of your Lakehouse because it offers unique
capabilities which include reasonably priced big data storage and compute resources,
multiple secure data sharing offerings, and secure and protected integration with Azure
services, along with data encryption and compression capabilities.
Consumption
Storing and serving data is a pivotal step in the end-to-end life cycle from sourcing
data to consuming it. It is within the storage and serving layer where the cleansed,
transformed, and curated data is made available for consumption by the end users.
Consumption can come in many different variations such as through Azure-native
BI tools including Power BI, Analysis Services, Power Platform, etc. Data within the
Lakehouse can also be consumed by a variety of non-Azure BI tools such as Tableau,
Informatica, and more. A good Lakehouse serving layer offers flexibility for consuming
the data from a variety of BI tools. In this section, our efforts will focus on understanding
the capabilities of some of these Lakehouse consumption options within Azure.
Analysis Services
Azure Analysis Services (AAS) is a fully managed platform as a service (PaaS) that
provides enterprise-grade data models in the cloud. With Analysis Services, users can
create secured KPIs, data models, metrics, and more from multiple data sources with a
tabular semantic data model. Analysis Services integrates well with a variety of Azure
services and can connect to Synapse Analytics Dedicated Pools. Figure 1-9 shows a
sample AAS workspace which can be accessed through Visual Studio (VS).
28
Discovering Diverse Content Through
Random Scribd Documents
“He that receives them is called universally a sojourning proselyte.”
And a little lower down it says plainly
ויש לו חלק, כל המקכל שבע מצוות ונזהר לעשותן הרי זה מחסידי אומות העולם
לעולם הבא ׃
“Whosoever receives the seven commandments, and is careful to
observe them, he is one of the pious of the nations of the world, and
has a share in the world to come.” (Hilchoth Melachim, c. viii. 10.)
From these two declarations, then, we learn that “the pious of the
nations of the world” are the same, as “the sojourning proselytes,”
who were allowed to reside in the land of Israel, and that their piety
consisted in receiving and practising the seven commandments.
What these commandments were, we are informed in the next
chapter of the same treatise.
ועל שפיכת, ועל ברכת השם, על ע׳׳ז, על ששה דברים נצטוה אדם הראשון
אף על פי שכולן הן קבלה בידנו, ועל הדינים, ועל הגזל, ועל גלוי עריות, דמים
, מכלל דברי תורה יראה שעל אלה נצטוה, והדעת נוטה להן, ממשה רבינו
נמצאו שבע, הוסיף לנח אבר מן החי שנאמר אך בשר בנפשו דמו לא תאכלו
וכן היה הדבר בכל העולם עד אברהם ׃, מצוות
“The first Adam was commanded concerning six things—idolatry,
blasphemy, shedding of blood, incest, robbery, and administration of
justice. Although we have all these things as a tradition from Moses,
our master, and reason naturally inclines to them, yet, from the
general tenour of the words of the law, it appears that he was
commanded concerning these things. Noah received an additional
command concerning the limb of a living animal, as it is said, ‘But
flesh in the life thereof, which is the blood thereof, ye shall not eat.’
(Gen. ix. 4.) Here are the seven commandments, and thus the
matter was in all the world until Abraham.” (Ibid. ix. 1.)
Now, without stopping to dispute about the command given to Noah,
we cannot help saying that the above tradition is very defective, and
certainly not derived from Moses, for it is opposed to the history
which he himself has given us. In the first place, that command, on
which, the oral law lays such stress, “Be fruitful and multiply,” was
originally given to Adam (Gen. i. 28,), and was renewed to Noah,
after the deluge. If the Rabbies reckon this as a separate command
in the case of the Jews, as may be seen in the Hilchoth Priah
Ureviah, it is only fair to reckon it as a separate command in the
case of the Gentiles, and thus we get an eighth command. In the
second place, God ordained marriage as a holy state. “The Lord God
said, It is not good that man should be alone; I will make him an help
meet for him.” “And the rib which the Lord God had taken from man
made he a woman, and brought her unto the man.” Here is God’s
holy institution, and in the following verses we have the obligations of
marriage distinctly acknowledged. “And Adam said, This is now bone
of my bones, and flesh of my flesh; she shall be called Woman,
because she was taken out of man. Therefore shall a man leave his
father and his mother, and shall cleave unto his wife, and they shall
be one flesh.” Here, then, is a ninth commandment. We know,
indeed, that the oral law gives a different account, but its doctrine is
false and pernicious. In the face of the above plain narrative, it
teaches as follows:—
קודם מתן תורה היה אדם פוגע אשה בשוק אם רצה הוא והיא לישא אותה
מכניסה למוך ביתו ובועלה בינו לבין עצמו ומהיה לו לאשה ׃
“Before the giving of the law, a man might happen to meet a woman
in the street; if they both agreed on marriage, he took her to his
house, and cohabited with her, and she became his wife.” (Hilchoth
Ishuth, c. i. 1.) Now, not to speak of profane history, there is not in
the law of Moses a single passage to give colour to this statement,
unless it be the following:—“And it came to pass, when men began
to multiply on the face of the earth, and daughters were born unto
them, that the sons of God saw the daughters of men that they were
fair; and they took them wives of all which they chose.” But,
whatever is meant by “Sons of God,” it is plain that this conduct is
mentioned, not as having the sanction or approval of God, but as a
proof of antediluvian wickedness, for it is immediately added, “And
the Lord said, My Spirit shall not always strive with man, for that he
also is flesh.” But it is not simply an error of judgment, it is most
pernicious as it regards both Gentiles and Jews, for it completely
annuls the sanctity and obligation of the marriage tie. It teaches that
as the marriage of Noahites is contracted without solemn espousals,
so it may be dissolved without the formality of a divorce.
, ומאימתי תהיה אשת חברו כגרושה שלנו ? משיוציאנה מביתו וישלחנה לעצמה
ואין הדבר תלוי, שאין להן גירושין בכתב, או שתצא היא מתחת רשותו ותלך לה
אלא כל זמן שירצה הוא או היא לפרוש זה מזה פורשין ׃, בו לבד
“When is his (the Noahite’s) neighbour’s wife to be considered in the
same light, as a divorced woman with us?
From the time that he sends her forth from his house, and leaves her
to herself. Or from the time that she goes forth from under his power,
and goes her way; for they have no divorces in writing, neither does
the matter depend upon that alone;[15] but whenever he or she
please to separate one from the other, they separate.” (Hilchoth
Melachim, c. ix. 8.) We Gentiles have great reason to be thankful
that Jesus of Nazareth has taught us a different doctrine, according
with the original institution of marriage. What would have been the
state of the world, if the oral law had attained supreme power, and
the Gentiles had been instructed in the above law as Divine? What
would result from the doctrine that every man may turn out his wife,
and every woman leave her husband, whenever they like? The
peace and well-being of Gentile society would be at an end. The
frightful state of disorder and misery that would ensue, as well as the
words of the original institution, plainly show that this doctrine is not
from God. But the effect upon the believers in the oral law is still
worse. With reference to them, the marriage of Gentiles is no
marriage at all. The oral law says distinctly—
אין אישות לגוים.
“There is no matrimony to the Gentiles.” (Hilchoth Melachim, viii. 3.)
And again,
אין אישות אלא לישראל או לגוים על הגוים אבל לא לעבדים על עבדים ולא
לעבדים על ישראל ׃
“There is no matrimony except to Israel, or to Gentiles with respect
to Gentiles; but not to slaves with respect to slaves, nor to slaves
with respect to Israel.” (Hilchoth Issure Biah, c. xiv. 19.) Here, then,
the oral law directly makes void the law of God, and pronounces that
a command given to Adam in Paradise, and therefore equally
binding on all his descendants, is in particular cases of no force at
all. The oral law, therefore, is certainly not from God.
We have already made out nine commandments; in sacrifice we find
a tenth. Cain and Abel brought sacrifices, and the only reason that
can be assigned is, that they had received a command to that effect.
Sacrifice was either a Divine command or the dictate of their own
reason. But it was not the dictate of reason, for reason says, that the
Creator of all things has no need of gifts, and, least of all, such gifts
as imply the slaughter of an innocent animal. It must, therefore, have
been of Divine command. The reason why the Rabbies excluded this
command is plain. They did not choose that there should be
acceptable sacrifices offered anywhere but amongst themselves. But
that this doctrine is altogether of a recent date is plain. It was not
known to Job. He says not a word about the seven commandments,
and he was in the habit of offering sacrifices. “And it was so when
the day of their feasting was gone about, that Job sent and sanctified
them, and rose up early in the morning, and offered burnt-offerings
according to the number of them all.” (Job i. 5.) And the Lord himself
expressly commanded Job’s friends to do so likewise. “And it was
so, that after the Lord had spoken these words unto Job, the Lord
said to Eliphaz the Temanite, My wrath is kindled against thee, and
against thy two friends.... Therefore, take unto you now seven
bullocks and seven rams, and go to my servant Job, and offer up for
yourselves a burnt-offering, and my servant Job shall pray for you,
for him will I accept.” (Job xlii. 7, 8.) It was not known to Elisha.
When Naaman said, “Shall there not then, I pray thee, be given to
thy servant two mules’ burden of earth? For thy servant will
henceforth offer neither burnt-offering nor sacrifice unto other gods,
but unto the Lord.” (2 Kings v. 17.) Elisha made no objection. He did
not tell him that he had only seven commandments to attend to.
Neither had Isaiah any idea that, when Judaism triumphed, the
whole world was to be compelled to adhere to the seven
commandments, for he plainly predicts the contrary. “And the Lord
shall be known to Egypt, and the Egyptians shall know the Lord in
that day, and shall do sacrifice and oblation: yea, they shall vow a
vow unto the Lord and perform it.” (Isaiah xix. 21.) Here again, then,
the oral law contradicts the Word of God.
But the law of God points out to us an eleventh commandment, in
the distinction between clean and unclean animals. The Lord
commanded Noah to take of the former by sevens and of the latter
by pairs. (Gen. vii. 2.) And when Noah came forth from the ark “he
builded an altar unto the Lord; and took of every clean beast, and of
every clean fowl, and offered burnt-offerings on the altar.” (Gen. viii.
2.) It is plain, from the command, that a greater number of clean than
unclean animals was required. Noah’s conduct shows that the rite of
sacrifice was the cause of the requirement. We have a twelfth
commandment in the appointment of a priesthood. “Melchizedek was
the priest of the Most High God,” (Gen. xiv. 10,) which he most
certainly could not have been, if he had not been Divinely appointed.
From the law itself, then, we have made out twelve distinct
commandments. Eight would have been sufficient to overthrow the
oral tradition. But we appeal to the common sense of every
Talmudist. We ask him to look over the meagre list of the seven
commandments, in which neither love to God nor man is included,
and to tell us whether it be at all probable that “the God of the spirits
of all flesh” would leave all mankind, excepting the small company of
Rabbinists, without any better rule for time, and any better guide to
eternity? Is it possible that the God of love and mercy should leave
the minority of his reasonable creatures in doubt as to his love, and
tell them that he requires no love from them? Yet this is what the oral
law says. The Gentiles are, according to it, left without any direction
as to the worship of God, and are pronounced guilty of death if they
study the law. Nay, they are expressly told that God does not require
them to glorify him by their obedience.
אפילו נאנס לעבוד, מותר לו לעבור, בן נח שאנסו אנס לעבור על אחת ממצוותיו
לפי שאינן מצווין על קדוש השם ׃, ע׳׳ז עובד
“A Noahite who is forced to transgress one of his commandments, it
is lawful for him to do so. Even if he be compelled to commit idolatry
he may commit it, for they are not commanded to sanctify God.”
(Hilchoth Melachim, c. x. 2.) So that, according to the Rabbies, the
Noahite who is compelled to commit murder, adultery, or even to
deny his God, may do it with impunity; he still belongs “to the pious
of the nations of the world,” and may have a share in the world to
come. We confess that we cannot see in this doctrine either charity
or toleration. We can discover only that narrowness of heart which
characterizes the oral law. In order to magnify themselves, and
depreciate the other nations, the Rabbies first swell out their own
commandments to 613, and reduce the commandments of the
nations to seven. But not content with that, they also strive to confine
the glories of martyrdom to themselves, and tell the Gentiles that
God does not require them to sanctify His name. Can such doctrine
come from God? Is God the God of the Rabbinists only? We grant
that the Jews are his “peculiar people.” We acknowledge that “they
have much advantage every way”—that “they are beloved for the
fathers’ sakes”—that the time is coming when “all that see them shall
acknowledge them that they are the seed whom the Lord has
blessed.” But we still think that God’s heart is large enough to
comprehend us Gentiles too in his love. We know that we are the
work of His hand, and we trust that, as He is our Father, he requires,
and is pleased to see even in Gentiles, the feelings of children, love
and filial fear. And we found this our faith on your Scriptures as well
as ours. The Word of God tells us that, long before there were any
Rabbies in the world, He had a gracious and tender care for all
mankind. He promised to our first parents a Saviour who should
“bruise the serpent’s head.” He saved Noah and his family, not one
of whom was a Rabbi, from the deluge; and when they came forth
from the ark, He made a gracious covenant not with one nation only,
but “with all flesh,” and hung up on high a lovely and glittering arch,
from one end of the heavens to the other, that all the habitants of
earth might have a token of their Father’s love and learn to look up
to Him with humble confidence. When he chose Abraham and his
seed, it was not an act of partiality, but that in his seed all the
families of the earth might be blessed. He did not leave himself
without witness to the nations. He manifested himself to Job, and
taught him “that his Redeemer liveth,” and moved even the prophets
of Israel to predict again and again the happy times when, “from the
rising of the sun to the going down of the same, His name should be
great among the Gentiles, and in every place incense should be
offered to his name, and a pure offering; for my name shall be great
among the heathen, saith the Lord of hosts.” (Mal. i. 11.) Having this
word, we reject the oral law which contradicts it, and would make
God the God of the Rabbinists only: and we believe in the New
Testament, which exactly agrees with your written law, and asks, “Is
he the God of the Jews only? Is he not also of the Gentiles?”—and
answers, “Yes, of the Gentiles also” (Rom. iii. 29)—and which also
declares that, in the sight of God, “There is no difference between
the Jew and the Greek; for the same Lord over all is rich unto all that
call upon him, for whosoever shall call upon the name of the Lord
shall be saved.” (Rom. x. 12, 13.)
In the fixing of the commandments, then, for the sons of Noah, we
have detected an intolerant and uncharitable spirit very different from
that of the Old and New Testament. But we have further to inquire,
what was the extent of toleration conceded to them? We do not stop
to prove that they were not allowed to possess land, nor to be
judges, nor members of the Sanhedrin, nor to hold any office, nor to
intermarry with the Jews. From all that, they were excluded by the
law of God himself. They were allowed to sojourn in the land, and
hence their name “sojourning proselytes.” Further, “They were to be
treated with the same courtesy and benevolence as the Israelites.”
(See No. 4, p. 26.) But further than this the toleration did not extend.
The oral law, though it commands “courtesy and benevolence,” does
not administer even-handed justice to the “pious of the nations of the
world,” as may be seen from the following specimens:—
ישראל שהרג בשגגה את העבד או את גר תושב גולה.
וכן גר תושב שהרג את גר תושב או את העבד בשגגה גולה.
גר תושב שהרג את ישראל בשגגה אף על פי שהיה שוגג הרי זה נהרג.
“An Israelite who unintentionally kills a slave, or a sojourning
proselyte, is imprisoned (in one of the cities of refuge).”
“And so a sojourning proselyte who unintentionally kills a sojourning
proselyte, or a slave, is imprisoned.”
“A sojourning proselyte who unintentionally kills an Israelite, although
he did it unintentionally, is to be put to death.” (Hilchoth Rotzeach, c.
v. 3.) The written law, on the contrary, says, “These six cities shall be
a refuge, both for the children of Israel and for the stranger, and for
the sojourner among them: that any one that killeth any person
unawares may flee thither.” (Numbers xxxv. 15.) Again, the oral law
says—
שנאמר וכי יזיד איש על רעהו ׃, ישראל שהרג גר תושב אינו נהרג עליו בבית דין
“An Israelite who kills a sojourning proselyte, is not put to death on
his account by the tribunal, for it is said, ‘But if a man come
presumptuously upon his neighbour.’ (Exodus xxi. 14.)” The law of
God says, “Whoso sheddeth man’s blood, by man shall his blood be
shed: for in the image of God made he man.” (Gen. ix. 6.) And to this
law the New Testament commands us Christians to adhere, rejecting
the oral traditions; and in consequence the laws of Christian
countries make no difference between the murderer of a Jew, a
Christian, Turk, Infidel, or Heretic. Short as all Christian nations
confessedly come of the pure morality of the New Testament, their
laws direct the administration of impartial justice, and are a terror to
all evil doers of every creed and sect. The liberality of the Talmud
then, in allowing a share of salvation to the pious of the world is not
so very great, nor its toleration of a very comprehensive character. It
not only withholds justice from the pious of the world, but gives as
the reason, because they are not considered as neighbours. Want of
room prevents us from pursuing this subject further at present. We
therefore ask, Is this law from God? Can God, in an oral law, directly
contradict his written law? Can you point out anything similar in the
New Testament? Is this law just or unjust? You will grant that it is
unjust and erroneous. Then your fathers have been mistaken about
one of the first principles of the administration of justice, for many
centuries. And your brethren who adhere to this system as Divine, as
on the Barbary coast, for instance, are still mistaken. Why do you not
protest aloud against such error? Why not endeavour to convince
your brethren that they are wrong? In England there is nothing to
prevent you. There is full liberty, free toleration. You may lift up your
voice like a trumpet against the errors of the Talmud. You may
expunge all acknowledgment of its authority from your prayers—you
may return to Moses and the prophets, and no man will say nay.
No. IX.
CHRISTIANS CANNOT BE RECKONED AMONGST
THE “PIOUS OF THE NATIONS OF THE WORLD.”
We said, in our last number, that “the pious of the nations of the
world” are, according to the oral law, those who have received the
seven commandments of the sons of Noah. We said that of the laws
laid down for their own conduct, some, as for instance that
respecting divorces, are such as would introduce confusion and
misery into Gentile society—and that others, referring to the
administration of justice by Rabbinical tribunals, are extremely
unjust. But the advocates of the oral law think, nevertheless, that it is
very tolerant, more tolerant than the New Testament, because it says
that “the pious of the nations of the world have a share in the world
to come.” Now we cannot help feeling a curiosity to know how great
or how small that share will be. And this our curiosity is excited by
the following information, which the oral law commands to be
communicated to a Gentile who wishes to turn Jew:—
ומודיעין, וכשם שמודיעין אותו עונשן של מצוות כך מודיעין אותו שכרן של מצוות
ושאין שום צדיק גמור אלא בעל, אותו שבעשית מצוות אלו יזכה לחיי העולם הבא
החכמה שעושה ויודען ׃ ואומרים לו הוי יודע שהעולם הבא אינו צפון אלא לצדיקים
וזה שתראה ישראל בצער בעולם הזה טובה היא צפונה להם שאין, והם ישראל
שמא ירום לבם ויתעו ויפסידו שכר, יכולין לקבל רוב טובה בעולם הזה כאומות
העולם הבא כענין שנאמר וישמן ישורון ויבעט ׃ ואין הקדוש ברוך הוא מביא עליהן
רוב פורענות כדי שלא יאבדו אלא כל האומות כלין והן עומדין וכו׳ ׃
“As they are to make known to him the punishments attached to the
commandments, so they are also to inform him of the rewards for
keeping them. They should inform him, that, by the doing of these
commandments, he will be worthy of everlasting life; and that there
is no perfectly righteous man, except that possessor of wisdom who
does and knows them. And they are to say to him, Be assured that
the world to come is laid up for none but the righteous, and they are
Israel; and as to this that thou seest Israel in trouble in this world,
their good things are laid up for them, for they cannot receive an
abundance of good things in this world, like the nations. Their heart
might, perchance, be lifted up, and they might go astray, and lose
the reward of the world to come, as it is said, ‘Jeshurun waxed fat
and kicked.’ The Holy One, blessed be he, brings upon them the
abundance of afflictions for no other reason than this, that they may
not be lost. All the nations shall be utterly destroyed, but they shall
abide.” (Hilchoth Issure Biah., c. xiv. 3-5.) To us this sounds very
much like a flat contradiction to the above declaration, that “the pious
of the nations of the world have a share in the world to come.” Here,
on the contrary, it is stated that the blessings of that state are
reserved “for none but the righteous, and they are Israel;” and again,
“All the nations snail be utterly destroyed.” And it is even implied that
the nations get their good things in this world, and do not suffer
affliction, as they are not to have that blessedness, which is reserved
for the righteous. How, then, are we to reconcile these two sayings?
There are only two ways which occur to us, either by saying that this
is not strictly true, but only a fair speech in order to catch proselytes;
or, if it be strictly true, that then “the pious of the world” are to have a
much smaller share in the blessedness to come. In any case the
spirit is far from charitable or tolerant. It represents God as an
accepter of persons, saving Israelites simply because they are
Israelites, and destroying the other nations because they are not
Israelites. The New Testament representation is very different, and
far more worthy of “the Judge of all the earth.” It does indeed say,
“He that believeth shall be saved, and he that believeth not shall be
damned.” But in this very declaration, we have an impartial rule
applied to all mankind. “He that believeth,” of whatsoever nation,
kindred, or tongue—Jew or Gentile, white or black—“shall be saved.”
“He that believeth not,” whether he be called a Jew or a Christian,
whether he be a son of Japhet, of Shem, or of Ham, “shall be
damned.” The New Testament asserts no monopoly of salvation for
one favoured family. It excludes none because he had not the
happiness to be descended from a privileged stock. It lays down a
general and impartial rule to be applied to all the children of men.
The oral law says,
כל ישראל יש להם חלק לעולם הבא ׃
“All Israel has a share in the world to come.” The New Testament
says, “Not every one that saith unto me, Lord, Lord, shall enter the
kingdom of heaven, but he that doeth the will of my Father which is
in heaven.” (Matt. vii. 21.) The oral law says, “The world to come is
laid up for none but the righteous, and they are Israel.” The New
Testament says, “God is no respecter of persons; but in every nation
he that feareth him, and worketh righteousness, is accepted with
him.” (Acts x. 34, 35.) Now then we appeal to the good sense of
every Jew, even of the Talmudists to tell us which of these two
statements is most just, impartial, and worthy of the Just Judge?
But the reasoning employed in the above extract from the oral law, is
as false as the principles which it is intended to support, when it
says, “As to this that thou seest Israel in trouble in this world, their
good things are laid up for them, for they cannot receive an
abundance of good things in this world like the nations,” it directly
contradicts the law of Moses, which everywhere promises an
abundance of temporal blessings to Israel, if obedient. “It shall come
to pass, if thou shalt hearken diligently unto the voice of the Lord thy
God, to observe and to do all the commandments which I command
thee this day, that the Lord thy God will set thee on high above all
nations of the earth, and all these blessings shall come upon thee,
and overtake thee, if thou shalt hearken unto the voice of the Lord
thy God. Blessed shalt thou be in the city, and blessed shalt thou be
in the field. Blessed shall be the fruit of thy body, and the fruit of thy
ground, and the fruit of thy cattle, the increase of thy kine, and the
flocks of thy sheep.... The Lord shall cause thine enemies that rise
up against thee to be smitten before thy face; they shall come out
against thee one way, and flee before thee seven ways. The Lord
shall command the blessing upon thee in thy store-houses, and in all
that thou settest thine hand unto; and he shall bless thee in the land
which the Lord thy God giveth thee.” (Deut. xxviii. 1-8, &c.) Here,
then, is temporal blessing in abundance, promised to obedience; and
the afflictions which have come upon Israel are not because of their
piety, but because of their disobedience. In this case, then, the oral
law speaks utter falsehood. God has not two ways of dealing with
nations, but one way. He gives every nation a fair trial, and if they
refuse to hearken to his voice, he pours out upon them his wrath.
The rise, and growth, and trial, of a nation is slower, and requires
more time than the growth and trial of individual men. The life of a
nation is, so to speak, longer than the life of a man. Centuries are
required as the time of a nation’s trial, but all history, sacred and
profane, testifies the truth of the general rule given in the Old
Testament, “Righteousness exalteth a nation, but sin is a reproach to
any people.” The only difference which God makes between Israel
and the other nations, is with regard to their national existence in this
world. He has crumbled the mighty empires of Assyria, Babylon,
Greece, and Rome into dust, but he still preserves the independent
existence of the family of Abraham, according to his covenant; and
when, as a nation, they repent and return to him, He will remove the
rod of his anger, and give them the temporal prosperity which He has
promised by the mouth of Moses his servant. But this promise of
temporal blessing will not justify any impenitent Jew at the tribunal of
God’s judgment. The hopes held out by the oral law are utterly
fallacious, and dishonouring to God, inasmuch as he is represented
as unduly favouring one nation, and unjustly condemning all others.
An advocate of the oral law may, however, find out some other way
of evading the evident intolerance of the above statement, and still
insist upon it, that as the Talmud says, “The pious of the nations of
the world have a share in the world to come,” it is a very tolerant
book. We therefore proceed to inquire what pains the Rabbies have
taken to add to the number of those who are to be saved. They
believe, as we are told, that every one, who receives and observes
the seven commandments of the sons of Noah, will be saved; they
believe that all others must be lost; have they then taken any pains
to make known this important information to the world? Or, if that
was not to be expected during the captivity, did they during the days
of their power and dominion? Or, at least, did they offer every facility
to those Gentiles who might come to renounce idolatry, to receive
the necessary instruction? Did they command all their disciples to be
ready day and night to open their doors at the knock of the penitent
idolater, and by receiving rescue him from everlasting destruction?
Not one of all these things. They commanded that, when there was
no jubilee, such converts should be refused, and that if they did not
choose to be circumcised and observe the whole Mosaic law, they
should be left to perish.
אי זה הוא גר תושב זה גוי שקבל עליו שלא יעבוד עכו׳׳ם עם שאר המצוות שנצטוו
וממה, בני נח ולא מל ולא טבל הרי זה מקבליו אותי והוא מחסידי אומות העולם
נקרא שמו תושב לפי שמותר לנו להושיבו בינינו בארץ ישראל כמו שבארנו
ואין מקבלין גר תושב אלא בזמן שהיובל נוהג ׃, בחלכות עכו׳׳ם
“What is meant by a sojourning proselyte? Such an one is a Gentile,
who has taken upon himself not to commit idolatry, together with the
remaining commandments given to the sons of Noah, but is not
circumcised nor baptized. Such an one is received, and is of the
pious of the nations of the world. And why is he called a sojourner?
Because it is lawful for us to let him dwell amongst us in the land of
Israel, as we have explained in the laws concerning idolatry. But a
sojourning Proselyte is not received WHEN THE JUBILEE CANNOT BE
OBSERVED.” (Hilchoth Issure Biah., c. xiv. 7, 8.) At all other times the
unfortunate heathen might perish, if they did not choose to become
Jews altogether. Now what will be thought of the charity of this law if
we add, that there has been no jubilee, and consequently no pious
amongst the nations for two thousand seven hundred years and
more? Yet this is what the oral law tells us.
משגלו שבט ראובן ושבט גד וחצי שבט מנשה בטלו היובלות שנאמר וקראתם
והוא שלא יהיו מעורבבין שבט, בזמן שכל יושביה עליה, דרור בארץ לכל יושביה
בשבט אלא כולן יושבים כתקונן ׃
“Since the time that the tribe of Reuben, and the tribe of Gad and the
half-tribe of Manasseh were led away captive, the jubilees have
ceased, for it is said, ‘And ye shall proclaim liberty throughout the
land unto all the inhabitants thereof’ (Lev. xxv. 10); that means, when
all its inhabitants are upon it, and, moreover, when the tribes are not
mixed one with another, but all dwelling according as they were
appointed.” (Hilchoth Shemitah, c. x. 8.) We have the account of this
captivity in the following words, “In those days the Lord began to cut
Israel short: and Hazael smote them in all the coasts of Israel: from
Jordan eastward, all the land of Gilead, the Gadites, and the
Reubenites, and the Manassites.” (2 Kings x. 32, 33.) That was,
according to the common chronology about 884 years before the
Christian era. If to this we add 1836, we have 2720 years since the
time that there could be a jubilee, and consequently 2720 years
since any Gentiles were converted from the errors of idolatry to the
religion of the sons of Noah. What is it then but solemn mockery, in
any one acquainted with the oral law, to tell us that the Talmud is
tolerant, and admits “that the pious of the nations of the world may
be saved;” when according to that same book seven-and-twenty
centuries have elapsed, since any such converts were received? We
believe that those who make this defence are unacquainted with the
principles of the system which they undertake to defend. The truth is,
that the authors of the oral law, finding that they could not altogether
deny salvation to the pious of other nations, were determined not to
add to their number, and therefore limited the possibility of this mode
of conversion to times that had elapsed long before they were born.
But in their own times they would not receive any one who was not
willing to be circumcised and to receive the whole law. And hence we
see how exactly the New Testament represents the state of the case,
when Christianity was first propagated amongst the Gentiles, and
free salvation was proclaimed to all who believed, without becoming
Jewish proselytes. The Rabbinists opposed with all their might. “And
certain men which came down from Judea taught the brethren and
said, Except ye be circumcised after the manner of Moses, ye
cannot be saved.” And again, “There rose up certain of the sect of
the Pharisees which believed, saying that it was needful to
circumcise them, and to command them to keep the law of Moses.”
(Acts xv. 1-5.) There was no year of jubilee, and therefore
renunciation of idolatry was not sufficient in the eyes of these
traditionists, who believed that at such a time there was no salvation
except for those who observed the whole law. But how is it now? If a
Gentile should desire now to become one of the pious of the nations,
could the Jews receive him? According to the above general
principles, certainly not. The tribes are still scattered and mixed up
together. The land has not got “all its inhabitants.” There can be no
jubilee, and therefore those that wish to be saved, must, according to
the oral law, turn Jews, or take their chance of living to a year of
jubilee. But we are not necessitated to argue from the principles. The
thing is expressly laid down in the oral law. After explaining, as we
have quoted above, who are the pious of the world, and that when
the jubilee is possible, is the only time for receiving them, it adds—
אבל בזמן הזה אפילו קבל עליו כל התורה כולה חוץ מדקדוק אחד אין מקבלין
אותו ׃
“But in the present time, though a man should be willing to take upon
him the whole law, with the exception of only one of its least
requirements, he is not to be received.” Now then what becomes of
the boasted toleration of the Talmud? It says, that “the pious of the
nations of the world may be saved.” But it says, first, that such
converts can only be received when the jubilee can be celebrated. It
says, secondly, that this only opportunity has not occurred for the
last 2,700 years; and, lastly, it positively forbids the Jews in the
present time to give the Gentiles a chance of salvation, unless they
are willing to receive the whole law. What use is it then to talk of the
pious of the world, or to say that people of other religions may be
saved? According to the Talmud, there are no pious of the nations,
unless perchance there may be some descendants of those who
were received 2,700 years ago. But all history that we have ever
seen is silent on the subject. We do not know of a single
congregation of Noahites in the whole world. The forefathers of the
Christians were not received during the usage of jubilee. They were
idolaters received against the wishes of the Rabbinists. The Britons
and the Saxons were converted to Christianity long after the final
dispersion of the Jews, that is, at a time when, according to the
Talmud, it was unlawful to add to the pious amongst the nations.
Neither were they received according to the Talmudic condition, in
the presence of three learned Jews.
וצריך לקבל עליו בפני שלשה חברים ׃
“And it is necessary for such an one to take the seven
commandments on him in the presence of three learned men, who
are qualified to be Rabbies.” (Hilchoth Melachim, c. viii. 10.)
According to the oral law, then, there are no such persons now
existing as “the pious of the nations of the world.” It is, therefore, idle
to talk of the liberality with which they would be treated, were they
forthcoming. Thus the only appearance of an argument in favour of
the Talmud vanishes into thin air, and mocks our grasp, as soon as
we endeavour to lay hold of it. Those who caught at this phantom of
charity, no doubt meant it sincerely. They thought that the oral law
was misrepresented. They were told that it was charitable, and they
therefore nobly came forward in its defence. If they had known its
true principles, they would have renounced them. Their advocacy
went on a false supposition. But now that we have set forth the true
bearings of the case, and given them chapter and verse to which
they may refer, and convince themselves, we call upon them to do
so: and then, as they hate intolerance, to join with us in protesting
against it, even though it should be found in that system, which
hitherto they have believed, on the testimony of others, to be Divine.
At the same time we would seriously ask of them to compare this
system, which has been for more than 1,700 years the religion of the
majority of the Jewish nation, with the system laid down in the New
Testament, and to decide which is most agreeable to the character of
God, as revealed in the law and the prophets, and most beneficial to
the world. The oral law says, that God has commanded the heathen
to be left for 2,700 years without the means of instruction, and that
when the days of Israel’s prosperity come, the nations are to be
converted by force; but that even then, they will not be raised to the
rank of brethren, but only be sojourning proselytes. The oral law
looks forward to no reunion of all the sons of Adam into one happy
family. The New Testament has, on the contrary, commanded its
disciples to afford the means of instruction “to every creature.” It
speaks to us Gentiles, who were once regarded as poor outcasts, in
the language of love, and says, “Now, therefore, ye are no more
strangers and foreigners, but fellow-citizens with the saints, and of
the household of God.” (Ephes. ii. 19.) It takes nothing from you. It
asserts your privileges as the peculiar people of God; but it reveals
that great, and to us, most comfortable truth, “That the Gentiles
should be follow-heirs, and of the same body;” and it promises a
happy time, when there shall be one fold and one Shepherd. It does,
indeed, tell us not to forget what we once were, “aliens from the
commonwealth of Israel, and strangers from the covenant of
promise, having no hope, and without God in the world.” (Eph. ii. 12.)
It reminds us that the olive-tree is Jewish, and that you are the
natural branches, and warns us against all boasting. (Rom. xi. 16-
24.) And we desire to remember these admonitions, and to
acknowledge with thankfulness, that all that we have received, is
derived from the Jewish nation. We ask you not to compare the oral
law with any Gentile speculations, or systems, or inventions, but with
doctrines essentially and entirely Jewish. Christianity has effected
great and glorious changes in the world, but we take not the glory to
ourselves. We give it to God, who is the author of all good, and
under Him, to the people of Israel. We ask you, then, to compare
these two Jewish systems, Rabbinism, which has done no good to
the Gentiles, and perpetuated much error amongst the Jews; and
Christianity, which has diffused over the world the knowledge of the
one true God—disseminated the writings of Moses and the prophets,
and increased the happiness of a large portion of mankind. The
comparison may require time, and ought to be conducted with
calmness and seriousness. But we think that, even without instituting
that comparison, you must acknowledge that the principles of the
oral law, discussed in this paper, are contrary to the law of Moses;
and that, therefore, a decided and solemn protest against these
Rabbinical additions, is an immediate and imperative duty.
No. X.
RABBINIC WASHING OF HANDS.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com