Teradata To Snowflake Migration Guide
Teradata To Snowflake Migration Guide
MIGRATION GUIDE
Don’t let your past determine your future
2 Why migrate?
3 Strategy—Thinking about your migration
6 Migrating your existing Teradata warehouse
10 Need help migrating?
11 Appendix A—Migration tools
12 Appendix B—Data type conversion table
13 Appendix C—SQL considerations
CHAMPION GUIDES
WHY
MIGRATE?
Decades ago, Teradata identified the need to 4. Cost: Snowflake allows true pay-as-you-go THE CORE OF SNOWFLAKE
manage and analyze large volumes of data. cloud scalability without the need for complex
Snowflake delivers the performance, concurrency
reconfiguration as your data or workloads grow.
But just as the volume, velocity and variety of and simplicity needed to store and analyze all
data has since changed, the cloud has enabled data available to an organization in one location.
WHY SNOWFLAKE?
what’s possible today with modern data Snowflake’s technology combines the power of data
Snowflake’s innovations break down the technology warehousing, the flexibility of big data platforms,
analytics. For example, by separating compute and architecture barriers that organizations still the elasticity of the cloud and live data sharing at a
from storage, Snowflake has developed a experience with other data warehouse vendors. fraction of the cost of traditional solutions.
modern cloud data warehouse solution that Only Snowflake has achieved all six of the defining
automatically and instantly scales in a way not qualities of a data warehouse built for the cloud, as
displayed in the chart below.
possible with Teradata, whether the current
Teradata system is on-premises or hosted in
the cloud, Snowflake accomplishes this with
its multi-cluster, shared data architecture.
YOUR MOTIVATION TO MIGRATE ZERO MANAGEMENT PAY ONLY FOR WHAT YOU USE
Some of the key reasons enterprises migrate off Snowflake reduces complexity with built-in Snowflake’s built-for-the-cloud solution scales
of Teradata: performance, so there’s no infrastructure to storage separate from compute, up and down,
tweak, no knobs to turn and no tuning required. transparently and automatically.
1. Legacy platform: Traditional technology fails
to meet the needs of today’s business users,
such as the increasing requirement for unlimited
ALL OF YOUR DATA DATA SHARING
concurrency and performance.
Create a single source of truth to easily store, Snowflake extends the data warehouse with
2. Cloud: A strategy to move from on-premise to integrate and extract critical insight from direct, governed and secure data sharing, so
petabytes of structured and semi-structured enterprises can easily forge one-to-one, one-to-
cloud implies a move away from traditional IT
data (JSON, XML, AVRO). many and many-to-many data sharing relationships.
delivery models to a more ‘on-demand,’
‘as-a-service’ model with minimal
management intervention.
ALL OF YOUR USERS COMPLETE SQL DATABASE
3. New data sources and workloads: Key data sources Provide access to an architecturally unlimited
SQL
Snowflake supports the tools millions
for the modern enterprise are already in the cloud. number of concurrent users and applications of business users already know how to
The cloud also allows for new types of analytics without eroding performance. use today.
to be assessed and refined without a long-term
commitment to infrastructure or specific tools.
2
CHAMPION GUIDES
STRATEGY—THINKING ABOUT
YOUR MIGRATION
WHAT SHOULD YOU CONSIDER? within the data, such as data marts that rely on types and number of data sources and your future
There are several things to contemplate when references to data populated via a separate process ambitions and timescales.
choosing your migration path. It’s usually desirable in another schema.
Factors that lead to a ‘lift and shift’ approach may
to pilot the migration on a subset of the data and Questions to ask about your workloads and data: focus on what you already have:
processes. Organizations often prefer to migrate
in stages, reducing risk and showing value sooner. • What workloads and processes can be migrated • You have highly integrated data across the
with minimal effort? existing warehouse.
However, you must balance this against the need
to keep the program momentum and minimize the • Or, you are migrating a single independent,
• Which processes have issues today and would
period of dual-running. In addition, your approach benefit from re-engineering? standalone data mart.
may also be constrained by the inter-relationships
• What workloads are outdated and require a • Or, your current system uses well-designed data
complete overhaul? and processes using standard ANSI SQL.
• What new workloads would you like to add that • Or, you have timescale pressures to move off
would deploy easier in Snowflake? legacy equipment.
‘Lift and shift’ vs a staged approach Factors that may lead to a staged approach:
The decision whether to move data and processes • Your warehouse platform consists of many
in one bulk operation or deploy a staged approach independent data marts and other data
depends on several factors. They include the applications, which can be moved independently
nature of your current data analytics platform, the over time.
3
CHAMPION GUIDES
WHAT YOU DON’T NEED TO WORRY ABOUT Workload management capacity planning for an on-premises Teradata
When migrating to Snowflake from Teradata Workload management is unnecessary in a system, you run a risk of over or under configuring
there are a number of concerns that you can Snowflake environment due to its multi-cluster your system. Even with Teradata Vantage, you have
forget about as they are not relevant to your architecture, which allows you to create separate the similar capacity planning risk as compute and
Snowflake environment. virtual warehouses for your disparate workloads so storage are fixed per instance. If you need more
as to avoid resource contention completely. capacity you must buy in predefined increments.
Data distribution and primary indexes With Snowflake’s elastic storage and compute
In Snowflake, there is no need for primary Statistics collection architecture, you never have this risk, so you can
indexes. Since compute is separate from storage Snowflake automatically captures statistics, relieving save money and avoid the time previously spent on
in Snowflake’s architecture, the data is not DBAs from having to set up jobs to collect statistics extensive planning.
pre-distributed to the MPP compute nodes. In for performance tuning. It’s automatic in Snowflake,
Snowflake, we have MPP compute nodes that do so you no longer have to remember to add new tables
not rely on the data being distributed ahead of time. to the process when your data warehouse grows.
4
CHAMPION GUIDES
Disaster recovery
Teradata has several disaster recovery scenarios.
Many of them require the purchase of another
system, and purchasing software such as Unity to
implement these scenarios. With Snowflake, none of
this is necessary. Snowflake leverages many of the
built-in features of the cloud, such as the automatic
replication of data built into AWS. Snowflake is
implemented in multiple regions on AWS, Azure, and
Google Cloud and supports cross cloud replication
for disaster recovery to your cloud provider and
region of choice. There is no work on your part to
establish this.
5
5
CHAMPION GUIDES
MIGRATING YOUR EXISTING
TERADATA WAREHOUSE
In order to successfully migrate your enterprise Teradata. Keep in mind, Snowflake is self-tuning control system). You’ll also want to edit these scripts
data warehouse to Snowflake, you need to and has a unique architecture. You won’t need to to remove code for extraneous features and options
generate code for any indexes, partitions or storage not needed in Snowflake, such as primary indexes
develop and follow a logical plan that includes
clauses of any kind that you may have needed in and other storage or distribution related clauses.
the items in this section.
Teradata. You only need basic DDL, such as CREATE Depending on the data types you used in Teradata,
TABLE, CREATE VIEW, and CREATE SEQUENCE. you may also need to do a search-and-replace in
MOVING YOUR DATA MODEL Once you have these scripts, you can log into your the scripts to change some of the data types to
As a starting point for your migration, you’ll need Snowflake account to execute them. Snowflake optimized types. For a list of these data
to move your database objects from Teradata to types, please see Appendix B.
If you have a data modeling tool, but the model is
Snowflake. This includes the databases, tables,
not current, we recommend you reverse engineer
views and sequences in your existing data
the current design into your tool, then follow the
warehouse that you want to move over to your new
approach outlined above.
Snowflake data warehouse. In addition, you may
want to include all of your user account names, roles Using existing DDL scripts
and objects grants. At a minimum, the user who You can begin with your existing DDL scripts if you
owns the Teradata database must be created on the don’t have a data modeling tool. But you’ll need the
target Snowflake system before migrating data. most recent version of the DDL scripts (in a version
Which objects you decide to move will be highly
dependent on the scope of your initial migration.
There are several options for making this happen.
The following sections outline three possible
approaches for moving your data model from
Teradata to Snowflake.
6
CHAMPION GUIDES
Creating new DDL scripts SQL extract script of some sort to build the DDL consider how much data you have to move. For
If you don’t have current DDL scripts for your data scripts. Rather than do a search and replace after example, to move 10s or 100s of terabytes, or even
warehouse, or a data modeling tool, you will need the script is generated, you can code these data a few petabytes of data, a practical approach may
to extract the metadata needed from the Teradata type conversions directly into the metadata extract be to extract the data to files and move it via a
data dictionary in order to generate these scripts. script. The benefit is that you have automated the service such as AWS Snowball or Azure Data Box
But for Snowflake, this task is simpler since you extract process so you can do the move iteratively. or Google Transfer Appliance. If you have to move
won’t need to extract metadata for indexes and Plus, you will save time editing the script after the 100s of petabytes or even exabytes of data, AWS
storage clauses. fact. Additionally, coding the conversions into the Snowmobile is likely the more appropriate option.
script is less error-prone than any manual cleanup
As mentioned above, depending on the data If you choose to move your data manually, you will
process, especially if you are migrating hundreds or
types in your Teradata design, you may also need need to extract the data for each table to one or
even thousands of tables.
to change some of the data types to Snowflake more delimited flat files in text format using Teradata
optimized types. You will likely need to write a Parallel Transporter (TPT). Then upload these files
MOVING YOUR EXISTING DATA SET using the PUT command into your cloud provider’s
Once you have built your objects in Snowflake, blob storage. We recommend these files be between
you’ll want to move the historical data already 100MB and 1GB to take advantage of Snowflake’s
loaded in your Teradata system over to Snowflake. parallel bulk loading.
To do this, you can use a third-party migration tool After you have extracted the data and moved it to
(see Appendix A), an ETL (extract, transform, load) your cloud provider’s blob storage, you can begin
tool or a manual process to move the historical loading the data into your table in Snowflake using
data. Choosing among these options, you should the COPY command. You can check out more
details about our COPY command in our online
documentation.
7
CHAMPION GUIDES
Migrating BI tools • Is there a specific time period in which this MOVING THE DATA PIPELINE
Many of your queries and reports are likely to use workload needs to complete? Between certain AND ETL PROCESSES
hours? You can easily schedule any Snowflake
an existing business intelligence (BI) tool. Therefore, Snowflake is optimized for an ELT (extract, load,
virtual warehouse to turn on and off, or just auto
you’ll need to account for migrating those transform) approach. However, we support a
suspend and automatically resume when needed.
connections from Teradata to Snowflake. You’ll also number of traditional ETL (extract, transform, load)
have to test those queries and reports to be sure • How much compute will you need to meet that and data integration solutions. We recommend a
you’re getting the expected results. window? Use that to determine the appropriately basic migration of all existing data pipelines and
sized ETL processes in order to minimize the impact to
This should not be too difficult since Snowflake virtual warehouse. your project unless you are planning to significantly
supports standard ODBC and JDBC connectivity,
• How many concurrent connections will this
enhance or modify them. Given the fact that testing
which most modern BI tools use. Many of the
workload need? If you normally experience and data validation are key elements of any changes
mainstream tools have native connectors to
bottlenecks you may want to use the Snowflake to the data pipeline, maintaining these processes, as
Snowflake. Don’t worry if your tool of choice is
multi-cluster warehouse for those use cases to is, will reduce the need for extensive validation.
not available. You should be able to establish a
allow automatic scale out during
connection using either ODBC or JDBC. If you have
peak workloads.
questions about a specific tool, your Snowflake
contact will be happy to help. • Think about dedicating at least one large virtual
warehouse for tactical, high-SLA workloads.
Handling workload management
As stated earlier, the workload management • If you discover a new workload, you can easily add
it on demand with Snowflake’s ability to instantly
required in Teradata is unnecessary with Snowflake.
provision a new virtual warehouse.
The multi-cluster architecture of Snowflake allows
you to create separate virtual warehouses for your
disparate workloads to avoid resource contention
completely. Your workload management settings
in Teradata (TASM or TIWM) will give you a good
idea of how you’ll want to set up Snowflake virtual
warehouses. However, you’ll need to consider the
optimal way to distribute these in Snowflake. As a
starting point, create a separate virtual warehouse
for each workload. You will need to size the virtual
warehouse according to resources required to
meet the SLA for that workload. To do so, consider
the following:
8
CHAMPION GUIDES
Snowflake has worked diligently to ensure that CUT OVER
the migration of processes running on traditional Once you migrate your data model, your data, your
ETL platforms is as painless as possible. Native loads and your reporting over to Snowflake, you
connectors for tools such as Talend and Informatica must plan your switch from Teradata to Snowflake.
make the process quick and easy. Here are the fundamental steps:
We recommend running the data pipeline in both 1. Execute a historic, one-time load to move all the
Snowflake and Teradata during the initial migration. existing data.
This way, you can simplify the validation process
by enabling a quick comparison of the results from 2. Set up ongoing, incremental loads to collect
new data.
the two systems. Once you’re sure queries running
against Snowflake are producing identical results as 3. Communicate the cut-over to all Teradata users,
queries from Teradata, you can be confident that the so they know what’s changing and what they
migration did not affect data quality. But you should should expect.
see a dramatic improvement in the performance.
4. Ensure all development code is checked in/backed
For data pipelines that require re-engineering, up, which is a good development practice.
you can leverage Snowflake’s scalable compute
and bulk-loading capabilities to modernize your 5. Point production BI reports to pull data
from Snowflake.
processes and increase efficiency. You may consider
taking advantage of our Snowpipe tool for loading 6. Run Snowflake and Teradata in parallel for a few
data continuously as it arrives to your cloud storage days and perform verifications.
provider of choice, without any resource contention
or impact to performance thanks to Snowflake’s 7. Turn off the data pipeline and access to Teradata
cloud-built architecture. Snowflake makes it easy to for the affected users and BI tools.
bring in large datasets and perform transformations
at any scale.
9
CHAMPION GUIDES
NEED HELP
MIGRATING?
Snowflake’s solution partners and Snowflake’s Whether your organization is fully staffed for a
Professional Services team offer a number of services platform migration or you need additional manpower,
to accelerate your migration and ensure a successful Snowflake’s solution partners and Snowflake’s
implementation. The Snowflake Alliances team is Professional Services team have the skills and tools
working with top-tier system integrators that have to make this process easier, so you can reap the full
experience performing platform migrations. benefits of Snowflake.
Snowflake solution partners and Snowflake’s To find out more, please contact Snowflake’s
Professional Services team understand the benefits solutions partner team or the Snowflake sales team.
of Snowflake’s cloud-built architecture and apply To understand the business benefits of migrating
their experience and knowledge to the specific from Teradata to Snowflake, click here.
challenges that your organization may face during the
migration process. They offer services ranging from
high-level architecture recommendations to manual
code conversion. Additionally, a number of Snowflake
partners have built tools to automate and accelerate
the migration process.
10
CHAMPION GUIDES
APPENDIX A:
MIGRATION TOOLS
Several of our Snowflake ecosystem partners have WIPRO functionality for targeting a Snowflake database, it’s
offerings that may help with your migration from relatively simple to redeploy the data warehouse
Wipro has developed a tool to assist in these
Teradata to Snowflake. For more information, or objects and ELT code, with these steps:
migration efforts —CDRS Self Service Cloud
to engage these partners, contact your Snowflake Migration Tool (patent pending). It provides and • Change the target platform from Teradata
representative. These are just a few: end-to-end, self-service data migration solution to Snowflake.
for migrating your on-premises data warehouse
• Generate all the required objects (tables, views,
to Snowflake. This includes Snowflake specific
sequences and file formats).
optimizations.
Existing WhereScape customers that have built their • Regenerate all the ELT code for the new target.
Teradata warehouse using WhereScape RED can
leverage all the metadata in RED to replatform to • Deploy and execute to continue loading your data
Snowflake. Since WhereScape has native, optimized in Snowflake.
11
CHAMPION GUIDES
APPENDIX B:
DATA TYPE CONVERSION TABLE
This appendix contains a sample of some of the data type mappings you need to know when moving
from Teradata to Snowflake. As you will see, many are the same but a few will need to be changed.
BYTEINT BYTEINT
SMALLINT SMALLINT
INTEGER INTEGER
BIGINT BIGINT
DECIMAL DECIMAL
FLOAT FLOAT
NUMERIC NUMERIC
CHAR Up to 64K CHAR Up to 16MB
VARCHAR Up to 64K VARCHAR Up to 16MB
LONG VARCHAR Up to 64K VARCHAR Up to 16MB
CHAR VARYING(n) CHAR VARYING(n)
REAL REAL
DATE DATE
TIME TIME
TIMESTAMP TIMESTAMP
BLOB BINARY Up to 8MB
CLOB VARCHAR Up to 16MB
BYTE BINARY
VARBYTE VARBINARY
GRAPHIC VARBINARY
JSON VARIANT
ARRAY ARRAY
12
CHAMPION GUIDES
APPENDIX C:
SQL CONSIDERATIONS
Below are examples of some changes you may need to make to your Teradata SQL queries so
they will run correctly in Snowflake. Note that this is not an all-inclusive list.
13
ABOUT SNOWFLAKE
Snowflake’s cloud data platform shatters the barriers that have prevented organizations of all sizes from unleashing the
true value from their data. Thousands of customers deploy Snowflake to advance their organizations beyond what was
possible by deriving all the insights from all their data by all their business users. Snowflake equips organizations with a
single, integrated platform that offers the only data warehouse built for the cloud; instant, secure, and governed access to
their entire network of data; and a core architecture to enable many types of data workloads, including a single platform for
developing modern data applications. Snowflake: Data without limits. Find out more at snowflake.com.