Data Loading with Snowflake's COPY INTO Command-Table

Arabinda Mohapatra

Pyspark, SnowFlake,AWS, Stored Procedure, Hadoop,Python,SQL,Airflow,Kakfa,IceBerg,DeltaLake,HIVE,BFSI,Telecom

Published Feb 18, 2025

Snowflake's COPY INTO command is a powerful tool for data professionals, streamlining the process of loading data from staged files into existing tables. Here's a quick overview of how it works and where you can stage your files.

Staging Locations:

Named Internal Stage: Files can be staged using the PUT command in a table or user stage.
Named External Stage: Reference an external location such as Amazon S3, Google Cloud Storage, or Microsoft Azure.
External Location: Directly from Amazon S3, Google Cloud Storage, or Microsoft Azure.

Important Note: You cannot access data held in archival cloud storage classes that require restoration before retrieval. This includes Amazon S3 Glacier Flexible Retrieval, Glacier Deep Archive, and Microsoft Azure Archive Storage.

Snowflake's COPY INTO command ensures efficient and organized data loading, making it an essential feature for managing your data ecosystem.

Here’s a detailed explanation of each parameter in the COPY INTO command:

ON_ERROR= {CONTINUE | SKIP_FILE | ABORT_STATEMENT}:
FORCE=TRUE:
VALIDATION_MODE= {RETURN_ALL_ERRORS | RETURN_ERRORS}:
RETURN_FAILED_ONLY=TRUE:
PURGE=TRUE:
MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE:
INCLUDE_METADATA=(ColumnName=METADATA$field):

Logged into Snowflake account by SNOWSQL & select WAREHOUSE,DATABASE & SCHEMA

CREATE the Fileformat & Stage

Create the table

Put the file into the stages

Checking the bad records where it is there in source side

Skip the bad records and load it to table - showing 1 bad records skipped

FORCE=TRUE parameter means that the data load operation will be forced to run even if the files have already been loaded previously. This parameter is useful when you want to reload data from files that have not changed or when you need to reload files that previously encountered errors.

When we mentioned FORCE=TRUE -Same data will reloaded

TABLE cricket_stats created

Loaded this file to stage DATA_LOAD_STAGE_1;

Load the data from Stage to table

Benefits of Using PURGE=TRUE:

Storage Management: Automatically deleting the files helps in managing and optimizing your storage space, preventing unnecessary clutter and potential costs associated with storing large amounts of unused data.
Operational Efficiency: It simplifies the data loading process by eliminating the need for manual cleanup, which can be particularly useful when dealing with large datasets and frequent data loads.
Data Security: Removing files from the stage after loading reduces the risk of unauthorized access to staged files that are no longer needed, thus enhancing data security.

Loaded file which was there under the stage that is deleted t20.csv.gz

How do i add the filename,FILENAME = METADATA$FILENAME, ELT_LOAD_TIME = METADATA$START_SCAN_TIME)

order_info table got created

File Format & Stage got created

Data uploaded to stage from local via snowsql

Create a temp table to stage the data

Data dump into Final table order_info table

SELECT * FROM order_info

Copy from Table to Stages
REMOVE @OD01_STAGE--Remove all the files under this stage
LIST @OD01_STAGE -- Check if any files are there.
COPY INTO @OD01_STAGE FROM customer_info

🔹 All Parameters

🚀 Best Practices for COPY INTO @STAGE

✅ 1. Use SINGLE=FALSE for Large Exports

Allows Snowflake to split files efficiently using parallel processing

✅ 2. Optimize File Size for Performance

Use to balance read performance & storage costs.
For AWS S3, use ~128MB file size for best retrieval performance.

✅ 3. Use INCLUDE_QUERY_ID=TRUE to Avoid File Overwrites

Ensures exported filenames are unique (prevents accidental overwriting).

✅ 4. Choose the Right File Format (PARQUET for Analytics)

→ Best for big data analytics (compact, columnar storage).
→ Best for external systems that don’t support Parquet.

✅ 5. Encrypt Data for Security

Use for built-in encryption.

✅ 6. Validate Data Before Copying Using VALIDATION_MODE

Avoid wasting time by checking errors before exporting data.

✅ 8. Overwrite Files When Needed

Ensure fresh data by overwriting old files

COPY INTO @OD01_STAGE/CUSTOMERSDATA

FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER

FILE_FORMAT = (TYPE = 'PARQUET')

OVERWRITE=TRUE

SINGLE=TRUE

HEADER = TRUE

LIST @OD01_STAGE

https://www.youtube.com/watch?v=DuowRboOWAI&list=PL__gObEGy1Y7klsW7vc2TM2Cmt6BwRkzh&index=13

https://docs.snowflake.com/en/sql-reference/sql/copy-into-table

Data Loading with Snowflake's COPY INTO Command-Table