Data Loading with Snowflake's COPY INTO Command-Table

Data Loading with Snowflake's COPY INTO Command-Table

Snowflake's COPY INTO command is a powerful tool for data professionals, streamlining the process of loading data from staged files into existing tables. Here's a quick overview of how it works and where you can stage your files.

Staging Locations:

  1. Named Internal Stage: Files can be staged using the PUT command in a table or user stage.

  2. Named External Stage: Reference an external location such as Amazon S3, Google Cloud Storage, or Microsoft Azure.

  3. External Location: Directly from Amazon S3, Google Cloud Storage, or Microsoft Azure.

Important Note: You cannot access data held in archival cloud storage classes that require restoration before retrieval. This includes Amazon S3 Glacier Flexible Retrieval, Glacier Deep Archive, and Microsoft Azure Archive Storage.

Snowflake's COPY INTO command ensures efficient and organized data loading, making it an essential feature for managing your data ecosystem.

Here’s a detailed explanation of each parameter in the COPY INTO command:

  1. ON_ERROR= {CONTINUE | SKIP_FILE | ABORT_STATEMENT}:

  2. FORCE=TRUE:

  3. VALIDATION_MODE= {RETURN_ALL_ERRORS | RETURN_ERRORS}:

  4. RETURN_FAILED_ONLY=TRUE:

  5. PURGE=TRUE:

  6. MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE:

  7. INCLUDE_METADATA=(ColumnName=METADATA$field):

  • Logged into Snowflake account by SNOWSQL & select WAREHOUSE,DATABASE & SCHEMA

  • CREATE the Fileformat & Stage

  • Create the table

  • Put the file into the stages

  • Checking the bad records where it is there in source side

  • Skip the bad records and load it to table - showing 1 bad records skipped

  • FORCE=TRUE parameter means that the data load operation will be forced to run even if the files have already been loaded previously. This parameter is useful when you want to reload data from files that have not changed or when you need to reload files that previously encountered errors.

  • When we mentioned FORCE=TRUE -Same data will reloaded

  • TABLE cricket_stats created

  • Loaded this file to stage DATA_LOAD_STAGE_1;

  • Load the data from Stage to table

Benefits of Using PURGE=TRUE:

  1. Storage Management: Automatically deleting the files helps in managing and optimizing your storage space, preventing unnecessary clutter and potential costs associated with storing large amounts of unused data.

  2. Operational Efficiency: It simplifies the data loading process by eliminating the need for manual cleanup, which can be particularly useful when dealing with large datasets and frequent data loads.

  3. Data Security: Removing files from the stage after loading reduces the risk of unauthorized access to staged files that are no longer needed, thus enhancing data security.

  • Loaded file which was there under the stage that is deleted t20.csv.gz

  • How do i add the filename,FILENAME = METADATA$FILENAME, ELT_LOAD_TIME = METADATA$START_SCAN_TIME)

  • order_info table got created

  • File Format & Stage got created

  • Data uploaded to stage from local via snowsql

  • Create a temp table to stage the data

  • Data dump into Final table order_info table

  • SELECT * FROM order_info

  • Copy from Table to Stages

  • REMOVE @OD01_STAGE--Remove all the files under this stage

  • LIST @OD01_STAGE -- Check if any files are there.

  • COPY INTO @OD01_STAGE FROM customer_info

🔹 All Parameters

🚀 Best Practices for COPY INTO @STAGE

✅ 1. Use SINGLE=FALSE for Large Exports

  • Allows Snowflake to split files efficiently using parallel processing

✅ 2. Optimize File Size for Performance

  • Use to balance read performance & storage costs.

  • For AWS S3, use ~128MB file size for best retrieval performance.

✅ 3. Use INCLUDE_QUERY_ID=TRUE to Avoid File Overwrites

  • Ensures exported filenames are unique (prevents accidental overwriting).

✅ 4. Choose the Right File Format (PARQUET for Analytics)

  • Best for big data analytics (compact, columnar storage).

  • Best for external systems that don’t support Parquet.

✅ 5. Encrypt Data for Security

  • Use for built-in encryption.

✅ 6. Validate Data Before Copying Using VALIDATION_MODE

  • Avoid wasting time by checking errors before exporting data.

✅ 8. Overwrite Files When Needed

  • Ensure fresh data by overwriting old files

COPY INTO @OD01_STAGE/CUSTOMERSDATA

FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER

FILE_FORMAT = (TYPE = 'PARQUET')

OVERWRITE=TRUE

SINGLE=TRUE

HEADER = TRUE

LIST @OD01_STAGE

https://www.youtube.com/watch?v=DuowRboOWAI&list=PL__gObEGy1Y7klsW7vc2TM2Cmt6BwRkzh&index=13

https://docs.snowflake.com/en/sql-reference/sql/copy-into-table

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics