This article was published as a part of theData Science Blogathon
Introduction
If you are just starting your journey with Snowflake, then I strongly recommend you to check my article that tells about how to get started with Snowflake.The term Stage has a specific purpose w.r.t Snowflake terminology. It is different compared to what we name as a staging area in the normal ETL terminology.
Stages in Snowflake are locations used to store data. If the data that needs to be loaded into Snowflake is stored in other cloud regions like AWS S3 or Azure or GCP then these are calledExternal stages whereas if the data is stored inside Snowflake then these are called Internal stages.
Table of Contents
1.Internal Stages
First, I will walk through in detail the Internal Stages.
Internal Stages are further divided as below
- User Stages
- Table Stages
- Internal Named stages
Before ingesting data into a table in Snowflake, data has to be loaded into a stage using the PUT command, and then it has to be loaded into the table using the COPY INTO command. Similarly, if we need to unload the data from a table, it has to be loaded into the stage using the GET command, and then it has to be exported using the COPY INTO command.
Note that there is a slight change in the syntax based on whether the local system is Unix or windows. I have used windows based system to store the local files in this article.
As of now, all the tasks are done via the SNOWSQL command-line interface. SNOWSQL is a client tool where you can install on your local machine and then connect to the snowflake. When you load the data from local to the stage, snowflake automatically compresses the file. If you check the file in the stage after loading you will see an extension of .gz all the time inside the stage.
1.a.User stages
User stages are tied to a specific user. Every user has a default stage created. We will not be able to either modify or remove these stages.
We can copy the files to these stages to load them further into the table. Once the load is completed we need to ensure to remove these files explicitly otherwise we need to pay for storage. Files in one user stage cannot be accessed by another user. So if you need to load multiple tables from a specific user then this is the best option. We need to refer to the user stages using ‘@~’
1.b.Table stages
Table stages are tied to a specific user. Whenever a table is created, then automatically table stage is created. Similar to the user stage will not be able to either modify or remove the table stage, however, we need to clean up space after the files are loaded. The table stage for a particular table cannot be accessed through another table. So if you need to load one table then you can choose the table stage. We need to refer to the user stages using ‘@%’
1.c.Internal Named Stages
These stages offer more flexibility compared to user or table stages. These are some of the snowflake objects. So all the operations that can be performed on objects can be performed on Internal named stages as well. We need to create these stages manually and we can also specify the file format options while creating the stage itself which is unlike the table or user stage. We need to refer to the user stages using ‘~’.
2.Use Case for Internal Stage
2.a.Problem Definition
Load source files from the local system into multiple tables in snowflake and then process the data. Processed data will be available in the target table. Unload the data from the target table into a file in the local system.
Note: Since the processing of data is out of scope for this article, I will skip this. I will populate the data in the target table manually. Let’s assume that aggregation of a particular employee salary.
2.b.Solution
Since we need to load multiple tables here, we can either go for the user stage or named stage. Processing the data is out of scope for this article, so I have not included that part. Assume that processed data is available in the target table. Since it is a single target table, we can use either table stage or named stage to unload the data from a table.
2.c.Steps to be Implemented
- Connect and log in to SNOWSQL.
- Load all the files from the local disk to the user stage using the PUT command.
- Load the files from the user stage to the source tables in snowflake using the COPY INTO table command.
- Unload the file from the target table into the table stage using COPY into the table stage.
- Unload the stage and copy it into the local system using GET.
- Clean all the files in the stages to avoid billing for storage.
2.d.The Architecture of the Implementation
Image source: Author
2.e.Naming Conventions
To understand easily I will use the naming conventions as below
- Source Files in local disk – F_EMP_SAL1.txt,F_EMP_SAL2.txt
- Source Tables :T_EMP_SAL1,T_EMP_SAL2
- User stages : STG_EMP_USER
- Target table: T_EMPTARGET
- Table stages: STG_EMP_TABLE
- Target file in local: F_EMP_TARGET
2.f.Pre-requisites for Implementation
- SNOWSQL should be installed in the client machine.
- Snowflake free trial account. If you do not have then check this article on how to create a free trail account on the snowflake.
- Local txt Files with content is as below:
- F_EMP_SAL1.txt
1,aaron,30002,vidhya,4000
- F_EMP_SAL2.txt
1,Ben,70002,vidhya,2000
- Source and target tables are created in Snowflake with the below script
use role accountadmin;
use warehouse compute_wh;
use database DEMO_DB;
use schema PUBLIC;
create table T_EMP_SAL1(emp_id integer,emp_name varchar,empsal float);
create table T_EMP_SAL2(emp_id integer,emp_name varchar,empsal float);
create table T_EMP_TARGET(emp_id integer,emp_name varchar,empsal float);
insert into T_EMP_TARGET
values
(1,'Aaron',8000),
(1,'Vidhya',4000),
(1,'Ben',7000);
2.g.Implementation
We will see the code used in this section
Code#Logging into SNOWSQL
snowsql -a <>.ap-south-1.aws -u <>
#1.Load all the files from the local disk to the user stage using PUT command
use role accountadmin;
use warehouse compute_wh;
use database DEMO_DB;
use schema PUBLIC;
put file://D:SnowflakeContentF_EMP_SAL1.txt @~;
put file://D:SnowflakeContentF_EMP_SAL2.txt @~;
select * from T_EMP_SAL1;#This returns empty table
select * from T_EMP_SAL2;#This returns empty table
#2.Load the files from the user stage to the source tables in snowflake using COPY INTO table command
(Video) Demystifying the Data Anonymization Process: Myths and Best Practicescopy into T_EMP_SAL1 from @~/F_EMP_SAL1.txt;
copy into T_EMP_SAL2 from @~/F_EMP_SAL2.txt;
select * from T_EMP_SAL1;
select * from T_EMP_SAL2;
#3.Copy the file from target table into the table stage
copy into @%T_EMP_TARGET from T_EMP_TARGET;
select * from T_EMP_TARGET;
#4.Unload into local system using GET.
get @%T_EMP_TARGET file://D:SnowflakeContent
#5.Clean all the files in the stages to avoid billing for storage.
list @~;
rm @~ pattern='.*txt.*';
rm @%T_EMP_TARGET;
Let’s see the output now for all the above commands
Image Source – Author
Image Source – Author
Image Source – Author
2.h.Output
Let’s see if the file is exported.
Image source: Author
3.External Stage
If the files are located in an external cloud location, for example, if you need to load files from AWS S3 into snowflake then an external stage can be used.
Unlike Internal stages, loading and unloading the data can be directly done using COPY INTO. Get and Put commands are not supported in external stages. The external stage can be created via Web user interface or SNOWSQL as well. We will see how to create using the web user interface.
4.Use Case for External Stage
4.a.Problem Definition
Load files from AWS S3 into a snowflake table using an external stage.
4.b.Prerequisites
- To perform this demo, you need to have an AWS account.
- An access key and secret key to connect to AWS account.
- Create a table where data needs to be loaded in snowflake with the below script
4.c.Steps for Implementation
1.Create an AWS S3 bucket as shown below
Image source: Author
2.Upload files on S3
Image source: Author
3. Create an external stage in snowflake using AWS keys
Image source: Author
4.d.Output
Image source: Author
5.Conclusion
In this article, we have seen what a stage is and how to use stages in Snowflake with a use case. Have you observed that in the first use case we have not used any command like create stage? This is because we have used table and user stage, you can also use the internal stage instead of this as the internal stage offers more flexibility compared to the table stage and user stage.
I hope you got a clear idea about the stages in Snowflake. If you have any questions or feedback then please comment below.
Happy Learning!!
Here is my Linkedin profile in case if you want to connect with me.
Have something to say! Feel free to contact me here.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
FAQs
What are different stages in Snowflake? ›
Internal stages store the files internally within Snowflake. External stages store the files in an external location (AWS S3 bucket or Azure Containers or GCP Cloud storage) that is referenced by the stage.
How do you check stages in a Snowflake? ›Lists all the stages for which you have access privileges. This command can be used to list the stages for a specified schema or database (or the current schema/database for the session), or your entire account.
What are external stages in Snowflake? ›An external (i.e. S3) stage specifies where data files are stored so that the data in the files can be loaded into a table. Data can be loaded directly from files in a specified S3 bucket, with or without a folder path (or prefix, in S3 terminology).
What is the difference between user stage and table stage in Snowflake? ›User stages are personal storage locations for each user. These stages are unique to the user, meaning no other user can access the stage. Each user has a user stage allocated by default, and these cannot be altered or dropped. Table stages are storage locations held within a table object.
What are the four layers of Snowflake? ›...
- Storage Layer. Snowflake organizes the data into multiple micro partitions that are internally optimized and compressed. ...
- Compute Layer. Snowflake uses “Virtual Warehouse” (explained below) for running queries. ...
- Cloud Services Layer.
- Snowflake Workloads Overview.
- Data Applications.
- Data Engineering.
- Data Marketplace.
- Data Science.
- Data Warehousing.
- Marketing Analytics.
- Unistore.
Table Stages. Each table has a Snowflake stage allocated to it by default for storing files. This stage is a convenient option if your files need to be accessible to multiple users and only need to be copied into a single table.
Why do we create a stage in Snowflake? ›A named stage is a cloud storage location managed by Snowflake. Creating a named stage is useful if you want multiple users or processes to upload files. If you plan to stage data files to load only by you, or to load only into a single table, then you may prefer to use your user stage or the table stage.
What is the difference between external stage and external table in Snowflake? ›In a typical table, the data is stored in the database; however, in an external table, the data is stored in files in an external stage. External tables store file-level metadata about the data files, such as the filename, a version identifier and related properties.
What is the difference between user stage and table stage? ›If you need to load a file into Numerous Tables, the User Stage is the way to go. If you only need to load into one Table, use the Table Stage. Numerous Users can access the Internal Named Stages, which can be utilized to Load Multiple Tables.
What is an internal stage in Snowflake? ›
Creates a new named internal or external stage to use for loading data from files into Snowflake tables and unloading data from tables into files: Internal stage. Stores data files internally within Snowflake. Internal stages can be either permanent or temporary.
Can we query data from Stage in Snowflake? ›Snowflake supports using standard SQL to query data files located in an internal (i.e. Snowflake) stage or named external (Amazon S3, Google Cloud Storage, or Microsoft Azure) stage. This can be useful for inspecting/viewing the contents of the staged files, particularly before loading or after unloading data.
Is Snowflake faster than star schema? ›Star Schema vs Snowflake Schema: Query Performance. Star Schema has a faster query time than Snowflake Schema because they need a single join between the fact table and its other attributes in dimensional tables.
What is a six sided Snowflake called? ›Snowflakes come in many different shapes. The six-pointed star-shaped flakes are called dendrites (tree-like) thanks to their beautiful branches. These flakes form in clouds that have plenty of moisture and a temperature of around 5 degrees.
What are the two types of Snowflake releases? ›Release Types (Weekly)
Feature enhancements or updates.
This system defines the seven principal snow crystal types as plates, stellar crystals, columns, needles, spatial dendrites, capped columns, and irregular forms.
What are the ETL tools in Snowflake? ›Snowflake supports both transformation during (ETL) or after loading (ELT). Snowflake works with a wide range of data integration tools, including Informatica, Talend, Fivetran, Matillion and others.
How many roles can a user have in Snowflake? ›Different Types of Roles in Snowflake
The Snowflake Data Cloud gives you different types of roles, including both system and custom roles. There are five system roles that you can assign, each of which is described below.
Snowflake supports both ETL and ELT and works with a wide range of data integration tools, including Informatica, Talend, Tableau, Matillion and others.
Why Clustering is used in Snowflake? ›During reclustering, Snowflake uses the clustering key for a clustered table to reorganize the column data, so that related records are relocated to the same micro-partition. This DML operation deletes the affected records and re-inserts them, grouped according to the clustering key.
Which role is most powerful role in a Snowflake system? ›
The account administrator (i.e users with the ACCOUNTADMIN system role) role is the most powerful role in the system. This role alone is responsible for configuring parameters at the account level.
Do we have triggers in Snowflake? ›Using Snowflake Stream Triggers will save you Snowflake credits when you need to extract data as quickly as possible from your Snowflake warehouse, but data is updated more frequently than your batch-based, orchestrated data pipelines.
Can user stage be cloned in Snowflake? ›Individual external named stages can be cloned. An external stage references a bucket or container in external cloud storage; cloning an external stage has no impact on the referenced cloud storage. Internal (i.e. Snowflake) named stages cannot be cloned.
Can we create stream on stage in Snowflake? ›No stream can be created at a time in the past before change tracking was recorded. Only supported for streams on standard tables or streams on views that query standard tables.
What are the four types of Snowflake tables and which ones have fail safe storage? ›- Permanent Table. These are the standard, regular database tables. ...
- Transient Table. Transient tables in Snowflake are similar to permanent tables except that that they do not have a Fail-safe period and only have a very limited Time-Travel period. ...
- Temporary Table.
Managed or external tables can be identified using the DESCRIBE FORMATTED table_name command, which will display either MANAGED_TABLE or EXTERNAL_TABLE depending on table type.
What is the difference between tables and views in Snowflake? ›Tables and views are the primary objects created and maintained in database schemas: All data in Snowflake is stored in tables. Views can be used to display selected rows and columns in one or more tables.
Can a Snowflake have 5 sides? ›All snowflakes contain six sides or points owing to the way in which they form.
What is the difference between warehouse and database in Snowflake? ›In Snowflake data platform, databases are virtual hard drives where you store data, while warehouses are the virtual compute resources you use to run analytical queries.
Can we drop multiple variables in the same statement in Snowflake? ›Usage Notes. The command supports dropping multiple variables in the same statement. The command does not require a running warehouse to execute.
What are the four main staging types? ›
- Found stages.
- Proscenium stages.
- Thrust stages.
- Arena stages.
They differ in the following way: period designates no more than a stretch of time; phase designates a stretch of time plus a change that is considered recurrent (either in the sequence at hand, or in all sequences of the same kind); stage designates a stretch of time that is characterized by a qualitative change that ...
What is the difference between status and stage? ›Status is a field of an entity. You can set status - Active, Open, Close, Cancelled etc. Stage is a part of Business process flow, You can divide your case in several stages while designing business process flow.
What type of schema is Snowflake? ›A snowflake schema is a multi-dimensional data model that is an extension of a star schema, where dimension tables are broken down into subdimensions. Snowflake schemas are commonly used for business intelligence and reporting in OLAP data warehouses, data marts, and relational databases.
What is Snowflake stage and pipe? ›A pipe is a named, first-class Snowflake object that contains a COPY statement used by Snowpipe. The COPY statement identifies the source location of the data files (i.e., a stage) and a target table. All data types are supported, including semi-structured data types such as JSON and Avro.
What is scaling in in Snowflake? ›In auto-scale mode, Snowflake automatically adds or resumes additional clusters (up to the maximum number defined by user) as soon as the workload increases. If the load subsides again, Snowflake shuts down or pauses the additional clusters.
What are the 4 levels of cache? ›- L1 cache, or primary cache, is extremely fast but relatively small, and is usually embedded in the processor chip as CPU cache.
- L2 cache, or secondary cache, is often more capacious than L1. ...
- Level 3 (L3) cache is specialized memory developed to improve the performance of L1 and L2.
Level 1 (L1) is the fastest type of cache memory since it is smallest in size and closest to the processor. Level 2 (L2) has a higher capacity but a slower speed and is situated on the processor chip. Level 3 (L3) cache memory has the largest capacity and is situated on the computer that uses the L2 cache.
What are the two main strategies of caching? ›Two common approaches are cache-aside or lazy loading (a reactive approach) and write-through (a proactive approach). A cache-aside cache is updated after the data is requested. A write-through cache is updated immediately when the primary database is updated.
Can we run multiple queries in Snowflake? ›The Snowflake Multi Execute Snap is a Write type Snap that enables you to execute multiple queries as a single atomic unit. This means that. if any query fails in execution, the changes are rolled back. This Snap supports executing all Snowflake queries.
Can Snowflake read from S3? ›
Do you want near real-time streaming of data from Amazon S3 to Snowflake? Using Snowflake's Snowpipe, it's possible to upload a CSV to a S3 bucket and see the data populated in a Snowflake table within 60 seconds.
How many types of internal stages are there in a Snowflake? ›There are three types of internal stages, and I would bet that most of us only use the first type — Named Stages. Let's start with the other two: User and Table stages. User Stages: every Snowflake user automatically gets a staging area of their very own, intuitively called a USER stage.
Is Snowflake normalized or denormalized? ›The snowflake schema is a fully normalized data structure. Dimensional hierarchies (such as city > country > region) are stored in separate dimensional tables.
What is the maximum time a Snowflake? ›Assuming an average fall speed of 3.5 feet per second, a snowflake would take more than 45 minutes to reach Earth.
Why are stages used Snowflake? ›Table Stages. Each table has a Snowflake stage allocated to it by default for storing files. This stage is a convenient option if your files need to be accessible to multiple users and only need to be copied into a single table.
What is the life cycle of a Snowflake? ›It begins with a speck. The speck comes from dust or pollen floating in a cloud. The droplet freezes into a ball of ice. More water vapor sticks to the ball of ice and it grows into an ice crystal.
How many layers are there in Snowflake? ›Snowflake has a unique architecture that consists of three salient layers, which include: Storage Layer. Compute or Processing Layer. Cloud Services Layer.
What are different roles in Snowflake? ›Roles are assigned to users to allow them to perform actions required for business functions in their organization. A user can be assigned multiple roles. This allows users to switch roles (i.e. choose which role is active in the current Snowflake session) to perform different actions using separate sets of privileges.
What is the difference between internal and external stages in Snowflake? ›Internal stages store the files internally within Snowflake. External stages store the files in an external location (i.e. S3 bucket) that is referenced by the stage. An external stage specifies location and credential information, if required, for the S3 bucket.
What is difference between external table and external stage in Snowflake? ›In a typical table, the data is stored in the database; however, in an external table, the data is stored in files in an external stage. External tables store file-level metadata about the data files, such as the filename, a version identifier and related properties.
What is the snowflake theory? ›
The Snowflake Theory simply states that some situations are so complex that no standard solutions exist for those situations. Just like no two snowflakes look the same, no two businesses are the same. No two people, no two projects and no two cases are exactly the same. This means there are no standard solutions.
How do you organize data in a snowflake? ›Organizing Data
You can organize your data into databases, schemas, and tables. Snowflake does not limit the number of databases you can create or the number of schemas you can create within a database.
Assuming an average fall speed of 3.5 feet per second, a snowflake would take more than 45 minutes to reach Earth.
Why is a Snowflake 6 sided? ›All snowflakes contain six sides or points owing to the way in which they form. The molecules in ice crystals join to one another in a hexagonal structure, an arrangement which allows water molecules - each with one oxygen and two hydrogen atoms - to form together in the most efficient way.
What are the main concepts in Snowflake? ›Snowflake's architecture is a hybrid of traditional shared-disk and shared-nothing database architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the platform.
What is the difference between role and user in Snowflake? ›Roles are assigned to users to allow them to perform actions required for business functions in their organization. A user can be assigned multiple roles. This allows users to switch roles (i.e. choose which role is active in the current Snowflake session) to perform different actions using separate sets of privileges.
How many tasks can be created in Snowflake? ›You can specify the predecessor tasks when creating a new task (using CREATE TASK … AFTER) or later (using ALTER TASK … ADD AFTER). A DAG is limited to a maximum of 1000 tasks total (including the root task). A single task can have a maximum of 100 predecessor tasks and 100 child tasks.