Marketo exposes a REST API which allows for remote execution of many of the system’s capabilities. From creating programs to bulk lead import, there are a large number of options which allow fine-grained control of a Marketo instance.
Step 5. Select OAuth2 for “Authentication”. In the “User” field enter the “Application ID” from step 5 of the “Register your app with Azure AD” and in the “Password” field enter the key value from step 12.
Step 6. In the “Authentication URL” field enter the “”App ID URI” from step 5 of the “Register your app with Azure AD”.
Step 7. Select POST for “HTTP Method for Token and OAuth2 Authentication”.
Step 8. Enter the following string in the “Authentication Request Payload” field:
Updated (5/7/2019). Etlworks now includes a native connector for Salesforce but this article is still relevant if you need to access various Salesforce APIs (for example a streaming REST API) not supported by the native connector.
In Etlworks, it is possible to connect to practically any HTTP-based API -the Salesforce API is no exception. This blog post provides step-by-step instruction for creating a connection to the Salesforce REST API in Etlworks.
Creating Connected App in Salesforce.
Assuming that you already have a Salesforce account, the first step is to create a connected app in Salesforce.
Step 1. Login into Salesforce and click the Setup icon (looks like gears) in the top navigation banner.
Step 2. Search for “apps” and select the “Apps/App Manager” link in the left side-bar.
Step 3. Click the “New Connected App” button in the top right corner.
Step 4. Enter all required parameters and check Enable OAuth settings.
Step 5. In the “Selected OAuth Scopes” settings, add the Full Access (full) scope.
Step 6. Enter the following URL in the “Callback URL” field:
Step 7. Click the “Save” button and continue to the next screen (there will be a message saying that you need to wait for a few minutes before you can start using the app).
Step 8. In the next screen, under the “API (Enable OAuth Settings)” there will be a “Consumer Key” and a “Consumer Secret”. Copy and save them somewhere, we will need them later when you create a connection to the Salesforce API.
Creating a connection to Salesforce API
In this section, we will be showing you how to connect to the Query API, which takes a SQL-like query as an URL parameter and returns a JSON for the requested object.
As you can see, we are using a query API to get all the users under your Salesforce account, together with roles.
Step 3. Select GET as a “Method” and application/x-www-form-urlencoded as a “Content Type Header”.
Step 4. Select oauth2 as an “Authentication”. Enter URL encoded username in the “User or Access Key” field and the password in the “Password” field. For example, if the username is firstname.lastname@example.org, the encoded URL is going to be first.last%40company.com.
Step 5. Enter the following string in the “Authentication URL” field:
In this blog post, I will be talking about building a reliable data injection pipeline for Snowflake.
Snowflake is a data warehouse built for the cloud. It works across multiple clouds and combines the power of data warehousing, the flexibility of big data platforms, and the elasticity of the cloud.
Based on the Snowflake documentation, loading data is a two-step process:
Upload (i.e. stage) one or more data files into either an internal stage (i.e. within Snowflake) or an external location.
Use the COPY INTO command to load the contents of the staged file(s) into a Snowflake database table.
It is obvious that one step is missing: preparing data files to be loaded in Snowflake.
If steps 1-3 do not look complicated to you, let’s add more details.
Typically, developers that are tasked with loading data into any data warehouse dealing with the following issues:
How to build a reliable injection pipeline, which loads hundreds of millions of records every day.
How to load only recent changes (incremental replication).
How to transform data before loading into the data warehouse.
How to transform data after loading into the data warehouse.
How to deal with changed metadata (table structure) in both the source and in the destination.
How to load data from nested datasets, typically returned by the web services (in addition to loading data from the relational databases).
This is just a short list of hand-picked problems. The good news is that Snowflake is built from the ground up to help with bulk-loading data, thanks to the very robust COPY INTO command, and continues-loading using Snowpipe.
Any Snowflake injection pipeline should at least be utilizing the COPY INTO command and, possibly Snowpipe.
The simplest ETL process that loads data into the Snowflake will look like this:
Extract data from the source and create CSV (also JSON, XML, and other formats) data files.
Archive files using gz compression algorithm.
Copy data files into the Snowflake stage in Amazon S3 bucket (also Azure blob and local file system).
Execute COPY INTO command using a wildcard file mask to load data into the Snowflake table.
Repeat 1-4 for multiple data sources. Injection ends here.
If needed, execute SQL statements in Snowflake database to transform data. For example, populate dimensions from the staging tables.
The part where you need to build a “reliable data injection pipeline” typically includes:
Performance considerations and data streaming.
Error-handling and retries.
Notifications on success and failure.
Reliability when moving files to the staging area in S3 or Azure.
COPY INTO command can load data from the files archived using gz compression algorithm. So, it would make sense to archive all the data files before copying or moving them to the staging area.
Cleaning up: what to do with all these data files after they have been loaded (or not loaded) into the Snowflake.
Dealing with changing table structure in the source and in the destination.
Snowflake supports transforming data while loading it into a table using the COPY INTO <table> command but it will not allow you to load data with inconsistent structure.
Add the need to handle incremental updates in the source (change replication) and you got yourself a [relatively] complicated project at hands.
As always, there are two options:
Develop home-grown ETL using a combination of scripts and in-house tools.
Develop solution using third-party ETL tool or service.
Assuming that you are ready to choose option 2 (if not, go to paragraph one), let’s discuss
The requirements for the right ETL tool for the job
When selecting the ETL tool or service the questions you should be asking yourself are:
How much are you willing to invest in learning?
Do you prefer the code-first or the drag&drop approach?
Do you need to extract data from the semi-structured and unstructured data sources (typically web services) or all your data is in the relational database?
Are you looking for point-to-point integration between well-known data sources (for example, Salesforce->Snowflake ) with the minimum customization, or you need to build a custom integration?
Do you need your tool to support change replication?
How about real-time or almost real-time ETL?
Are you looking for a hosted and managed service, running in the cloud or on-premise solution?
Why Etlworks is the best tool for loading data in Snowflake?
First, just like Snowflake, Etlworks is a cloud-first data integration service. It works perfectly well when installed on-premise, but it really shines in the cloud. When subscribing to the service, you can choose the region that is closest to your Snowflake instance which will make all the difference as far as the fast data load is concerned. Also, you won’t have to worry about managing the service.
Second, in Etlwoks you can build even the most complicated data integration flows and transformations using simple drag&drop interface. No need to learn a new language and no complicated build-test-deploy process.
Third, if you are dealing with heterogeneous data sources, web services, semi-structured or unstructured data, or transformations which go beyond the simple point-to-point, pre-baked integrations – you are probably limited to just a few tools. Etlworks is one of them.
Last but not least, if you need your tool to support a native change (incremental) replication from relational databases or web services, Etlworks can handle this as well. No programming required. And it is fast.
How it works
In Etlworks, you can choose from several highly configurable data integration flows, optimized for Snowflake:
Extract data from databases and load in Snowflake.
Extract data from data objects (including web services) and load in Snowflake.
Extract data from well-known APIs (such as Google Analytics) and load in Snowflake.
Load existing files in Snowflake.
Execute any SQL statement or multiple SQL statements.
Behind the scene, the flows perform complicated transformations and create data files for Snowflake, archive files using gz algorithm before copying to the Snowflake stage in the cloud or in the server storage, automatically create and execute COPY INTO <table> command, and much more. For example, the flow can automatically create a table in Snowflake if it does not exist, or it can purge the data files in case of error (Snowflake can automatically purge the file in case of success).
You can find the actual, step-by-step instructions on how to build Snowflake data integration flows in Etlworks in our documentation.
The extra bonus is that in Etlworks you can connect to the Snowflake database, discover the schemas, tables, and columns, run SQL queries, and share queries with the team. All without ever using Snowflake SQL workbench. Even better – you can connect to all your data sources and destinations, regardless of the format and location to discover the data and the metadata. Learn more about Etlworks Explorer.