Connecting to Marketo API

etlworks-marketo

Marketo exposes a REST API which allows for remote execution of many of the system’s capabilities.  From creating programs to bulk lead import, there are a large number of options which allow fine-grained control of a Marketo instance.

Read about Marketo REST API.

Prerequisites

Assuming that you already have Marketo account, in order to connect to the Marketo REST API you must have the following:

  • The base URL for the Marketo instance, for example, 377-XYZ-015.mktorest.com
  • Client ID
  • Client Secret

This article explains how to obtain the Client ID and the Client Secret.

Creating a connection to the Marketo REST API

In this section, we will be showing you how to create a connection to the Marketo REST API endpoint which returns available fields for the leads.

Step 1. Add a new HTTP connection in Etlworks.

Step 2. Enter the following URL in the “URL” field.

https://base-url/rest/v1/leads/describe.json?access_token={access_token}

where the base-url is the URL from the “Prerequisites” section, for example, 377-XYZ-015.mktorest.com.

Notice the leads/describe.json which is the API endpoint that returns the fields for the leads.

Step 3. Select GET as a “Method”.

Step 4. Select OAuth2 for “Authentication”. In the “User” field enter the “Client ID” from the “Prerequisites” section and in the “Password” field enter the “Client Secret” from the same section.

Step 5. In the “Authentication URL” field enter the following URL:

https://base-url/identity/oauth/token?grant_type=client_credentials&client_id={user}&client_secret={password}

where the base-url is the URL from the “Prerequisites” section.

Step 6. Select POST for “HTTP Method for Token and OAuth2 Authentication”.

Step 7. Select application/json as an “Authentication Request Content Type”.

Step 8. Give the connection a name and save it.

Testing Marketo REST API connection

The following shows you how to test the connection using Explorer.

Step 1. If you don’t have it already create the JSON format. Use all the default settings.

Step 2. Go to Explorer, select the connection created in the previous section and click “expand” (down-arrow) button. Link the format created in step 1 to the connection.

Step 3. Click the “View Data” button to view the data in a grid format or “Raw Data” to view the data in the JSON format.

Connecting to Microsoft Graph API

etlworks-azure

Microsoft Graph provides a unified programmability model that you can use to take advantage of the tremendous amount of data in Microsoft 365, Azure Active Directory, and other Microsoft services.

Read about Microsoft Graph and Microsoft Graph API.

This blog post provides step-by-step instruction for creating a connection to the Microsoft Graph API in Etlworks.

Registering your app with Azure AD endpoint

Assuming that you already have Office 365 or Azure account, the first step is to create an app and register it with Azure AD endpoint.

Step 1. Login to Azure Portal and select “Azure Active Directory” in the left side-bar.

Step 2. Select “App Registration” and click “New application registration”.

Step 3. Enter the name of the application, for example, Office 365 connector, select Web app / API in the “Application type” and enter https://api.etlworks.com in the “Sign-on URL”.

Step 4. Click “Create” and wait while application being created.

Step 5. Click “Settings” and then expand “Properties”. Copy and save somewhere “Application ID” and “App ID URI”. We will be using both later when creating a connection to the Microsoft Graph API.

Step 6. Expand “Required permissions” and click “Add”.

Step 7. Click “Select an API” and add all required APIs. For this example we will need Microsoft Graph.

Step 8. Click “Select permissions” and select Read all users’ full profiles.

Step 9. Click “Select” and “Done”.

Step 10. Expand “Keys”.

Step 11. Enter key parameters, for example, api key, Never expires, graph api key.

Step 12. Click “Save”. Copy and save the key value. It is not going to be available after you leave this screen.

Creating a connection to the Microsoft Graph API

In this section, we will be showing you how to create a connection to the Graph API endpoint which returns users in the Azure Active Directory for the specific Office 365 account.

Step 1. Add a new HTTP connection in Etlworks.

Step 2. Enter the following URL in the “URL” field.

https://graph.microsoft.com/v1.0/users

Step 3. Select GET as a “Method” and application/json as a “Content Type Header”.

Step 4. Add the following “Header” (name&value)

name=content-type
value=application/json;odata.metadata=minimal;odata.streaming=true;IEEE754Compatible=false;charset=utf-8

Step 5. Select OAuth2 for “Authentication”. In the “User” field enter the “Application ID” from step 5 of the “Register your app with Azure AD” and in the “Password” field enter the key value from step 12.

Step 6. In the “Authentication URL” field enter the “”App ID URI” from step 5 of the “Register your app with Azure AD”.

Step 7. Select POST for “HTTP Method for Token and OAuth2 Authentication”.

Step 8. Enter the following string in the “Authentication Request Payload” field:

&client_id={user}
&scope=https%3A%2F%2Fgraph.microsoft.com%2F.default
&client_secret={password}
&grant_type=client_credentials

Step 9. Select application/x-www-form-urlencoded as an “Authentication Request Content Type”.

Step 10. Give the connection a name and save it.

Testing Microsoft Graph API connection

The following shows you how to test the connection using Explorer.

Step 1. If you don’t have it already create the JSON format. Use all the default settings.

Step 2. Go to Explorer, select the connection created in the previous section and click “expand” (down-arrow) button. Link the format created in step 1 to the connection.

Step 3. Click the “View Data” button to view the data in a grid format or “Raw Data” to view the data in the JSON format.

Connecting to Salesforce API

etlworks-salesforce

Updated (5/7/2019). Etlworks now includes a native connector for Salesforce but this article is still relevant if you need to access various Salesforce APIs (for example a streaming REST API) not supported by the native connector.

In Etlworks, it is possible to connect to practically any HTTP-based API -the  Salesforce API is no exception. This blog post provides step-by-step instruction for creating a connection to the Salesforce REST API in Etlworks.

Creating Connected App in Salesforce.

Assuming that you already have a Salesforce account, the first step is to create a connected app in Salesforce.

Step 1. Login into Salesforce and click the Setup icon (looks like gears) in the top navigation banner.

Step 2. Search for “apps” and select the “Apps/App Manager” link in the left side-bar.

Step 3. Click the “New Connected App” button in the top right corner.

Step 4.  Enter all required parameters and check Enable OAuth settings.

Step 5. In the “Selected OAuth Scopes” settings, add the Full Access (full) scope.

Step 6. Enter the following URL in the “Callback URL” field:

https://app.etlworks.com/salesforce/callback

Step 7. Click the “Save” button and continue to the next screen (there will be a message saying that you need to wait for a few minutes before you can start using the app).

Step 8. In the next screen, under the “API (Enable OAuth Settings)” there will be a “Consumer Key” and a “Consumer Secret”. Copy and save them somewhere, we will need them later when you create a connection to the Salesforce API.

Creating a connection to Salesforce API

In this section, we will be showing you how to connect to the Query API, which takes a SQL-like query as an URL parameter and returns a JSON for the requested object.

Step 1. Add a new HTTP connection in Etlworks.

Step 2. Enter the following URL in the “URL” field.

https://your_salesforce_instance.lightning.force.com/services/data/v20.0/query?q=SELECT+name+,+email+,+Username+,+LastName+,+FirstName+,+IsActive+,UserRole.Name+,+Id+from+User

As you can see, we are using a query API to get all the users under your Salesforce account, together with roles.

Step 3. Select GET as a “Method” and application/x-www-form-urlencoded as a “Content Type Header”.

Step 4. Select oauth2 as an “Authentication”. Enter URL encoded username in the “User or Access Key” field and the password in the “Password” field. For example, if the username is first.last@company.com, the encoded URL is going to be first.last%40company.com.

Step 5. Enter the following string in the “Authentication URL” field:

https://login.salesforce.com/services/oauth2/token

Select POST in the “HTTP Method for Token and OAuth2 Authentication”.

Step 6. Enter the following string in the “Authentication Request Payload” field:

client_id=client_id&client_secret=client_secret&grant_type=password&username={user}&password={password}

where the client_id is the “Consumer Key”  and the client_secret is the “Consumer Secret” from step 8 in the “Creating Connected App in Salesforce” section.

Step 7. Select application/x-www-form-urlencoded as an “Authentication Request Content Type”.

Step 8. Give the connection a name and save it.

Testing Salesforce connection

The following shows you how to test the connection to the Salesforce API using Explorer.

Step 1. If you don’t have it already create the JSON format. Use all the default settings.

Step 2. Go to Explorer, select the connection created in the previous section and click “expand” (down-arrow) button. Link the format created in step 1 to the Salesforce connection.

Step 3. Click the “View Data” button to view the data in a grid format or “Raw Data” to view the data in the JSON format.

Working with paginated API

In this blog post, I will discuss how to work with paginated APIs in Etlworks. 

Most (but not all) of the APIs put the limit on the amount of data that can be returned or uploaded in one request. They provide a pagination mechanism by responding with the information on how to access the next “page” (hence pagination).

In Etlworks retriving data from the paginated API can be impemnted using nested flow with the loop, together with the technique called dynamic URLs.
Common pagination patterns

Let’s look at the most common pagination patterns:

  • The API where the amount of data is controlled by the parameters in the request and the loop stops when there is no more data to return. I will be using Jira API to illustrate the example.
  • The API where the server controls the amount of data by providing the “next marker” in the response, so the loop stops when there is no “next marker”. I will be using API provided by an organization called UKRI Gateway to illustrate the example.
API with the page number (JIRA)

The URL for the typical Jira API endpoint looks like below:

https://company.atlassian.net/rest/api/3/searchjql=query
&startAt=pointer&maxResults=10

When called it returns the following JSON:

{"expand": "schema,names",
    "startAt": 0,
    "maxResults": 10,
    "total": 2823,
    "results": [...]}

As you can see, the request contains the starting pointer and the maximum number of results. The response includes the total number of results.

So, the algorithm to iterate through the pages will look like below:

  1. Make the first call with startAt=0 and maxResults=1
  2. Extract total from the response and calculate the total number of calls which need to be made to the endpoint, with maxResults=100 (100 is a maximum allowed number of results).
  3. Create a loop to iterate through the pages. The loop ends when the number of calls made to the endpoint reaches the total number of calls, calculated in step 2. 

Now, let’s put it all together.

In step 1, we will need to create connections for the example.

  1. HTTP connection for the JIRA API. 
  2. Database connection to the PostgreSQL database.

The URL for the HTTP connection which returns JIRA ticket for the project TEST will look like below (we are using the version of the API for the JIRA cloud, so substitute company in the URL on your actual company name). Note that JIRA API uses preemptive basic authentication.   

https://company.atlassian.net/rest/api/3/searchjql=project%20%3D%20TEST
&startAt={startAt}&maxResults=100

Notice that the value for the startAt parameter in the URL includes token {startAt}. The idea that in the loop we will be replacing token {startAt} on the actual values: 0, 100, 200, 300, etc.

For this example, I will be using a database loop so we will need one more connection to the temporary database, which will be used to drive the loop.

In step 2, we will be creating the flows.

1.  Create SQL flow which is used to create a temporary loop table using SQL below:

create table pages (startAt integer)

2.  Create JavaScript flow to populate the loop table:

importPackage(com.toolsverse.etl.core.engine);
importPackage(com.toolsverse.util);
importPackage(com.toolsverse.config);
importPackage(com.toolsverse.etl.sql.util);
importPackage(java.util);

var pageSize = 100;

var params = Extractor.lookup(etlConfig, scenario, 
"First", "params", "select total"); var totalNumberOfRecords = params.getFieldValue(params.getRecord(0),
"total"); var numberOfPages = Math.floor(totalNumberOfRecords / pageSize); var remainder = totalNumberOfRecords % pageSize; numberOfPages = totalNumberOfRecords == 0 ? 0 :
(numberOfPages == 0 ? 1 :
(remainder > 0 ? numberOfPages + 1 : numberOfPages)); var connection = etlConfig.getConnectionFactory().
getConnection("Temporary database"); var sql = "insert into pages (startAt) values (?)"; var startAt = 0; for (var page = 0; page <numberOfPages; page++) { SqlUtils.executeSql(connection, sql, startAt); startAt = startAt + pageSize; }

3. Create a flow “Extract data from the Web Service and load into the database” where the source is a web service (Jira endpoint above) and the destination is a PostgreSQL database.

4.  Create a nested flow which combines flows 1, 2, and 3.

5.  Modify step 3 by configuring a database loop where the connection is a Temporary database and the driving SQL is:

select startAt as "startAt" from pages

Download JIRA example from the etlworks.com. You can import the example into your Etlworks account and try it out.  Don’t forget to edit the PostgreSQL and JIRA connections before executing the flow.

Download a fully functional example for another paginated api with page number query parameter from the etlworks.com. You can import the example into your Etlworks account and try it out without changing anything. This flow pulls the data from the paginated API and loads in the shared Google Sheet. 

API with the next marker (UKRI Gateway)

For this example, I will be using UKRI Gateway API endpoint with the following URL:

https://www.ebi.ac.uk/europepmc/webservices/rest/search
?query=italy&cursorMark={NEXT_MARKER}

If the value of the cursorMark query parameter is * (star character) the API returns the first page, which contains data and possible next value for the cursorMark.

<response>
<nextCursorMark>535543435GUI999==</nextCursorMark> 
<data>
...
</data>
</reponse>

So, the algorithm to iterate through the pages will look like below:

  1. Make the first call with cursorMark=*
  2. Get the value for the cursorMark from the response by extracting the value of the nextCursorMark field.
  3. Create a loop to iterate through the pages. The loop ends when the response doesn’t have the nextCursorMark

Now, let’s put it all together.

In step 1, we will need to create connections for the example.

  1. HTTP connection for the UKRI Gateway API endpoint above.   
  2. Connection to the shared Google Sheet.

Notice that the value of cursorMark query parameter is a token {NEXT_MARKER}. The idea is to replace the token with the next marker, make the API call and repeat until there is no next marker anymore.

We will be loading data into the shared Google Sheet by calling the API endpoint multiple times so we will need a connection to the temporary staging database to dump the data from the API before loading the whole thing into the Google Sheet.

In step 2, we will be creating the flows.

1. Create the JavaScript flow to set the original value of the {NEXT_MARKER} token as (start character).

importPackage(com.toolsverse.config);
SystemConfig.instance().getProperties().put("NEXT_MARKER", "*");

2. Create a flow “Extract data from the Web Service and load into the database” where the source is a web service and the destination is a temporary database.

3. Click the “Mapping->Additional Transformations” and add the following JavaScript as a value for the After Extract transformation:

importPackage(com.toolsverse.util);
importPackage(com.toolsverse.config);

var nextCursorMark = dataSet.getFieldValue(dataSet.getRecord(0), 
"nextCursorMark"); if (Utils.isNothing(nextCursorMark)) { nextCursorMark = ""; } SystemConfig.instance().getProperties().put("NEXT_MARKER",
nextCursorMark);

4. Create flow “Extract data from database, push into the well known API” where the source is a temporary database and the destination is a shared Google Sheet.

5.  Create nested flow which combines flows 1, 2, 3, and 4.

Modify step 2 by configuring a JavaScript loop, where the JavaScript for the loop is:

importPackage(com.toolsverse.config);
importPackage(com.toolsverse.util);

var nextCursorMark = SystemConfig.instance().getProperties().
get("NEXT_MARKER");

value = !Utils.isNothing(nextCursorMark) ? nextCursorMark : null;

Download a fully functional example for paginated api with next marker from the etlworks.com. You can import the example into your Etlworks account and try it out without changing anything. This flow pulls the data from the paginated API and loads in the shared Google Sheet. 

Loading data in Snowflake

etlworks-snowflake

In this blog post, I will be talking about building a reliable data injection pipeline for Snowflake.

Snowflake is a data warehouse built for the cloud. It works across multiple clouds and combines the power of data warehousing, the flexibility of big data platforms, and the elasticity of the cloud.

Based on the Snowflake documentation, loading data is a two-step process:
  1. Upload (i.e. stage) one or more data files into either an internal stage (i.e. within Snowflake) or an external location.
  2. Use the COPY INTO command to load the contents of the staged file(s) into a Snowflake database table.

It is obvious that one step is missing: preparing data files to be loaded in Snowflake.

If steps 1-3 do not look complicated to you, let’s add more details.

Typically, developers that are tasked with loading data into any data warehouse dealing with the following issues:

  • How to build a reliable injection pipeline, which loads hundreds of millions of records every day.
  • How to load only recent changes (incremental replication).
  • How to transform data before loading into the data warehouse.
  • How to transform data after loading into the data warehouse.
  • How to deal with changed metadata (table structure) in both the source and in the destination.
  • How to load data from nested datasets, typically returned by the web services (in addition to loading data from the relational databases).

This is just a short list of hand-picked problems. The good news is that Snowflake is built from the ground up to help with bulk-loading data, thanks to the very robust COPY INTO command, and continues-loading using Snowpipe.

Any Snowflake injection pipeline should at least be utilizing the COPY INTO command and, possibly Snowpipe.

The simplest ETL process that loads data into the Snowflake will look like this:
  1. Extract data from the source and create CSV (also JSON, XML, and other formats) data files.
  2. Archive files using gz compression algorithm.
  3. Copy data files into the Snowflake stage in Amazon S3 bucket (also Azure blob and local file system).
  4. Execute COPY INTO command using a wildcard file mask to load data into the Snowflake table.
  5. Repeat 1-4 for multiple data sources. Injection ends here.
  6. If needed, execute SQL statements in Snowflake database to transform data. For example, populate dimensions from the staging tables.
The part where you need to build a “reliable data injection pipeline” typically includes:
  • Performance considerations and data streaming.
  • Error-handling and retries.
  • Notifications on success and failure.
  • Reliability when moving files to the staging area in S3 or Azure.

COPY INTO command can load data from the files archived using gz compression algorithm. So, it would make sense to archive all the data files before copying or moving them to the staging area.

  • Cleaning up: what to do with all these data files after they have been loaded (or not loaded) into the Snowflake.
  • Dealing with changing table structure in the source and in the destination.

Snowflake supports transforming data while loading it into a table using the COPY INTO <table> command but it will not allow you to load data with inconsistent structure.

Add the need to handle incremental updates in the source (change replication) and you got yourself a [relatively] complicated project at hands.

As always, there are two options:
  1. Develop home-grown ETL using a combination of scripts and in-house tools.
  2. Develop solution using third-party ETL tool or service.

Assuming that you are ready to choose option 2 (if not, go to paragraph one), let’s discuss

The requirements for the right ETL tool for the job

When selecting the ETL tool or service the questions you should be asking yourself are:

  • How much are you willing to invest in learning?
  • Do you prefer the code-first or the drag&drop approach?
  • Do you need to extract data from the semi-structured and unstructured data sources (typically web services) or all your data is in the relational database?
  • Are you looking for point-to-point integration between well-known data sources (for example, Salesforce->Snowflake ) with the minimum customization, or you need to build a custom integration?
  • Do you need your tool to support change replication?
  • How about real-time or almost real-time ETL?
  • Are you looking for a hosted and managed service, running in the cloud or on-premise solution?
Why Etlworks is the best tool for loading data in Snowflake?

First, just like Snowflake, Etlworks is a cloud-first data integration service. It works perfectly well when installed on-premise, but it really shines in the cloud. When subscribing to the service, you can choose the region that is closest to your Snowflake instance which will make all the difference as far as the fast data load is concerned. Also, you won’t have to worry about managing the service.

Second, in Etlwoks you can build even the most complicated data integration flows and transformations using simple drag&drop interface. No need to learn a new language and no complicated build-test-deploy process.

Third, if you are dealing with heterogeneous data sources, web services, semi-structured or unstructured data, or transformations which go beyond the simple point-to-point, pre-baked integrations  – you are probably limited to just a few tools. Etlworks is one of them.

Last but not least, if you need your tool to support a native change (incremental) replication from relational databases or web services, Etlworks can handle this as well. No programming required.  And it is fast.

How it works

In Etlworks, you can choose from several highly configurable data integration flows, optimized for Snowflake:

  • Extract data from databases and load in Snowflake.
  • Extract data from data objects (including web services) and load in Snowflake.
  • Extract data from well-known APIs (such as Google Analytics) and load in Snowflake.
  • Load existing files in Snowflake.
  • Execute any SQL statement or multiple SQL statements.

Behind the scene, the flows perform complicated transformations and create data files for Snowflake, archive files using gz algorithm before copying to the Snowflake stage in the cloud or in the server storage, automatically create and execute COPY INTO <table> command, and much more. For example, the flow can automatically create a table in Snowflake if it does not exist, or it can purge the data files in case of error (Snowflake can automatically purge the file in case of success).

You can find the actual, step-by-step instructions on how to build Snowflake data integration flows in Etlworks in our documentation.

The extra bonus is that in Etlworks you can connect to the Snowflake database, discover the schemas, tables, and columns, run SQL queries, and share queries with the team. All without ever using Snowflake SQL workbench.  Even better – you can connect to all your data sources and destinations, regardless of the format and location to discover the data and the metadata. Learn more about Etlworks Explorer.

Etlworks installation options and pricing model

DjdkTL2UYAAm2fa

In this blog post, I will explain the various Etlworks installation options and how our pricing model works.

Our pricing model is a three-tiered, feature-based SaaS subscription.

There are three clearly defined tiers: Startup, Business and Enterprise.

All tiers allow for an unlimited number of users and provide you access to all included connectors. Tier differences can be found in the following:

  • The total number of records that can be processed each day.
  • The total number of schedules.
  • How often the scheduled flow can run.
  • The type of instance: shared or dedicated.
  • The ability to set up a custom domain with its own SSL certificate
  • White labeling
  • The ability to create and use tenants.

Take a look at our pricing grid for more information.

Dedicated instance, custom domains, white labeling and the ability to use tenants are only available with the Enterprise and on-premise plans.

FAQ


Q. What are tenants?

A. Essentially, tenants are sub-accounts under the main account.

Each tenant has a separate list of users, flows, connections, formats, and listeners, and are completely isolated from each other. They can be used to separate customers and environments – for example, DEV, QA, PRODUCTION, etc. Flows and connections can be copied from one tenant to another.

Q. What happens if we exceed the maximum allowed number of processed records?

A. You will receive a notification that the limit has been exceeded and we will recommend that you switch to a higher tier.

It’s OK if you exceed the limit set for your plan once or twice a week, but systematically exceeding it violates the terms of service.

Q. How does the free trial work?

A. You get full functionality for 14 days, at which point you can turn your trial into a monthly or annual subscription.

Q. What happens when the 14-days free evaluation period ends? Do I need to ask to extend it?

A. Your Etlworks account will be disabled but you will have an option to subscribe to the paying service or request an extension. Your evaluation period can be extended up to 14 days.

Q. How does your on-premise offering work?

A. Unlike cloud subscriptions, the customer owns and operates the on-premise instance running Etlworks. In addition to the subscription fee, we charge $2000 USD as a one-time installation and configuration fee.

Typically we don’t have access to the on-premise instances at all, but we do provide a fully automated, one-click installation and upgrade script for all major operating systems.

The on-premise instance must be able to connect to the Etlworks License server at least once a day.

Q. Can I have more than one dedicated instance?

A. Yes, you can. Each instance requires a separate license.

Q. Can Etlworks handle our load?

A. Etlworks is extremely fast and optimized for performance. It is also horizontally scalable. You can have multiple ETL servers running in parallel behind the load balancer.

Q. What is ETL server?

A. The ETL server is a headless web application with the REST API which handles the data integration requests.  It is a core of the Etlworks Integrator. The ETL server can run on the same physical instance as Etlworks Integrator (this is a default setup), or it can be installed on the multiple physical/virtual servers behind the load balancer, which will create a highly scalable data integration solution that is able to process a large number of heavy-duty ETL requests.

Q. How many ETL servers will I need?

A. In most cases, you will need just one ETL server, which is included in the base price. You might need more than one ETL server if you expect a large number of the parallel ETL requests (tens of thousands per day) or need guaranteed high availability. Always think of upgrading the amount of available RAM and CPU cores before adding an extra ETL server.

Q. Is multi-server option available for cloud Enterprise plans?

A. Yes, it is available.

Q. What does “the price starts from” mean for cloud Enterprise plans?

A. We factor the cost of running the dedicated instance of Etlworks when calculating the price. For example:

  • 8 Gb RAM, 2 CPU cores, 100 SSD – $800 / month.
  • 16 Gb RAM, 4 CPU cores, 100 SSD – $1000 / month.
  • 32 Gb RAM, 8 CPU cores, 200 SSD – $1500 / month.

Q. How can I estimate the size of the instance?

A.  If you are planning to work with APIs that return large (> 100Mb) JSON or XML documents we recommend the instance with 16 or 32 Gb of RAM.  In all other cases, 8 Gb should be enough.

Q. Can I request to upgrade or downgrade the instance?

A.  Yes, you can. It usually takes just a few minutes to change the size of the instance for the instances managed by Etlworks.

Q. What does “the price starts from” mean for on-premise Enterprise plans?

A. The base price includes one instance of Etlworks Integrator. For example:

  • One instance with one ETL server – $800 / month.
  • Two instances (for example DEV and PRODUCTION), each with its own ETL server – $1600 / month.
  • One instance with two ETL servers – $1600 / month.
  • One instance with three ETL servers – $2400 / month.

Q. Can I install Etlworks from the AWS or Azure marketplace?

A. Not at the moment. We will be in both marketplaces in Q1 of 2019.

Q. How am I billed?

A. You subscribe and will be billed through our customer portal powered by Paywhirl. The other payment option is direct money wire to our bank account.

Q. What will my total cost be?

A. Your total cost will be the subscription cost listed on our website. No surprises.

Please note: Enterprise plans will be billed based on the size of the instance.

Q. What regions are available for shared and dedicated instances?

A. We have shared instances in the US-EAST (us-east-2) and EU-WEST (eu-west-2) regions. Dedicated instances can be installed in any region of your choice.

Q. Can we have a dedicated instance on Microsoft Azure or Google Cloud?

A. Yes, you can. The dedicated instance of Etlworks can be installed on any cloud of your choice.

Q. On your website there is an option to buy a perpetual license. What does this mean?

A. When you buy a perpetual license you own it forever. Perpetual license for one server costs 30K.

The renewal costs $6000 / year after the first year.

Q. How is support provided?

A.  The basic support by email is included in all plans. You can purchase extended support, which includes 10 hours of the professional services per month for $1000 / month (or 6 hours for $600 / month). You can also buy ad-hoc extended support for $150 / hour.

Cloud Data Integration

cloud-data-integration

In this blog post, I will discuss the definition of cloud data integration and what makes it truly useful.  

Before we start, let’s get on the same page and define what cloud data integration is.

According to Wikipedia, cloud data integration software must have the following features:

  • Deployed on a multi-tenant, elastic cloud infrastructure.
  • Subscription model pricing: operating expense, not capital expenditure.
  • No software development: required connectors should already be available.
  • Users do not perform deployment or manage the platform itself.
  • Presence of integration management & monitoring features.

While I agree with the definition, there’s something is missing:

where is the data we are suppose to be integrating?

If you are ahead of the curve, all your data is already stored in the cloud. While I think we all will be here eventually, as of today, a typical enterprise – from two guys working out of a garage to multinational corporations – owns and operates multiple data silos. I would add diverse and isolated data silos:

  • Cloud databases.
  • On-premise databases, available from the Internet.
  • On-premise databases, not available from the Internet.
  • Public APIs.
  • Private APIs, not available from the Internet.
  • Cloud-based third-party applications.
  • Locally hosted third-party applications.
  • Legacy applications.
  • Files stored locally.
  • Files stored in cloud data storage.
  • Office 365 and Google Docs documents.

Can your favorite data integration platform handle the vast array of data sources? If the answer is “Yes it can! We just need to deploy it in our corporate network and it will be able to connect to all our data,” then it is not a cloud data integration anymore. Don’t get me wrong, there is nothing wrong with the ETL tool deployed locally. It gets the job done, but you are not getting the benefits of the true cloud-based platform, specifically this one:

users do not perform deployment or manage the platform itself.

If this is not a showstopper, my advice is to find and stick to the tool which has all the required connectors and is easy to program and operate. Sure, you will need a competent DevOps group on payroll (in addition to the ETL developers), who will be managing and monitoring the tool, installing upgrades, performing maintenance, etc., but hey…it works.

Keep reading if you want to focus on breaking the data silos in your organization instead of managing the data integration platform. The solution to the problem at hand is so-called hybrid data integration.

Hybrid data integration is a solution when some of the connectors can run on-premise, behind the corporate firewall, while others, and the platform itself runs on the cloud.

We, at Etlworks, believe that no data silo should be left behind, so in addition to our best in class cloud data integration service we offer fully autonomous, zero-maintenance data integration agents which can be installed on-premise, behind the corporate firewall.  Data integration agents are essentially connectors installed locally and seamlessly integrated with a cloud-based Etlworks service.

Let’s consider these typical data integration scenarios:

Source and destination are in the cloud

Example: the source is an SQL Server database in Amazon RDS and the destination is a Snowflake data warehouse.

In this case, no extra software is required. Etlworks can connect to the majority of the cloud-based databases and APIs directly. Check out available connectors.

The source is on-premise, behind the firewall and the destination is in the cloud

Example: the source is locally hosted PostgreSQL database, not available from the Internet, and the destination is Amazon Redshift.

In this scenario, you will need to install a data integration agent as a Windows or Linux service in any available computer in your corporate network (you can install multiple agents in multiple networks if needed). The agent includes a built-in scheduler so it can run periodical extracts from your on-premise database and push changes either directly to the cloud data warehouse or to the cloud-based data storage ( Amazon S3, for example).  You can then configure a flow in Etlworks, which will be pulling data from the cloud data storage and loading into the cloud-based data warehouse.  The flow can use the extremely fast direct data upload into the Redshift available as a task in Etlworks.

The source is in the cloud and the destination is on-premise, behind the firewall

Example: the source is a proprietary API, available from the Internet and the destination is a database in the Azure cloud.

Data Integration Agent can work both ways: extracting data from the sources behind the firewall and loading data into the local databases. In this scenario, the flow in Etlworks will be extracting data from the cloud-based API, then transforming and storing it in the cloud-based storage, available to the agent. The data integration agent will be loading data from the cloud-based storage directly into the local database.

The source is in the cloud and a destination is a third-party software hosted locally

Example: the source is a proprietary API, available from the Internet and the destination is locally hosted data analytics software.

If the third-party software can load data from services such as Google Sheets, you can configure a flow in Etlworks, which will be extracting and transforming data from the API and loading into the Google Sheets. The third-party software will then be loading data directly from the specific worksheet. You can always find a format which is understood by Etlworks and a third-party software.

Source and destination are on-premise, behind the firewall

Example: the source is a proprietary API, not available from the Internet and the destination is another proprietary API, again not available from the Internet.

In this case, you probably don’t need cloud-based software at all. Look at Etlworks on-premise subscription options as well as an option to buy a perpetual license.