Data Integration Project: Steps to Success

Data integration involves bringing together information from disparate sources in order to generate meaningful insight. Reconciling data generated from the software, equipment, and personnel across all of the functional areas of your business can provide you with the valuable information you need to make the right decisions.

However, combining these different sources of data can be a complex, and honestly, headache-inducing challenge.

It can be quite difficult to know the exact scope of a data integration project, particularly when data is siloed across many different functional areas of your business.

Choosing the right approach to integration can save you money and help ensure integration project success. Choosing unwisely can increase the cost of doing integration or prevent you from meeting business-related objectives. In order to maximize success and minimize re-work, a business evaluation should start and guide each integration effort.

Before initiating any data integration project, it is necessary to take the right steps in preparation. While every company has its own unique needs, there are some general tips you can follow in order to achieve smooth and successful data integration.

Find the Best Data Integration Provider

There are many vendors out there with various data integration solutions that are both efficient and resourceful. Therefore, choosing the most suitable data integration solution for your business should be your number one priority.

Finding the right vendor who can overcome all the data integration challenges while implementing the right data management strategy with timely delivery and speed is the most important piece of the puzzle.

Establish a Data Governance Process

To unlock your data’s full value, you should establish and implement a set data governance process in your organization. This process needs to prioritize and include managing risks, data quality, business processes, and data management as a whole.

Having a set data governance policy in place will help you improve your operational processes. Also, it will help you ensure that your data is present in the right format, with the right quality and maximum availability for your stakeholders.

Implement Data Security

Businesses in touch with the latest data integration trends also need to find a way to safely and securely connect on-premise data using different cloud applications and systems.

Taking action on this subject should be a priority, considering the large volume of data that keeps growing.

Data integration projects can be complex, slow, and frustrating. However, it does not have to be the reality. 

With Etlworks robust and intuitive solution, data integration is quite simple and straightforward and the data integration project is easy and fast.

Etlworks installation options and pricing model

DjdkTL2UYAAm2fa

In this blog post, I will explain the various Etlworks installation options and how our pricing model works.

Our pricing model is a three-tiered, feature-based SaaS subscription.

There are three clearly defined tiers: Startup, Business and Enterprise.

All tiers allow for an unlimited number of users and provide you access to all included connectors. Tier differences can be found in the following:

  • The total number of records that can be processed each day.
  • The total number of schedules.
  • How often the scheduled flow can run.
  • The type of instance: shared or dedicated.
  • The ability to set up a custom domain with its own SSL certificate
  • White labeling
  • The ability to create and use tenants.

Take a look at our pricing grid for more information.

Select the shared plan (Startup or Business) if:

  1. You are OK to share the instance with other customers.
  2. The number of records you are planning to process monthly will not exceed 30 million (<=1 M records a day).
  3. You will not be running any flow more frequent than once an hour.
  4. You are OK to have a limited number of schedules.
  5. You are not planning to use API to process events in real time.

Select the dedicated plan (Enterprise or on-premise) if:

  1. You don’t want to share the instance with other customers.
  2. The number of records you are planning to process monthly will exceed 32 million (>1 M records a day).
  3. You are going to be running flows more frequent than once an hour.
  4. You expect to have a lot of schedules.
  5. You are planning to use API to process events in real time
  6. You need any of the following:
    1. Custom domain
    2. White labeling
    3. Ability to create and use tenants

FAQ


Q. What are tenants?

A. Essentially, tenants are sub-accounts under the main account.

Each tenant has a separate list of users, flows, connections, formats, and listeners, and are completely isolated from each other. They can be used to separate customers and environments – for example, DEV, QA, PRODUCTION, etc. Flows and connections can be copied from one tenant to another.

Q. What happens if we exceed the maximum allowed number of processed records?

A. You will receive a notification that the limit has been exceeded and we will recommend that you switch to a higher tier.

It’s OK if you exceed the limit set for your plan once or twice a week, but systematically exceeding it violates the terms of service.

Q. How does the free trial work?

A. You get full functionality for 14 days, at which point you can turn your trial into a monthly or annual subscription.

Q. What happens when the 14-days free evaluation period ends? Do I need to ask to extend it?

A. Your Etlworks account will be disabled but you will have an option to subscribe to the paying service or request an extension. Your evaluation period can be extended up to 14 days.

Q. How does your on-premise offering work?

A. Unlike cloud subscriptions, the customer owns and operates the on-premise instance running Etlworks. In addition to the subscription fee, we charge $2000 USD as a one-time installation and configuration fee.

Typically we don’t have access to the on-premise instances at all, but we do provide a fully automated, one-click installation and upgrade script.

The on-premise instance must be able to connect to the Etlworks License server at least once a day.

Q. Can I have more than one dedicated instance?

A. Yes, you can. Each instance requires a separate license.

Q. Can Etlworks handle our load?

A. Etlworks Integrator is extremely fast and optimized for performance. It is also horizontally scalable. You can have multiple instances running in parallel behind the load balancer.

Q. How many instances will I need?

A. In most cases, you will need just one instance, which is included in the base price. You might need more than one if you expect a large number of the parallel ETL requests (hundreds of thousands per day) or need guaranteed high availability. Always think of upgrading the amount of available RAM and CPU cores before adding an instance.

Q. Is multi-server option available for cloud Enterprise plans?

A. Yes, it is available.

Q. What does “the price starts from” mean for cloud Enterprise plans?

A. We factor the cost of running the dedicated instance of Etlworks when calculating the price. For example:

  • 8 Gb RAM, 2 CPU cores, 100 SSD – $900 / month.
  • 16 Gb RAM, 4 CPU cores, 100 SSD – $1100 / month.
  • 32 Gb RAM, 8 CPU cores, 200 SSD – $1500 / month.
  • 64 Gb RAM, 16 CPU cores, 200 SSD – $2000 / month.
  • 160 Gb RAM, 40 CPU cores, 500 SSD – $3000 / month.

Q. How can I estimate the size of the instance?

A. Use the maximum number of records per month that you are planning to process to bulk-estimate the size of the instance that you might need. Read how we count the number of records.

If you are planning to work with the nested XML and JSON documents read how we calculate the number of records in the nested documents.

Q. Can I request to upgrade or downgrade the instance?

A.  Yes, you can. It usually takes just a few minutes to change the size of the instance for the instances managed by Etlworks.

Q. What does “the price starts from” mean for on-premise Enterprise plans?

A. The base price includes one instance of Etlworks Integrator. It is possible to have multiple instances. The price per instance depends on the size of the instance. We provide a 20% discount for each additional license.

  • One 8 GB, 2 CPU cores instance  – $900 / month.
  • One 16 GB, 4 CPU cores instance  – $1200 / month.
  • One 32GB, 8 CPU cores instance  – $1500 / month.
  • One 64GB, 16 CPU cores instance  – $2000 / month.
  • One 160GB, 40 CPU cores instance  – $3000 / month.
  • One 256GB, 64 CPU cores instance  – $4000 / month.
  • Two 8GB instances – $1620 / month.
  • Two 16GB instances – $2160 / month
  • Two 32 GB instances  – $2700 / month
  • Two 64 GB instances  – $3600 / month
  • Two 128 GB instances  – $5400 / month

Q. Can I install Etlworks from the AWS or Azure marketplace?

A. Not at the moment. We will be in both marketplaces in Q3 of 2021.

Q. How am I billed?

A. You subscribe and will be billed through our customer portal powered by Paywhirl. You can pay by a major credit card, direct money wire to our bank account or mailed check. Monthly and anual payment plans are available.

Q. What will my total cost be?

A. Your total cost will be the subscription cost listed on our website. No surprises.

Q. What regions are available for shared and dedicated instances?

A. We have shared instances in the US-EAST (us-east-2) and EU-WEST (eu-west-2) regions. Dedicated instances can be installed in any region of your choice.

Q. Can we have a dedicated instance on Microsoft Azure or Google Cloud?

A. Yes, you can. The dedicated instance of Etlworks can be installed on any cloud of your choice.

Q. On your website there is an option to buy a perpetual license. What does this mean?

A. When you buy a perpetual license you own it forever. A perpetual license for one instance costs $75K.

The renewal costs $6000 / year after the first year. The perpetual license never expires but after the first year you will need to pay for renewal in order to install the latest updates. The renewal is optional – the software will remain operational even if you don’t install updates.

We provide a 20% discount for each additional instance, for example, QA, DEV, Production, etc.

Q. How is support provided?

A.  The basic support by email is included in all plans. You can purchase extended support, which includes 10 hours of the professional services per month for $1000 / month (or 6 hours for $600 / month). You can also buy ad-hoc extended support for $150 / hour.

Cloud Data Integration

cloud-data-integration

In this blog post, I will discuss the definition of cloud data integration and what makes it truly useful.  

Before we start, let’s get on the same page and define what cloud data integration is.

According to Wikipedia, cloud data integration software must have the following features:

  • Deployed on a multi-tenant, elastic cloud infrastructure.
  • Subscription model pricing: operating expense, not capital expenditure.
  • No software development: required connectors should already be available.
  • Users do not perform deployment or manage the platform itself.
  • Presence of integration management & monitoring features.

While I agree with the definition, there’s something is missing:

where is the data we are suppose to be integrating?

If you are ahead of the curve, all your data is already stored in the cloud. While I think we all will be here eventually, as of today, a typical enterprise – from two guys working out of a garage to multinational corporations – owns and operates multiple data silos. I would add diverse and isolated data silos:

  • Cloud databases.
  • On-premise databases, available from the Internet.
  • On-premise databases, not available from the Internet.
  • Public APIs.
  • Private APIs, not available from the Internet.
  • Cloud-based third-party applications.
  • Locally hosted third-party applications.
  • Legacy applications.
  • Files stored locally.
  • Files stored in cloud data storage.
  • Office 365 and Google Docs documents.

Can your favorite data integration platform handle the vast array of data sources? If the answer is “Yes it can! We just need to deploy it in our corporate network and it will be able to connect to all our data,” then it is not a cloud data integration anymore. Don’t get me wrong, there is nothing wrong with the ETL tool deployed locally. It gets the job done, but you are not getting the benefits of the true cloud-based platform, specifically this one:

users do not perform deployment or manage the platform itself.

If this is not a showstopper, my advice is to find and stick to the tool which has all the required connectors and is easy to program and operate. Sure, you will need a competent DevOps group on payroll (in addition to the ETL developers), who will be managing and monitoring the tool, installing upgrades, performing maintenance, etc., but hey…it works.

Keep reading if you want to focus on breaking the data silos in your organization instead of managing the data integration platform. The solution to the problem at hand is so-called hybrid data integration.

Hybrid data integration is a solution when some of the connectors can run on-premise, behind the corporate firewall, while others, and the platform itself runs on the cloud.

We, at Etlworks, believe that no data silo should be left behind, so in addition to our best in class cloud data integration service we offer fully autonomous, zero-maintenance data integration agents which can be installed on-premise, behind the corporate firewall.  Data integration agents are essentially connectors installed locally and seamlessly integrated with a cloud-based Etlworks service.

Let’s consider these typical data integration scenarios:

Source and destination are in the cloud

Example: the source is an SQL Server database in Amazon RDS and the destination is a Snowflake data warehouse.

In this case, no extra software is required. Etlworks can connect to the majority of the cloud-based databases and APIs directly. Check out available connectors.

The source is on-premise, behind the firewall and the destination is in the cloud

Example: the source is locally hosted PostgreSQL database, not available from the Internet, and the destination is Amazon Redshift.

In this scenario, you will need to install a data integration agent as a Windows or Linux service in any available computer in your corporate network (you can install multiple agents in multiple networks if needed). The agent includes a built-in scheduler so it can run periodical extracts from your on-premise database and push changes either directly to the cloud data warehouse or to the cloud-based data storage ( Amazon S3, for example).  You can then configure a flow in Etlworks, which will be pulling data from the cloud data storage and loading into the cloud-based data warehouse.  The flow can use the extremely fast direct data upload into the Redshift available as a task in Etlworks.

The source is in the cloud and the destination is on-premise, behind the firewall

Example: the source is a proprietary API, available from the Internet and the destination is a database in the Azure cloud.

Data Integration Agent can work both ways: extracting data from the sources behind the firewall and loading data into the local databases. In this scenario, the flow in Etlworks will be extracting data from the cloud-based API, then transforming and storing it in the cloud-based storage, available to the agent. The data integration agent will be loading data from the cloud-based storage directly into the local database.

The source is in the cloud and a destination is a third-party software hosted locally

Example: the source is a proprietary API, available from the Internet and the destination is locally hosted data analytics software.

If the third-party software can load data from services such as Google Sheets, you can configure a flow in Etlworks, which will be extracting and transforming data from the API and loading into the Google Sheets. The third-party software will then be loading data directly from the specific worksheet. You can always find a format which is understood by Etlworks and a third-party software.

Source and destination are on-premise, behind the firewall

Example: the source is a proprietary API, not available from the Internet and the destination is another proprietary API, again not available from the Internet.

In this case, you probably don’t need cloud-based software at all. Look at Etlworks on-premise subscription options as well as an option to buy a perpetual license.

%d bloggers like this: