Data integration platform vs. in-house solution

In-house data solutions – otherwise known as the small thing that someone built ages ago and now it’s somehow grown into a critical part of your data infrastructure and you’d rather not touch it because who knows how it all works? Also, some companies are afraid of risks of migration, so they are pursuing the “if it ain’t broke, don’t fix it” strategy which is not always the most efficient or cost-effective. 

It’s a very common scenario. But those in-house solutions can quickly become harder and harder to manage and extend to support increasingly complicated requirements.

At some point you face the decision: do you keep improving an in-house solution or do you reassess your needs and look around for a possible off-the-shelf solution?

If you are reading this article, you are in a position when you are:

  • Tired of dealing with the complicated inputs and outputs and don’t want to write a custom code anymore.
  • Need a way to connect to all your data, regardless of its format and location.
  • Need a simple way to transform your data from one format to another.
  • Need to track changes in your transactional database and push them to your data warehouse.
  • Need to connect to the external and internal APIs with different authentication schemas, requests, and responses.
  • Want to create new APIs with just a few mouse clicks, without writing any code.
  • Prefer not to write any code at all.
  • Want not to worry about backups, performance, and job monitoring.
  • Want someone to manage the data integration tool for you.

By selecting Etlworks as your data integration platform, you will be able to implement complex data integration flows with fewer steps and faster.

The key advantages of the Etlworks:

  • It can read data from all your sources and load it into all your destinations, including most databases, file storage systems and more than 150 SaaS applications.
  • It automatically parses even the most complicated JSON and XML documents (as well as other formats) and can connect to all your APIs and databases.
  • It is built for Cloud but works equally well when installed on-premise.
  • You can visualize and explore all your data, regardless of the format and location, before creating any data integration flows.
  • You probably won’t need to write any code at all, but even if you will, it will be just a few lines in the familiar programming language: SQL, JavaScript, and Python.
  • You can use SQL to extract and flatten data from nested documents.
  • Etlworks is a full-fledged enterprise service bus (ESB) so you can create data integration APIs with just a few mouse clicks.
  • Etlworks can integrate data behind the corporate firewall when working together with data integration agent.
  • We provide world-class support for all customers.
  • Our service is so affordable that you won’t have to get a board approval in order to use it.
  • No sales call necessary! Sign up and start using it right away.

In the new post-COVID reality, the pressure to do more with less is higher than ever before. By leveraging modern managed integration solutions, your company get a chance not only to save money but also to gain a competitive advantage. 

Etlworks is  solving data integration challenges since 2016. We are working with companies large and small around the world, in industries such as finance, healthcare, entertainment and consulting, to help them build and manage better data pipelines and deliver better data outcomes.

What is Data Integration?

In 2020, it is definitive to argue that one of a company’s most significant assets is its Data.

Data integration involves collecting data from different sources, combining this data into a unified set that can easily be used for a predetermined business function, and enabling you to query and manipulate all of your data from a single interface and derive analytics and statistics. While the sources and types of data continue to grow, it becomes increasingly important to be able to perform quality analysis on that data.

Benefits of Data integration

Data integration is becoming more and more common, as numerous apps and companies race to meet consumer demand to have all of their data collected in one place and in a useful format.

From small businesses that want to spend less employee time creating reports, to large corporations taking a deep dive into an array of user statistics, data integration is indispensable for businesses that want to compete in today’s economy.

Here are the primary benefits of data integration offered to businesses.

Integrate data from multiple sources

Businesses are using tons of applications, systems, and data warehouses within your organization. Nevertheless, these data sources are disparate and siloed. A data silo, is a repository of data that is isolated. Generally, in businesses, this means that the information is under the control of a business unit or department and is not available across the organization. When systems containing valuable data are integrated across an organization, then the data gathered by one app or department can benefit the company as a whole, not just a team or individual.

Inter-system cooperation ensures that your company is free from information silos that benefit the few, and allow data gathered across the organization to be viewed and used by anyone who needs access to it.

For example, integrating data from multiple online stores can give you a more complete understanding of customer behavior and payment processing preferences.

Data integrity and data quality

Data integrity is an important element of data integrations. Data integrity is the assurance of the consistency and quality of the data through its entire lifecycle.
Nevertheless, most of the data is still of poor quality. Eliminating errors in data sets used for business intelligence and decision making is one of the primary advantages of data integration.

When data is gathered manually, then every tool, database and user account must be properly accounted for and set up prior to collecting this data, meaning that any data source overlooked or added last minute results in an incomplete data set as the end result. This also means that reporting protocols must be re-worked whenever a new data source is added.

When data systems are integrated properly, however, errors from overlooked sources do not occur and reports can be run and accessed in real time.

Save time and boost efficiency

When a company takes measures to integrate its data properly, it cuts down significantly on the time it takes to prepare and analyze that data. The automation of unified views cuts out the need for manually gathering data, and employees no longer need to build connections from scratch whenever they need to run a report or build an application.

Additionally, using the right tools, rather than hand-coding the integration, returns even more time (and resources overall) to the dev team.
When systems are properly integrated, collecting data and converting it into its final, usable format becomes a quick, easy task instead of a treasure hunt across your company’s data assets.

All the time saved on these tasks can be put to other, better uses, with more hours earmarked for analysis and execution to make an organization more productive and competitive.

Increase competitiveness

This is the most straightforward from all the other benefits of data integration.

Having a data integration strategy in place can help you to plan what actions you need to take to improve the accessibility of data both externally and internally, you will be able to influence many vital parts of your business. Yes, the overall goal is to generate more profit. An essential part of that is delighting your customers, so offering better services than your competition cannot be overlooked.

Tools for Data Integration

There are numerous tools available in the market that would help us query out the Data effectively since our Data is not going to integrate itself. To name a few, we have some Open Source Data Integration Tools, Cloud-based Data Integration Tools and the On-premises data integration tools.

Again, the question is that how to and which one to choose among those various tools available in the market.  The features you should look for in a data integration tool are:

A lot of connectors. There are many systems and applications in the world; the more pre-built connectors your Data Integration tool has, the more time your team will save.

Portability. It is important, as companies increasingly move to hybrid cloud models, to be able to build your data integrations once and run them anywhere

Ease of use. Data integration tools should be easy to learn and easy to use with a GUI interface to make visualizing your data pipelines simpler.

A transparent price model. Your data integration tool provider should not ding you for increasing the number of connectors or data volumes.

Cloud compatibility. Your data integration tool should work natively in a single cloud, multi-cloud, or hybrid cloud environment.

Business intelligence, analytics, and competitive edges are all at stake when it comes to data integration. That is why it is critical for your company to have full access to every data set from every source.

Simple data integration with Etlworks

Etlworks is an all-in-one, any-to-any data integration platform for all your ETL projects, regardless of the complexity, data location, format and volume. There is no code to write — with just a few clicks, Etlworks will extract your data from wherever it lives and get it ready to be analyzed, understood, and acted upon.

Etlworks offers a free 14-day trial. Give it a try today!

Streamline your Salesforce data integration with Etlworks

etlworks-salesforce-data-integration

What is Salesforce?

Salesforce is the world’s #1 cloud-based customer relationship management (CRM) platform.

Salesforce offers a wide range of applications for managing business processes including sales, customer service, marketing, and e-commerce. For many organizations, Salesforce is a rich source of customer data, such as Accounts, Opportunities, Services, Community, Activities, and Leads.

On its own, Salesforce can dramatically improve how companies run their sales operations, support their customers, and provide products and services to a market. With the integration, businesses make Salesforce more valuable through data.

Through integration, you bring data from disparate sources, databases or applications, such as marketing, support, e-commerce, and sales to the data warehouse.

Effective and efficient integration of Salesforce with adjacent enterprise systems — such as databases, ERP and CRM systems, and custom applications — is critical to enabling sales teams, increasing revenue, and better serving customers. By integrating Salesforce with other applications, APIs and resources, you make Salesforce even more valuable to your employees and your organization.

Ready to get started?

Etlworks is a cloud-native data integration platform that helps businesses automate manual data management tasks, ensure data that are far more accurate, accelerate business workflows, and provide greater operational visibility to an organization.

Etlworks Salesforce connector allows fast real-time access to Salesforce data. The connector supports all objects and metadata (fields) available through the Salesforce API and works just like any other database connector. This not only makes it easier to read, insert, update and delete data, it also accelerates the time it takes to turn it into valuable, 360-degree customer insights. 

You can load Salesforce Contacts, Leads, Opportunities, Attachments, Accounts, custom objects, etc. directly to/from major cloud and on-premise data sources or synchronize data in both directions. Powerful mapping settings allow you to load and synchronize Salesforce data with sources having different data structures. You can schedule your integration operation to execute it automatically.

Let’s do it!

Extracting data from Salesforce

Note: extracting data from Salesforce is similar to extracting data from the relational database.

Step 1. Create Salesforce connection which will be used as a source (FROM).

Step 2. Create a destination connection, for example, a connection to the relational database, and if needed a format (format is not needed if the destination is a database or well-known API).

Step 3. Create a flow where the source is a database and the destination is a connection created in step 2, for example, relational database.

mceclip0.png

Step 4. Add new source-to-destination transformation.

Step 5. Select Salesforce connection created in step 1 as a source connection and select the Salesforce object you are extracting data from:

salesforce-from.png

Step 6. Select TO connection, format (if needed) and object (for example database table) to load data into.

salesforce-to

Step 7. Click MAPPING and optionally enter Source Query (you don’t need a query if you are extracting data from the Salesforce object unconditionally):

salesforce-query

Step 8. Optionally define the per-field mapping.

salesforce-mapping

Step 9. Add more transformations if needed.

Loading data in Salesforce

Note: loading data in Salesforce is similar to loading data into a relational database.

Step 1. Create a source connection and a format (if needed).

Step 2. Create destination Salesforce connection.

Step 3. Create a flow where the destination is a database.

Step 4. Add new source-to-destination transformation.

Step 5. Select FROM and TO connections and objects (also a FROM format if needed).

to-snowflake

Step 6. Optionally define the per-field mapping.

Step 7. Add more transformations if needed.

Browsing data in Salesforce

You must have a Salesforce connection to browse objects and run SQL queries.

Use Explorer to browse data and metadata in Salesforce as well as execute DML and SELECT queries against Salesforce connection.

browse-salesforce

Change Replication and Data Synchronization

Loading data from Salesforce to your data warehouse is just a part of the problem. Real-time analytics require data in the data warehouse to be constantly up-to-date with Salesforce. In Etlworks, you can always have the most current data from Salesforce in your data warehouse by using High Watermark (HWM) change replication techniques.

After the first replication of all the Salesforce data, subsequent replications update the data warehouse data incrementally with refreshes from Salesforce, in near real-time. Data warehouse data will always be up-to-date in a matter of minutes automatically without any user intervention.

Sign up free or get a demo today to learn more: https://etlworks.com

ETL/ELT all your data into Amazon Redshift DW

amazon_integration

Amazon Redshift is fast, scalable, and easy-to-use, making it a popular data warehouse solution. Redshift is straightforward to query with SQL, efficient for analytical queries and can be a simple add-on for any organization operating its tech stack on AWS.

Amazon Web Services have many benefits. Whether you choose it for the pay as you go pricing, high performance, and speed or its versatile and flexible services provided, we are here to present you the best data loading approaches that work for us.

Etlworks allows users to load your data from cloud storages and APIs, SQL and NoSQL databases, web services to Amazon Redshift data warehouse in a few simple steps. You can configure and schedule the flow using intuitive drag and drop interface and let Etlworks do the rest.

Etlworks supports not just one-time data loading operation. It can help you to integrate your data sources with Amazon Redshift and automate updating your Amazon Redshift with fresh data with no additional effort or involvement!

Today we are going to examine how to load data into Amazon Redshift.

A typical Redshift flow performs the following operations:

  • Extract data from the source.
  • Create CSV files.
  • Compress files using the gzip algorithm.
  • Copy files into Amazon S3 bucket.
  • Check to see if the destination Amazon Redshift table exists, and if it does not – creates the table using metadata from the source.
  • Execute the Amazon Redshift COPY command.
  • Clean up the remaining files.

There are some prerequisites have to be met, before you can design a flow that loads data into Amazon Redshift:

Now, you are ready to create a Redshift flow. Start by opening the Flows window, clicking the + button, and typing redshift into the search field:

redshift-flows

Continue by selecting the flow type, adding source-to-destination transformations and entering the transformation parameters:

redshift-transformation

You can select one of the following sources (FROM) for the Redshift flow:

  • API – use any appropriate string as the source (FROM) name
  • Web Service – use any appropriate string as the source (FROM) name
  • File – use the source file name or a wildcard filename as the source (FROM) name
  • Database – use the table name as the source (FROM) name
  • CDC – use the fully qualified table name as the source (FROM) name
  • Queue – use the queue topic name as the source (FROM) name

For most of the Redshift flows, the destination (TO) is going to be Amazon S3 connection. To configure the final destination, click the Connections tab and select the available Amazon Redshift connection.

redshift-connection

Amazon Redshift can load data from CSVJSON, and Avro formats but Etlwoks supports loading only from CSV so you will need to create a new CSV format and set it as a destination format. If you are loading large datasets into Amazon Redshift, consider configuring a format to split the document into smaller files. Amazon Redshift can load files in parallel, also transferring smaller files over the network can be faster.

If necessary, you can create a mapping  between the source and destination (Redshift) fields.

Mapping is not required, but please remember that if a source field name is not supported by Redshift, it will return an error and the data will not be loaded into the database. For example, if you are loading data from Google Analytics, the output (source) is going to include fields with the prefix ga: ( ga:user, ga:browser, etc. ). Unfortunately, Amazon Redshift does not support fields with a : , so the data will be rejected. If that happens, you can use mapping to rename the destination fields.

ELT for Amazon Redshift

Amazon Redshift provides affordable and nearly unlimited computing power which allows loading data to Amazon Redshift as-is, without pre-aggregation, and processing and transforming all the data quickly when executing analytics queries. Thus, the ETL (Extract-Transform-Load) approach transforms to ELT (Extract-Load-Transform). This may simplify data loading to Amazon Redshift greatly, as you don’t need to think about the necessary transformations.

Etlworks supports executing complex ELT scripts directly into Amazon Redshift which greatly improves performance and reliability of the data injection.

I hope this has been helpful. Go forth and load large amounts of data.

Loading data in Snowflake

etlworks-snowflake

In this blog post, I will be talking about building a reliable data injection pipeline for Snowflake.

Snowflake is a data warehouse built for the cloud. It works across multiple clouds and combines the power of data warehousing, the flexibility of big data platforms, and the elasticity of the cloud.

Based on the Snowflake documentation, loading data is a two-step process:
  1. Upload (i.e. stage) one or more data files into either an internal stage (i.e. within Snowflake) or an external location.
  2. Use the COPY INTO command to load the contents of the staged file(s) into a Snowflake database table.

It is obvious that one step is missing: preparing data files to be loaded in Snowflake.

If steps 1-3 do not look complicated to you, let’s add more details.

Typically, developers that are tasked with loading data into any data warehouse dealing with the following issues:

  • How to build a reliable injection pipeline, which loads hundreds of millions of records every day.
  • How to load only recent changes (incremental replication).
  • How to transform data before loading into the data warehouse.
  • How to transform data after loading into the data warehouse.
  • How to deal with changed metadata (table structure) in both the source and in the destination.
  • How to load data from nested datasets, typically returned by the web services (in addition to loading data from the relational databases).

This is just a short list of hand-picked problems. The good news is that Snowflake is built from the ground up to help with bulk-loading data, thanks to the very robust COPY INTO command, and continues-loading using Snowpipe.

Any Snowflake injection pipeline should at least be utilizing the COPY INTO command and, possibly Snowpipe.

The simplest ETL process that loads data into the Snowflake will look like this:
  1. Extract data from the source and create CSV (also JSON, XML, and other formats) data files.
  2. Archive files using gz compression algorithm.
  3. Copy data files into the Snowflake stage in Amazon S3 bucket (also Azure blob and local file system).
  4. Execute COPY INTO command using a wildcard file mask to load data into the Snowflake table.
  5. Repeat 1-4 for multiple data sources. Injection ends here.
  6. If needed, execute SQL statements in Snowflake database to transform data. For example, populate dimensions from the staging tables.
The part where you need to build a “reliable data injection pipeline” typically includes:
  • Performance considerations and data streaming.
  • Error-handling and retries.
  • Notifications on success and failure.
  • Reliability when moving files to the staging area in S3 or Azure.

COPY INTO command can load data from the files archived using gz compression algorithm. So, it would make sense to archive all the data files before copying or moving them to the staging area.

  • Cleaning up: what to do with all these data files after they have been loaded (or not loaded) into the Snowflake.
  • Dealing with changing table structure in the source and in the destination.

Snowflake supports transforming data while loading it into a table using the COPY INTO <table> command but it will not allow you to load data with inconsistent structure.

Add the need to handle incremental updates in the source (change replication) and you got yourself a [relatively] complicated project at hands.

As always, there are two options:
  1. Develop home-grown ETL using a combination of scripts and in-house tools.
  2. Develop solution using third-party ETL tool or service.

Assuming that you are ready to choose option 2 (if not, go to paragraph one), let’s discuss

The requirements for the right ETL tool for the job

When selecting the ETL tool or service the questions you should be asking yourself are:

  • How much are you willing to invest in learning?
  • Do you prefer the code-first or the drag&drop approach?
  • Do you need to extract data from the semi-structured and unstructured data sources (typically web services) or all your data is in the relational database?
  • Are you looking for point-to-point integration between well-known data sources (for example, Salesforce->Snowflake ) with the minimum customization, or you need to build a custom integration?
  • Do you need your tool to support change replication?
  • How about real-time or almost real-time ETL?
  • Are you looking for a hosted and managed service, running in the cloud or on-premise solution?
Why Etlworks is the best tool for loading data in Snowflake?

First, just like Snowflake, Etlworks is a cloud-first data integration service. It works perfectly well when installed on-premise, but it really shines in the cloud. When subscribing to the service, you can choose the region that is closest to your Snowflake instance which will make all the difference as far as the fast data load is concerned. Also, you won’t have to worry about managing the service.

Second, in Etlwoks you can build even the most complicated data integration flows and transformations using simple drag&drop interface. No need to learn a new language and no complicated build-test-deploy process.

Third, if you are dealing with heterogeneous data sources, web services, semi-structured or unstructured data, or transformations which go beyond the simple point-to-point, pre-baked integrations  – you are probably limited to just a few tools. Etlworks is one of them.

Last but not least, if you need your tool to support a native change (incremental) replication from relational databases or web services, Etlworks can handle this as well. No programming required.  And it is fast.

How it works

In Etlworks, you can choose from several highly configurable data integration flows, optimized for Snowflake:

  • Extract data from databases and load in Snowflake.
  • Extract data from data objects (including web services) and load in Snowflake.
  • Extract data from well-known APIs (such as Google Analytics) and load in Snowflake.
  • Load existing files in Snowflake.
  • Execute any SQL statement or multiple SQL statements.

Behind the scene, the flows perform complicated transformations and create data files for Snowflake, archive files using gz algorithm before copying to the Snowflake stage in the cloud or in the server storage, automatically create and execute COPY INTO <table> command, and much more. For example, the flow can automatically create a table in Snowflake if it does not exist, or it can purge the data files in case of error (Snowflake can automatically purge the file in case of success).

You can find the actual, step-by-step instructions on how to build Snowflake data integration flows in Etlworks in our documentation.

The extra bonus is that in Etlworks you can connect to the Snowflake database, discover the schemas, tables, and columns, run SQL queries, and share queries with the team. All without ever using Snowflake SQL workbench.  Even better – you can connect to all your data sources and destinations, regardless of the format and location to discover the data and the metadata. Learn more about Etlworks Explorer.