What is reverse ETL? And why is it valuable?

Reverse ETL is a new key component of the modern data stack that enables “operational analytics.”

ETL/ELT

Before defining “reverse ETL”, let’s briefly talk about plain old ETL. Extract, transform, and load (ETL) is a data integration methodology that extracts raw data from sources, transforms the data on a secondary processing server, and then loads the data into a target database.

ETL is nothing new. The concept was actually popularized in the 1970s.

More recently, with the rise of cloud data warehouses, extract, load, transform (ELT) is beginning to replace ETL. Unlike the ETL method, ELT does not require data transformation before the loading process. ELT loads raw data directly into a cloud data warehouse. Data transformations are executed inside the data warehouse via SQL pushdowns, Python scripts, and other code.

ETL and ELT both transfer data from third-party systems, such as business applications (Hubspot, Zendesk, Salesforce) and/or databases (Oracle, MySQL), into target data warehouses. But with reverse ETL, the data warehouse is the source, rather than the target. The target is a third-party system.

What is reverse ETL?

Reverse ETL is the exact inverse process of ETL covered above. Simply put, it’s the process of copying data from the data warehouse to Saas products used by organizations.

Why would I move data out of the warehouse?  Well, companies today increasingly engage in operational analytics – an approach consisting in making data accessible to “operational” teams, for operational use cases (sales, marketing, ..). We distinguish it from the more classical approach of using data stored in the warehouse only for reporting and business intelligence. Instead of using data to influence long-term strategy, operational analytics informs strategy for the day-to-day operations of the business. To put it simply, it’s putting the company’s data to work so everyone in your organization can make smarter decisions.

Reverse ETL is the tool responding to the new practice of operational analytics. It can be seen as a bridge between your data warehouse and cloud applications. Reverse ETL tools move the data out of your warehouse to the SaaS products your team loves and uses. For example, the Sales team wants the list of webinar attendees to import as leads into Salesforce, the Support team wants to see on Zendesk the data about accounts with premium support, Finance team wants a CSV of rolled up transaction data to use in Excel or Google Sheets.

The data needed is already available in the data warehouse and with reverse ETL, all you really need is to extract data from the warehouse and sync it to external tools, making it the simplest solution.

Etlworks and reverse ETL

Creating a Reverse ETL pipeline from scratch for such data is a complex process since businesses will have to utilize a high amount of resources to develop it and then ensure that it can keep up with the increased data volume and Schema variations.

Etlworks is a cloud-native data integration platform that allows you to perform ELT or reverse ETL with the help of your cloud data platform.

Etlworks helps you directly transfer data from a source of your choice, such as Snowflake, Amazon Redshift, etc., to any SaaS applications, CRMs such as Salesforce, HubSpot, etc., Support tools such as Zendesk, Jira, etc., in a fully automated and secure manner without having to write the code repeatedly. It will make your life easier and make data migration hassle-free.

Want to take Etlworks for a ride? Sign Up for a 14-day free trial and see the difference!  Check out the pricing model to get a better understanding of which plan suits you the most.

What is a Data Pipeline?

Etlworks is the leading provider of cloud-based managed data pipelines.

To gain business insights for competitive advantage every business these days is seeking ways to integrate data from multiple sources. Data and data analytics are critical to business operations, so it’s important to engineer and deploy strong and maintainable data pipelines.

data pipeline is a set of actions that ingest raw data from disparate sources and move the data to a destination for storage and analysis. Most of the time, though, a data pipeline is also to perform some sort of processing or transformation on the data to enhance it.

Data pipelines often deliver mission-critical data and for important business decisions, so ensuring their accuracy and performance is required whether you implement them through data integration and ETL platforms, data-prep technologies, or real time data streaming architectures.

How is a data pipeline different from ETL?

You may commonly hear the terms ETL and data pipeline used interchangeably. ETL stands for Extract, Transform, and Load. The major dissimilarity of ETL is that it focuses entirely on one system to extract, transform, and load data to a particular data warehouse. Alternatively, ETL is just one of the components that fall under the data pipeline.

ETL pipelines move the data in batches to a specified system with regulated intervals. Comparatively, data pipelines have broader applicability to transform and process data through streaming or real-time.

Data pipelines do not necessarily have to load data to a database or data warehouse. It might be loaded to any number of targets, such as an AWS bucket or a data lake, or it might even trigger a webhook on another system to kick off a specific business process.

Data pipeline solutions

The nature and functional response of data pipeline would be different from cloud tools to migrate data to outright use it for a real-time solution. The following list shows the most popular types of pipelines available. Note that these systems are not mutually exclusive. For example, you might have a data pipeline that is optimized for both cloud and real-time.

Cloud-Based

The cost-benefit ratio of using cloud-based tools to integrate data is quite high. These tools are hosted in the cloud, allowing you to save money on infrastructure and expert resources because you can rely on the infrastructure and expertise of the vendor hosting your pipeline.

Batch

Batch processing allows you to easily transport a large amount of data at interval without having to necessitate real-time visibility. The process makes it easier for analysts who combine a multitude of marketing data to form a decisive result or pattern.

Real-Time or Streaming

Real-time or streaming processing is useful when an organization processes data from a streaming source, such as the data from financial markets or internet of things (IoT) devices and sensors. Real-time processing captures data as it comes off the source systems in real time, performs rudimentary data transformations (filters, samples, aggregates, calculates averages, determines min/max values) before firing off data to the downstream process.

Open Source

Open Source tools are ideal for small business owners who want lower cost and over-reliance on commercial vendors. However, the usefulness of such tools requires expertise to use the functionality because the underlying technology is publicly available and meant to be modified or extended by users.

Data pipeline use cases

Data Migration

Data pipelines are used to perform data migration tasks. These might involve moving data from databases, e.g. MongoDB, Oracle, Microsoft SQL Server, PostgreSQL, and MySQL into the cloud. Cloud databases are scalable and flexible and enable for easier creation of other data pipelines that use real-time streaming.

Data Warehousing and Analysis

Probably the most common destination for a data pipeline is a dashboard or suite of analytical tools. Raw data that is structured via ETL can be loaded into databases for analysis and visualization. Data scientists can then create graphs, tables and other visualizations from the data. This data can then be used to inform strategies and guide the purpose of future data projects.

AI and Machine Learning Algorithms

ETL and ELT pipelines can move data into machine learning and AI models. Machine learning algorithms can learn from the data, performing advanced parsing and wrangling techniques. These ML models can then be deployed into various software. Machine learning algorithms fed by data pipelines can be used in marketing, finance, science, telecoms, etc.

IoT Integration

Data pipelines are frequently used in IoT systems that use networks of sensors for data collection. Data inducted from various sources across a network can be transformed into data available for ready analysis. For example, an ETL pipeline may perform numerous calculations on huge quantities of delivery tracking information, vehicle locations, delay expectations, etc, to form a rough ETA estimate.

Getting started with a data pipeline

Setting up a reliable data pipeline doesn’t have to be complex and time-consuming. Etlworks can help you solve your biggest data collection, extraction, transformation, and transportation challenges. Sign up for Etlwoks for free and get the most from your data pipeline, faster than ever before.

Etlworks Marketo Integration

etlworks-marketo-data-integration

What is Marketo?

Marketo is a cloud-lead management and marketing solution. The product range of Marketo is provided on a subscription basis and covers Lead Management, Sales Insights, Revenue Cycle Analytics and Social Marketing applications. It helps organizations automate and measure marketing engagement, tasks, and workflows, including those for email, mobile, social, and digital ads.

What is Etlworks?

Etlworks is a cloud-native integration platform helps businesses automate manual data management tasks, ensure data that are far more accurate, accelerate business workflows, and provide greater operational visibility to an organization.

After a few minutes setup, Etlworks replicates all your applications, databases, events and files into a high-performance data warehouse like Snowflake or Amazon Redshift, so that you can then use your favorite BI or analytics tools. Create reports, monitor custom dashboards, and more instantly from the cloud.

Connect Marketo to Anything

Etlworks offers connectivity to Marketo’s APIs enabling you to work with key Marketo entities including Lead, Activity, List, Opportunity, OpportunityRole as well as Custom Objects. Etlworks exposes both the SOAP and REST APIs for Marketo ensuring you can handle any integration task.

Use the Etlworks Marketo connector for data integration between Marketo and your CRM system, such as Salesforce, MS Dynamics, SugarCRM, HubSpot, and NetSuite; collaboration or survey tools; webinar platforms; data services; marketing databases; and more.

Etlworks Marketo connector free you to focus on insights, so your company will be faster and more efficient at optimizing your marketing performance and improving your campaigns’ ROI.

Etlworks partnered with CData to provide access to the Marketo API using industry standard JDBC protocol.

Let’s do it!

Connecting to Marketo

Step 1. Obtaining the OAuthClientId and OAuthClientSecret Values. To obtain the OAuthClientIdand OAuthClientSecret, navigate to the LaunchPoint option on the Admin area. Click the View Details link for the desired service. A window containing the authentication credentials is displayed.

Step 2. Obtaining the REST Endpoint URL. The RESTEndpoint can be found on your Marketo Admin area on the Integration -> Web Services option in the REST API section. Note the Identity Endpoint will not be needed.

Step 3. Enable Marketo connector for your Etlworks account. Contact support@etlworks.com to enable connector.

Step 4. Create a Marketo connection to work with data in Marketo.

Stored Procedures

Stored Procedures are available to complement the data available from the REST Data Model. Sometimes it is necessary to update data available from a view using a stored procedure because the data does not provide for direct, table-like, two-way updates. In these situations, the retrieval of the data is done using the appropriate view or table, while the update is done by calling a stored procedure. Stored procedures take a list of parameters and return back a dataset that contains the collection of tuples that constitute the response.

To call stored procure from the SQL flow or from Before/After SQL use EXEC sp_name params=value syntax. Example:

EXEC SelectEntries ObjectName = 'Account'

Extracting data from Marketo

Note: extracting data from Marketo is similar to extracting data from the relational database.

Step 1. Create a Marketo connection which will be used as a source (FROM).

Step 2. Create a destination connection, for example, a connection to the relational database, and if needed a format (format is not needed if the destination is a database or well-known API).

Step 3. Create a flow where the source is a database and the destination is a connection created in step 2, for example, relational database.

mceclip0

Step 4. Add new source-to-destination transformation.

Step 5. Select Marketo connection created in step 1 as a source connection and select the Marketo object you are extracting data from:mceclip0 (1)

Step 6. Select TO connection, format (if needed) and object (for example database table) to load data into.

mceclip3

Step 7. Click MAPPING and optionally enter Source Query (you don’t need a query if you are extracting data from the Marketo object unconditionally).

Step 8. Optionally define the per-field mapping.

salesforce-mapping (1)

Step 9. Add more transformations if needed.

Loading data in Marketo

Note: loading data in Marketo is similar to loading data into relational database.

Step 1. Create a source connection and a format (if needed).

Step 2. Create destination Marketo connection.

Step 3. Create a flow where the destination is a database.

Step 4. Add new source-to-destination transformation.

Step 5. Select FROM and TO connections and objects (also a FROM format if needed).

mceclip5

Step 6. Optionally define the per-field mapping.

Step 7. Add more transformations if needed.

Browsing data in Marketo

You must have a Marketo connection to browse objects and run SQL queries.

Use Explorer to browse data and metadata in Marketo as well as execute DML and SELECT queries against Marketo connection.

mceclip4

Ready to get started?

Contact Etlworks today to connect your Marketo instance with Etlworks and unlock the ability to read and replicate many of the objects to your data destination.

Streamline your Salesforce data integration with Etlworks

etlworks-salesforce-data-integration

What is Salesforce?

Salesforce is the world’s #1 cloud-based customer relationship management (CRM) platform.

Salesforce offers a wide range of applications for managing business processes including sales, customer service, marketing, and e-commerce. For many organizations, Salesforce is a rich source of customer data, such as Accounts, Opportunities, Services, Community, Activities, and Leads.

On its own, Salesforce can dramatically improve how companies run their sales operations, support their customers, and provide products and services to a market. With the integration, businesses make Salesforce more valuable through data.

Through integration, you bring data from disparate sources, databases or applications, such as marketing, support, e-commerce, and sales to the data warehouse.

Effective and efficient integration of Salesforce with adjacent enterprise systems — such as databases, ERP and CRM systems, and custom applications — is critical to enabling sales teams, increasing revenue, and better serving customers. By integrating Salesforce with other applications, APIs and resources, you make Salesforce even more valuable to your employees and your organization.

Ready to get started?

Etlworks is a cloud-native data integration platform that helps businesses automate manual data management tasks, ensure data that are far more accurate, accelerate business workflows, and provide greater operational visibility to an organization.

Etlworks Salesforce connector allows fast real-time access to Salesforce data. The connector supports all objects and metadata (fields) available through the Salesforce API and works just like any other database connector. This not only makes it easier to read, insert, update and delete data, it also accelerates the time it takes to turn it into valuable, 360-degree customer insights. 

You can load Salesforce Contacts, Leads, Opportunities, Attachments, Accounts, custom objects, etc. directly to/from major cloud and on-premise data sources or synchronize data in both directions. Powerful mapping settings allow you to load and synchronize Salesforce data with sources having different data structures. You can schedule your integration operation to execute it automatically.

Let’s do it!

Extracting data from Salesforce

Note: extracting data from Salesforce is similar to extracting data from the relational database.

Step 1. Create Salesforce connection which will be used as a source (FROM).

Step 2. Create a destination connection, for example, a connection to the relational database, and if needed a format (format is not needed if the destination is a database or well-known API).

Step 3. Create a flow where the source is a database and the destination is a connection created in step 2, for example, relational database.

mceclip0.png

Step 4. Add new source-to-destination transformation.

Step 5. Select Salesforce connection created in step 1 as a source connection and select the Salesforce object you are extracting data from:

salesforce-from.png

Step 6. Select TO connection, format (if needed) and object (for example database table) to load data into.

salesforce-to

Step 7. Click MAPPING and optionally enter Source Query (you don’t need a query if you are extracting data from the Salesforce object unconditionally):

salesforce-query

Step 8. Optionally define the per-field mapping.

salesforce-mapping

Step 9. Add more transformations if needed.

Loading data in Salesforce

Note: loading data in Salesforce is similar to loading data into a relational database.

Step 1. Create a source connection and a format (if needed).

Step 2. Create destination Salesforce connection.

Step 3. Create a flow where the destination is a database.

Step 4. Add new source-to-destination transformation.

Step 5. Select FROM and TO connections and objects (also a FROM format if needed).

to-snowflake

Step 6. Optionally define the per-field mapping.

Step 7. Add more transformations if needed.

Browsing data in Salesforce

You must have a Salesforce connection to browse objects and run SQL queries.

Use Explorer to browse data and metadata in Salesforce as well as execute DML and SELECT queries against Salesforce connection.

browse-salesforce

Change Replication and Data Synchronization

Loading data from Salesforce to your data warehouse is just a part of the problem. Real-time analytics require data in the data warehouse to be constantly up-to-date with Salesforce. In Etlworks, you can always have the most current data from Salesforce in your data warehouse by using High Watermark (HWM) change replication techniques.

After the first replication of all the Salesforce data, subsequent replications update the data warehouse data incrementally with refreshes from Salesforce, in near real-time. Data warehouse data will always be up-to-date in a matter of minutes automatically without any user intervention.

Sign up free or get a demo today to learn more: https://etlworks.com

ETL/ELT all your data into Amazon Redshift DW

amazon_integration

Amazon Redshift is fast, scalable, and easy-to-use, making it a popular data warehouse solution. Redshift is straightforward to query with SQL, efficient for analytical queries and can be a simple add-on for any organization operating its tech stack on AWS.

Amazon Web Services have many benefits. Whether you choose it for the pay as you go pricing, high performance, and speed or its versatile and flexible services provided, we are here to present you the best data loading approaches that work for us.

Etlworks allows users to load your data from cloud storages and APIs, SQL and NoSQL databases, web services to Amazon Redshift data warehouse in a few simple steps. You can configure and schedule the flow using intuitive drag and drop interface and let Etlworks do the rest.

Etlworks supports not just one-time data loading operation. It can help you to integrate your data sources with Amazon Redshift and automate updating your Amazon Redshift with fresh data with no additional effort or involvement!

Today we are going to examine how to load data into Amazon Redshift.

A typical Redshift flow performs the following operations:

  • Extract data from the source.
  • Create CSV files.
  • Compress files using the gzip algorithm.
  • Copy files into Amazon S3 bucket.
  • Check to see if the destination Amazon Redshift table exists, and if it does not – creates the table using metadata from the source.
  • Execute the Amazon Redshift COPY command.
  • Clean up the remaining files.

There are some prerequisites have to be met, before you can design a flow that loads data into Amazon Redshift:

Now, you are ready to create a Redshift flow. Start by opening the Flows window, clicking the + button, and typing redshift into the search field:

redshift-flows

Continue by selecting the flow type, adding source-to-destination transformations and entering the transformation parameters:

redshift-transformation

You can select one of the following sources (FROM) for the Redshift flow:

  • API – use any appropriate string as the source (FROM) name
  • Web Service – use any appropriate string as the source (FROM) name
  • File – use the source file name or a wildcard filename as the source (FROM) name
  • Database – use the table name as the source (FROM) name
  • CDC – use the fully qualified table name as the source (FROM) name
  • Queue – use the queue topic name as the source (FROM) name

For most of the Redshift flows, the destination (TO) is going to be Amazon S3 connection. To configure the final destination, click the Connections tab and select the available Amazon Redshift connection.

redshift-connection

Amazon Redshift can load data from CSVJSON, and Avro formats but Etlwoks supports loading only from CSV so you will need to create a new CSV format and set it as a destination format. If you are loading large datasets into Amazon Redshift, consider configuring a format to split the document into smaller files. Amazon Redshift can load files in parallel, also transferring smaller files over the network can be faster.

If necessary, you can create a mapping  between the source and destination (Redshift) fields.

Mapping is not required, but please remember that if a source field name is not supported by Redshift, it will return an error and the data will not be loaded into the database. For example, if you are loading data from Google Analytics, the output (source) is going to include fields with the prefix ga: ( ga:user, ga:browser, etc. ). Unfortunately, Amazon Redshift does not support fields with a : , so the data will be rejected. If that happens, you can use mapping to rename the destination fields.

ELT for Amazon Redshift

Amazon Redshift provides affordable and nearly unlimited computing power which allows loading data to Amazon Redshift as-is, without pre-aggregation, and processing and transforming all the data quickly when executing analytics queries. Thus, the ETL (Extract-Transform-Load) approach transforms to ELT (Extract-Load-Transform). This may simplify data loading to Amazon Redshift greatly, as you don’t need to think about the necessary transformations.

Etlworks supports executing complex ELT scripts directly into Amazon Redshift which greatly improves performance and reliability of the data injection.

I hope this has been helpful. Go forth and load large amounts of data.

%d bloggers like this: