EDI Data Integration & Why It’s Important


Today more than 85% of all electronic business transactions take place utilizing Electronic Data Interchange (EDI). This creates major competitive advantages for businesses and their partners. The EDI process has proven to be the most efficient and secure process to exchange business documents globally. EDI is a necessary component to ensure that your organization is achieving operational excellence.

What is EDI?

Electronic Data Interchange is the computer-to-computer exchange of business documents in a standard electronic format between business partners. By moving from a paper-based exchange of business document to one that is electronic, businesses enjoy major benefits such as reduced cost, increased processing speed, reduced errors and improved relationships with business partners.

EDI provides a safe, reliable, secure, and established method of exchanging documents for all industries. The retail industry uses EDI documents such as the Purchase Order and Invoice. The healthcare industry relies almost entirely on the HIPAA healthcare claim and corresponding payment EDI documents.

Why is EDI Necessary?

The adoption of electronic data interchange is critical for companies of all sizes to maintain their competitiveness in the marketplace. Over three decades of global usage has proven that EDI improves operational efficiency across your entire organization.

As an automation technology, EDI delivers core business benefits:

  • Saves time and money: automates a process previously manually executed with paper documents.
  • Improves efficiency and accuracy: data-entry errors are eliminated.
  • Improves traceability and reporting: electronic documents can be integrated with a range of IT systems to support data collection, visibility, and analysis.
  • Improves relationships with your customers: enables efficient transaction execution and prompt, reliable product and service delivery.

EDI is important to both large and small businesses. For large organizations, EDI enables standards to be instituted across trading partners to consistently achieve benefits. For smaller organizations, adherence to EDI offers greater integration with larger firms that have big budgets and strong influence.

Metalanguages like XML, JSON, and API integration complement, rather than replace EDI. Companies must be ready to handle an ever-increasing number of document formats and transmission options.

Why choose Etlworks for EDI Integration?

Etlworks allows your team to easily automate the handling, process, and integration of your electronic data exchange information. You’ll increase your organization’s communication, customer service, and cash flow by automating your processes.

With Etlworks, you can automate data flows that generate EDI messages from internal data or process incoming EDI messages and integrate them with your internal applications and databases, improving process control.

The EDI connector in Etlworks Integrator contains components that convert messages between EDI and XML and vice versa.

Etlworks supports numerous EDI formats — HL7, EDI X12, EDIFACT, FHIR, JSON, Flat File, CSV, Delimited, XML, AVRO and more.

EDI is only the tip of what we have the capability to handle. If you’d like to learn more about how we can streamline your EDI software integration, request a personalized demo from an Etlworks expert.

ETL/ELT all your data into Amazon Redshift DW


Amazon Redshift is fast, scalable, and easy-to-use, making it a popular data warehouse solution. Redshift is straightforward to query with SQL, efficient for analytical queries and can be a simple add-on for any organization operating its tech stack on AWS.

Amazon Web Services have many benefits. Whether you choose it for the pay as you go pricing, high performance, and speed or its versatile and flexible services provided, we are here to present you the best data loading approaches that work for us.

Etlworks allows users to load your data from cloud storages and APIs, SQL and NoSQL databases, web services to Amazon Redshift data warehouse in a few simple steps. You can configure and schedule the flow using intuitive drag and drop interface and let Etlworks do the rest.

Etlworks supports not just one-time data loading operation. It can help you to integrate your data sources with Amazon Redshift and automate updating your Amazon Redshift with fresh data with no additional effort or involvement!

Today we are going to examine how to load data into Amazon Redshift.

A typical Redshift flow performs the following operations:

  • Extract data from the source.
  • Create CSV files.
  • Compress files using the gzip algorithm.
  • Copy files into Amazon S3 bucket.
  • Check to see if the destination Amazon Redshift table exists, and if it does not – creates the table using metadata from the source.
  • Execute the Amazon Redshift COPY command.
  • Clean up the remaining files.

There are some prerequisites have to be met, before you can design a flow that loads data into Amazon Redshift:

Now, you are ready to create a Redshift flow. Start by opening the Flows window, clicking the + button, and typing redshift into the search field:


Continue by selecting the flow type, adding source-to-destination transformations and entering the transformation parameters:


You can select one of the following sources (FROM) for the Redshift flow:

  • API – use any appropriate string as the source (FROM) name
  • Web Service – use any appropriate string as the source (FROM) name
  • File – use the source file name or a wildcard filename as the source (FROM) name
  • Database – use the table name as the source (FROM) name
  • CDC – use the fully qualified table name as the source (FROM) name
  • Queue – use the queue topic name as the source (FROM) name

For most of the Redshift flows, the destination (TO) is going to be Amazon S3 connection. To configure the final destination, click the Connections tab and select the available Amazon Redshift connection.


Amazon Redshift can load data from CSVJSON, and Avro formats but Etlwoks supports loading only from CSV so you will need to create a new CSV format and set it as a destination format. If you are loading large datasets into Amazon Redshift, consider configuring a format to split the document into smaller files. Amazon Redshift can load files in parallel, also transferring smaller files over the network can be faster.

If necessary, you can create a mapping  between the source and destination (Redshift) fields.

Mapping is not required, but please remember that if a source field name is not supported by Redshift, it will return an error and the data will not be loaded into the database. For example, if you are loading data from Google Analytics, the output (source) is going to include fields with the prefix ga: ( ga:user, ga:browser, etc. ). Unfortunately, Amazon Redshift does not support fields with a : , so the data will be rejected. If that happens, you can use mapping to rename the destination fields.

ELT for Amazon Redshift

Amazon Redshift provides affordable and nearly unlimited computing power which allows loading data to Amazon Redshift as-is, without pre-aggregation, and processing and transforming all the data quickly when executing analytics queries. Thus, the ETL (Extract-Transform-Load) approach transforms to ELT (Extract-Load-Transform). This may simplify data loading to Amazon Redshift greatly, as you don’t need to think about the necessary transformations.

Etlworks supports executing complex ELT scripts directly into Amazon Redshift which greatly improves performance and reliability of the data injection.

I hope this has been helpful. Go forth and load large amounts of data.

Etlworks CEO: Efficient data structures and real-time data streaming


Search Business Analytics recently interviewed Etlworks CEO, Maksym Sherbinin about the importance of creating efficient data structures and real-time data streaming.

Here are his insights:

Q: To what extent does defaulting to JSON as a data storage mechanism contribute to decreased analytics performance and larger file sizes?

A: There are two factors that contribute to JSON (as well as CSV and XML) not being an optimal format for storing and processing large datasets:

  • The lack of schema enforcement, which results in the same data being injected and stored multiple times in the data warehouse, definitely has a direct impact on performance, quality of service, and cost.
  • JSON-encoded datasets cannot be stored in a parallel manner, decreasing its usefulness for systems built on top of the Hadoop and other frameworks used for distributed processing.

Q: What are the existing alternatives for developers to create more efficient data structures for different kinds of analytics workloads in terms of data/file formats?

A: There are two very distinct alternatives:

  • Column-based databases optimized for data warehousing and massively parallel processing, specifically Snowflake and Amazon Redshift.
  • Formats, optimized for use in Hadoop clusters and data streaming: Optimized Row Columnar (ORC), Avro, and Parquet.

Q: What is the current state of tools that can automate some of this process (i.e. are there tools that can look at how an app works with data and recommend a better file format or data structure, and perhaps compile or transform it into these more efficient formats like traditional programming language compilers)?

A: When it comes to encoding data in any of the Hadoop-optimized formats, such as Avro and Parquet, there are great open-source libraries available for practically any programming language known to humankind. Converting data stored somewhere else (for example in the database) to the Avro or Parquet datasets and storing them in the distributed file system can be performed using home-grown tools or (preferred) a good ETL tool, which can connect to any source, map, transform, and store data in the Hadoop cluster. Below are some of the any-to-any ETL tools which support various sources, Avro and Parquet as a destination format, and Hadoop Distributed File System (HDFS) as a destination:

  • Etlworks
  • Talend
  • Informatica

The other trend is real-time data streaming that uses distributed platforms such as Apache Kafka. The Alooma enterprise pipeline is a good example of the service, which can be used to stream large datasets to analytical platforms using datasets encoded as Avro or Parquet.

Additionally, there is a growing list of tools which can be used to inject data into the column-based data warehouses, built for the cloud, such as Snowflake and Amazon Redshift. Depending on your requirements you might consider the following tools:

  • Matillion ETL
  • Etlworks Integrator
  • AWS Glue

Finally, selecting the best format for the job is not something any tool should be automatically doing for you, but having multiple options is a must. The ability to experiment and prototype with different formats and storage options, as well as injection techniques (batch load, streaming, SQL, etc.) is why a good ETL tool goes a long way. It saves you time, money and provides a peace of mind.

Q: What are your thoughts on the best practices for finding the balance between identifying the most efficient data structures for a given analytics use case, and quickly experimenting with different analytics use cases?

A: This article describes how Uber went from the general use database, to JSON-encoded datasets, to the Hadoop-based big data platform. Granted, not all business require something as extreme, so common sense is your best friend. A typical relational database such as MySQL or PostgreSQL is perfectly capable of storing and processing relatively large analytical datasets (millions of records). Models with hundreds of millions and billions of records require specialized solutions, using either column-based data warehouses or Hadoop-like clusters with a parallel file system.

%d bloggers like this: