Welcome to the 2023 Estuary Real-time Data landscape. Want to get started with real-time insights and data products? Here are the tools and how they fit together.
Want to get started in minutes? Try Estuary for end-to-end real-time data operations.
There has been major innovation throughout the entire real-time data landscape over the last few years. Some of the most interesting, mature companies have emerged on the analytics side, but simpler, more powerful pipelines to get data from sources to destinations and enabling more companies to work with low-latency data.
The above diagram has four sections, where hybrid denotes an open-source product that’s being provided as a managed service.
Capture
Extracting data from source systems. For the real-time landscape, most systems are technologies like databases (using the write-ahead-log) and streams since most SaaS APIs are batch in nature.
Some SaaS APIs do support streaming. For example, Salesforce has a streaming endpoint.
Transport
Moving data from point A to B. The de facto standard here is Kafka, but there are some emerging options – almost all require engineers, maintenance, and infrastructure.
Streaming transport is complex and doesn’t usually retain historical data. For this reason, most streaming systems can be viewed as a “buffer” of current events. Notable exceptions here are Pulsar, Gazette, and Estuary.
Operational Transforms
An in-pipeline transformation that one uses to massage data before getting it to either your production systems (as a data product) or analytics environment.
Operational transforms in real-time systems usually come with some gotchas – calculating things like “lifetime customer value” can be very difficult because doing so requires state which grows without bounds in streaming systems. They are extremely important though since they get data into the right “shape” for analytics queries.
Analytic Transforms
The real-time equivalent of a data warehouse. These are systems that can be loaded in real-time and provide up to the second answers for queries as you ask them.
Note:
The diagram is oversimplified, and many companies straddle two or more areas. For example, we at Estuary do offer Operational Transforms because we believe a pipeline needs to be end-to-end, but our logo is in the area that most people associate us.
Products Offered as SaaS Solutions
Company & Product | Solution | Background |
---|---|---|
Estuary | Capture, Transport & Operational Transforms | Easily capture data from systems using CDC (change data capture), transport, transform it in motion, and sync it where you want it, such as analytics or operational systems. |
Ably | Transport | Simple transportation layer for events. |
Amazon Kinesis | Transport | Amazon’s Pub/Sub system. Manages events produced by one system and subscribed to by another (or pub/sub). |
Azure Web PubSub | Transport | Microsoft Azure’s Pub/Sub system. |
Arcion | Capture | Low latency captures from databases using CDC. |
Bytewax | Operational Transforms | Bytewax makes it turnkey to transform streaming data using Python. |
Clickhouse | Analytic Transforms | Real-time SQL transforms on Clickhouse by the team that created it. |
Confluent | Transport & Operational Transforms | The original company behind Kafka with a core business model of managing Kafka. |
Datacater | Transport & Operational Transforms | Managed Kafka stream to python transformations. |
Decodable | Capture & Operational Transforms | Capture using managed Debezium and transform using managed Apache Flink. |
Deltastream | Analytic & Operational Transforms | Managed service for analytic and operational transforms. |
Firebolt | Analytic Transforms | Real-time analytic transforms using an improved version of managed Clickhouse. |
Imply | Analytic Transforms | Real-time analytic transforms using managed Druid by the team that created it. |
IOblend | Operational Transforms | Managed Spark |
Materialize | Analytic Transforms | Real-time analytic transforms using open source SQL built on top of Timely Dataflow. |
Memphis.dev | Transport | Simple but powerful transport layer. |
Meroxa | Capture & Operational Transforms | Capture and transform real-time data. |
Google Cloud Pub/Sub | Capture | Google’s Pub/Sub system. |
Google Cloud Dataflow | Transform | Managed Apache Beam, allowing you to coordinate batch and streaming transforms using your favorite transformation system. |
Oracle Golden Gate | Capture | Capture data from Oracle systems using their managed, proprietary product. |
Rockset | Analytic Transforms | SQL transformations in real-time by the creators of RocksDB. |
Redpanda | Transport | Transport data using the Kafka protocol and a full rewrite of Kafka for greater efficiency. |
Singlestore | Analytic Transforms | SQL transformations in real-time. |
Startree | Analytic Transforms | SQL transformations in real-time built on top of managed Apache Pinot. |
Streamnative | Transport | Managed Apache Pulsar. |
Streamsets | Capture & Operational Transforms | Capture and transform data through a GUI. |
Striim | Capture & Operational Transforms | Capture data from databases using managed CDC and transform it in motion. |
Upsolver | Operational Transforms | Transform micro-batches using SQL. |
Quix | Operational Transforms | Real-time Python transformations. |
Timeplus | Analytic Transforms | SQL-based analytic transforms on time series data. |
Tinybird | Capture, operational & Analytic Transforms | Managed Clickhouse for the easy creation of real-time data APIs and analytics. Some sources are available to capture from out of the box. |
Open-Source Frameworks
Project | Solution | Background |
---|---|---|
Apache Beam | Operational Transforms | A framework that allows you to transform data from both batch and streaming systems. |
Apache Druid | Analytic Transforms | A real-time analytics engine that quickly indexes streaming data allowing for efficient, high-scale queries. |
Apache Flink | Operational Transforms | A stream processing framework that is natively event-based. |
Apache Kafka | Transport | A highly popular streaming system built by Linkedin. |
Apache Pinot | Analytic Transforms | A real-time analytics engine that offers real-time SQL queries on high-scale streaming data. |
Apache Pulsar | Transport | A streaming system that has native cloud storage options. |
Apache Spark | Operational Transforms | A stream processing framework that is natively batch-based and expanded to near real-time micro-batches. |
Clickhouse | Analytic Transforms | A real-time analytics engine that offers real-time SQL queries on high-scale streaming data. |
Debezium | Capture | A framework for capturing data from databases in real-time using their write-ahead-log. |
Flow | Capture, Transport & Operational Transforms | An end-to-end system that supports capturing data from databases in real-time using their write-ahead-log, transporting it, transforming it, and materializing into destination systems. |
Gazette | Transport | A streaming system that natively stores data in cloud storage enabling unlimited lookback and direct reads by batch systems. |
About the author
David Yaffe is a co-founder and the CEO of Estuary. He previously served as the COO of LiveRamp and the co-founder / CEO of Arbor which was sold to LiveRamp in 2016. He has an extensive background in product management, serving as head of product for Doubleclick Bid Manager and Invite Media.