Print

What is RioDB?

RioDB is an ultra-fast data stream processing engine. You submit query statements to analyze unbounded (endless) data streams. Your queries run continuously, never-ceasing, and indefinitely against the stream. When it finds what you’re looking for, it triggers a user-defined action, like notifying an application or observability dashboard. (Think cyber surveillance, fraud detection, infrastructure monitoring, social media mentions, market thresholds, etc.)

There are other applications that do similar things. So let’s highlight the big differences:

Most applications perform batch-processing, which means querying the data often (maybe once per minute, or maybe once per second). RioDB queries continously, meaning, each time the stream provides a message, RioDB executes the query again. For instance, if the stream provides 50k messages per second, then RioDB is executing all queries 50k times per second. This is what data engineers refer to as sliding window queries, and is a vital feature for some use cases. When using batch-processing, you don’t have exact control of where the batch is going to get chopped. Suppose you are looking for a particular anomaly or pattern to detect a cyber attack, or opportunity in the stock market. Depending where the batch gets cut-off, the pattern you are searching for could slip by completely undetected, due to the batch getting chopped at a time that separate important information into separate batches. Querying data periodically (even in short periods) have this problem. The only sure way to reliably detect the pattern is by employing a sliding window technique, which queries the moving data constantly!

RioDB can perform sliding window queries exceptionally fast, many orders of magnitude faster than other alternatives, which means lower cloud spend, or perhaps it means enabling something never before possible! However, this comes with a trade-off. RioDB does not persist data. When data is received from the stream, queries are executed, and data is thrown away after no-longer needed. There’s no historical analysis for queries that didn’t already exist. You can’t query data after the fact. Data is non-durable. If you reboot the system, data processing starts again from zero as an empty shell. RioDB does not replace data warehouse, or lakehouse. It’s a decoupled engine on the side for real-time processing. That it can do extraordinarily fast, with very little CPU consumption.

So what is RioDB good for?

Does your data stream need a “referee”? RioDB is an excellent choice for decoupled data sensors. Think automated data survailance like real-time anomaly detection, or pattern detection, with automatic triggering capabilities.
It is not meant for historical data analysis.
It is not meant for transactional workflows (should this bank transaction proceed to the next step? Should this shopping cart purchase proceed to the next step?) It’s best as an outside “referee” looking at data stream and calling out detection in real-time.

As a data sensor to detect patterns or anomalies in real-time, RioDB can outperform other solutions by many orders of magnitude, saving a lot of money on (cloud) infrastructure cost, or even enabling certain use-cases that are not possible with other tools today.

Typical architecture, or use-case

A common case: You have a data warehouse where you store historical data (maybe using Elasticsearch, or Oracle), and a moderate amount of CPU for data processing. Then a requirement comes along for real-time pattern detection. Running queries in a loop every second. Now you have to add a lot more CPU, and in the cloud, storage is cheap, CPU is not. Not to mention that the requirement is often just not feasible for data warehouse platforms.
If you want to save money, you keep the data warehouse as it is, and fork a copy of your data stream towards a stream processing engine, which is designed to handle this task a lot more efficiently.

You don’t put RioDB as a transaction firewall between client and transaction service. Apache Flink has the bells and whistles for that. You put RioDB as a sensor behind transactions.
For example:

  1. Gamer interacts with game server uninterrupted. Behind the gaming server, a sensor (RioDB) is scanning gamer data to detect cheating. If something is detected, RioDB informs the game server that cheating was detected. If the RioDB service is unavailable, the interaction between gamer and server can still continue. RioDB in this case is not part of the game engine, but an added sensor behind the scenes.
  2. You subscribe to second-by-second stock market updates. RioDB scans the data, quering for opportunities. When detected, RioDB sends you an alert or calls an external API to execute a transaction workflow automatically. If the RioDB service is unavailable (if you turn off the server), the stock market will go on with or without you.
  3. Users interact with an online store uninterrupted. Behind the web server, a sensor (RioDB) is scanning incoming HTTP request data to detect cyber attacks, or perhaps to help rank products that should be displayed on the front page in real-time. If RioDB service is unavailable (if you turn off the server), user access to the website still continues. RioDB is not part of web application, but an added sensor behind the scenes.
  4. A telecommunications provider routes thousands of calls per second. They want to detect and stop robot-callers. They put RioDB behind their process to analyze recent calls placed. If a robot-caller is detected, it informs their call-routing system. If the RioDB service is turned off, calls will continue to be routed. RioDB in this case is not part of the call-routing workflow, but an added sensor behind the scenes.

In a nutshell, there are many use-cases for real-time analytics. These examples are just meant to illustrate a couple things:
You can achieve modern real-time analytics with very little cost if using the right tool for the task. And RioDB is an extremely efficient option for reactive sensors, or a “referee” for anomaly detection, or pattern detection in real-time.

Table of Contents
Scroll to Top