The red-hot market for real-time analytics in the cloud just got another entrant with StarRocks Inc.’s announcement today of a cloud-native version of its SQL online analytics processing database engine.
StarRocks Cloud is a fully managed software-as-a-service version of the platform the company developed two years ago and released under an open-source license. It’s based on the Apache Doris massively parallel processing-based interactive SQL data warehouse.
The architecture is purpose-built for real-time data analysis by a large number of concurrent users with support for fast multitable joins. The engine works with a variety of schema models, including flat tables, star and snowflake schemas. It provides a basis for combining real-time transactional data with historical records.
The company has mostly flown under the radar since its founding in early 2020 but has raised more than $60 million in venture capital and signed on 110 paying customers, including large accounts such as Airbnb Inc. and Lenovo Group Ltd.
The global streaming analytics market is expected to grow nearly 29% annually through 2025, driven by the rapid deployment of internet of things devices and the growing appetite among business leaders for up-to-the-minute data, according to Grand View Research Inc.
StarRocks supports high concurrency and availability with an engine that can handle more than 10,000 queries per second and ingest data at speeds of up to 100 megabytes per second per node, the company said.
Real-time processing has caught on quickly, but real-time analytics has been slower to gain traction, said Li Kang, the company’s vice president of strategy. One of the problems is the need for denormalized tables in analytical queries, which are redundant tables that are created to reduce the need for complex and time-consuming joins.
That approach is “OK for reports, but if users want to leverage it for real-time decisions it’s too slow,” said Kang said. Denormalization yields good query performance but increases complexity, he said. For example, denormalizing a table that has multiple foreign keys pointing to it creates multiple copies of the data. That breaks an essential tenet of normalization, which is that each data element should be unique.
“You pay the price of delay in ingestion, extra hardware and development costs,” Kang said. “You also have limited concurrency. There are lots of issues from both performance and business requirement standpoints.”
StarRocks uses vectorized execution, which takes advantage of multicore CPUs to change the data orientation from rows to columns, across CPU, memory and storage. Columnar storage is more efficient for analytics queries while row storage is better for transaction processing.
Kang said StarRocks’ principal competitors are products built on real-time data stores such as Apache Druid, Apache Pinot and Apache ClickHouse. All require data to be in denormalized form, he said. “This is why it’s been notoriously difficult to build a real-time infrastructure with those technologies,” he said.
The company also competes with distributed query engines based on the Apache Presto and Apache Trino projects. The company said it can process queries three to five times faster than products from its competitors.
“We take the concept into the query engine so we can work on the columnar data without converting it for each CPU, memory and storage layer,” Kang said. “The result is we get much better query performance for a single-table query or a multitable query will in star schema format and we use better parallel processing to support thousands of users at one time.”
StarRocks can ingest data from cloud data lakes such as Amazon Web Services Inc.’s S3 and Azure Blob storage. It also supports streaming data managed by Apache Kafka and change-data capture streams from relational databases, which identify and track changes to data in a database.
StarRocks Cloud will be available initially on the AWS and Azure clouds with support for Google LLC’s cloud planned in the near future. It supports standard SQL and MySQL protocols and any business intelligence tools that use SQL.
Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.