The day 1 excitement of cloud adoption has segued into day 2 operations, and it’s causing a Kubernetes complexity headache for information technology teams. The reason is simple: Cloud means containers, and containers create a dynamic, obscure environment that is too complex for efficient human management.
The inability to effectively optimize Kubernetes deployment means developers choose to play it safe by overprovisioning rather than risk the system going down. But excessive cloud usage results in equally excessive bills, as well as negatively affecting the environment through unnecessary server use. It can also create inter-departmental tension. Chief financial officers coping with out of control operational expenditures instead of the promised cloud savings place the blame on IT.
All these factors, plus the demand for five nines (or higher) uptime, and an increase in ransomware attacks has created a demand for full-stack, automated observability solutions that aim to solve all these problems at once.
StormForge, a flagship platform provider at Gramlabs Inc., offers a slightly different take on how to solve the complexity problem. Not APM, nor exactly observability, nor AIOps (although the company won an award in this category), its mission is to be the “intel inside” other company’s products. The goal is interoperability with APM and observability market solutions, such as New Relic, Datadog, Dynatrace and AppDynamics (Cisco).
In this article, theCUBE examines StormForge’s Kubernetes optimization platform, which includes the StormForge Optimize Pro, Optimize Live and Performance Testing tools.
StormForge has K8s and ML at its core
StormForge founder and Chief Executive Officer Matt Provo has a background in machine learning and takes pains to differentiate StormForge from other companies that may not have the same high level of in-house machine learning expertise. Rather than building a technology to fix a problem, Provo first gathered a team of math Ph.D.s and machine learning experts. After several years in lab mode, he went looking for how best to apply the results of their combined brain power.
“We were trying to connect a fantastic team with differentiated technology to the right market timing. And when we saw all of these pain points around how fast the adoption of containers and Kubernetes have taken place, this was the perfect use case,” Provo told theCUBE industry analyst Dave Vellante during StormForge’s “Solving the Kubernetes Complexity Gap by Optimizing With Machine Learning” event, which aired recently as an exclusive broadcast on theCUBE.
The ability to automate information technology operations, known as AIOps, has led to a rise in full-stack, automated observability solutions that aim to solve overspend, security and performance issues all at once. However, the most efficient way to get a grip on the cloud complexity is to go to the root of the problem and gain visibility into the Kubernetes environment.
As the guardian of the Kubernetes project, the Cloud Native Computing Foundation is encouraging the development of intelligent K8s optimization solutions. CNCF Chief Technology Officer Chris Aniszczyk was quoted as saying that the most reliable way to mitigate the rising costs of cloud is to accurately and effectively monitor Kubernetes environments, something that can only be done through intelligent automation.
“Intelligent and automated solutions, like those we see coming from StormForge and others, can help optimize cloud native infrastructure and reduce unnecessary spending. We’re encouraged by these technological advances,” Aniszczyk said.
Intelligent K8s optimization eliminates cloud overspend
As well as giving DevOps teams visibility into what is happening inside their Kubernetes environments so they can optimize their cloud usage, this next generation of tools applies machine learning to performance analytics and enable continuous insights and proactive incident detection and response across an organization’s entire cloud infrastructure.
“Observability solutions will shorten the lengthy feedback cycle involved before committing apps to code, enhancing the quality of apps moving through the pipeline,” Charlotte Dunlap, principal analyst for application platforms, enterprise technology and services at GlobalData PLC, told theCUBE during an interview.
While Kubernetes has built-in resource management tools, very few developers use them because they’re challenging to configure and lack compatibility. In a conversation with theCUBE, Provo cited a statistic that only 1% of Kubernetes users use the Vertical Pod Autoscaler designed to scale container resources. This means developers are left to make guesstimates for CPU and memory requests and limits, usually based on local performance analysis.
Another K8s shortfall that impacts performance is while application-specific settings can be established by environment variables or through a ConfigMap command, they are not exposed through a consistent interface and therefore can’t be adjusted.
“We want to empower people to be able to sit in the middle of all of that chaos and for those trade-offs and those difficult interactions to no longer be a thing,” Provo told theCUBE during KubeCon + CloudNativeCon Europe 2022. “We’ve done hundreds of deployments, and never once have we met a developer who said, ‘I’m really excited to get out of my bed, come to work every day, and manually tune my application.’”
StormForge’s first solution, Optimize Pro, solves these issues for workflows in a non-production environment, while the newer Optimize Live can be used in both production and non-production environments.
Optimize Pro performs rapid experimentation and can evaluate any conceivable K8s scenario to provide application insights that the user can choose to apply based on recommended configurations or customize for unique use-case trade-offs. The solution uses machine learning algorithms for experimentation-based optimization in a non-production environment, and it is recommended for complex and mission-critical K8s applications.
“It is a challenge to efficiently provision the correct resources for containerized apps, and organizations are left either overprovisioning or doing a lot of manual work to calibrate their infrastructure. … StormForge is trying to make this process easy and intuitive for cloud-native development teams with machine learning and automation,” according to James Governor, principal analyst and co-founder of Redmonk.
StormForge Optimize Live was launched in February 2022. The solution is the first platform that provides proactive, continuous insights into Kubernetes environments across both pre-production and production environments. As with Optimize Pro, StormForge’s “secret sauce” is its patent-pending machine learning algorithms.
The architecture of both Optimize Pro and Optimize Live is split into three parts:
- The StormForge controller:
This is a K8s operator that runs in a dedicated namespace on a user’s cluster.
- The StormForge CLI:
The StormForge command line tool interacts with the Optimize Pro controller and can be used in combination with kubectl to install the StormForge controller
- A StormForge API:
Users can configure their cluster to connect to a StormForge API that automatically generates recommend configurations.
“We’ve always been squarely focused on Kubernetes using our core machine learning engine to optimize metrics at the application layer, which is what people care about and need to go after,” Provo told theCUBE.
Applications are just a set of K8s resources in need of optimization
As Optimize Live views it, applications are just a set of K8s resources that need to be optimized. The StormForge controller allows the developer to indicate optimization parameters, and the machine learning engine gets to work analyzing performance and utilization data. There are no inbuilt specifications, as StormForge believes in empowering the developer through machine learning rather than automating them out of the decision-making process.
This is the same on the other end of the process when Optimize Live returns its recommendations. Developers can configure Optimize Live to implement optimization parameters automatically, but the control remains in their hands. If they prefer to review and implement recommendations manually, they can do that. This has the side effect of giving developers confidence in the ML’s process, as they can monitor its decisions until they feel confident that they are 100% in alignment with their organization’s service-level objectives and agreements.
StormForge customer Aquia Inc. is an example of this process at work. The company wanted to maintain a human-in-the-loop strategy. However, rather than employing an in-house team of ML experts and investing in custom infrastructure, the company has a smaller team of specialists who review insights from StormForge and can approve and apply them to optimize multiple applications simultaneously.
“Optimize Live allows us to do that [observability] in real time, to make policy decisions across our fleet on what’s the right trade-off between performance cost [and] other parameters,” Aquia CEO Charley Dublin told theCUBE.
A data-agnostic approach
When it comes to data input, StormForge’s algorithm was designed to avoid any data structure issues. So, as long as the data can be captured and made available to Optimize Live, the machine learning engine can ingest and analyze it.
“To us it’s zeros and ones; we really are data agnostic,” Provo told theCUBE.
Optimize Live returns a set of precision configurations for optimal cloud consumption — not too much and not too little — saving costs and increasing performance. But it’s not a “set it and leave it” solution. StormForge recommends that optimization is built into the continuous integration/continuous deployment pipeline.
“Every time you deploy, we want to make sure that we’re recommending the perfect configuration for your application in the namespace that you’re deploying into,” StormForge CTO Patrick Bergstrom told theCUBE.
StormForge’s Performance Testing as-a-service tool supports continual verification by automating load testing in the CI/CD workflow. This means that CTOs can be confident that their cloud resource use is based on real-time traffic patterns – something CFOs will be happy to hear.