Gremlin brings Chaos Engineering as a Service to Kubernetes

INSUBCONTINENT EXCLUSIVE:
The practice of Chaos Engineering developed at Amazon and Netflix a decade ago to help those web scale companies test their complex systems
for worst-case scenarios before they happened
Gremlin was started by a former employee of both these companies to make it easier to perform this type of testing without a team of Site
Reliability Engineers (SREs)
Today, the company announced that it now supports Chaos Engineering-style testing on Kubernetes clusters. The company made the announcement
at the beginning of KubeCon, the Kubernetes conference taking place in San Diego this week. Gremlin co-founder and CEO Kolton Andrus says
that the idea is to be able to test and configure Kubernetes clusters so they will not fail, or at least reduce the likelihood
He says to do this it critical to run chaos testing (tests of mission-critical systems under extreme duress) in live environments, whether
you&re testing Kubernetes clusters or anything else, but it also a bit dangerous to do be doing this
He says to mitigate the risk, best practices suggest that you limit the experiment to the smallest test possible that gives you the most
information. &We can come in and say I&m going to deal with just these clusters
I want to cause failure here to understand what happens in Kubernetes when these pieces fail
For instance, being able to see what happens when you pause the scheduler
The goal is being able to help people understand this concept of the blast radius, and safely guide them to running an experiment,& Andrus
explained. In addition, Gremlin is helping customers harden their Kubernetes clusters to help prevent failures with a set of best practices
&We clearly have the tooling that people need [to conduct this type of testing], but we&ve also learned through many, many customer
interactions and experiments to help them really tune and configure their clusters to be fault tolerant and resilient,& he said. The Gremlin
interface is designed to facilitate this kind of targeted experimentation
You can check the areas you want to apply a test, and you can see graphically which parts of the system are being tested
If things get out of control, there is a kill switch to stop the tests. Gremlin Kubernetes testing screen (Screenshot: Gremlin) Gremlin
launched in 2016
Its headquarters are in San Jose
It offers both a freemium and pay product
The company has raised almost $27 million, according to Crunchbase data. How you react when your systems fail may define your business