Skip to main content

Chaos Experimentation Framework

A framework to perform Chaos experiments built on top of EnvoyProxy

Clutch Component Architecture

Chaos Experimentation Framework consists of a few parts - frontend, a backend server and a xDS management server.

The Frontend uses Clutch's core frontend. It can be customized by using the frontend config.

The Backend server is responsible for performing CRUD operations of the experimentation package - CreateExperiment, GetExperiments, CancelExperimentRun, etc. It stores Chaos experiments in its tables in the Postgres database.

The xDS management server uses go-control-plane library and serves two Envoy APIs - Runtime Discovery Service (RTDS) and Extension Configuration Discovery Service (ECDS). Either of these two APIs can be used to perform fault injection tests. With RTDS, you can make changes to runtime specific to faults whereas with ECDS you can make changes to the entire fault filter to perform any custom Chaos experiments.

Components

Below components are responsible to perform Chaos experiments starting from storing the data in the Postgres database for each incoming request all the way to passing the experiment values to the Envoys to inject faults.

Component NameDescription
clutch.module.chaos.experimentation.apiModule that supports CRUD API for managing Chaos experiments like Create, Get, List, Cancel, etc
clutch.module.chaos.serverexperimentationModule responsible for orchestrating server fault Chaos experiments.
clutch.module.chaos.experimentation.xdsModule which runs Envoy xDS management server which is responsible for propagating chaos experiment configurations to Envoys
clutch.service.chaos.experimentation.storeService that defines the data layer to perform all database operations for chaos experiments
clutch.service.db.postgresService used to connect to Postgres database

In order to use Chaos Experimentation Framework, registration of all the above components is required.

It is recommended to run Envoy xDS management server (clutch.module.chaos.experimentation.xds) on a separate host.

Configuration

Frontend

The frontend of the framework is completely configurable. Below is an example frontend config which will show the list of Chaos experiments and as well as workflow to start/stop an experiment.

module.exports = {
...
"@clutch-sh/experimentation": {
listExperiments: {
description: "Manage fault injection Chaos experiments.",
trending: true,
componentProps: {
columns: [
{ id: "target", header: "Target" },
{ id: "fault_types", header: "Faults" },
{ id: "start_time", header: "Start Time", sortable: true },
{ id: "end_time", header: "End Time", sortable: true },
{ id: "run_creation_time", header: "Creation Time", sortable: true },
{ id: "status", header: "Status" },
],
links: [
{
displayName: "Start Server Experiment",
path: "/server-experimentation/start",
},
],
},
},
viewExperimentRun: {},
},
"@clutch-sh/server-experimentation": {
startExperiment: {
componentProps: {
upstreamClusterTypeSelectionEnabled: true,
},
hideNav: true,
},
},

Backend Server

Below configuration will spin up all the required modules and services to store the data coming from frontend into Postgres database.

backend/clutch-config.yaml
modules:
...
- name: clutch.module.chaos.experimentation.api
- name: clutch.module.chaos.serverexperimentation
services:
...
- name: clutch.service.db.postgres
typed_config:
"@type": types.google.com/clutch.config.service.db.postgres.v1.Config
connection:
host: <RDS_HOST>
port: <RDS_PORT>
user: <RDS_USER>
ssl_mode: REQUIRE
dbname: <RDS_NAME>
password: <RDS_PASSWORD>
- name: clutch.service.chaos.experimentation.store

xDS Management Server

Below is the configuration for spinning up xDS server. For details about the fields, take a look at the xds config proto.

backend/clutch-xds-config.yaml
...
modules:
...
- name: clutch.module.chaos.experimentation.xds
typed_config:
"@type": types.google.com/clutch.config.module.chaos.experimentation.xds.v1.Config
rtds_layer_name: <RTDS_LAYER_NAME> // "rtds_layer"
cache_refresh_interval: <CACHE_REFRESH_INTERNAL> // "5s"
ingress_fault_runtime_prefix: <INGRESS_FAULT_PREFIX> // "fault.http"
egress_fault_runtime_prefix: <EGRESS_FAULT_PREFIX> // "fault.http.egress"
resource_ttl: <RESOURCE_TTL> // "20s"
heartbeat_interval: <HEARTBEAT_INTERVAL> // "5s"
ecds_allow_list: <LIST_OF_ECDS_ENALBED_CLUSTERS> // ["foo", "bar"]
services:
- name: clutch.service.db.postgres
typed_config:
"@type": types.google.com/clutch.config.service.db.postgres.v1.Config
connection:
host: <RDS_HOST>
port: <RDS_PORT>
user: <RDS_USER>
ssl_mode: REQUIRE
dbname: <RDS_NAME>
password: <RDS_PASSWORD>
- name: clutch.service.chaos.experimentation.store

Keep in mind that both backend config and xDS config need to connect to the same Postgres database.

Example Envoy config

When Envoy in the mesh boots up, it creates a bi-directional gRPC stream with the management server. Below is the sample Envoy configs for RTDS and ECDS which will initiate the connection to the xDS server. Checkout Envoy Proxy docs for details on Envoy's support for Fault Injection.

RTDS

envoy.yaml
...
layered_runtime:
layers:
- name: rtds
rtds_layer:
name: <RTDS_LAYER_NAME>
rtds_config:
api_config_source:
api_type: GRPC
grpc_services:
envoy_grpc:
cluster_name: <xDS_CLUSTER>
...
http_filters:
- name: envoy.fault
typed_config:
"@type": "type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault"
abort:
percentage:
numerator: 0
denominator: HUNDRED
http_status: 503
delay:
percentage:
numerator: 0
denominator: HUNDRED
fixed_delay: 0.001s
...

ECDS

envoy.yaml
filters:
...
http_filters:
...
name: envoy.extension_config
config_discovery:
config_source:
api_config_source:
api_type: GRPC
grpc_services:
- envoy_grpc:
cluster_name: <xDS_CLUSTER>
transport_api_version: V3
initial_fetch_timeout: 10s
resource_api_version: V3
default_config:
"@type": "type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault"
abort:
percentage:
numerator: 0
denominator: HUNDRED
http_status: 503
delay:
percentage:
numerator: 0
denominator: HUNDRED
fixed_delay: 0.001s
apply_default_config_without_warming: false
type_urls:
- type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault

Redis Chaos experiments

To perform Redis Chaos experiments, there is a specific module that is used to process the Redis Chaos experiments data. You will need below component in addition to the above Experimentation components. Also, keep in mind that Redis experiments can be only be run with the use of RTDS (and not ECDS).

Component NameDescription
clutch.module.chaos.redisexperimentationModule which is responsible for orchestrating the Redis Chaos experiments

Frontend Config

module.exports = {
"@clutch-sh/experimentation": {
listExperiments: {
...
links: [
{
displayName: "Start Redis Experiment",
path: "/redis-experimentation/start",
},
],
},
},
},
...
"@clutch-sh/redis-experimentation": {
startExperiment: {
hideNav: true,
},
},

Backend Config

clutch-config.yaml
modules:
...
- name: clutch.module.chaos.redisexperimentation