About Canary Analysis
Before we start the canary deployment setup, lets learn more about the canary analysis and how it works in Flagger.
Canary Resource
Flagger can be configured to automate the release process for Kubernetes workloads with a custom resource named canary which we installed in the previous chapter.
The canary custom resource defines the release process of an application running on Kubernetes.
We have defined the canary release with progressive traffic shifting for the deployment of backend service detail
here.
When we deploy a new version of detail
backend service, Flagger gradually shifts traffic to the canary, and at the same time, measures the requests success rate as well as the average response duration.
Canary Target
Below is one of the section from Canary Resource for canary target for detail
service. For more details, see Flagger documentation here.
Based on the above configuration, Flagger generates deployment/detail-primary Kubernetes object during canary analysis. This primary deployment is considered the stable release of our detail
service, by default all traffic is routed to this version and the target deployment is scaled to zero. Flagger will detect changes to the target deployment (including secrets and configmaps) and will perform a canary analysis before promoting the new version as primary.
The progress deadline represents the maximum time in seconds for the canary deployment to make progress before it is rolled back, defaults to ten minutes.
Canary Service
Below canary service section from Canary Resource dictates how the target workload is exposed inside the cluster. The canary target should expose a TCP port that will be used by Flagger to create the ClusterIP Services. For more details, see Flagger documentation here.
Based on our canary spec service, Flagger creates the following Kubernetes ClusterIP service
-
detail.flagger.svc.cluster.local
selector app=detail-primary
-
detail-primary.flagger.svc.cluster.local
selector app=detail-primary
-
detail-canary.flagger.svc.cluster.local
selector app=detail
This ensures that traffic to detail.flagger:3000
will be routed to the latest stable release of our detail
service. The detail-canary.flagger:3000
address is available only during the canary analysis and can be used for conformance testing or load testing.
Canary Analysis
The canary analysis defines:
- the type of deployment strategy
- Canary Release
- A/B Testing
- Blue/Green Deployment
- the metrics used to validate the canary version. Flagger comes with two builtin metric checks:
- HTTP request success rate and duration
- HTTP request success duration
- the webhooks used for acceptance test, load testing, etc
- the alerting settings
- Flagger can be configured to send Slack notifications:
The canary analysis runs periodically until it reaches the maximum traffic weight or the number of iterations. On each run, Flagger calls the webhooks, checks the metrics and if the failed checks threshold is reached, stops the analysis and rolls back the canary. For more details, see Flagger documentation here.
For the canary analyis of detail
service, we have used the below setup.
- We are checking the failed metrics for every iteration of canary analysis. If the request-success-rate metrics is below 99% or if the latency is greater than 500ms (which means the number of failures reach the threshold which is “1” in our setup), then canary analysis will fail and will rollback.
- We have pre-rollout webhook for running acceptance-test that are executed before routing traffic to canary. The canary advancement is paused if the pre-rollout hook fails and the canary will rollback.
- We also have rollout hook for load-test that are executed during the analysis on each iteration before the metric checks. If a rollout hook call fails; the canary advancement is paused and eventfully rolled back.
Canary Status
Get the current status of canary deployments in our cluster:
kubectl get canaries --all-namespaces
The status condition reflects the last known state of the canary analysis:
The Promoted status condition can have one of the following reasons:
- Initialized
- Waiting
- Progressing
- Promoting
- Finalising
- Succeeded
- Failed
A failed canary will have the promoted status set to false, the reason to failed and the last applied spec will be different to the last promoted one.
Now lets deploy the app and canary analysis setup and see this in action!