Analysis¶
Analysis is argo workflows are tests that can be launched in a kubernetes cluster, typically inside a Rollout or using Kargo promotions, but they can be used without referencing them in that applications
That analysis make some queries to systems like prometheus, Datadog,... Also supports web queries and executing a kubernetes job.
We can template that analysis using AnalysisTemplate or ClusterAnalysisTemplate kubernetes resources. And they are instanciated via an AnalysisRun resource.
Analysis Spec¶
When defining an AnalysisTemplate, ClusterAnalysisTemplate or AnalysisRun we have this fields
spec.metrics¶
This is where we define the queries|tests|measurements via the following fields
- name
The name we give to the test, query or measurement
- provider
It includes the query|test itself and provider configuration. There are some supported providers like prometheus, Datadog, CloudWatch, InfluxDB, Web, Job, Graphite,...
- initialDelay
It adds a delay to the test execution. Example: 30s, 5m,...
- count and interval
Count is the number of times we want to repeat the test, query or measurement and interval is the time to wait between tests. Example: 15s
Test, query or measurement result¶
The query|test|measurement itself is done against a provider, and we can consider it as successful or failed with the SuccessCondition and failureCondition settings.
Sometimes the query|test|measurement cannot be evaluated as successful or failed and they are considered as an inconclusive result. This could happen due to missing data, timeouts, or other issues that prevent the metric from being evaluated. One example of how analysis runs could become Inconclusive, is when a metric defines no success or failure conditions. They also can
Handling Success results¶
consecutiveSuccessLimit define the required consecutive number of successes to consider the analysis to succeed
consecutiveSuccessLimit default value is 0 (disabled) and it is available since v1.8 release
Handling Error results¶
- With failureLimit we can define the maximum number of test errors we want to tolerate.
The default value of failureLimit is 0 so no failures are tolerated. To disable we can set it to "-1". failureLimit has precedence over consecutiveSuccessLimit. Also failureLimit or consecutiveSuccessLimit are not reached, the test (measurement) is considered as inconclusive.
- consecutiveErrorLimit defines the maximum number of consecutive errors that are allowed for a metric before the analysis is considered to have failed.
Handling inconclusive results¶
InconclusiveLimit sets a threshold for how many inconclusive results are acceptable during an analysis. If the number of inconclusive results exceeds this limit, the analysis is marked as failed. If inconclusiveLimit is not specified, the default behavior is to allow unlimited inconclusive results, meaning the analysis will not fail due to inconclusive results
Example¶
apiVersion: argoproj.io/v1alpha1
kind: AnalysisRun
metadata:
generateName: test-
namespace: argocd
spec:
metrics:
- name: argocd-app-health-sync # name of the measurement
initialDelay: 30s # wait 30 seconds to start doing queries
count: 15 # 15 times
interval: 10s # every 10 seconds. Mix this when the metric is updated
provider:
prometheus:
address: "http://prometheus-operated.monitoring:9090"
query: |
argocd_app_info{name="my-argocd-app",health_status="Healthy", sync_status="Synced"}
successCondition: len(result) == 1 && result[0] == 1 # only 1 result with value 1
failureCondition: len(result) == 0 || result[0] != 1 # empty array or result not 1
failureLimit: 3 # tolerate 3 errors max
consecutiveErrorLimit: 3 # tolerate 3 consecutive errors max
consecutiveSuccessLimit: 8 # its ok with 8 consecutiveSuccessLimit
inconclusiveLimit: 2 # tolerate 2 inconclusive results
spec.args¶
Inside a template we can define arguments
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: mytemplate
spec:
args:
- name: service-name
- name: prometheus-port
And they can used later as variables in the query
{{args.service-name}}
{{args.prometheus-port}}"
spec.ttlStrategy¶
ttlStrategy permits to control the the lifetime of an analysis run and delete them after a period of time. If this field is unset, the analysis controller will not delete them and they must be deleted manually or via other garbage collection policies (e.g. successfulRunHistoryLimit and unsuccessfulRunHistoryLimit).
apiVersion: argoproj.io/v1alpha1
kind: AnalysisRun
spec:
...
ttlStrategy:
secondsAfterCompletion: 3600
secondsAfterSuccess: 1800
secondsAfterFailure: 1800
spec.terminate¶
pending
spec.measurementRetention¶
pending
spec.dryRun¶
pending
Links¶
- Analysis & Progressive Delivery
https://argoproj.github.io/argo-rollouts/features/analysis/
- Argo Rollouts FAQ
https://argoproj.github.io/argo-rollouts/FAQ/
- Kargo Analysis Templates Reference
https://docs.kargo.io/user-guide/reference-docs/analysis-templates/