Monitoring Resources¶
When monitoring an environment with tb_pulumi, we want to make sure alarms get set up against critical metrics for all resources being managed in a project. The monitoring tools in this module are designed to track your infrastructure as you build it and set up monitors for everything automatically. The alarms can then be tweaked or disabled entirely as needed.
When you add ThunderbirdComponentResource
s to a ThunderbirdPulumiProject
, the project tracks the resources in
an internal mapping correlating the name of the component resource to the collection of resources it contains. These
resources can have complex structures with nested lists, dicts, and ThunderbirdComponentResource
s. The project’s
tb_pulumi.ThunderbirdPulumiProject.flatten()
function returns these as a flat list of unlabeled Pulumi
Resource
s and Output
s.
However, it is the nature of Pulumi Outputs that we do not know what type they will become when they are resolved. This
presents a hurdle for the auto-detection of resources to monitor, which is resolved through implementations of the
tb_pulumi.monitoring.MonitoringGroup
class. This class works by finding all the Output
s in the
flatten
ed resources, then applying them. Once applied, the resolved outputs and previously known resources are
iterated to find supported resources of known types. The outputs are then passed into a function called monitor
.
When you implement the MonitoringGroup
class, the alarms you build must be defined in an implementation of
monitor
, not in __init__
as in typical Pulumi patterns.
In addition to providing this post-apply access to all monitorable resources, the MonitoringGroup
also sets up a
configuration of overrides (allowing you to tweak or disable any alarm) and provides a method of notification for
tripped alarms.
A MonitoringGroup
‘s alarms are organized and made configurable through a second class, the
tb_pulumi.monitoring.AlarmGroup
. This represents an overridable set of alarms for a
single resource (which may produce any number of metrics which we want to monitor). MonitoringGroup
s must map
resource types to AlarmGroup
types that handle those resources in their monitor
functions.
As an example, take a look at tb_pulumi.cloudwatch.CloudWatchMonitoringGroup
, a MonitoringGroup
implementation that uses AWS CloudWatch to alarm on metrics produced by AWS resources. It creates a
tb_pulumi.cloudwatch.LoadBalancerAlarmGroup
when it encounters a resource of type
aws.lb.load_balancer.LoadBalancer
. That alarm group monitors status codes and response times, among other things.
CloudWatch Monitoring¶
To create monitors for AWS resources, you may want to use AWS’s metrics and alerting platform, CloudWatch. You can get
automatic monitoring with sensible defaults for all supported resources in your stack by setting up a
tb_pulumi.cloudwatch.CloudWatchMonitoringGroup
. Assuming your project is set up like in the
Quickstart section, you can add monitoring like this:
1monitoring_opts = resources['tb:cloudwatch:CloudWatchMonitoringGroup']
2monitoring = tb_pulumi.cloudwatch.CloudWatchMonitoringGroup(
3 name='my-monitoring',
4 project=project,
5 notify_emails=['your_alerting_email_here@host.tld'],
6 config=monitoring_opts,
7)
The CloudWatchMonitoringGroup
will look at every resource in your project
. If it is capable of setting up
alerting for a resource, it will, using default values. If you want to tweak the alarm’s configuration, pass the desired
values in through the config object. This should look something like this:
1tb:cloudwatch:CloudWatchMonitoringGroup:
2 alarms:
3 resource-name:
4 alarm-name:
5 options: values
The options: values
settings can contain any valid inputs to the aws.cloudwatch.MetricAlarm
constructor
as defined here. It also
supports a special enabled
option, which can be set to False
to prevent the creation of the alarm.
The resource-name
is the name of the resource to which the alarm applies, as it is known to Pulumi. To see a list of
these values within your stack, you can set up your Pulumi environment and run pulumi stack
. You’ll see output like
this (which is heavily truncated):
Current stack is mystack:
Managed by mymachine
Last updated: 9 seconds ago (2024-12-10 09:31:13.157002687 -0700 MST)
Pulumi version used: v3.142.0
Current stack resources (137):
TYPE NAME
pulumi:pulumi:Stack myproject-mystack
...
├─ tb:fargate:FargateClusterWithLogging myproject-mystack-fargate
│ ├─ aws:kms/key:Key myproject-mystack-fargate-logging
│ ├─ aws:iam/policy:Policy myproject-mystack-fargate-policy-exec
│ ├─ tb:fargate:FargateServiceAlb myproject-mystack-fargate-fargateservicealb
│ │ ├─ aws:alb/targetGroup:TargetGroup myproject-mystack-fargate-fargateservicealb-targetgroup-myapp
│ │ ├─ aws:lb/loadBalancer:LoadBalancer myproject-mystack-fargate-fargateservicealb-alb-myapp
│ │ └─ aws:lb/listener:Listener myproject-mystack-fargate-fargateservicealb-listener-myapp
│ ├─ aws:cloudwatch/logGroup:LogGroup myproject-mystack-fargate-fargate-logs
│ ├─ aws:iam/policy:Policy myproject-mystack-fargate-policy-logs
│ ├─ aws:ecs/cluster:Cluster myproject-mystack-fargate-cluster
│ ├─ aws:iam/role:Role myproject-mystack-fargate-taskrole
│ ├─ aws:ecs/taskDefinition:TaskDefinition myproject-mystack-fargate-taskdef
│ └─ aws:ecs/service:Service myproject-mystack-fargate-service
...
If you wanted to change the threshold for alerting on 5xx errors in the target group, you would use
myproject-mystack-fargate-fargateservicealb-targetgroup-myapp
as the resource-name
in the config.
The alarm-name
key should be the name of an alarm that is supported by the relevant alarm group. For example,
tb_pulumi.cloudwatch.AlbAlarmGroup
describes the target_5xx
and alb_5xx
alarms. To change a
config for one alarm and disable another, you could write the following config:
1tb:cloudwatch:CloudWatchMonitoringGroup:
2 alarms:
3 myproject-mystack-fargate-fargateservicealb-targetgroup-myapp:
4 target_5xx:
5 threshold: 123
6 evaluation_periods: 3
7 alb_5xx:
8 enabled: False
Both of these pieces of data are available as tags on the alarms themselves. If you discover an alarm which needs to be tweaked, note the tb_pulumi_resource_name and tb_pulumi_alarm_name tags.