Monitoring Resources

tb_pulumi offers a tool for setting up monitoring for your entire stack at once. This comes downstream from some other internals described better in Acting on Fully Applied Pulumi Stacks. The short version of this is that:

And now, we add that…

The overall idea is that the monitoring groups inspect the post-applied Pulumi state looking for any monitorable resources. It creates the appropriate alarm groups for each resource it understands. If you change your infrastructure to, as an example, create a new ALB, the monitoring group will automatically create new alarms for it. Likewise, if you delete a monitored resource, its monitors will also get deleted. This requires no extra effort from the developer beyond defining the monitoring group.

Set up a CloudWatchMonitoringGroup

Start with a basic YAML config:

tb:cloudwatch:CloudWatchMonitoringGroup:
  monitoring:
    config:
      alarms: {}
    notify_emails:
      - your.email.here@domain.smth

And then define one in code:

monitoring_opts = resources.get('tb:cloudwatch:CloudWatchMonitoringGroup')
monitoring = tb_pulumi.cloudwatch.CloudWatchMonitoringGroup(
    name=f'{project.name_prefix}-monitoring',
    project=project,
    **monitoring_opts,
)

This will build monitors for everything the module is capable of monitoring.

Override Default Alarm Settings

The tb_pulumi.cloudwatch.CloudWatchMonitoringGroup class inherits from the base tb_pulumi.monitoring.MonitoringGroup class. That base class offers us a way of overriding the default options for monitors. That is its config parameter.

This is a specially formatted dict of options. Here’s a reasonable skeleton to begin with:

alarms:
  resource-name:
    alarm-name:
      enabled: True
      # ... other options here ...

The resource-name is the name of your resource as it is known to Pulumi. You can run the pulumi stack command to get a list of these names. Here is some heavily modified sample output to refer to. The value you want is in the NAME column.

# pulumi stack
Current stack is foobar:
    Owner: your-org
    Last updated: date/time
    Pulumi version used: v3.187.0
Current stack resources (###):
    TYPE                                                  NAME
    pulumi:pulumi:Stack                                   project-stack
    ├─ tb:network:MultiCidrVpc                            project-stack-vpc
    │  ├─ aws:ec2/vpc:Vpc                                 project-stack-vpc
    │  ├─ tb:network:SecurityGroupWithRules               project-stack-vpc-endpoint-sg
    │  │  ├─ aws:ec2/securityGroup:SecurityGroup          project-stack-vpc-endpoint-sg-sg
    │  │  ├─ aws:ec2/securityGroupRule:SecurityGroupRule  project-stack-vpc-endpoint-sg-ingress-0
    │  │  └─ aws:ec2/securityGroupRule:SecurityGroupRule  project-stack-vpc-endpoint-sg-egress-0
    │  ├─ aws:ec2/subnet:Subnet                           project-stack-vpc-subnet-1
    │  ├─ aws:ec2/internetGateway:InternetGateway         project-stack-vpc-ig
    │  ├─ aws:ec2/subnet:Subnet                           project-stack-vpc-subnet-0
    ...

The alarm-name is defined by the AlarmGroup responding to the resource. For example, the tb_pulumi.cloudwatch.Ec2InstanceAlarmGroup docs list several, such as cpu_utilization.

All alarms support the enabled option. This is implied to be True, but can be set to False if you do not wish to build that alarm. You can also supply any other options for an aws.cloudwatch.MetricAlarm here, and those will override the defaults. For example, you could change the threshold and number of evaluation periods for an alarm:

alarms:
  my-instance:
    cpu_utilization:
        threshold: 80
        evaluation_periods: 3