{"id":7884,"date":"2018-12-28T10:42:16","date_gmt":"2018-12-28T14:42:16","guid":{"rendered":"http:\/\/dougmcclure.net\/blog\/?p=7884"},"modified":"2018-12-28T10:42:20","modified_gmt":"2018-12-28T14:42:20","slug":"in-the-beginning-the-monitoring-service","status":"publish","type":"post","link":"https:\/\/dougmcclure.net\/blog\/2018\/12\/in-the-beginning-the-monitoring-service\/","title":{"rendered":"In the Beginning &#8211; The \u201cMonitoring Service\u201d"},"content":{"rendered":"\n<p>Follow one of PagerDuty\u2019s integration guides for common monitoring tools and you\u2019ll quite easily end up with your very own \u201cMonitoring Service\u201d and open the floodgates for incoming signals, alerts or events from that tool now triggering PagerDuty alerts and incidents. In the end, hopefully, you\u2019ll be paging the right on-call person at the right time! PagerDuty makes it super easy to get started with this design pattern no matter the tool in your environment thanks to over 300+ integrations in our portfolio today. &nbsp;<\/p>\n\n\n\n<p>Your \u201cMonitoring Services\u201d may resemble something like \u201cNagios XI -Datacenter 1\u201d or \u201cSplunk &#8211; Atlanta\u201d or just plain old \u201cNew Relic\u201d. I see them in all shapes and sizes across customers in all parts of the world and every industry and size. Internally here at PagerDuty, in addition to calling these \u201cMonitoring Services\u201d we also often refer to them as \u201cCatch All Services\u201d, \u201cEvent Sink Services\u201d, or \u201cDatacenter Services\u201d because they do one thing well &#8211; catch all incoming signals, alerts or events in one single PagerDuty service and notify someone based upon the single escalation policy associated with that service.&nbsp;Works, but maybe not so well in the long run.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"376\" src=\"http:\/\/dougmcclure.net\/blog\/wp-content\/uploads\/2018\/12\/MonitoringServiceExample-1024x376.png\" alt=\"\" class=\"wp-image-7885\" srcset=\"https:\/\/dougmcclure.net\/blog\/wp-content\/uploads\/2018\/12\/MonitoringServiceExample-1024x376.png 1024w, https:\/\/dougmcclure.net\/blog\/wp-content\/uploads\/2018\/12\/MonitoringServiceExample-300x110.png 300w, https:\/\/dougmcclure.net\/blog\/wp-content\/uploads\/2018\/12\/MonitoringServiceExample-768x282.png 768w, https:\/\/dougmcclure.net\/blog\/wp-content\/uploads\/2018\/12\/MonitoringServiceExample.png 1608w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The speed and ease at which you can integrate tools into PagerDuty <g class=\"gr_ gr_13 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace\" id=\"13\" data-gr-id=\"13\">is<\/g> awesome.&nbsp; In a very short time, you\u2019re up and running getting value from PagerDuty.&nbsp; Any responder or any schedule on the escalation policy associated with these kinds of services will get paged. Application Developer team paged for network events, you bet! Security team paged for server <a href=\"http:\/\/foo.bar.com\">foo.bar.com<\/a> disk space events, you got it! On-call responders paged at 3 am for a problem with \u201cNew Relic\u201d, a piece of cake. Trying to engage the right team for the right alert\/incident is very challenging when you only have one escalation policy to use for anything\/everything that might be monitored by your integrated tool.<\/p>\n\n\n\n<p>If you\u2019d like to apply PagerDuty\u2019s best practices for reducing the sheer number of incidents and notifications in this configuration you can simply turn on Time Based Alert Grouping at let&#8217;s say a two-minute grouping window.&nbsp; Group away!&nbsp; Sometime later, the Application Support team reaches out to you confused because there are some weird Cisco Chassis Card Inserted alerts grouped together with their important application incidents. The Storage Ops team pings you in Slack confused by the custom \u201cFront Door Visitor\u201d alert grouped into their DX8000 SAN incident. Time is time and <g class=\"gr_ gr_13 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace\" id=\"13\" data-gr-id=\"13\">Time Based<\/g> Alert Grouping is just doing its job perfectly across the mega \u201cMonitoring Service\u201d.<\/p>\n\n\n\n<p>Ease and speed aside, as you can see above with not so subtle examples that there are a number of drawbacks from following the \u201cMonitoring Service\u201d design pattern and this configuration certainly isn\u2019t a \u2018best practice\u2019.&nbsp; Over the next few blog posts, I\u2019d like to take you along on a better practice journey by exploring PagerDuty\u2019s service design best practices and our Event Intelligence product through practical applications when using Nagios XI.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Follow one of PagerDuty\u2019s integration guides for common monitoring tools and you\u2019ll quite easily end up with your very own \u201cMonitoring Service\u201d and open the floodgates for incoming signals, alerts or events from that tool now triggering PagerDuty alerts and incidents. In the end, hopefully, you\u2019ll be paging the right on-call person at the right [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1079,1082,1081,1077,1078,1080],"tags":[931,1074,1073,1072],"class_list":{"0":"post-7884","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-event-intelligence","7":"category-event-routing","8":"category-event-rules","9":"category-pagerduty","10":"category-pagerduty-best-practices","11":"category-service-design","12":"tag-best-practices","13":"tag-event-intelligence","14":"tag-nagios-xi","15":"tag-pagerduty"},"_links":{"self":[{"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/posts\/7884","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/comments?post=7884"}],"version-history":[{"count":2,"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/posts\/7884\/revisions"}],"predecessor-version":[{"id":7887,"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/posts\/7884\/revisions\/7887"}],"wp:attachment":[{"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/media?parent=7884"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/categories?post=7884"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dougmcclure.net\/blog\/wp-json\/wp\/v2\/tags?post=7884"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}