Building an SLA notification system - Part 1 - Overview

Introduction

In this series, we're going to build a system to notify agents of tickets that breach the SLA schedule.  In zendesk, there isn't a way to create a trigger to notify an agent of a breached ticket, nor is there a way to create an escalation chain, should that agent not respond.  This app will close that gap.  For now, we're calling it yeller.  The idea is that it yells at you when tickets aren't being responded to.

Our development stack is a standard LEEF stack (Linux, Nginx, Emporer, Flask).  We will use Python as the language, and SQLAlchemy as the ORM.  Our database will be SQLite3, but in production - I recommend PostgreSQL.  This series assumes you already have a working knowledge of SQLAlchemy and using the declarative pattern for defining data models.  If you don't, see the references section for links.

In this part, we will focus on the design requirements.  We'll identify the components of the system and the database schema definition.

yeller psuedocode

  1. Ticket is created in Zendesk
  2. Trigger is sent to yeller to start tracking a ticket
  3. yeller identifies an escalation schedule for the ticket
  4. When an escalation level time expires, yeller notifies a target via email or sms
  5. yeller increments the escalation level and waits for that to expire, Goes to step 4.
  6. If no higher order escalation level exists, yeller stop tracking the ticket

This app has three main components:

  • a state machine to track the lifecycle of a ticket
  • a scheduler to fire off tasks when a ticket breaches
  • an underlying data model to support the scheduler and state machine

State Machine

As state machines go, yeller will be pretty simple.  There's no branching to a schedule.  The schedule is a simple list of levels.  All tickets start at level 1.  Each level has a timeout value, breach_interval_sec.  When a timeout is reached, the ticket is evaluated.  If the ticket has no assignee, the level is considered breached, otherwise it is closed and no longer tracked.  If in breach, the escalation targets are evaluated, notifications are sent, and the escalation level is incremented.  This process is repeated, until the entire list of levels for a schedule have been executed.  At that point, the ticket is no longer tracked.

Scheduler

In yeller, we'll use python's excellent APScheduler to execute escalation breaches.  When the state machine enters a new level, it will schedule a task to fire breach_interval_sec number of seconds from now.  The scehduled task will evaluate the ticket and return to the state machine if the ticket is in breach or not.

Data Model

The data model for our app has a few objects: Groups, Schedules, Levels, Targets.

Groups - These are groups of agents.  Tickets are assigned to a group.  Groups are assigned an Escalation Schedule (ES).  Groups have a 1:1 relationship to groups in zendesk.  When a ticket comes into yeller, it's group assignment determines which ES is used to track progress.

Schedules - Escalation Schedules is a collection of 1 or more escalation levels a ticket will progress through.  It's a container object, nothing more.  

Levels - Escalation Schedule Levels define how long a ticket can stay on a level before breaching.  When a ticket breaches, Targets are notified

Targets - Targets belong to Escalation Levels and defines an agent to be notified, the method of notification, and a message to send.

Here's a schema diagram of our database (made with genMyModel.com):

Here's a python models.py file representing this in SQLAlchemy ORM.

Conclusion

In this part we covered the components of a notification system and the data model that will support it.  The next part will cover accessing the zendesk API to sync our zendesk groups to our yeller app group models.

Building an SLA notification system - Part 2 - Syncing zendesk data to yeller

References

 

Comments

0 comments

Please sign in to leave a comment.