In this talk we’ll use a standard serverless application that uses API Gateway, Lambda, DynamoDB, SQS, SNS, Kinesis, Step Functions, Aurora (Serverless) (and other AWS-managed services). We'll explore how Amazon DevOps Guru recognizes operational issues and anomalies like increased latency and error rates (timeouts, throttling and increased latency). We will also explore DevOps Guru "Proactive Insights" which recognize configurational anti-patterns like missing failure destination on Kinesis Data Streams or DLQ on SQS or over-provisioning of AWS services like DynamoDB tables. We'll also integrate DevOps Guru with PagerDuty to provide even better incident management. We'll also investigate current shortcomings of the DevOps Guru service.
Amazon DevOps Guru analyzes data like application metrics, logs, events, and traces to establish baseline operational behavior and then uses ML to detect anomalies. The service uses pre-trained ML models that are able to identify spikes in application requests, so it knows when to alert and when not to.