Javascript Crawler slowdown

Incident Report for Siteimprove

Postmortem

A bug was introduced in our Crawler infrastructure on Friday 14 Sept., which caused a queue of job items to build up over the weekend. The situation was further compounded by a misconfiguration in an underlying messaging component. The bug was fixed Monday morning, and the queue had been completely processed by Tuesday morning (both CET).

Only a small part of our customers will have been affected by this bug, and those affected will experience crawl delays of no more than three days.

We have improved our alerting to make sure we notice bugs faster in the future, and will also be reviewing the configuration of our messaging components.

Posted Sep 19, 2018 - 11:50 UTC

Resolved

This incident has been resolved.

Posted Sep 18, 2018 - 04:48 UTC

Update

We are continuing to monitor for any further issues.

Posted Sep 18, 2018 - 04:47 UTC

Update

The queue has now been processed and we are back to normal operations.

Posted Sep 18, 2018 - 04:46 UTC

Update

Unfortunately our initial estimate of 5 hours to normal operations was a little optimistic. Our processing queue has been reduced by more than 75%, but it is still there and so it will take some more time before we are back to normal operations. Again, we apologize and appreciate your patience.

Posted Sep 17, 2018 - 19:49 UTC

Monitoring

The issue has now been identified and fixed. However, a long processing queue has built up over the day, and we estimate it will about 5 hours before we're through the queue and operations are back to normal.

Again, this issue only affects customers on the Javascript crawler, not customers on the classic crawler.

Posted Sep 17, 2018 - 13:31 UTC

Investigating

Our Javascript crawler is currently experiencing slowdown due to infrastructure issues, which has caused a queue of crawl jobs to build up. This means customers will experience delays with their crawls.

The classic crawler which crawls the sites of most of our crawlers is not affected.

We're investigating the problems and apologize for the inconvenience.

Posted Sep 17, 2018 - 08:03 UTC

This incident affected: Platform.