Discussion:
[specref] Downtime post-mortem
Tobie Langel
2016-03-04 11:59:32 UTC
Permalink
Hi all,

Sorry Specref was down for a number of hours overnight as I deployed a
buggy app refactor and failed to see that the app was crashing despite
checking the logs.

Lessons learned:
- the app needs tests, not only the data,
- Papertrail logs don't differentiate enough failed and served requests,
so check the app too,
- Don't push code late a night.

Apologies,

--tobie
Tobie Langel
2016-03-04 13:53:38 UTC
Permalink
As another mitigation strategy for such issues, I've given access to the
app to Dom, Shane and Marcos.

We should pretty much have all timezones covered with these three.

--tobie
Post by Tobie Langel
Hi all,
Sorry Specref was down for a number of hours overnight as I deployed a
buggy app refactor and failed to see that the app was crashing despite
checking the logs.
- the app needs tests, not only the data,
- Papertrail logs don't differentiate enough failed and served requests,
so check the app too,
- Don't push code late a night.
Apologies,
--tobie
Shane McCarron
2016-03-04 14:06:18 UTC
Permalink
In particular since most of us never sleep.
Post by Tobie Langel
As another mitigation strategy for such issues, I've given access to the
app to Dom, Shane and Marcos.
We should pretty much have all timezones covered with these three.
--tobie
Post by Tobie Langel
Hi all,
Sorry Specref was down for a number of hours overnight as I deployed a
buggy app refactor and failed to see that the app was crashing despite
checking the logs.
- the app needs tests, not only the data,
- Papertrail logs don't differentiate enough failed and served requests,
so check the app too,
- Don't push code late a night.
Apologies,
--tobie
--
Shane McCarron
Projects Manager, Spec-Ops
Tobie Langel
2016-03-04 15:34:24 UTC
Permalink
Post by Tobie Langel
As another mitigation strategy for such issues, I've given access to the
app to Dom, Shane and Marcos.
I've now also setup email alerts when errors show up in the logs.

--tobie

Loading...