Some final details about this incident:
We found an insidious "lock" that was induced by arrival of new data being analyzed by an alert while a user was scrolling through a time graph of the same target getting data. In an isolated (low chance-based) case, this would cause a deadlock, which would then back up the data analysis server (of which we run many in parallel). That deadlock could (but didn't always) take several minutes to release, and while that was happening, continued actions on that server would back up. Depending on how busy the server was, it could be unusable for many minutes, which would affect access to any sessions being processed by that server. Whoa, complicated!
This bug happened on Friday (May 3rd) morning twice, and has not happened since (once we found a way to recognize it, we've been keeping a close eye on it and remediating before it created an issue).
We rolled out a fix for that bug yesterday, and feel confident that that particular bug is squashed.
Thanks for your understanding and patience!