PingPlotter Cloud - Notice history

Identity / Login - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Web Interface API - Operational

99% - uptime
May 2024 · 98.44%Jun · 100.0%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Backend Data Collection - Operational

100% - uptime
May 2024 · 100.0%Jun · 99.96%Jul · 100.0%
May 2024
Jun 2024
Jul 2024

Agent Service - Operational

100% - uptime
May 2024 · 100.0%Jun · 100.0%Jul · 99.92%
May 2024
Jun 2024
Jul 2024

Notice history

Jul 2024

Unexpected downtime during maintenance
  • Resolved
    Resolved

    We think we have resolved all agent connectivity issues. Please reach out if you are having any issues.

  • Monitoring
    Monitoring

    During some maintenance for our agents service (connect.pingplotter.com and agents100.pingplotter.com) we lost the public IP address assigned to our agents100.pingplotter.com agent endpoint. The old IP Address was: 20.62.235.32 and the new IP address is: 172.214.6.221. We're working with our cloud provider to see if we can reclaim the old IP address. We have updated the DNS for agents100.pingplotter.com to point to the new IP address, but it may take up-to 48 hours. It is possible that some of your agents were only connected with agents100.pingplotter.com. Agents solely connected to agents100.pingplotter.com should slowly reconnect to the new IP address.

Jun 2024

May 2024

Performance Issue
  • Resolved
    Update

    Some final details about this incident:

    We found an insidious "lock" that was induced by arrival of new data being analyzed by an alert while a user was scrolling through a time graph of the same target getting data. In an isolated (low chance-based) case, this would cause a deadlock, which would then back up the data analysis server (of which we run many in parallel). That deadlock could (but didn't always) take several minutes to release, and while that was happening, continued actions on that server would back up. Depending on how busy the server was, it could be unusable for many minutes, which would affect access to any sessions being processed by that server. Whoa, complicated!

    This bug happened on Friday (May 3rd) morning twice, and has not happened since (once we found a way to recognize it, we've been keeping a close eye on it and remediating before it created an issue).

    We rolled out a fix for that bug yesterday, and feel confident that that particular bug is squashed.

    Thanks for your understanding and patience!

  • Resolved
    Resolved

    Our team has successfully identified the root cause of the issue. While a comprehensive fix is still in progress, we have implemented effective remediation steps that have stabilized the system. At this time, the issue should no longer affect your user experience. We are diligently working on a permanent solution to ensure this issue does not recur.

  • Identified
    Identified

    Looks like there's another issue that we haven't yet identified because the problem came back up on a different server. We're remediating / investigating.

  • Resolved
    Resolved

    On of the back end "computation" servers was overloaded in an unexpected path. This caused other servers to mis-report their statuses, too, and the problem looked more widespread than it really was. Because of this, we restarted too much.

    Everything is fully operational - all reports, views, summaries and quality monitor views should be accurate and up to date. No data was lost. We are investigating ways to reduce the likelihood of a repeat event.

    Thanks for your patience and understanding!

  • Identified
    Identified

    Found an issue with one of the back end data collection servers. Restarting it. Most targets are working, and more coming back online.

  • Investigating
    Investigating

    There's an issue with viewing summaries and agent target lists. We're investigating.

May 2024 to Jul 2024

Next