Previous incidents

September 2024
Sep 30, 2024
1 incident

Image builds is down

Downtime

Resolved Sep 30 at 05:17pm EDT

Image builds recovered.

1 previous update

Sep 24, 2024
1 incident

CPU container performance degradation from over-scheduling on GCE n1-standard...

Resolved Sep 24 at 01:51pm EDT

Between Monday 23rd 15:54 UTC and 15:36 UTC a change was made to make GCP workers poll more quickly for containers and this caused them to pick up more CPU-only containers.

Between 13:48 UTC and 16:18 UTC on September 24th CPU containers were allowed to be scheduled on GCP NVIDIA T4 workers which are the n1 class of GCE VMs.

Both of these changes combined to cause over-scheduling of CPU work onto GCP T4 workers and significantly degrade the performance of the hosted containers (whether CP...

Sep 13, 2024
1 incident

Container filesystem 'file not found' bug

Resolved Sep 13 at 01:59pm EDT

At 12:57 UTC a container filesystem change rolled out across workers which caused spurious 'file not found' (ENOENT) errors against the container's rootfs filesystem.

We identified the bad commit and rolled back to prevent new workers from picking up the bug.

We then worked to roll over all running production workers which were affected by the bad change and scale up new workers.

All affected workers have been quarantined and thus new containers will start on new, healthy workers.

We a...

Sep 06, 2024
1 incident

CPU functions on GCP is down

Downtime

Resolved Sep 06 at 11:05am EDT

CPU functions on GCP recovered.

1 previous update

Sep 05, 2024
1 incident

CPU functions on GCP is down

Downtime

Resolved Sep 05 at 10:09am EDT

CPU functions on GCP recovered.

1 previous update

Sep 04, 2024
1 incident

CPU functions on GCP is down

Downtime

Resolved Sep 04 at 12:18pm EDT

CPU functions on GCP recovered.

1 previous update

August 2024
No incidents reported
July 2024
Jul 25, 2024
1 incident

Ephemeral apps and image builds are degraded.

Resolved Jul 25 at 02:50am EDT

Trigger processing issues have caused issues in ephemeral Apps and modal.Image builds. We're investigating.

Jul 16, 2024
1 incident

Modal function executions are degraded

Degraded

Resolved Jul 16 at 02:43pm EDT

The system has recovered, and function executions should be proceeding as normal. The team is on standby for any further issues.

2 previous updates

Jul 12, 2024
1 incident

CPU functions on GCP, Web endpoints, and 1 other service are down

Downtime

Resolved Jul 12 at 07:35pm EDT

Web endpoints recovered.

6 previous updates