Fl-MICRON-2 - File Manageme...

Resolved

We are happy to report that the full Micron-2 pipeline has now been stabilized. Nonetheless, as part of our strictly enforced internal stability and performance related protocols, we are adding a brand new Micron-6 cluster in replacement of Micron-2. This version of Micron will be retired after about a year of life when it is reinstalled to meet high standards for stability and reliability in the Square Cloud. We apologize profusely for the recent instability, it is not up to our usual standards.

Despite facing a disk failure (which we have just now found out recently) throughout this period we can report with great satisfaction that not a single byte of data was lost, proving the reliability of our systems and the notable quality of the Micron equipment. Thank you to Micron for providing superb hardware.

Going forward, we will be moving applications from FL-MICRON-2 to FL-MICRON-4-6 for adding the security check points. This will help ensure our continued operational excellence and demonstrate our resolve to stay online for our most important partners.

We thank you for your understanding and continued trust in our services as we put our efforts to maintain the best reliability and performance standards.

Correction: most applications were unavailable for 8 minutes and a few seconds, however, applications in isolated sectors ended up having downtime 5x worse, we sincerely apologize for this downtime.

Note: disk failure not fully confirmed, which could be exclusively a raid controller failure and/or software failure, which will be completely remodeled later today.

Posted 15 Jun at 06:53am GMT-3.

Re-appeared

We are currently experiencing an ongoing instability and have contacted datacenter staff during last hours. We apologise for any inconvenience this may have caused. RAID disks failed - we suspect that one of the disks in our RAID array failed recently, which could have been one of the causes. This is something we are still waiting from our datacenter's technical team to verify. Still, rest assured that the system is up and running and requests are still coming in. Some applications may have been interrupted to boot, however.

Posted 15 Jun at 04:37am GMT-3.

Resolved

We are pleased to inform you that the instability has been resolved. There was a total application downtime of 8 minutes and 37 seconds. We sincerely apologize for any inconvenience this may have caused. Applications that were completely offline will automatically restart within a few seconds.

Posted 15 Jun at 02:19am GMT-3.

Updated

I apologize sincerely for the disruption, but we've detected that the cluster's activity is currently at only 10%. Therefore, we've decided to proceed with a quick restart for maintenance purposes. We understand the inconvenience this may cause and appreciate your patience and understanding during this necessary maintenance window.

Posted 15 Jun at 02:11am GMT-3.

Updated

We apologize for the instability in the management of services. To ensure security, boot and restart actions will be gradually released via API to all users within 15 minutes. We appreciate your understanding and patience as we prioritize system stability.

Posted 15 Jun at 12:32am GMT-3.

Updated

File manager active again, we apologize for the inconvenience, we will monitor the situation, application launches and restarts on fl-micron-2 are still blocked during the instability, we ask for your understanding.

Posted 15 Jun at 12:30am GMT-3.

Updated

Cause of instability not yet fully identified, due to performance protocols, we will restart some environments of this cluster, such as the web part and internal proxies, with a focus on reducing previous load (connections active for more than ~220 days will be reset).

Posted 15 Jun at 12:25am GMT-3.

Created

We detected instability in this cluster, affecting file list and initializations in the FL-MICRON-2 cluster, engineering team investigating.

Posted 15 Jun at 12:09am GMT-3.