We noticed an issue with the OpenShift router infrastructure and are working on fixing the issues.
Update 09:08: We've identified the issue with the OpenShift routers and are working on getting them up and running again.
Update 09:55: We still do not know the root cause. We contacted support of the vendor for additional expertise.
Update 10:27: So far we did not have not heard back from the vendor. Meanwhile we are investigating further but so far where not able to find the root cause or a means of mitigating the issue.
Update 10:45: We are in contact with an engineer from the vendor and are analyzing the issue further with our own engineers.
Update 11:12: One of our engineers is in a call with the vendor. Furthermore our engineers are analyzing the HAProxy which keeps failing. As soon as one of the HAProxys gets traffic, it crashes and the second one takes over which then is also crashing. We are currently researching several HAProxy limit related settings. Further we are in contact with our infrastructure provider to see if there is any abnormal traffic incoming.
Update 11:27: We have identified an unusual amount of unwanted traffic from one source. After blocking this source we are seeing an improvement and the system recovering slowly. The investigation is ongoing to find the source of the traffic and the root cause of the failing HAProxy.
Update 11:45: The sites are reachable. We are investigating why the unusual traffic caused the HAProxy to fail and what we can do to prevent something like this happening again.
Update 11:59: We could mitigate the symptoms and the platform is running stable again. We will follow up with a post-mortem of the incident.