Our video hosting provider suffered intermittent issues with video playback for some users. This impacted the Felix platform for some users intermittently. If a user tried to play a video, they may have seen a “Sorry, this video does not exist” error message.
Early attempts to solve the problem by the video host were unsuccessful before normal operations were restored shortly after. The root cause was an underlying player service suffering from performance issues and inefficient load balancing. Traffic surges pushed this service over a healthy capacity, leading to elevated error rates.
The outage triggered alerts with the providers alerting function. Work has already begun on replacing an aging component of the player infrastructure and the timeline for the deployment of that update has been accelerated.
In the meantime, changes to the underlying RPC layer have improved performance allowing the infrastructure to handle existing loads more efficiently.
To prevent similar incidents from occurring in the future, our video provider are updating the architecture of the internal player services to improve latency and connection handling. This improvement is already in internal testing and should be deployed fully by the end of April.