Frostmourne season 3 was our most successful yet, in both popularity and stability. After season 1 launch crashed due to a faulty gamemaster command and season 2 launch crashed due to unexpectedly reaching our item guid pool allocator limit, we have finally broken the curse as season 3 launched with perfect realm stability, only being restarted three days later for updates. Unfortunately, as experienced in previous launches, the first few minutes were plagued by network lag as we once again exceeded our available bandwidth capacity. We indirectly benefited from previous launch instability due to the delay of players logging back in, reducing the overall peak bandwidth usage. Because of the stability of this launch the spike was higher and lasted longer than ever before until the network was able to catch up.
Our goal for a realm launch is to ensure a smooth and stable experience, but also to give as many players as possible a fair and equal chance at obtaining realm firsts, because of this we do not believe in limiting the initial wave of players. While we have made great improvements in this area by utilizing clever batching and compression to achieve better than original bandwidth usage, we are ultimately limited by the client’s own outdated and inefficient network protocol.
In 2021 we completed the major necessary milestones to support server-side instancing of entire world zones, in which we applied this then-new technology to Dalaran, significantly reducing world processing time by increasing parallelization due to updating Dalaran independently of the rest of Northrend map. While instancing Dalaran was possible at the time because of its unique position and limited script/npc usage we knew further work would be required to instance other world zones, however due to the already significant performance gain we felt continued work was no longer a priority and thus shifted development focus on to other tasks. It was not until just prior to the launch of Frostmourne season 3 we restarted development due to the expected sheer amount of players questing and leveling in the open world causing substantially increased server loads in comparison to our other realms. To support this we had to rework parts of several subsystems that were previously storing data globally which is obviously not ideal as we are looking to increase parallelization.
While we could’ve simply added locking around those datasets for synchronization the latency and sometimes unpredictable nature of locks are not inline with our overall performance goals so the decision was made to do it right and move the required data objects to the map’s local context, ensuring thread safety and no possible lock contention. After the completion of these changes we began by testing the effects of instancing Wintergrasp on Icecrown with very good results as seen in the images below.
As seen, instancing Wintergrasp completely eliminated the server average latency spikes that were occurring from heavy battles due to several hundred players PvPing in a small area. After the successful instanced Wintergrasp test, we also applied instancing to both Sholazar Basin and Zul'Drak, further reducing overall server load.
Another performance issue we faced with Frostmourne Season 3 was auction house search response delays caused by our threading model at the time being unable to scale with the significantly higher item counts due to the cross-faction auction house feature that is new to Season 3. Previously, we had a fixed total of two search threads, one for alliance and one for horde, unfortunately additional threads could not be created due to implementation limitations. While this worked great for years with all our current realms where the auction house is separated, it was not able to keep up with a combined auction house. The time complexity of running up to 8 individual category sorts on a 150,000+ item list grew to the point where the search queue would begin to backlog during peak hours and usage resulting in being either completely unable to search for items or waiting very long periods of time between searches which is obviously very unacceptable. Our answer was to completely rewrite the threading model with scalability in mind, allowing us to now dynamically choose the amount of worker threads needed with no limitations.
Additionally, we also moved the very cpu intensive search process to our realm server, freeing up precious cpu utilization from the worldservers and reducing context switches.