Benchmarks and beta.38
Hello everyone! Here some updates on the beta.38 release!
I'm still working on the benchmarks, I've started by doing a deep testing on all the features + the load tests and already found some "all kind of issues" to tackle (some related to users , NPCs, pathfinding, etc.).
Everything has been added in the board so you can see the bugs list, what has been fixed and the issues in progress:
https://github.com/users/damian-pastorini/projects/2/views/1
As for the load tests, here's a preview on the numbers I've been looking at:
- We setup a server in AWS EC2, a t4g.small with an RDS instance (probably the most basic setup).
Side note: as said before we will include a guide on how you can do this by yourself.
- In all the tests: the memory consumption was ridiculous low, below 150mb, no matter how much we pushed on it (this was good).
- As first test case, I've created a big room (145x145 tiles, some space to move around with random trees and collision elements), with 400 NPCs, and connected 12ccus > All good, server kept below 10% CPU and game play was excelent.
- Then we pushed on the CCUs and added 10 more (400 NPCs + 22ccus) > Weirdly the game started to get a bit laggy (far from unplayable, but the player response was not like initially). I was trying to analyze the problem, but before I could get any deep my friend did quicker test and changed the CPU type on AWS, from t4g.small to c7g.large, then at 20ccus everything was fine again, and it got a bit laggy with over 40ccus, which is not what I'm looking for.
- As another quick test, I've decided to drop the NPCs number from 400 to 20, and re-try > The results were EXACTLY the same, over the amount of users was increased the server became more unstable, no matter the amount of packages been sent/received and how much content was moving around the room. The issue is strictly related to players (since all NPCs and players use the same physics system and bodies we discarded any issues related to that).
- At that point, I've decided to stop the tests on AWS and did some more local tests to discard any network issues at once > The result was the same with over 60ccus the game became laggy locally.
With that been said, the first thing to do is to finish fixing the found bugs. From some of these I've already improved the performance, but none of it impacted the CCUs issue.
Once I have those bugs fixed, then I will continue with the following tests cases:
- Test by adding multiple rooms and splitting users between rooms, to see how many rooms and users we can hold on this way, also to see how multiple rooms would impact the load.
- Test by disabling feature by feature (as how Reldens was made I should be able to do this with a simple field switch). This way I should be able to identify if there's something specific part of the code causing the issue.
- Depending how the above cases goes, I would test by splitting the users over multiple rooms but having each room in different processes (see if multiple process would help with the issue).
Stay tunned! I will keep updating the board with the progress and will try to finish with the issues ASAP!