Recently I’ve been spending a lot of time trying to optimize my game to ensure framerate never drops below 60fps, even when I have way more enemies active at once than I plan to in the final release, so I have a good amount of headroom moving forward.
After making lots of tweaks such as splitting performance-heavy AI tasks such as pathfinding over multiple frames and ensuring only one enemy is generating a path at once, minimizing the number of collisions that are occuring, and of course the basic lua optimization stuff (making references ahead of time, reusing vectors and tables whenever possible), I’m hitting a bottleneck with the garbage collector. Despite my optimization efforts, the amount of enemies I’m stress testing still generate a decent bit of garbage every frame, and whenever the garbage collector’s threshold is reached (every few seconds) there is a noticable frame-drop, usually down to 30-40fps for the frame. Of course, with the amount of enemies that will be in typical encounters, the garbage collector won’t be triggered nearly as often, but when it is triggered I suspect a similar frame drop will occur, because it doesn’t seem to account for the amount of time elapsed in the frame (which makes sense because it’s just the normal lua garbage collector). I’ve verified the garbage collector is the issue by stopping it, and profiling the game. The frame drops are completely gone without it.
After figuring this out, I’ve been experimenting trying to manually trigger garbage collection steps on less intensive frames, which seems promising but still isn’t working quite how I’d think it would. I’d really appreciate any input:
First off, I stop the automated garbage collector, and trigger a complete collectgarbage() cycle whenever frame stability isn’t important (loading/unloading a room for example).
In the update loop of my level controller script (which I’ve verified is the first update to run in a frame by printing in it, in all the entities in the level, and in the render script - which comes later in the pipeline), I store os.clock() in a shared module to mark the start time of a frame. If anyone has any ideas of an earlier “start point” I have access to every frame that would probably help.
At the end of my render script’s update I have this:
local start_time = shared.start_time
if start_time then
local diff = os_clock() - start_time
local time_left = fps_target - diff
if time_left > collection_limit then
local done
-- until garbage is all collected or further collection would exceed the threshold
while not done and fps_target - diff > collection_limit do
-- execute a step of garbage collection
done = collectgarbage("step", 1)
-- time elapsed
diff = os_clock() - start_time
end
end
end
-- executing a garbage collector step seems to restart it, so stop it again
collectgarbage("stop")
The fps_target is 1/60, since I never want the framerate to drop below 60fps, and the collection_limit is a variable number of seconds to reserve for other things that won’t be caught in diff (I’ve experimented with everywhere from 0.002 to 0.01).
what it does is calculate the amount of time left in that frame, assuming fps_target is the limit and that start_time was indeed taken from close enough to the start of the engine frame. Then, loop garbage collector steps until all garbage has been cleaned or the elapsed time of that frame (including the collector loops) would exceed the amount of ms to reserve for other tasks.
I’ve also experimented with passing different values to the collectgarbage(“step”) call, without making much difference in the issue I’ll outline. This is a large improvement over the automated collection for sure, the frame rate is much more stable, and when it dips it’s not nearly as hard, more like a 17-18ms frame vs like 30+ms, but I don’t understand why it’s happening. Even if I set the collection_limit to something extreme like 10ms, when that frame dip occurs the profile says my render script took around 11ms to execute, but it should have broken out of the garbage collector loop once 6ms had elapsed from the start of the level controller update, let alone the render script update.
Anyway, sorry for the long post. If anyone has some insight into how to handle memory management or notices I’ve done something done I’d really appreciate it!