[SOLVED] Optimizing frame stability / Avoiding garbage collector-related hitches

cornedbeer · May 16, 2022, 8:03pm

Recently I’ve been spending a lot of time trying to optimize my game to ensure framerate never drops below 60fps, even when I have way more enemies active at once than I plan to in the final release, so I have a good amount of headroom moving forward.

After making lots of tweaks such as splitting performance-heavy AI tasks such as pathfinding over multiple frames and ensuring only one enemy is generating a path at once, minimizing the number of collisions that are occuring, and of course the basic lua optimization stuff (making references ahead of time, reusing vectors and tables whenever possible), I’m hitting a bottleneck with the garbage collector. Despite my optimization efforts, the amount of enemies I’m stress testing still generate a decent bit of garbage every frame, and whenever the garbage collector’s threshold is reached (every few seconds) there is a noticable frame-drop, usually down to 30-40fps for the frame. Of course, with the amount of enemies that will be in typical encounters, the garbage collector won’t be triggered nearly as often, but when it is triggered I suspect a similar frame drop will occur, because it doesn’t seem to account for the amount of time elapsed in the frame (which makes sense because it’s just the normal lua garbage collector). I’ve verified the garbage collector is the issue by stopping it, and profiling the game. The frame drops are completely gone without it.

After figuring this out, I’ve been experimenting trying to manually trigger garbage collection steps on less intensive frames, which seems promising but still isn’t working quite how I’d think it would. I’d really appreciate any input:

First off, I stop the automated garbage collector, and trigger a complete collectgarbage() cycle whenever frame stability isn’t important (loading/unloading a room for example).

In the update loop of my level controller script (which I’ve verified is the first update to run in a frame by printing in it, in all the entities in the level, and in the render script - which comes later in the pipeline), I store os.clock() in a shared module to mark the start time of a frame. If anyone has any ideas of an earlier “start point” I have access to every frame that would probably help.

At the end of my render script’s update I have this:

local start_time = shared.start_time
	if start_time then
		local diff = os_clock() - start_time
		local time_left = fps_target - diff
		if time_left > collection_limit then
			local done 
			-- until garbage is all collected or further collection would exceed the threshold
			while not done and fps_target - diff > collection_limit do
				-- execute a step of garbage collection
				done = collectgarbage("step", 1)
				-- time elapsed 
				diff = os_clock() - start_time
			end	
		end
		
	end
	-- executing a garbage collector step seems to restart it, so stop it again
	collectgarbage("stop")

The fps_target is 1/60, since I never want the framerate to drop below 60fps, and the collection_limit is a variable number of seconds to reserve for other things that won’t be caught in diff (I’ve experimented with everywhere from 0.002 to 0.01).

what it does is calculate the amount of time left in that frame, assuming fps_target is the limit and that start_time was indeed taken from close enough to the start of the engine frame. Then, loop garbage collector steps until all garbage has been cleaned or the elapsed time of that frame (including the collector loops) would exceed the amount of ms to reserve for other tasks.

I’ve also experimented with passing different values to the collectgarbage(“step”) call, without making much difference in the issue I’ll outline. This is a large improvement over the automated collection for sure, the frame rate is much more stable, and when it dips it’s not nearly as hard, more like a 17-18ms frame vs like 30+ms, but I don’t understand why it’s happening. Even if I set the collection_limit to something extreme like 10ms, when that frame dip occurs the profile says my render script took around 11ms to execute, but it should have broken out of the garbage collector loop once 6ms had elapsed from the start of the level controller update, let alone the render script update.

Anyway, sorry for the long post. If anyone has some insight into how to handle memory management or notices I’ve done something done I’d really appreciate it!

britzl · May 18, 2022, 4:16am

I’m also interested in learning more about the best practices for GC optimization in Lua. There’s surprisingly few articles and discussions on the matter. I’ve seen best results when running the GC at regular intervals and in small steps. This avoids big hitches.

cornedbeer · May 18, 2022, 3:50pm

Yeah, my initial go at tackling this problem was just to run a step every frame, but I wasn’t really able to find a good balance of step size to stay on top of memory accumulation, while ensuring heavy frames were never too taxed by it. Additionally I really wanted to find a solution that isn’t so framerate-agnostic, because it’s frustrating seeing frame-dips caused by garbage collection surrounded by frames with several ms to spare, knowing there’s a lot of performance that could be squeezed out of them.

After some more experimentation the past day, I found a solution that seems to be working pretty well! I was on the right track before, but I think I may have been running into some approximation errors via os.cock() that meant I couldn’t find a balance of a tight enough garbage collection limit to keep the framerate from dipping below 60fps on otherwise busier frames, while keeping it broad enough to stay on top of the memory accumulation of these heavy frames. Additionally, running this sort of loop in a render script seems very unstable after I did more testing with it; I would seemingly randomly get crashes, I think when the loop made the render script’s update take too long, perhaps triggering some sort of failsafe.

To counteract these issues, I moved the garbage collector looping to my main loader script, which always runs after other updates, with the exception of the render script (The render script is fairly stable, and obviously needs to do the same thing every frame, so I feel comfortable just reserving a constant number of ms for it) and, most importantly, structured my code so that it would cycle between heavy processing frames (for now basically just the enemy pathfinding), and garbage collection frames.

The basic flow of it is:
enemy needs path → sends a request to the main controller where it is put on a queue

on controller update, if not shared.pathfinding_sent → send a message to the front of the queue, removing that element and setting the shared.pathfinding_sent flag

enemy receives the response to find a path and is given the sole token to process pathfinding → sets shared.block_collection flag; starts the pathfinding process, which is split into a coroutine-esque structure to run over multiple frames if need be, and to never consume more than ~3ms of a frame

if an enemy has the token on its update loop → continue the pathfinding, unless the process is complete, in which case give up the token and unset the shared.block_collection flag

Then, in the loader script (the outer level bootstrap collection that loads the actual scenes of the game):

-- toggle between intensive processing in enemy AI, and frames reserved for garbage collection
	if not shared.block_collection then 
		
		local start_time = shared.start_time
		if start_time then
			local diff = os_clock() - start_time
			local done
			
			-- step through garbage collection cycle until it is done or the frame is approaching 60fps
			while not done and fps_target - diff > collection_limit do
				done = collectgarbage("step", 1)
				
				local check = os_clock()
				diff = check - start_time
			end
			if done then
				-- could be a more generic state, but for now pathfinding is the only thing to alternate with collection
				shared.sent_pathfinding = nil 
			end
			
		end
		-- have to stop garbage collector again after stepping it, to avoid automatic collection cycles
		collectgarbage("stop")

The start_time is set from os.clock() at the start of the level controller’s update, which is the first update of the frame. When the garbage collector is freed (an enemy has finished the pathfinding cycle),
while garbage collector hasn’t returned a finished cycle, as long as 60fps - the time elapsed so far this frame doesn’t exceed the threshold set to reserve some time for other processing every frame, take a garbage collector step. Once collector is done, unset the shared.sent_pathfinding flag, so the next enemy on the queue (if it needs one) can be sent a pathfinding request and the cycle starts again.

Doing it this way has allowed me to stay on top of memory accumulation while never dropping below 60fps in my stress test, so I feel pretty confident in the performance moving forward. Hopefully my explanation makes sense and can be of some help to people.

TL;DR
If you want to get the most out of each frame, in a game that has to be smooth and responsive while having some pretty processing-intensive features, I would recommend doing some manual memory management. Also think about ways you can structure your code where any processing that isn’t critical to run EVERY frame is split up to run over multiple frames, and staggered by frames that are reserved for garbage collection (and of course any basic stuff that has to happen every frame).

Lastly, the first advice anyone will give, but it is still relevant, is to try to minimize the amount of junk you accumulate every frame to begin with, to lighten the load this manual collection has to do. It’s a good habit to get into to avoid making a bunch of local data in loops, such as tables, vectors, functions, and to reuse these whenever possible, clearing or resetting the values rather than creating a brand new table, and never declaring a function within a loop when you can avoid it.

cornedbeer · May 11, 2023, 5:31pm

I’ve recently discovered an issue with Lua’s GC that may be good to know for anyone who was interested in this thread originally.

If you’re doing something like the method outlined in this thread to manually step the garbage collector, you may run into instances where a single step can take way too long. It took me a while to debug this, and for a while I was worried there was just something inherently non-deterministic about how the GC defines an undivisible step.

I can only speculate on the underlying reason why, but this is caused by having a table with a ton of direct elements. This probably won’t be all that impactful below like 1 million or so direct elements of a table, I only started running into it because I pre-process a ton of tilemap data, pathfinding data, etc. and stored them completely unfolded so I could access information on a given tile with a single lookup, rather than having to traverse a few nested tables.

Luckily, the only thing that causes this long step is having too many elements in a given table. Luckily, the number of elements isn’t recursive, so a table containing 100 tables of 100,000 elements each won’t noticeably impact the collector, but a table with 10,000,000 elements will cause horrible hitches on every step that “examines” that reference.

Assuming you’re already doing manual garbagecollector stepping, you can verify this is the case by putting this code in the init function of some object that won’t be deleted and printing whenever a step takes an unreasonable amount of time:

    self.foo = {}
	local outer_max = 1000
	local inner_max = 10000000 / outer_max
	for i = 1, outer_max do
		local t = self.foo[i] or {}
		self.foo[i] = t
		local row_offset = ((i - 1) * inner_max)
		for j = 1, inner_max do
			t[j + row_offset] = true
		end
	end

As you set outer_max higher (until you set it so high that (inner_max / outer_max) < outer_max), the long collector step will get shorter until it’s not an issue, despite the total number of data entries, and even unique indices, being stored staying the same.