How to measure performance for optimization?

rocamocha · August 23, 2022, 2:11am

What are some ways I can analyze the performance of a script more in-depth?
I’m trying to improve scalability of some things and I’ve used the profiler to determine what is taking the most time, and I’ve done any obvious optimizations I can think of, but for some more complex things I’m kinda lost on how to figure out where to start. Is there a way to time specific sections of my code? Is that even something that would be worth doing?

For context here are a few screenshots of the profiler.
This first one is the peak frame a few seconds after starting my game.

The second is after waiting a bit, probably a spike. But the spikes are semi-regular with this many objects, and are going up to 60-80 for the frametime but it’s kind of inconsistent (sometimes the spikes don’t happen at all for a really long time), so maybe it’s just some jankiness going on with my computer / background processes?

Pawel · August 23, 2022, 7:29am

Does each of the slimes have their own script and do something heavy in their update()?

Is so, you can think of making one game object with a slime_controller.script, that will be updating position and state of each of the slimes (which will be then only game objects with sprites + eventually collision shapes+script for taking collision messages, but only this) instead, this should impact the performance the most

rocamocha · August 23, 2022, 8:10am

I was afraid this was going to be the answer… I’ve been wanting to refactor to that for a while but my big question has been how do you handle initialization?

Do I initialize the state in the slime script and “subscribe” to the updates handled by the manager through a module? In other words, how do I let the single manager script know what to update?

Would it make sense to store the return from collectionfactory.create() into a module and iterate over the table containing those to pass them as a parameter for my logic functions? Should it be through message passing the id instead to avoid having to cleanup the module between scenes, since it will just be a new manager in the new scene?

There’s so many questions that I don’t know where to even start with, hence why I’ve been putting it off for so long…

Pawel · August 23, 2022, 9:07am

No problem, I’ve been putting it for so long too!

I am utilizing some of the ECS ideas. I have a script that has a table with tables containing data about objects to spawn, their properties and so on and where and when to spawn them. This script is a controller, so it responds to triggers and then traverses through this table to find objects meeting given criteria - then spawns them using factory. The object itself don’t have any logic except of message handling for collisions - that I actually pass to the controller. Then the controller given updates about their triggers (like found player, detected wall or so) updates their state and their positions basing on that state and its internal logic for such objects.

It may sound complicated, but I advise you to start with a single controller and a single object only.
Try to pull all the logic you can think of from the object (your slime) to the controller and try to create slime and update its transforms (position only, I guess) from the controller. If you create a script that controls the object and is outside this object - you will be then a top of the mountain and maintaining more objects will be a ride down it

How it translates to ECS?
The huge table with tables of objects data is a table of Entities (sometimes this huge table is called a World). It contains no logic, only objects’ data - so called Components - properties, names, hp, attack_range, attack_dmg and so on.
The controller script is one of the Systems. It could be divided into smaller chunks later on, like a system for only controlling position, only controlling HP and so on
The messages from objects trigger components passed to the controller aka System are Events that drives the System to make a change

So:

Yes, that’s a very good idea, except in Defold you can actually control most of the slime functionality through an external script, set go’s position, animation, thanks to specifiying in function a proper game object id or component id - a data that you can also store within this controller reach and not in slime script

For message passing I’m utilizing a slightly suited to my needs Dispatcher from Critique Gaming repository myself.

ross.grams · August 23, 2022, 11:36am

Am I reading this correctly? You only have 50 instances of slime.script, and they’re taking 18ms or more? That’s pretty extreme. I kind of doubt that 50 instances is enough that converting to a central manager system will save you that much (but it doesn’t hurt to try of course). It’s only the update function, message handling doesn’t appear to be taking any significant time.

Can you share the project or share the slime.script?

I use socket.gettime() to measure times for custom profiling, though I’m not sure how useful it will be for measuring parts of a single function call (I usually compare running different functions with 1 million or 100k iterations…). You could potentially sum up times between multiple scripts over multiple updates though.

britzl · August 23, 2022, 2:44pm

We’ve also added profiler support for measuring pieces of code. You can use profiler.scope_begin(name) and profiler.scope_end() to measure execution time of code:

cornedbeer · August 23, 2022, 3:13pm

Considering these spikes happen infrequently/inconsistently I think the most likely culprit is the garbage collector. Especially if you’re creating and discarding a lot of vectors/tables each frame, which is all but guaranteed if you’re using defold’s default vector math behaviors, you’ll end up triggering it every so often.

Another alternative would be if you have any particularly intensive parts of the slime update loop that don’t happen every frame, but could theoretically happen on the same frame for several of the slimes. However, this operation would have to be pretty intensive, maybe something like a complex pathfinding algorithm, so I would say the garbage collector is the first thing to investigate.

A very quick way to check if the garbage collector is the issue would be to just add the following line somewhere in your initialization:

collectgarbage("stop")

This will keep the garbage collector from being run unless you restart it with another call to collectgarbage(), with either “restart”, “step”, or “collect” as arguments (https://defold.com/ref/beta/base/#collectgarbage).

So, if turning off the collector fixes the intermittent hitches, you know you need to target your memory management. In which case you should look into refactoring as much as possible when you’re creating new tables/vectors/functions in loops, in addition to turning off the automatic garbage collection, but manually running some steps regularly.

This is what I’m doing for my garbage collection and it’s been working well, though you might need to experiment and tweak thresholds to stay on top of your memory accumulation:
(In a script that will only ever have one instance)

local function garbage_step(size)
	local done = collectgarbage("step", size)
	collectgarbage("stop")
	return done
end
local block_collection = false -- set based on state of loading
function update(self, dt)
	-- toggle between intensive processing in enemy AI, and frames reserved for garbage collection
	if not shared.block_collection then 
		local start_time = os.clock()
		local diff = 0
		local max_collection_time = 0.001 -- one ms / frame for collection
		local done
		while not done and diff < max_collection_time do
			done = garbage_step(0)
			diff = os.clock() - start_time
		end
		
		if done then
			shared.sent_pathfinding = nil
			shared.block_collection = block_collection -- prevent collection while a level is loading
		end
		
	end
end

There’s a little more going on here based on how I have my project structured and the need to block collection at certain points, but the important part is the while loop. Experiment passing different step sizes to the garbage_step helper function, and setting max_collection_time based on how much time you want to reserve for collection each frame, and keep an eye on your memory usage in the profiler in addition to just framerate.

If memory usage is ballooning rapidly then you need to allocate more time for collection/limit the amount of junk being created. However, if you notice a very small amount of memory accumulation, this is the engine running as normal and will eventually stabilize. To benchmark this “ambient” memory accumulation try running a full collection cycle every frame:

collectgarbage("collect")

and if you observe similar memory usage here, then it means your stepping is tuned to keep up with the junk you’re creating.

rocamocha · August 24, 2022, 12:53am

This doesn’t sound like it would be too difficult to refactor for my project. I’ve pretty much got all of my logic separate from the scripts already and I’ve already structured any modification of the data to be possible externally through it being a value in an object table, although the logic is still held with the data which brings me to @ross.grams thought.

From this response it’s obvious something is probably super bad with how I’m handling something with my data… so maybe going over how it’s structured can reveal some answers?

I’m using modules to ‘install’ relevant logic and states to the script component in init() into a container (currently using the self-reference as the container), with the logic being functions stored in the object the state for the system is held. So an example of a module, the logic and states being referred to as a system:

local m = {}

function m.install(self)
    local system = {}
    system.some_state = true
 
    function system.update(self, dt)
        -- logic
        self.variable = outcome_of_logic
    end
    return system
end
return m

And the usage in the script is:

local system = require 'system'

function init(self)
    self.system = system.install(self)
end

function update(self)
    self.system.update(self, dt)
end

Some of the systems access eachother’s state externally from within their own logic from the context of their own calling script.

-- could be in the module from before wrapped in install()
function system.collision(data)
    local c = {}
        -- logic that uses passed data (state from another system)
        return c
    end
end

(I’m aware this couples the systems and makes it more of an OOP approach)

Is the logic being stored adjacent to the state for each instance maybe causing an issue? Or perhaps my states being stored in the self-reference is problematic? Those are the two things I could think of aside from the logic itself.

rocamocha · August 24, 2022, 2:33am

Haven’t tested it for the hitches yet. If it does end up explaining them, that’s great. What I did check for is if it could have something to do with the scalability, since it seemed like the hitches themselves also scaled with number of objects. It didn’t seem to make a difference. So the problem is more likely with my data or logic?

cornedbeer · August 24, 2022, 3:31am

So whether you have 1 slime or 50 slimes in the scene has no impact on the frequency/severity of the hitches? Because I would expect having more/less of them to impact how often the garbage collector is called, unless they’re not generating the bulk of your junk. But yes, when you get the chance to test it, it won’t take long to rule out the garbage collector if it isn’t the problem.

One other point that I forgot about in my previous post is that if the long frame is caused by the garbage collector, it’ll be reflected in the profiler as happening in whatever bit of code was running when the memory limit was exceeded and a garbage collection cycle was called. So it’s also possible that the slimes’ updates aren’t even generating most of your garbage, but happen to be where the collector was invoked on the frames you’re investigating.

rocamocha · August 24, 2022, 6:21am

The severity scales with the number of objects, but the frequency is inconsistent and seemingly random. And turning the garbage collector off had no improvement to the framerate when number of objects is scaled up. If scalability in that sense was being affected negatively by garbage collection I would expect turning it off to improve the framerate for more objects, right?

Sorry - just to be more clear, the main issue isn’t the random hitching. It’s that my frametime increases per object at a high rate and I don’t know how to go about improving that by reducing the rate somehow.

Uncapped FPS:

For no slimes - hovering around 700-850 → just the character controller in the test scene
Turning on the on-screen profiler reduces this to about half of that, but stops making a significant difference after 2 or 3 slimes.
With 1 slime it drops to → 680 - 720
2 slimes → 590 - 630
5 slimes → 450
10 slimes → 240
20 slimes → 160
50 slimes → 75
100 slimes → 50

EDIT: Did some more testing, making sure to keep an eye on memory and framerate, I’m pretty sure that it’s not related to my scalability problem. I think I’ve been pretty careful to avoid what I think you’re describing in the below quote - I’ve tried not to create new tables or vectors inside of loops where I can, only creating new ones either somewhere outside the scope of the loop or by making use of already existing data.

britzl · August 24, 2022, 10:03am

It’s really hard to say what’s wrong. I think you need to share some code we can look at.

rocamocha · August 24, 2022, 1:12pm

Ok, so as per @Pawel’s suggestions, I’ve done a quick refactor just to change the context of the logic functions being called.

The slime’s own update() is completely removed from the script now, and I just have a simple table in which I’m storing references to the self-reference of each slime script (since I’m using it as my data container).
In the controller I’m iterating through this table in a loop and calling the state modifications (logic) from here.

This prediction was correct, it had pretty much no impact on the frame times (pretty much the same numbers as in my above reply to @cornedbeer). So it’s likely just the logic itself then is what I concluded from that.

My game is tile-based, using a collision system based off of the methods described here. When I don’t calculate the tile collision info, I get about 250 (prev 80) with 50 slimes. With 100 slimes the improvement is from 50fps to 130.

I’ve attached the tilecollision calculation module (it’s pretty meaty → slopes are an absolute pain in the a**, especially if you want the ability to jump and drop through them, retro tilebased). One thing I tried was commenting out the last calculation function, which is the least intensive of them and I was able to get 10fps (50 slimes). Pretty much the only thing that’s happening in that function is a few tilemap.get_tile()wrapped in an evaluation to avoid an error when the object checking is out of tilemap bounds:

function get_tile(tm, layer, x, y) 
	local _x, _y, _w, _h = tilemap.get_bounds(tm) -- first check if the tile exists to avoid error
	if (x >= _x and x <= _w+_x-1) and ((y >= _y and y <= _h+_y-1 and _h ~= 1) or (_h == 1 and y == _y)) then
		return tilemap.get_tile(tm, layer, x, y) -- returns the tile's index from tilesource
	end
end

Can this be optimized somehow? I’m getting a lot of tiles this way due to how the heightmasked tile calculations work. Would it be faster than tilemap.get_tile() to get and store every tile of the tilemap into a table and retrieve them from there?

tilecollision.lua (36.2 KB)

The other part of logic that seems to make an impact when left out is handling collision interaction with other objects (interaction.lua), and updating the data structure I’m using to group potential collisions (partition.lua).

interaction.lua (2.6 KB)
partition.lua (3.8 KB)

The calls from interaction.lua that use data.type as the index are calling this function:
(where aabb in this context is the state of self.aabb, the same state accessed through data from interaction.lua when an object checks another for collision)

    function aabb.collision(data)
        local collision = {}
        collision.left_difference = (aabb.world_position.x + xradius) - (data.aabb.world_position.x - data.aabb.xradius)
        collision.right_difference = (aabb.world_position.x - xradius) - (data.aabb.world_position.x + data.aabb.xradius)

        local combined_radius_y = aabb.yradius + data.aabb.yradius
        local object_diameter_y = combined_radius_y * 2
        collision.top_difference = aabb.world_position.y - data.aabb.world_position.y - 4 + combined_radius_y
        collision.distance_x = (aabb.world_position.x < data.aabb.world_position.x and collision.left_difference or collision.right_difference)
        collision.distance_y = (aabb.world_position.y < data.aabb.world_position.y and collision.top_difference + 4 or collision.top_difference - object_diameter_y)
        collision.active = not (collision.left_difference < 0 or collision.right_difference > 0 or collision.top_difference < 0 or collision.top_difference > object_diameter_y)

        collision.enter = (collision.active and not table.value_exists(aabb.collision_enter, data))
        if collision.enter then
            table.insert(aabb.collision_enter, data)
        end

        collision.exit = (not collision.active and table.value_exists(aabb.collision_enter, data))
        if collision.exit then
            for i, v in ipairs(aabb.collision_enter) do
                if v == data then
                    table.remove(aabb.collision_enter, i)
                end 
            end
        end

        collision.eclipse = {
            -- check if object is fully within the given bounds of the object in question
            x = collision.left_difference > aabb.xradius * 2 and collision.right_difference < aabb.xradius * -2,
            y = collision.top_difference >= aabb.yradius * 2 and collision.top_difference <= data.aabb.yradius * 2,
            full = collision.eclipse_x and collision.eclipse_y
        }
        
        return collision
    end

This is the slime script (not refactored to central controller) if needed to make sense of all that description on how those parts work together.
slime.script (2.5 KB)

britzl · August 24, 2022, 1:30pm

YES! If the tilemap doesn’t change or changes infrequently it will be much faster to cache the tilemap data in a Lua table.

rocamocha · August 24, 2022, 1:44pm

This is most definitely the worst offender then…
Probably some other things that can be sped up in those other places, but looks like I’ll be working on this first.

I forgot to mention also,

I couldn’t quite figure out how to use these. Where does the info show? It probably doesn’t help that the on-screen profiler seems to have changed its look after updating… Nothing’s changed in my code setting up the on-screen profiler, how do I change it back?

britzl · August 24, 2022, 1:52pm

It should show up as a subitem to the Script or Update-scriptc section on the left.

You can’t change it back. That is the new look

Remember that there’s also the webprofiler!