Orders of magnitude slower when running in the editor vs. bundled (SOLVED)

I’m doing a kind of roguelike lighting in a tilemap and I’m running this bit of code:

local function los(x0,y0,x1,y1, callback)

local sx,sy,dx,dy

if x0 < x1 then
	sx = 1
	dx = x1 - x0
else
	sx = -1
	dx = x0 - x1
end

if y0 < y1 then
	sy = 1
	dy = y1 - y0
else
	sy = -1
	dy = y0 - y1
end

local err, e2 = dx-dy, nil

if not callback(x0, y0) then return false end

while not(x0 == x1 and y0 == y1) do
	e2 = err + err
	if e2 > -dy then
		err = err - dy
		x0  = x0 + sx
	end
	if e2 < dx then
		err = err + dx
		y0  = y0 + sy
	end
	if not callback(x0, y0) then return false end
end

return true
end

...

function light(director, source_coord, radius, tile, line_of_sight)
	for y = -radius, radius do
	for x = -radius, radius do
		local tile_x = tile_source_coord.x + x
		local tile_y = tile_source_coord.y + y
		if tile_x >= tile_min_x and tile_y >= tile_min_y and tile_x < tile_w and tile_y < tile_h
		and x * x + y * y <= radius * radius then
			if tilemap.get_tile(tilemap_url, "walls", tile_x, tile_y) ~= 0 or tilemap.get_tile(tilemap_url, "floor", tile_x, tile_y) ~= 0 then
				if line_of_sight then
					
					local has_line_of_sight = bresenham.los(tile_x, tile_y, tile_source_coord.x, tile_source_coord.y, function(x, y)
						if tile_x == x and tile_y == y then
							return true
						end

						local wall_tile = tilemap.get_tile(tilemap_url, "walls", x, y)
						return wall_tile == 0
					end)
					
					if has_line_of_sight then
						tilemap.set_tile(cover_url, "cover", tile_x, tile_y, tile)
					else
						local prev_tile = tilemap.get_tile(cover_url, "cover", tile_x, tile_y)
						if prev_tile == 0 then
							tilemap.set_tile(cover_url, "cover", tile_x, tile_y, COVER_TRANSPARENT_TILE)
						end
					end
				else
					tilemap.set_tile(cover_url, "cover", tile_x, tile_y, tile)
				end
			end
		end
	end
end

For some reason, if I run the light() function in the debugger, it’s horribly slow, specifically when it calls the los() function.
With a radius of 10, that gives me 236 executions of that function, and 1339 executions of the callback function as the bresenham los function iterations towards the center to see if there are blocking walls.

One run of this takes a staggering 120 ms to run.
If I do a build and run, it’s usually below 1 ms.

What’s more, if I download and include this lua profiler (https://github.com/charlesmallah/lua-profiler) and do a profile.start() and profiler.stop() around this function, it dramatically reduces the execution time to 20 ms.

I’m just flabbergasted about what I’m doing that’s hitting the performance to badly. The anomymous inline callback function isn’t it either.

What I’ve done to get this under control - after realizing that the profile I’m using uses debug_hooks. Is I wrap this problematic function in

local debug_f, debug_mask, debug_count = debug.gethook()
debug.sethook()
...
debug.sethook(debug_f, debug_mask, debug_count)

And that times the execution time down from 120 ms to less than 1 ms.
The only down side is I can’t place breakpoints inside this function.

These calls will probably have a pretty big impact on performance. You will call this function a lot and there’s a significant overhead in calling into the engine to get tiles from a layer in a tilemap. I would suggest that you do not use the tilemap as the true representation of your level, or at least not for calculating line of sight. I’m pretty sure you will get a big boost from having a Lua table representation of your map and use that when calculating LOS.

If you don’t want to do a big refactor of your data representation you could do it half-way and before calling the light function build a Lua table representation of the tiles from -radius to +radius. It could give you a performance boost, at least in an open map where you’d have to iterate over most of the tiles.

There’s also some basic things like caching tilemap.get_tile() and tilemap.set_tile() to local functions to get rid of the table lookup each time you want to call get_tile() and set_tile(). It’s probably too few call per frame for it to be noticeable though.

4 Likes