Some Tests. Lua Performance

Hello everyone!

I saw this site and decided to check how it looks in the Defold.

Here is the source code: LuaPerformance.zip (75.6 KB)

Сontents:

  1. Localize of standart Lua functions
  2. Unpack a table
  3. math.max vs >
  4. if == nil vs or
  5. ^ vs *
  6. math.fmod vs %
  7. Function as param for other function
  8. For pairs, For ipairs, for i
  9. obj[“name”] vs obj.name
  10. Localized table object
  11. Adding elements to the end of the array
  12. Adding elements to an already created table
  13. Insert in the middle of an array

The elapsed time is calculated like this:

function test(test_func)
    local start_time = os.clock()
    test_func()
    return os.clock() - start_time
end

The MAX constant for html is less than for other platforms.

local MAX = html5 and 1000000 or 10000000

Tested on platforms available to me: Windows (AMD Ryzen 7 3700X, Radeon RX 580 4GB), Android (OnePlus 7T Pro), Web (Chrome Windows)

So let’s get started

Test 1. Localize of standart Lua functions

Code 1:

for i = 1, MAX do
    local a = math.sin(i)
end

Code 2:

local sin = math.sin
for i = 1, MAX do
    local a = sin(i)
end
NUM WIN HTML ANDROID
1 0.003 (150.00%) 0.074 (139.62%) 1.779 (107.62%)
2 0.002 (100.00%) 0.053 (100.00%) 1.653 (100.00%)

As you can see, when the function is localized it actually work faster.

Test 2. Unpack a table

Set the table:

local a = {1, 2, 3, 4}

Code 1:

for i = 1, MAX / 10 do
    local x = math.min(a[1], a[2], a[3], a[4])
end

Code 2:

for i = 1, MAX / 10 do
    local x = math.min(unpack(a))
end

Code 3:

local function unpack4(a)
    return a[1], a[2], a[3], a[4]
end
for i = 1, MAX do
    local x = math.min(unpack4(a))
end
NUM WIN HTML ANDROID
1 0.000 (0.00%) 0.014 (100.00%) 0.040 (100.00%)
2 0.031 (3100.00%) 0.018 (128.57%) 0.055 (134.96%)
3 0.017 (1700.00%) 0.166 (1185.71%) 0.452 (1118.00%)

Unpack is expected to take longer. Interestingly, returning multiple variables from a function is even slower.

Test 3. math.max vs >

Code 1:

local x = 1
for i = 1, MAX do
    x = math.max(i, x)
end

Code 2:

local x = 1
for i = 1, MAX do
    if i > x then
        x = i
    end
end
NUM WIN HTML ANDROID
1 0.234 (100.00%) 0.066 (550.00%) 0.233 (206.89%)
2 0.475 (202.99%) 0.012 (100.00%) 0.113 (100.00%)

I increased the number of iterations for Windows, so that there was a measurement. math.max is slow everywhere except Windows. Magic :grin:

Test 4. if == nil vs or

Code 1:

for i = 1, MAX do
    local y, x
    if i > 500 then
        y = 1
    end
    if y == nil then
        x = 1
    else
        x = y
    end
end

Code 2:

for i = 1, MAX do
    local y, x
    if i > 500 then
        y = 1
    end
    x = y or 1
end
NUM WIN HTML ANDROID
1 0.056 (101.82%) 0.018 (105.88%) 0.198 (104.54%)
2 0.055 (100.00%) 0.017 (100.00%) 0.189 (100.00%)

As you can see there is almost no difference. Although or is faster by a few milliseconds. Worthwhile optimization :laughing:

Test 5. ^ vs *

Code 1:

local x = 5
for i = 1, MAX do
    local y = x ^ 2
end

Code 2:

local x = 5
for i = 1, MAX do
    local y = x * x
end
NUM WIN HTML ANDROID
1 0.002 (100.00%) 0.011 (183.33%) 0.372 (539.65%)
2 0.002 (100.00%) 0.006 (100.00%) 0.069 (100.00%)

Operator ^ is slower than direct multiplication.

Test 6. math.fmod vs %

Сode 1:

for i = 1, MAX do
    if (math.fmod(i, 30) < 1) then
        local x = 1
    end
end

Code 2:

for i = 1, MAX do
    if ((i % 30) < 1) then
        local x = 1
    end
end
NUM WIN HTML ANDROID
1 0.214 (375.44%) 0.091 (650.00%) 0.875 (366.47%)
2 0.057 (100.00%) 0.014 (100.00%) 0.239 (100.00%)

The % operator is faster than the math.fmod function.

Test 7. Function as param for other function

local func1 = function(a, b, func)
    return func(a + b)
end

Code 1:

for i = 1, MAX do
    local x = func1(1, 2, function(a)
        return a * 2
    end)
end

Code 2:

local func2 = function(a)
     return a * 2
end
for i = 1, MAX do
    local x = func1(1, 2, func2)
end
NUM WIN HTML ANDROID
1 0.426 (21300.00%) 0.112 (196.49%) 0.711 (220.07%)
2 0.002 (100.00%) 0.057 (100.00%) 0.323 (100.00%)

Declare functions in advance and you will be happy. Especially on Windows :upside_down_face:

Test 8. For pairs, For ipairs, for i

Filling array:

local arr = {}
for i = 1, 100 do
    arr[#arr + 1] = i
end

Code 1:

for i = 1, MAX do
    for j, v in pairs(arr) do
        local x = v
    end
end

Code 2:

for i = 1, MAX do
    for j, v in ipairs(arr) do
        local x = v
    end
end

Code 3:

for i = 1, MAX do
    for i = 1, 100 do
        local x = arr[i]
    end
end

Code 4:

for i = 1, MAX do
    for i = 1, #arr do
        local x = arr[i]
    end
end

Code 5:

local length = #arr
for i = 1, MAX do
    for i = 1, length do
        local x = arr[i]
    end
end
NUM WIN HTML ANDROID
1 pairs 3.645 (782.19%) 4.722 (299.81%) 6.640 (100.00%)
2 ipairs 0.653 (140.13%) 5.106 (324.19%) 12.256 (184.59%)
3 i const 0.466 (100.00%) 1.639 (104.06%) 9.133 (137.56%)
4 i length 0.542 (116.31%) 1.600 (101.59%) 9.436 (142.12%)
5 i local # 0.548 (117.60%) 1.575 (100.00%) 9.115 (137.28%)

Can anyone explain what is going on here? :laughing:
All right, then. Keep your secrets.

Test 9. obj["name"] vs obj.name

Code 1:

local as = {foo = 5}
for i = 1, MAX do
    local x = as["foo"]
end

Code 2:

local as = {foo = 5}
for i = 1, MAX do
    local x = as.foo
end
NUM WIN HTML ANDROID
1 0.002 (100.00%) 0.016 (106.67%) 0.111 (100.00%)
2 0.002 (100.00%) 0.015 (100.00%) 0.111 (100.11%)

No difference.

Test 10. Localized table object

local ad = {}
for i = 1, 100 do
    ad[#ad + 1] = {x = i}
end

Code 1:

for i = 1, MAX do
    for n = 1, 100 do
        ad[n].x = ad[n].x + 1
    end
end

Code 2:

for i = 1, MAX do
    for n = 1, 100 do
        local y = ad[n]
        y.x = y.x + 1
    end
end
NUM WIN HTML ANDROID
1 1.292 (100.47%) 5.411 (129.48%) 28.404 (115.04%)
2 1.286 (100.00%) 4.179 (100.00%) 24.691 (100.00%)

There is a difference, but it is very minimal.

Test 11. Adding elements to the end of the array

Code 1:

local a = {}
for i = 1, MAX do
    table.insert(a, i)
end

Code 2:

local a = {}
for i = 1, MAX do
    a[i] = i
end

Code 2:

local a = {}
for i = 1, MAX do
    a[#a + 1] = i
end

Code 4:

local a = {}
local count = 1
for i = 1, MAX do
    a[count] = i
    count = count + 1
end
NUM WIN HTML ANDROID
1 0.070 (100.00%) 0.167 (759.09%) 1.527 (587.68%)
2 0.071 (101.43%) 0.022 (100.00%) 0.260 (100.00%)
3 0.099 (141.43%) 0.121 (550.00%) 1.323 (509.16%)
4 0.084 (120.00%) 0.024 (109.09%) 0.329 (126.55%)

As you can see table.insert is slower than a[#a + 1] except for Windows.

Test 12. Adding elements to an already created table

Code 1:

for i = 1, MAX do
    local a = {}
    a[1] = 1
    a[2] = 2
    a[3] = 3
end

Code 2:

for i = 1, MAX do
    local a = {true, true, true}
    a[1] = 1
    a[2] = 2
    a[3] = 3
end
NUM WIN HTML ANDROID
1 1.279 (42633.33%) 0.391 (197.47%) 2.291 (303.59%)
2 0.003 (100.00%) 0.198 (100.00%) 0.755 (100.00%)

It is better to immediately set the elements of the array, especially on Windows

Test 13. Insert in the middle of an array

Code 1:

local a = {}
for i = 1, 50 do
    a[#a + 1] = i
end
for i = 1, MAX / 1000 do
    table.insert(a, math.floor(#a / 2), i)
end

Code 2:

local function custom_insert(arr, pos, value)
    local curr = arr[pos]
    arr[pos] = value
    local i = pos + 1
    local nv
    while curr ~= nil do
        nv = arr[i]
        arr[i] = curr
        i = i + 1
        curr = nv
    end
end

local a = {}
for i = 1, 50 do
    a[#a + 1] = i
end
for i = 1, MAX / 1000 do
    custom_insert(a, math.floor(#a / 2), i)
end
NUM WIN HTML ANDROID
1 0.025 (100.00%) 0.004 (100.00%) 0.070 (100.00%)
2 0.028 (112.00%) 0.010 (250.00%) 0.953 (1362.79%)

It’s better to insert in the inside with table.insert

Conclusions

Lua behaves differently on different platforms. How to do better is up to you.

It will be interesting to see how ios behaves. If anyone checks share the file, please. :slightly_smiling_face:

31 Likes

Great tests and analysis, thanks for sharing! :heart:

It would be probably good to always test your actual solution, for micro-optimizations, but it is only needed when you are actually bumping into bottlenecks :wink:

Here are my results from Ubuntu 20.04:

Test Num Ubuntu
1 1 0.014 (105.01%)
1 2 0.013 (100.00%)
2 1 0.002(100.00%)
2 2 0.049(2782.84%)
2 3 0.020(1123.60%)
3 1 0.011(175.61%)
3 2 0.006(100.00%)
4 1 0.055(100.79%)
4 2 0.055(100.00%)
5 1 0.003(100.46%)
5 2 0.003(100.00%)
6 1 0.356(827.54%)
6 2 0.043(100.00%)
7 1 0.522(18063.53%)
7 2 0.003(100.00%)
8 1 3.847(866.83%)
8 2 0.843(190.03%)
8 3 0.520(117.21%)
8 4 0.626(141.08%)
8 5 0.444(100.00%)
9 1 0.002(101.09%)
9 2 0.002(100.00%)
10 1 1.205(100.00%)
10 2 1.207(100.20%)
11 1 0.060(125.83%)
11 2 0.048(100.00%)
11 3 0.078(162.77%)
11 4 0.064(133.63%)
12 1 1.245(49074.09%)
12 2 0.003(100.00%)
13 1 0.023(100.00%)
13 2 0.034(148.63%)

So in general there are almost no difference for Linux, except tests:

  • 2 (Unpack a table) - avoid unpack (like all other platforms)
  • 3 (math.max vs >) - prefer > (unlike Win, which is strange for Win itself)
  • 6 (math.fmod vs %) - prefer % (like all other platforms)
  • 7 (Function as param for other function) - declare in advance (like all other platforms)
  • 8 (For pairs, For ipairs, for i) - the result here is probably mostly because of localizing size of array (like HMLT5), especially using pairs is astonishingly slow
  • 12 (Adding elements to an already created table) - add upfront (like all other platforms), look at the vast difference here

Test results:
sharelog.txt (2.4 KB)

The most important outcome of the tests is that you generally should localize everything, store upfront and prefer operators over functions. This is, I guess, because of the specifics of the Lua mainly and it’s time needed to manage memory, if carelessly written :wink:

9 Likes

Thanks for this :slight_smile:

1 Like

I’d be interested to know how it performs on windows on Arm when defold supports it.

It’s also important to remember that performance characteristics are different with LuaJIT, LuaJIT interpreter (used on iOS and Switch), and vanilla Lua 5.1 (used on HTML5). Manual: Lua versions

You may find this set of similar benchmarks interesting, that compares LuaJIT, LuaJIT interpreter and Lua 5.1. It discusses the results and give recommendations: https://gitspartv.github.io/LuaJIT-Benchmarks/