[Colyseus] ERROR:WEBSOCKET: Failed to setup callback when player joins room

Hi all,

I have code that functionally looks like this:

splash.gui_script

-- When the user presses Create Game
function create_game()
    client:join_or_create("lobby", {}, function(err, _room)
        if err then
            print("Failed to create the lobby:", err)
            return
        end
        room_module:set_room(_room)
    end)
end

create_game()
r = room_module:get_room()
if r and r.state.players and r.state.players[r.sessionId] then
    msg.post("main:/loader", "load_level")
end

room.lua

local M = {}

function M.set_room(the_state, r)
    the_state.room = r
end

function M.get_room(the_state)
    return the_state.room
end

function M.add_callbacks(the_state)
    add_callbacks(the_state.room)
    print("Added room callbacks")
end

function M.new(r)
    local state = {
	room = r
    }
    return state
end

function add_callbacks(room)
    room.state.players.on_add = function(player, sessionId)
	print("new player", sessionId, player)
    end
end

loader.script

local function load_level(self)
    msg.post("#level", "load")
end

function on_message(self, message_id, message, sender)
    if message_id == hash("load_level") then
        unload_splash(self)
	load_level(self)
    end
end

level.script

local room_module = require "room"

function init(self)
    room_module:add_callbacks()
end

Functionally this is how it works:

  1. Load into main menu (splash).
  2. Press create game, which causes the code in splash.gui_script to run.
  3. The room object is created and stored globally in room.lua.
  4. splash.gui_script sends a message to loader.script to tell it to unload the splash and load the level.
  5. The level calls add_callbacks to add the callbacks to room. This ostensibly works correctly, because I see Added room callbacks after this point.

What I expect to happen is, after this point, if someone joins the game, I should see a message like this:

new player <session_id> <player>

However instead I see this:

ERROR:WEBSOCKET: Failed to setup callback

Importantly, I only see this error when the second player joins the room, not when I add the callbacks.

My schema pretty much looks like this:

export class MyLobbyRoomState extends Schema {
  @type({ map: Player })
  players = new MapSchema<Player>();
}

Player itself just has 3 fields in it: x: number, y: number, and name: string.

As far as I can tell, everything looks fine in the backend. I have logging that shows the two players connect:

onCreate: Created room: KRafbAeo9
onJoin: Joined: KRafbAeo9
onJoin: Options: []
Connected clients: [ 'kQITKomRu' ]
onJoin: Joined: KRafbAeo9
onJoin: Options: []
Connected clients: [ 'kQITKomRu', 'CHaf_yUQm' ]

If I add the callbacks immediately in set_room in room.lua, the callback works at first (it gets called for my own player joining). Later however, when the second player joins, I see the same websocket error, instead of the callback being called for the second player, which is very strange.

Furthermore, even if I could get that approach to work, it is too early to register the callbacks because at that point, the level has not loaded yet, since I join the room first and then load the level. I also feel like this is still a race, since the state change may happen before I assign the callback. The current approach, where I add the callbacks later, has the same problem. It seems to imply that I have to process the initial state first before relying on the callbacks on state change. Instead, I would love most of all to have all the callbacks get called retroactively based on the initial state. For example, if there are 2 players when someone joins, the callbacks for state.players.on_add gets called twice, but only after I have assigned the callback.

So two questions:

  1. How do I get more details about this error, ERROR:WEBSOCKET: Failed to setup callback. Currently it is very opaque, it doesn’t tell me what specifically failed in setting up the callback.
  2. What is the best way to get it that I only have to write callbacks for state changes, instead of that and an additional function that processes the initial state. When I look at examples like this or this by @endel they seem to just set the callbacks without doing any initial state processing, so I feel this must be possible.

Any help would be very much appreciated!

Hi @banool, thanks for the thorough explanation.

This error message comes from the Defold WebSocket extension: https://github.com/defold/extension-websocket/blob/0f841f16afb9ffdb24eefb2e9f661d0139f0bf49/websocket/src/websocket.cpp#L401

Everything looks correct by reading your post, perhaps you found a new undiscovered issue? Could you please share your source-code we me and perhaps @JCash and/or @britzl could have a look?

Cheers!

1 Like

The first though I have is that perhaps the callback has been unloaded?
Is the “splash” gui still loaded at this point?

1 Like

Hi, I’ll post the source in a couple of hours, currently afk.

In this iteration of the code, the splash collection remains loaded the entire time. Ideally this is what happens though:

  1. Create room from splash
  2. Assign it to global lua module.
  3. Unload splash
  4. Load level
  5. Add callbacks to room from level
  6. All the state changes then get stepped through.

But for now just getting the callback to even run properly when applied right after step 2 from the splash collection would be good.

I can also try to verify that the callbacks have not been unloaded, by printing state.players.on_add from the level’s update function.

1 Like

Well, there will always be a problem of unloading the gameobject that created/added the websocket callback.
If the gameobject doesn’t exist, then the script component doesn’t either, thus the callback cannot be invoked.

1 Like

code.zip (784.3 KB)
Oh that’s quite interesting. I figured that once I created the room and added it to the global lua module, it would stay there forever. I guess the websocket is not held “within” the room, but instead bound to the lifetime of the game object from which it is made? I can try to rearchitect my approach so the room is made in a game object that is spawned in the bootstrapping collection and stays there the whole time.

@endel, I have attached the code. Repro like this:

Run backend:

cd backend
npm run start

Run frontend:

  1. Open in Defold, build and run.
  2. Click “Create Game”.

Run second frontend:

  1. Build for HTML 5. Or if you know another way to run a second instance of the game, do that.
  2. Type in the code for the room. You should see it in the logs from the backend. Press enter.

In the first instance, you should see the websocket error in the logs and the callback will not have been called. What should instead happen is you should see the print message from the callback, and the effects of the callback, such as spawning in a new character.

This is of course a lot of steps to repro. Is there something specific you might suspect for which I can try and gather more information for you?

I tried storing the room in a game object higher up as I described and that helped a bit. Specifically, when the second client joins, you can see the change reflected on the first client.

I’m finding the order of events a bit obtuse, but I have managed to get it to work like this somewhat:

  1. User presses create.
  2. Load level, unload splash
  3. Send message to higher level game object to create the room (still from splash script) and immediately bind the callbacks.

I feel like this is an antipattern, because the script from splash is still running even after I send the message to unload the collection that it is in.

What I did at first, and I think is more natural, is this:

  1. User presses create.
  2. Send message to higher level game object to create the room.
  3. Unload the splash, load the level.
  4. Bind the callbacks from init in level.

This doesn’t work though because if I bind the callbacks in level, they don’t get called for the state changes that have already happened.

I don’t really understand why it works in the first place but not the second, it feels like a race condition either way. With the first approach am I just getting lucky because the state has not come through yet and I manage to bind the callbacks first?

Another problem with the first approach is this way, I load the level even if joining / creating a room fails. This means I need more complex logic in the level that checks after x seconds if there is a room, and if not, goes back to the menu.

Can you all think of a way to make the second approach work?

It feels like maybe you should centralise the code that deals with the network communication and not split it up on multiple screens. I assume you already have some kind of controller collection to orchestrate screen/state transitions (splash -> level and so on). Couldn’t you do the same for the network communication? And your screens would communicate via messages to the centralised network manager (“create room”, “join game” etc).

2 Likes

The callback you registered “belongs” to the gameobject/scriptinstance where you called websocket from. This metadata (function/gameobject/scriptinstance) is stored internally as a “callback”.
If the gameobject doesn’t exist, there’s really no one to call, so we warn about that.

2 Likes

@Mathias_Westerdahl, makes sense! That’s what I’m doing now and it works a charm.

@britzl, I think I managed to make it work. In my mind there is still a race, but perhaps I just don’t understand when the room callbacks fire.

  1. User presses create game on splash.
  2. Sends message to top level object room. The script there accepts the message and runs create_room.
  3. Inside the create_room callback from client:join_or_create I try to join the room and on success, send a message to myself called room_join_result.
  4. room.script, upon receiving this message, checks whether it was successful (I include the field err in my message). On success, I send a message to the loader to unload the splash and load the level.
  5. When level loads, level.script calls a function to add the callbacks in its init.

To me it seems like there is a race between 3 and 5, where something could happen to room before I register the callbacks, but in my tests, I never see that race happen. I don’t know if I’m getting lucky or I just misunderstand how the state updates / callbacks work.

We probably need to ask @endel. I don’t know the inner workings of the Colyseus client.

1 Like

I’m having the same issue with Nakama.
Anyone made any progress on this question?

I’m having. Failed to setup callback
when connecting for the 3rd time.

I understand I have to do the socket connection only once, in a global object.
If anyone has a working architecture, I’d be happy to hear about it. thanks!

2 Likes