Collection Parser?

Has anyone written a parser for Defold’s collection file format? Just checking before I start working on my own.

I tried to look at DefTree, which I thought might be doing it (albeit in python), but it seems like the source code has been moved or made private.

Assuming the answer is “No”, any tips or advice on doing it would be appreciated.

1 Like

Forgot about that - DefTree is now public again.

3 Likes

Assuming the answer is “No”, any tips or advice on doing it would be appreciated.

The format is in the Protobuf format, so if we could get access to the definition files then it would be quite easy to read them as that. Without them it is all about doing it the hardway.

3 Likes

@Jerakin Awesome, thanks!

@Mathias_Westerdahl, @sven You’re lurking so you’re getting tagged :slight_smile: Any chance we can get our hands on those definition files? I know sicher commented a while ago:

…but if it’s just a matter of sharing some .proto files when it changes (which I assume is fairly rare), that doesn’t seem too much to ask.

2 Likes

Haha, I’m always lurking! :smiley: But yes, I can see the benefit but also the reasoning why we haven’t shared them yet.
In my own opinion it makes sense to share these, maybe along with Defold SDK headers in some way. We should bring it up in the team once more and discuss it.

10 Likes

Cool, thanks! :defold:


Meanwhile I am going at it the hard way. Throwing together something hacky to parse it wasn’t actually too bad, but then I remembered that to do anything useful I would have to put it back into the original format, and I died a little inside (and am now rewriting). Haha. I guess it wasn’t really designed to be easily legible. Some parts are really clean and easy, but then you get stuff like this:

33-1

…which isn’t exactly ideal. It would be so much nicer if it was more like this:

33-2

But anyway. I’m getting there!

7 Likes

Oh yeah - tell me about it. That stuff was what I had to spend most of my time on. My ugly code have a lot of “do this, well except if the tag is data then do something completely different”. I think I ended up “unescaping” it first, but of course the data can also have data in it… so yeah - have fun :upside_down_face:

What are you writting it in?

4 Likes

Yup, that blasted data tag! Everything else is totally simple! Yeah, first I un-indent the line. Then if it’s inside a ‘data’ block I un-escape, un-quote, un-indent again, and remove the newline “\n” from the end. I probably don’t need to explicitly un-indent, but I was trying to set it up so I could just do everything in reverse to put it back… The above examples are before and after running it through my “cleaner” script. Now I’m trying to finish up the reverse process, trying to figure out the last few annoying exceptions to the rules.

I’m doing it with Lua. Following my tradition of making tools for Defold with Defold. :slight_smile:

5 Likes

Well folks…it works. (barely (and with caveats))

7 Likes

Have you worked on this some more?

I bet it would be awesome with Editor Scripts 🔥: Alpha Release, we don’t have access to the protobufs yet so this could be neat. :grin:

Bet the community wouldn’t mind helping out on it :slight_smile:

6 Likes

I agree. The new editor script system seems almost tailor-made for messing with collections like this. All the more reason they should release the protobuf specs! Or at least change how that darn “data” tag works so it’s not such a mess, hah.

Here you go: https://github.com/rgrams/defold_collection_parser

I just cleaned it up a tiny bit, but it’s kind of “as is”. It seems to be fairly solid, but I’m sure you could break it fairly easily if you named your objects weird or something, and it doesn’t have . . . any error checking? :slight_smile:

So it pretty much just converts the collection (or game object, or component) file directly into a lua table as you would expect from the original format. It puts a few extra things in there since I was using this to make my own editor, but I think you can just ignore those. Where Defold’s files use the same tag repeatedly (like “embedded_instances”), it converts them into a sequence table. Just try opening files and pprint-ing the output to see what it’s like.

3 Likes

Yes, this will happen soon. My current best guess is end of October.

4 Likes

Awesome!


I just pushed some little tweaks to my parser so it works with editor_scripts. They don’t have access to vmath, so the parser can’t convert transform data to vectors and quats, so they just stay as normal tables with x, y, z, w fields. They also don’t have socket, which I was using to check the time it took to parse, so I just took that out. And I removed or commented out all the debug print statements.

Here’s a stupid-simple little editor_script to count the number of embedded game objects in a collection file:
(Put it in a folder with the parser module and double-check the require path.)

Code
local M = {}

local parser = require "editor_scripts/collection_parser"

local commands = {
    {
        label = "Count Embedded Instances",
        locations = {"Assets", "Outline"},
        query = {
            selection = { type = "resource", cardinality = "many" }
        },
        run = function(opts)
            for i,node_id in ipairs(opts.selection) do
                local path = editor.get(node_id, "path")
                path = string.sub(path, 2) -- Cut off "/" prefix.
                local extension = string.match(path, "^.*%.(.*)$")
                if path and extension == "collection" then
                    local file, err = io.open(path, "r")
                    if not file then
                        print(err)
                    else
                        local data = parser.decodeFile(file, path)
                        file:close()
                        if data then
                            local inst = data.embedded_instances
                            local count = inst and #inst or 0
                            print("", count .. " embedded_instances in " .. path .. ".")
                        end
                    end
                end
            end
        end
    }
}

function M.get_commands()
    print("Collection Parser Test Extension Loaded.")
    return commands
end

return M
2 Likes

Sorry for archeology, but this is the question that worries me from time to time. Why is the file sometimes with normal formatting and sometimes with escaping? Language syntax highlighting plugins (vscode/atom) don’t work in these cases.

2 Likes

Are you talking about the collection file format?

If so, then to be honest I have no idea. I don’t actually know how the protobuf format works or how Defold deals with them, I just took what I saw and translated it. I hope there’s some worthwhile technical reason why they’re formatted this way, because it’s pretty terrible on the human-readable standpoint. They’re definitely not designed to be manually edited.

2 Likes

Curious to know the reason for using protobuf as well. There is a lot of automation that could happen externally with an easier format to handle.

It’s easy edit, and update.
Supports both text and binary format (editor vs engine)
Easy to generate language bindings for (python, java, c++)
Relatively light weight.
Robust (written/used by Google).

There is a lot of automation that could happen externally with an easier format to handle.

The problem I think is that we currently don’t provide the .proto files in our dmSDK. I’m actually working towards that right now. Once that’s done, you can use regular protobuf tools (I assume they exist). Or read our tools to see how we manipulate the files.

3 Likes