Extra Characters octet encoding in .font files

I’m making a game with a lot of non-English text.
I’ve created an Excel macro to output my game text into lua scripts, but I also want to modify the “extra_characters” field in the .font so that my font gets updated and can draw all the characters I’m using in my text.

I can create the .font file in my macro, but I’m having trouble figuring out how the extra characters are encoded.

When I open the .font in a text editor I see something like this:
extra_characters: "\346\227\245\346\234\254\350\252\236"
Which I can see is some sort of octet encoding for my unicode characters (日本語)

I’ve figured out the basic rules for this encoding (3xx means a new character, 2xx is the next part of the multi part sequence, and some other rules for how many parts a sequence has)

But I’m wondering if this is something that’s been documented and if an algorithm exists that I can use to ensure I’m encoding my characters the right way.

Thanks!

The extra_characters field is an UTF-8 encoded list of characters.

The character 日 is encoded in UTF-8 as: \xE6\x97\xA5 (note the hexadecimal notation \x) (https://mothereff.in/utf-8#日)

Hex 0xE6 is octal 346, 0x97 is octal 227 and 0xA5 is 245 and thus entered in extra_characters as \346\227\245

PS The extra_characters field accepts the hex representation as well.

3 Likes

Thanks Britzl,

I see where I was going wrong now. I was looking at this site (http://unicode.scarfboy.com/?s=日) and thinking I need to use the unicode string (\u65e5) instead of the UTF8 bytestring (\xe6\x97\xa5)

I think I have it sorted now.
I wasn’t too keen to write a proper UTF8 encoding algorithm in VBA script (ugh), so came up with this hack (which seems to work fine for my current text, but I can tell its going to come back and bite me later)

Public Function octToMultiOct(octString) As String
    'accepts an integer encoded as octects in a string, and will return a UTF8 bytestring encoding
    'for use with the defold font files
    Dim output As String        
    Dim length As Integer
    length = Len(octString)
   
    If length = 4 Then
        output = "\3" & Mid(octString, 1, 2) & "\2" & Mid(octString, 3, 2)
    End If

    If length = 5 Then
        output = "\34" & Mid(octString, 1, 1) & "\2" & Mid(octString, 2, 2) & "\2" & Mid(octString, 4, 2)
    End If
    
    If length = 6 Then
        'ignore the left most digit, it's always going to be 1 anyway... I hope :(
        output = "\35" & Mid(octString, 2, 1) & "\2" & Mid(octString, 3, 2) & "\2" & Mid(octString, 5, 2)
    End If
       
    octToMultiOct = output
End Function
2 Likes