Japanese / Chinese text line breaks

What to do about text that’s all on one line technically?

Here’s a machine translated sample

Do we need to manually apply line breaks in cases like these either in the text itself or code so that the linebreakless text fits better within text nodes or is there an option I’m not thinking of?

1 Like

There is an old issue to correctly handle line breaks in Asian (and other) languages; DEF-1509

However, there is support for zero-width space character that you could include in your strings. (Granted not the best solution though…)

In Lua, I think you could do something like this (I haven’t tried it myself):

local zws = "\xe2\x80\x8b"
local a = "モグは厚いかみそりの" .. zws .. "鋭いシダを突き破った。"
4 Likes

The “zero-width space” character “\xe2\x80\x8b” does actually produce spaces. This should probably be fixed to not happen by engine.

2019-05-13%2011_39_48-UTF8JapaneseText

If I add some negative text tracking it does look better

2019-05-13%2011_41_55-UTF8JapaneseText

Here is example project

UTF8JapaneseText.zip (2.1 MB)

This could probably be made smarter by detecting non-Japanese chars and not adding zero-width space char after them so they don’t split in half when mixed in but then that would mess up the tracking.

There are also some special situations to consider https://www.w3.org/TR/2009/NOTE-jlreq-20090604/#en-subheading2_1_7

2 Likes

We do nothing special with the zero-width space character except use it when calculating linebreaks. If the zero-width space character has some width it is probably because it’s like that in the font.

2 Likes

Opened font in FontForge

View-Goto

U+200B https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=string-literal

Width was at 512

Set to 0

Set leading back to 0 on the text node

Before

2019-05-13%2013_32_00-UTF8JapaneseText

After

2019-05-13%2013_32_20-UTF8JapaneseText

So should be possible to detect English words and not add zero width spaces within them. There are still more considerations until this is really good too.

1 Like

To add to the topic of zero width characters, there are other characters which should or should not have zero widths before/after. https://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages

Wow, that is an extensive set of rules! There’s basically two groups: 1) Characters not permitted on the start of a line and 2) Characters not permitted at the end of a line

Right now we only break on space + zero-width space and we never break in the middle of a word. I’m not really sure if we need to do anything else really?

1 Like

No, nothing is needed by the engine, but we do have to filter text to insert zero-width spaces based on these rules depending on language at the correct places before setting text on nodes / labels to have attractive text that has line breaks as expected.

1 Like

Here is a wip module meant to address the issues related to this topic. It is a start with following these rules Line breaking rules in East Asian languages - Wikipedia though it would be good if Defold itself was able to handle these rules on labels/text nodes. Maybe RichText would be a good option for implementing the rules since this is already sort of in its domain.

The wip module can for sure be improved to be more efficient / faster / implement more of the rules by someone who knows how to do those well. :slight_smile:

This image illustrates the kind of problems the WrapText module is meant to address. Because all characters are connected there is no natural white space to allow line breaks. But there are still situations where you don’t want certain characters to be left at the start/end of a line.

5 Likes

https://github.com/googlefonts/noto-cjk/tree/main/Sans/OTF/SimplifiedChinese

When testing with this specific font with Chinese characters attempting include the zero width space / insert it between Chinese characters does not seem to work.

2021-12-07 17_36_01-WrapText

Test string

页哨临蛤扩杯桃波楚淡啜遣帝虐能嚷在惨挑茉整精

​ Zero Width Space
U+200B

FontForge claims there’s a character there but seems weird to me. :thinking:

The other test font I was using seems to look more correct to me.

There must be something I don’t understand in relation to Simplified Chinese fonts with zero width space.

I tried using this https://github.com/akiirui/RobotoCJKSC to test if it was an OTF issue somehow and it has the ZWS space listed as I’d expect at least but still shows ~ in engine when attempting to add the ZWS character between Chinese characters.

I tested just trying to have multiple ZWS characters together and it displays nothing.

With more testing, I think it’s not an issue with the fonts but just somehow the UTF8 stuff.

When I try to include just “\226\128\139” in the extra characters this happens to the raw file:

font: "/assets/fonts/babamoji1004/BABAmoji.ttf"
material: "/builtins/fonts/label-df.material"
size: 15
antialias: 1
alpha: 1.0
outline_alpha: 0.0
outline_width: 0.0
shadow_alpha: 0.0
shadow_blur: 0
shadow_x: 0.0
shadow_y: 0.0
extra_characters: "\357\277\275\n"
  "8\v9"
output_format: TYPE_DISTANCE_FIELD
all_chars: false
cache_width: 0
cache_height: 0
render_mode: MODE_SINGLE_LAYER

It’s something the editor is doine changing the extra_characters field. I tried setting it to \xe2\x80\x8b and the editor changed it to extra_characters: “\342\200\213”

And now it seems to be working right, (but only with the modified ttf not the otf) and with the \342\200\213 listed in the extra chars raw text (I don’t understand this, I guess they are Octal UTF-8 bytes).

2021-12-07 18_22_26-WrapText

Something to be aware of is you cannot use the same fonts for Simplified Chinese and Traditional Chinese texts. They have their own set of glyphs.

I’m a Chinese. I know there’s one rule about line breaking:
Usually punctuations are not at the start of a line.
That’s all.

2 Likes







4 Likes

Looks pretty good.

2 Likes

This is great! I bumped into this a while back with Japanese, but solved it by manually (and possibly incorrectly) inserting spaces. Next time I’ll give the module a spin instead.

3 Likes