Ah, I see. I now realise what’s wrong. When we render things we try to batch calls to the GPU to reduce the number of draw calls and thus increase performance. This is standard practice in game engines. This means that components (sprites, spine, text etc) that are of the same type and should be rendered in order (most often based on z-value) will result in a single draw call instead of one draw call per component instance.
When you have many box nodes with childed text nodes the default would be to render them in their hierarchical order, meaning “draw box”, “draw text”, “draw next box”, “draw next text” and so on. This will result in many unnecessary draw calls and you really want to avoid that. Now, arranging your gui nodes in a hierarchy is convenient in so many ways and luckily there is a way to still have a hierarchy of nodes and still render efficiently. The solution is called layers and is described in detail in the manual. In your case you need to create two layers, one that you assign to the box node that you are cloning and one for the text node that is childed to the box node. This will cause all of the box nodes to be rendered in a single call and all of the text nodes to be rendered in another call.
To learn more about draw calls in Defold I can recommend this excellent forum article.