An empty game object instance in Defold is equivalent to a scene graph node in another engine, e.g. a specific skeleton bone, or something as small. They are not equivalent to an instance in e.g. Unity. In Defold, it’s an entry in a few arrays as opposed to a dynamically allocated heavy-duty object. In terms of cost, they are much closer to particles in a particle system, than an NPC for example.
The typical use case for a collection proxy is to do dynamic level loading, or possibly to run two game sessions simultaneously (to do cross fades or similar). It’s not meant to solve asset streaming. Since there is no support for asset streaming currently*, we understand that many use collection proxies as a workaround for this. So rather than changing how collection proxies work in this aspect, we would need to implement streaming support to solve the problems you are facing in a good way.
*) I believe it’s possible for users to achieve asset streaming to some degree using native extensions and the buffer api, but it would probably be a pretty challenging exercise.