New Camera Extension - Progress Updates (and Suggestions?)

I recently started my second (hopefully commercial) game with Defold. I decided I want to build the render script and camera system from scratch, so that I know exactly how they are working and I don’t have any lingering doubts about how to implement new features on top of built-in code.

I will release this system as an asset, similar to rendercam and orthographic. Before I do that, I would like to get some feedback on what features would be particularly useful that current camera assets do not support.

Here is what it currently supports:

  • Orthographic (2D) projection
  • Resize modes
  • Multiple cameras and viewports

Here is what I am currently implementing:

  • Perspective (3D) projection
  • Support for the default render predicates

Here is what is on the backlog:

  • Camera shake
  • Smooth zooming
  • Screen-to-world conversion
  • World-to-screen conversion

The following screenshot shows what I mean by resize modes:

In this example project, we have four cameras active simultaneously, each with their own viewport. A small Defold logo is in the center, and a large white particle image with an extensive black border is behind it. Magenta is the clear color, where no sprite exists.

  • Stretch mode is similar to the built-in stretch projection.
  • Expand mode shows more or less of the world as the window size changes.
  • Center mode magnifies or shrinks the world according to the desired resolution / aspect ratio. It also centers the world in the camera’s viewport, which creates that well-known “black bar” effect (which in this case is actually a “magenta bar” effect).

Each camera can have its own resize mode, aspect ratio, etc.

Please let me know if any feature comes to mind that you’ve had to implement manually or think could be useful for Defold users.


I use Orthographic. What I’ve had to add myself is center mode (my black bars are GUI though) and camera shake (I see Orthographic has shake, but for some reason I felt the need to replace or augment it). So seems you have those covered!


Nice! I think this will be very useful to the community! How will the GUI be rendered? On top of all cameras, selectable per camera or in all cameras?

If I were to redo Orthographic I’d probably remove the Shake and Recoil effects. They’re not really part of the camera and can both be implemented simply by animating/moving the camera. Sure, it is nice to have a camera shake included, but I also think it must be easy to replace the default. What are your thoughts on this?

1 Like

I think one issue is a conflict between boundaries and shake. I wasn’t able to fix it myself. A camera shake that respects camera boundaries would be neat, but I don’t know whether it needs to be inside the camera extension.

1 Like

I think GUI should not be per-camera, since a camera is a view into the game world, but the GUI is separate from the game world and its objects.

The more I think about it though, the more I don’t see the benefit in separating GUI logic from gameobject logic and forcing their scripts to be separate. I’m not sure how that design decision was originally made… maybe because GUI components aren’t meant to interact with objects in the game world, and therefore the original Defold creators wanted to kind of “reinforce the logic separation” by only allowing GUI-related code in GUI scripts? The gui.set_render_order() is a bit weird too as a concept, since this is basically the same thing as creating a new predicate.

Anyway, just a few tangentially-related thoughts. :slight_smile:

We can now animate most properties of a camera. Here are some of the potentially useful properties to animate, plus properties like position or rotation which are exposed via the standard go API:"viewport_x", 0)"viewport_y", 0)"viewport_width", 1)"viewport_height", 1)"z_min", -1)"z_max", 1)"resolution_width", 960)"resolution_height", 540)"zoom", 1)"field_of_view", 45)

The following gif shows animating the zoom property, allowing for smooth zooming. The top-left has center resize mode, the top-right has expand resize mode, and the bottom-left has stretch resize mode. The bottom-right is reserved for upcoming screen-to-world and world-to-screen debug info.


Finished screen-to-world conversion. You can specify which camera viewport to use as a base.

In this example, my cursor is over the top-right viewport (with the expand resize mode). Therefore, the world origin is in the center of the top-right viewport. You can see how the screen-to-world GUI text updates along with the animated zoom (which goes from x1 to x0.5 with EASING_INOUTCUBIC).

If you ask for a screen-to-world conversion from a screen position that lies outside the camera’s viewport, then the function will return nil.

The world-to-screen conversion function will act similarly, however it currently only displays (0, 0) or nil because it isn’t finished.

I almost have all of the default render predicates supported in the pipeline. Adding GUI was a bit tricky, because it doesn’t make sense for the GUI to be tied to cameras that show some part of the game world. Instead, this library creates a private camera for the GUI and manages it separately.

Fun Fact! The text render predicate is only attached to the system font that comes packaged with Defold. In the default render script, it’s drawn last after all other GUI text, regardless of what you specify for gui.set_render_order(). Therefore, it’s a good font to use for debug text that you want appearing on top of everything else.


I’m not sure if my suggestion is on topic, maybe I’m confusing things, but I remember it would be also great to have any conversion from and to input coordinates (action.x/y space).

1 Like

Can you clarify what you mean by that?

Edit: I think maybe what you’re referring to is converting the position of a game object to where it’s located on the screen (basically where your cursor might generate the action.screen_xy data). If so, then yes, that’s what world_to_screen() will do. If the game object is not within the camera frustum, then I can either return nil or coordinates, not sure which yet.

Sorry I didn’t clarify the details.

I mean the gui context and the conversion between screen space of the node and the action space of input, because there is scaling distortion after windows resizing.

For example rendercam has functions like screen_to_gui and screen_to_gui_pick that help with that.


For the type of game I make, it would be great to be able to define a layout that covers both GO and GUI — something that makes it easy to support different aspect ratios, while still being quick to iterate on. The camera would be part of this, although layout functionality would perhaps be organized in a separate, non-core module.

Game with all the layout elements, displayed in ultrawide 32:9

  • Here I define the aspect ratio of the central gameboard, and that it should scale up to be as big as possible without overlapping with the surrounding four panels.
  • The left and right panel, if there’s space, should put some air in-between themselves and the gameboard, but only up to a maximum.
  • The background scales to fit the screen, but only up to a maximum. If there’s still horizontal space left, a repeating pattern fills the rest.
  • On narrow aspect ratios like 4:3, elements may need to shrink to squeeze in everything horizontally. And of course, it should be possible to define another layout for portrait mode.

Some food for thought. Maybe you find it interesting, as I do. :grinning:


Okay, some updates:

  • I named the extension “Defold Rendy” or just “Rendy”. :slight_smile:
  • The render script now supports all default predicates.
  • The screen_to_world() function supports more of the little oddities you don’t really think about, like accounting for camera rotation.
  • The world_to_screen() function is finished for orthographic projections.
  • The script file attached to each camera game object now has “pixel-perfect properties” rather than [0, 1] ratios for viewport, so that you don’t have to do any involved math to set up their viewports.
  • You can now specify a render order for each camera in the scene, which is useful if you have multiple camera viewports overlapping.

A few notes to understand the example project:

  • Top-left camera has center mode. It is smoothly zooming from [1.0, 2.0].
  • Top-right camera has expand mode. It is smoothly zooming from [1.0, 4.0].
  • Bottom-left camera has stretch mode. It is rotating clockwise.
  • The center of each viewport has a world position of (0, 0). In other words, the cameras are centered on the origin of the world.
  • “Screen Position” currently actually means “Viewport Position”. For example, the middle of each of the three viewports are different screen positions, but all have the same viewport position in the Logo Screen Position label.
  • Each viewport has a default size of 480 x 270. Therefore, the center of each viewport has a screen position of (240, 135) when it is not stretched or expanded.
  • The Defold logo follows my cursor. The zooming and rotating is automatic over time.

You can see how if I roughly center my cursor in the middle of one of the viewports, the Logo Screen Position and Logo World Position labels make sense. (The world position is offset by (-8, -15) but the screen position is only offset by (-3, -6). The reason for this is that each viewport is only half the screen size, so 1 pixel on the viewport translates roughly to 2 pixels in the world).

Here’s a running example, where the application is sized much more vertically than its original resolution. In the following gifs, the logo always follows my non-moving cursor.

Top-Left Camera

Top-Right Camera

Bottom-Left Camera

Next objectives:

  • Support perspective projections and 3D worlds.
  • Camera shake.
  • Differentiate between “screen position” and “viewport position”. Right now, screen_to_world() and world_to_screen() are actually viewport-specific, not screen-specific.

The camera shake feature is complete. You can cancel a shake animation early if desired.

function rendy.shake_camera(camera_id, radius, intensity, duration)
function rendy.cancel_camera_shake(camera_id)
  • radius is how far the shake should move the camera from its original position.
  • intensity is how many ping-pong animations occur over the duration of the shake.

Therefore, the duration of a single ping-pong movement can be calculated as duration / intensity. For example, a shake that lasts for 1 second with 5 intensity would jerk its position 10 times.

For some reason this was a pretty entertaining feature to write. I didn’t use random number generation. I think it was a good decision because the user’s next_prng_number won’t unexpectedly update after using a call to random().

Instead, I used Lua’s socket library to get the milliseconds that have passed since the epoch, together with sin and cos, like so:

local milliseconds = socket.gettime() * 1000
local to = camera_position + vmath.vector3(math.sin(milliseconds), math.cos(milliseconds), 0) * radius

It wouldn’t be right to use sin or cos twice because then the shake effect would always be on the diagonal.

Here’s a demo using the following input:

rendy.shake_camera(top_right_camera, 100, 5, 0.5)

Untitled Project

I think this is a really good suggestion, so I will try to add it.


This is great! Well done!

I like it! Rendy! Catchy!

1 Like

Perspective projections and 3D worlds are now supported.

I haven’t worked on 3D before in Defold, but armed with previous experience working with it and the bulk of Rendy already implemented, this task only took a few hours.

Here’s the alternative example project for perspective cameras. The logo model is rotating slowly, but not moving. I implemented FPS camera controls to look around.

You can see how the Logo Screen Position label accurately maps the logo’s world position to its location on the viewport using the word_to_screen() function. In this scenario, the viewport is 960 x 540. Therefore, when the logo is centered on the screen, it makes sense that its screen position is (480, 270).

The Viewport World Position is a bit weird and not implemented yet. This is supposed to be the result of calling screen_to_world(), but how exactly do you do that in a 3D environment? For example, if you pass in (480, 270) which is the middle of the viewport, then that would just be the camera’s world position. And what about the resulting z value? Converting from screen to world necessarily adds an extra dimension, so the z component wouldn’t make sense. I guess the same thing could be said about 2D worlds since they technically are still 3D, but with a very small z range (usually [-1.0, 1.0]) and a different projection process.

I also added an optional “dampener” or “accelerator” to the shake_camera() function, which is applied over the duration of the shake. It feels a lot smoother!

Next steps:

  • Figure out what to do with screen_to_world() z issue.
  • Perhaps add pre-packaged 1st / 3rd person camera controls? Working with quaternions and basis vectors can be confusing, so it might be helpful to apply basic controls to a camera for testing or rapid development purposes.
  • Add documentation.
  • Clean up example projects.

Great work! This is becoming really complete

1 Like

screen_to_world(), but how exactly do you do that in a 3D environment?

I suggest allowing the user to pass a Z value to the function, representing depth from the camera to the world space coordinate you want. Passing 0 would give you the camera’s position; passing the camera’s near Z would give you a point on the near plane, and same for far Z and far plane. Importantly, the value should be Z depth and not the distance from the camera (length of the vector), since distance would give you points on a sphere rather than a plane.

This is quite useful if you want to create a line from the near plane to the far plane, to e.g. do intersection tests to check which object in the world you clicked on.

If your screen_to_world() takes a vmath.vector3, you can simply use its Z value, both for 2D/orthographic camera and 3D/perspective camera. I see that this is what Unity does.


I’m a little confused about your explanation here. Can you elaborate some more? Here’s a screenshot of a Minecraft (3D) world as an example. The center of the screen in (480, 270). The yellow circle is, let’s say, (700, 450). If I passed screen_to_world(700, 450, z), what different values of z would help describe your way of thinking about it?

Imagine this is the camera frustum viewed from above, with the near and far plane indicated by the horizontal black lines:


If you draw a line from the camera with length equal to the far Z, it will only touch the far plane if it’s perpendicular to the far plane (see the two orange lines, as compared to the purple line).

If we go into the camera’s coordinate system, where the Z axis always points straight ahead from the camera’s perspective, the Z value passed to screen_to_world is how far we should go along the Z axis (regardless of the angle of the line). If we want to calculate the point at the end of the purple line, we can imagine the Z value as being the vertical segment of the green line.

In your example, if you were to do the distance/line length approach, we can imagine the red and yellow circles as being the two orange lines in my drawing. If on the other hand we do depth, we can imagine the circles as being the central orange line and the purple line.

Generally, I believe the depth approach is preferable because you more often want to do operations in relation to the frustum volume rather than in relation to the camera position. If you have the purple line as a vector, you can easily change its length if you would like it to end at the blue circle. Going the other way is slightly more cumbersome.

1 Like

That makes sense, thanks! So in this case, the x and y values are screen-related, whereas the z value is more frustum-related, which is particularly useful for intersections with game objects.

This is also true and a good way of putting it.

1 Like