Problem with shader

Would you mind giving this a try?

  • Create a new script file; glversion.script
  • Replace its content with:
if not ffi then
    ffi = package.preload.ffi()
end
ffi.cdef[[
const char* glGetString(unsigned int);
]]
print("OpenGL Version: " .. ffi.string(ffi.C.glGetString(7938)))
  • Attach the script to one of your gameobjects and run the game.
  • Hopefully the “Console” panel in the editor should print something along the lines of: DEBUG:SCRIPT: OpenGL Version: 2.1 INTEL-10.22.25

This should determine what version of the OpenGL context the engine has created. I’m not sure if it will be relevant information, but couldn’t hurt to try.

(I haven’t tried this on Windows, but I think it should work.)

ERROR:SCRIPT: glversion.script:7: cannot resolve symbol ‘glGetString’

Does it behave better/worse/same in fullscreen mode? (Idea from Win 10 vsync issue)

same. But in fullscreen mode alt+tab not work.(the screen became black, and nothing happened)

Are there any vsync options available in the os or per app? (See this thread for some vsync issues)

Next idea I have is to try the app at another Windows machine, and also on another Windows version… :confused:

nothing changed.=(
I run project on android device.
Defold 1 light - frame 90ms :unamused:
Libgdx 5 lights - frame 16-17ms(57-61 fps)
So the problem with my code, or with defold engine. I will try to rewrite the project, maybe i do mistake somewhere.

1 Like

If the web profiler and in-game profiler reports different stats, the only realistic reason I can think of would be that you have two games running simultaneously (two different processes). Can you verify this? The first instance would have the web-server still running (http://localhost:8002), the second instance would be the window you are looking at.

The reason the shader perf changes when you change y+=10, is because it controls the number of iterations per pixel:

for (float y=0.0; y<resolution.y; y+=1.0) {

y += 10.0 is 1/10 as many iterations => 10 times cheaper.

When you are comparing to libgdx, could you verify that

uniform vec2 resolution;

has exactly the same value in the Defold and libgdx versions? That value has the biggest impact on your perf, as it defines how much work you do in the shaders per pixel. The next thing that can differ is the screen resolution in both cases, this would have an equally large impact on perf. The shader you posted is incredibly expensive (iterating + tex lookups per pixel), and I would be surprised if it worked well on phones in full resolution.

I would be very surprised (and very interested!!) if Defold is slower than libgdx in this case, we are most likely faster given that the configuration is the same.

EDIT: By screen resolution above, I mean the size of the render target you use that pixel shader for.

3 Likes

Thx,it was my mistake=).
I set render target height to 256, when it shoul be 1.
Defold 5 light - frame 40-50ms
Libgdx 5 lights - frame 16-17ms(57-61 fps)

Yes i know about this, but i try to make some optimisation.:slight_smile:

I am run one game.

When i have a lot of light sources 30+. I have the error ERROR:SCRIPT: helpers/render/2d_light/2d_light_helper.lua:65: Command buffer is full (1024). Can i fixed it?

Did you check the Task Manager? (worried there is a lingering process in there) I have absolutely no idea how they could show different numbers for the same game… :confused:

You can’t tweak the max value unfortunately, but it also means you are doing 1024 calls from the render script, most of which will affect not only OpenGL, but also GPU state (equivalent to the popular “draw call” measures). You should check your render script code and see if you can lower the number of calls, 20-30 calls per light seems like too many.

2 Likes

Yes one dmengine.exe in task manager. If i stoped game. The web profiler is stop working too.

The defold log.
INFO:ENGINE: Defold Engine 1.2.96 (0060183)
INFO:ENGINE: Loading data from: build/default
INFO:ENGINE: Initialised sound device ‘default’

INFO:DLIB: SSDP: Started on address 192.168.43.109
INFO:DLIB: SSDP: Started on address 192.168.56.1
INFO:DLIB: SSDP: Done on address 192.168.43.109
INFO:DLIB: SSDP: Done on address 192.168.56.1

look likes two instance of game.

After some optimisation Defold 10 lights(256 rays per light) frame 26-30ms.In samsung galaxy s3. On pc 50+ light sources 16 ms.I think i make all possible optimisation.If someone see more places to optimize please tell me.

shadow fragment shader. The bootleneck

#ifdef GL_ES
#define LOWP lowp
precision mediump float;
#else
#define LOWP 
#endif
#define PI 3.14
#define THRESHOLD 0.75

varying mediump vec4 position;
varying mediump vec2 var_texcoord0;

uniform lowp sampler2D TEX0;
//number of rays for every light
//can't use uniform because of webGl
const float resolution=256.0;
uniform vec4 up_scale;
uniform vec4 pos;
float mult_pi=PI*1.5;

void main(void) {
  float distance = 1.0;
  //angle do not changed for one ray, changed only r(lenght)
  float theta = (var_texcoord0.s*2.0 -1.0) * PI + mult_pi; 
  const float add = 1.0/resolution;
  vec2 pre_coord = vec2(sin(theta)*resolution/2.0/pos.z,cos(theta)*resolution/2.0/pos.w);
  for (float r=0.0; r<1.0; r+=add) {
  	//coord which we will sample from occlude map
	vec2 coord = pre_coord * -r +vec2(pos.x/pos.z,pos.y/pos.w);
	vec4 data = texture2D(TEX0, coord);
	//if we've hit an opaque fragment (occluder), then get new distance
	if (data.a > THRESHOLD) {
		distance = r;
		break;
  	}
  } 
  gl_FragColor = vec4(vec3(distance/up_scale.x), 1.0);
}

light map shader. The bootleneck too if use soft shadows(because of blur)

#ifdef GL_ES
#define LOWP lowp
precision mediump float;
#else
#define LOWP 
#endif
#define PI 3.14

//inputs from vertex shader
varying mediump vec4 position;
varying mediump vec2 var_texcoord0;

//uniform values
uniform lowp sampler2D TEX0;
uniform lowp sampler2D TEX1;
const float resolution=256.0;
uniform vec4 vColor;
const float soft_shadows=1.0;

//sample from the distance map
float sample(vec2 coord, float r) {
	coord.x=1.0-coord.x;
  return step(r, texture2D(TEX0, coord).r);
}

void main(void) {
    //rectangular to polar
	vec2 norm = var_texcoord0.st * 2.0 - 1.0;
	float theta = atan(norm.y, norm.x);
	float r = length(norm);	
	float coord = (theta + PI) / (2.0*PI);
	
	//the tex coord to sample our 1D lookup texture	
	//always 0.0 on y axis
	vec2 tc = vec2(coord, 0.0);
	
	//the center tex coord, which gives us hard shadows
	float center = sample(vec2(tc.x, tc.y), r);        
	
	//we multiply the blur amount by our distance from center
	//this leads to more blurriness as the shadow "fades away"
	//float blur = (1./resolution.x)  * smoothstep(0., 1., r); 
	
	//now we use a simple gaussian blur
	//float sum = 0.0;
	
	//uncomment if need soft shadows
	//sum += sample(vec2(tc.x - 4.0*blur, tc.y), r) * 0.05;
	//sum += sample(vec2(tc.x - 3.0*blur, tc.y), r) * 0.09;
	//sum += sample(vec2(tc.x - 2.0*blur, tc.y), r) * 0.12;
	//sum += sample(vec2(tc.x - 1.0*blur, tc.y), r) * 0.15;
	
	//sum += center * 0.16;
	
	//sum += sample(vec2(tc.x + 1.0*blur, tc.y), r) * 0.15;
	//sum += sample(vec2(tc.x + 2.0*blur, tc.y), r) * 0.12;
	//sum += sample(vec2(tc.x + 3.0*blur, tc.y), r) * 0.09;
	//sum += sample(vec2(tc.x + 4.0*blur, tc.y), r) * 0.05;
	
	//1.0 -> in light, 0.0 -> in shadow
 	//float lit = mix(center, sum, soft_shadows);
 	float lit=center;
 	
 	//multiply the summed amount by our distance, which gives us a radial falloff
 	//then multiply by vertex (light) color  
 	gl_FragColor = vColor * vec4(vec3(1.0), lit * smoothstep(1.0, 0.0, r));
}
2 Likes

Completely untested ofc, but I think you’ll get the idea, and hopefully, it actually helps :smiley:

Shadow (* marks the changes):

void main(void) {
  float distance = 1.0;
  //angle do not changed for one ray, changed only r(lenght)
// Calc this in vertex shader!
* float theta = (var_texcoord0.s*2.0 -1.0) * PI + mult_pi; 
  const float add = 1.0/resolution;
// Pre calc the step
* vec2 step = add * vec2(pos.x/pos.z,pos.y/pos.w);
  const float nsteps = resolution;
// uniform candidate
* vec2 pre_something = vec2(resolution/2.0/pos.z, resolution/2.0/pos.w);
  vec2 pre_coord = vec2(sin(theta)*pre_something.x,cos(theta)*pre_something.y);
  
  //coord which we will sample from occlude map
  vec2 coord = pre_coord;
  for (int i=0; i < nsteps; i++) {
    vec4 data = texture2D(TEX0, coord);
    //if we've hit an opaque fragment (occluder), then get new distance
    if (data.a > THRESHOLD) {
*       distance = i / resolution;
        break;
    }
// step
*   coord -= step;
  }
// reciprocal: up_scale.x = 1 / up_scale.x
* gl_FragColor = vec4(vec3(distance*up_scale.x), 1.0);
}

As for the blur, it’s advisable to move calculations to the vertex shader,
and let the sampler interpolate the (linear) values (texture coordinates) for you.
Personally, I use a separable blur (currently box), so I need to render twice, one for horizontal blur,
and one for vertical.
Plus when summing the samples, in my own example, I use lowp vec4 color = vec4(0.0);, since it’s a blur anyway :slight_smile:

Take aways:

  • Precompute as uniforms if possible
  • Precompute linear values in vertex shader,
    and you’ll get the interpolated value in the fragment shader
  • Move constants variables out of loops
  • Use multiplication instead of divs (gain varies on different GPU drivers)

Some further reading for those interested (quick googling):

Humus, an ex-colleague of ours, has written a good GDC presentation about optimizing shaders:
http://www.humus.name/index.php?page=Articles&ID=6

General GLSL advice:
https://www.khronos.org/opengl/wiki/GLSL_Optimizations

Some Apple notes:
https://developer.apple.com/library/content/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/BestPracticesforShaders/BestPracticesforShaders.html

Here’s my box blur

Let us know how it goes! :smiley:

6 Likes

Thx, for this articles. Precompute linear values in vertex shader, is realy awesome.=) :slight_smile:

1 Like

Shader optimization gave a few milliseconds.The big perfomance boost happened when i start to draw 32 light sources to one shadow map.Then draw 32 light to screen. Instead of draw to shadow map, then draw light to screen.It because i do not enable/disable render_targets/matereials/texture for every light.
android 30 light 26-34ms
pc 150 lights 16-17ms :smiley:
web 25 light 35-45ms

Now my problem is that on android i get bad shadows sometimes. Why it can happened? float precision?

3 Likes

I don’t know if you found the source of this issue, but, maybe you need to cast some rays at the edges of the light obstacles.
This article will explain it better:

Edit: I just read the article you linked in the other topic and just saw you are using a different technique. Please disconsider.

No, I don’t need more rays. The problem with float precision. I don’t understand why it happened, only on android. But if I pre calculate some values for optimisation, I get bad shadows