Does it behave better/worse/same in fullscreen mode? (Idea from Win 10 vsync issue)
same. But in fullscreen mode alt+tab not work.(the screen became black, and nothing happened)
Are there any vsync options available in the os or per app? (See this thread for some vsync issues)
Next idea I have is to try the app at another Windows machine, and also on another Windows versionā¦
nothing changed.=(
I run project on android device.
Defold 1 light - frame 90ms
Libgdx 5 lights - frame 16-17ms(57-61 fps)
So the problem with my code, or with defold engine. I will try to rewrite the project, maybe i do mistake somewhere.
If the web profiler and in-game profiler reports different stats, the only realistic reason I can think of would be that you have two games running simultaneously (two different processes). Can you verify this? The first instance would have the web-server still running (http://localhost:8002), the second instance would be the window you are looking at.
The reason the shader perf changes when you change y+=10, is because it controls the number of iterations per pixel:
for (float y=0.0; y<resolution.y; y+=1.0) {
y += 10.0 is 1/10 as many iterations => 10 times cheaper.
When you are comparing to libgdx, could you verify that
uniform vec2 resolution;
has exactly the same value in the Defold and libgdx versions? That value has the biggest impact on your perf, as it defines how much work you do in the shaders per pixel. The next thing that can differ is the screen resolution in both cases, this would have an equally large impact on perf. The shader you posted is incredibly expensive (iterating + tex lookups per pixel), and I would be surprised if it worked well on phones in full resolution.
I would be very surprised (and very interested!!) if Defold is slower than libgdx in this case, we are most likely faster given that the configuration is the same.
EDIT: By screen resolution above, I mean the size of the render target you use that pixel shader for.
Thx,it was my mistake=).
I set render target height to 256, when it shoul be 1.
Defold 5 light - frame 40-50ms
Libgdx 5 lights - frame 16-17ms(57-61 fps)
Yes i know about this, but i try to make some optimisation.
I am run one game.
When i have a lot of light sources 30+. I have the error ERROR:SCRIPT: helpers/render/2d_light/2d_light_helper.lua:65: Command buffer is full (1024). Can i fixed it?
Did you check the Task Manager? (worried there is a lingering process in there) I have absolutely no idea how they could show different numbers for the same gameā¦
You canāt tweak the max value unfortunately, but it also means you are doing 1024 calls from the render script, most of which will affect not only OpenGL, but also GPU state (equivalent to the popular ādraw callā measures). You should check your render script code and see if you can lower the number of calls, 20-30 calls per light seems like too many.
Yes one dmengine.exe in task manager. If i stoped game. The web profiler is stop working too.
The defold log.
INFO:ENGINE: Defold Engine 1.2.96 (0060183)
INFO:ENGINE: Loading data from: build/default
INFO:ENGINE: Initialised sound device ādefaultā
INFO:DLIB: SSDP: Started on address 192.168.43.109
INFO:DLIB: SSDP: Started on address 192.168.56.1
INFO:DLIB: SSDP: Done on address 192.168.43.109
INFO:DLIB: SSDP: Done on address 192.168.56.1
look likes two instance of game.
After some optimisation Defold 10 lights(256 rays per light) frame 26-30ms.In samsung galaxy s3. On pc 50+ light sources 16 ms.I think i make all possible optimisation.If someone see more places to optimize please tell me.
shadow fragment shader. The bootleneck
#ifdef GL_ES
#define LOWP lowp
precision mediump float;
#else
#define LOWP
#endif
#define PI 3.14
#define THRESHOLD 0.75
varying mediump vec4 position;
varying mediump vec2 var_texcoord0;
uniform lowp sampler2D TEX0;
//number of rays for every light
//can't use uniform because of webGl
const float resolution=256.0;
uniform vec4 up_scale;
uniform vec4 pos;
float mult_pi=PI*1.5;
void main(void) {
float distance = 1.0;
//angle do not changed for one ray, changed only r(lenght)
float theta = (var_texcoord0.s*2.0 -1.0) * PI + mult_pi;
const float add = 1.0/resolution;
vec2 pre_coord = vec2(sin(theta)*resolution/2.0/pos.z,cos(theta)*resolution/2.0/pos.w);
for (float r=0.0; r<1.0; r+=add) {
//coord which we will sample from occlude map
vec2 coord = pre_coord * -r +vec2(pos.x/pos.z,pos.y/pos.w);
vec4 data = texture2D(TEX0, coord);
//if we've hit an opaque fragment (occluder), then get new distance
if (data.a > THRESHOLD) {
distance = r;
break;
}
}
gl_FragColor = vec4(vec3(distance/up_scale.x), 1.0);
}
light map shader. The bootleneck too if use soft shadows(because of blur)
#ifdef GL_ES
#define LOWP lowp
precision mediump float;
#else
#define LOWP
#endif
#define PI 3.14
//inputs from vertex shader
varying mediump vec4 position;
varying mediump vec2 var_texcoord0;
//uniform values
uniform lowp sampler2D TEX0;
uniform lowp sampler2D TEX1;
const float resolution=256.0;
uniform vec4 vColor;
const float soft_shadows=1.0;
//sample from the distance map
float sample(vec2 coord, float r) {
coord.x=1.0-coord.x;
return step(r, texture2D(TEX0, coord).r);
}
void main(void) {
//rectangular to polar
vec2 norm = var_texcoord0.st * 2.0 - 1.0;
float theta = atan(norm.y, norm.x);
float r = length(norm);
float coord = (theta + PI) / (2.0*PI);
//the tex coord to sample our 1D lookup texture
//always 0.0 on y axis
vec2 tc = vec2(coord, 0.0);
//the center tex coord, which gives us hard shadows
float center = sample(vec2(tc.x, tc.y), r);
//we multiply the blur amount by our distance from center
//this leads to more blurriness as the shadow "fades away"
//float blur = (1./resolution.x) * smoothstep(0., 1., r);
//now we use a simple gaussian blur
//float sum = 0.0;
//uncomment if need soft shadows
//sum += sample(vec2(tc.x - 4.0*blur, tc.y), r) * 0.05;
//sum += sample(vec2(tc.x - 3.0*blur, tc.y), r) * 0.09;
//sum += sample(vec2(tc.x - 2.0*blur, tc.y), r) * 0.12;
//sum += sample(vec2(tc.x - 1.0*blur, tc.y), r) * 0.15;
//sum += center * 0.16;
//sum += sample(vec2(tc.x + 1.0*blur, tc.y), r) * 0.15;
//sum += sample(vec2(tc.x + 2.0*blur, tc.y), r) * 0.12;
//sum += sample(vec2(tc.x + 3.0*blur, tc.y), r) * 0.09;
//sum += sample(vec2(tc.x + 4.0*blur, tc.y), r) * 0.05;
//1.0 -> in light, 0.0 -> in shadow
//float lit = mix(center, sum, soft_shadows);
float lit=center;
//multiply the summed amount by our distance, which gives us a radial falloff
//then multiply by vertex (light) color
gl_FragColor = vColor * vec4(vec3(1.0), lit * smoothstep(1.0, 0.0, r));
}
Completely untested ofc, but I think youāll get the idea, and hopefully, it actually helps
Shadow (* marks the changes):
void main(void) {
float distance = 1.0;
//angle do not changed for one ray, changed only r(lenght)
// Calc this in vertex shader!
* float theta = (var_texcoord0.s*2.0 -1.0) * PI + mult_pi;
const float add = 1.0/resolution;
// Pre calc the step
* vec2 step = add * vec2(pos.x/pos.z,pos.y/pos.w);
const float nsteps = resolution;
// uniform candidate
* vec2 pre_something = vec2(resolution/2.0/pos.z, resolution/2.0/pos.w);
vec2 pre_coord = vec2(sin(theta)*pre_something.x,cos(theta)*pre_something.y);
//coord which we will sample from occlude map
vec2 coord = pre_coord;
for (int i=0; i < nsteps; i++) {
vec4 data = texture2D(TEX0, coord);
//if we've hit an opaque fragment (occluder), then get new distance
if (data.a > THRESHOLD) {
* distance = i / resolution;
break;
}
// step
* coord -= step;
}
// reciprocal: up_scale.x = 1 / up_scale.x
* gl_FragColor = vec4(vec3(distance*up_scale.x), 1.0);
}
As for the blur, itās advisable to move calculations to the vertex shader,
and let the sampler interpolate the (linear) values (texture coordinates) for you.
Personally, I use a separable blur (currently box), so I need to render twice, one for horizontal blur,
and one for vertical.
Plus when summing the samples, in my own example, I use lowp vec4 color = vec4(0.0);
, since itās a blur anyway
Take aways:
- Precompute as uniforms if possible
- Precompute linear values in vertex shader,
and youāll get the interpolated value in the fragment shader - Move constants variables out of loops
- Use multiplication instead of divs (gain varies on different GPU drivers)
Some further reading for those interested (quick googling):
Humus, an ex-colleague of ours, has written a good GDC presentation about optimizing shaders:
http://www.humus.name/index.php?page=Articles&ID=6
General GLSL advice:
https://www.khronos.org/opengl/wiki/GLSL_Optimizations
Some Apple notes:
https://developer.apple.com/library/content/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/BestPracticesforShaders/BestPracticesforShaders.html
Hereās my box blur
Let us know how it goes!
Thx, for this articles. Precompute linear values in vertex shader, is realy awesome.=)
Shader optimization gave a few milliseconds.The big perfomance boost happened when i start to draw 32 light sources to one shadow map.Then draw 32 light to screen. Instead of draw to shadow map, then draw light to screen.It because i do not enable/disable render_targets/matereials/texture for every light.
android 30 light 26-34ms
pc 150 lights 16-17ms
web 25 light 35-45ms
Now my problem is that on android i get bad shadows sometimes. Why it can happened? float precision?
I donāt know if you found the source of this issue, but, maybe you need to cast some rays at the edges of the light obstacles.
This article will explain it better:
Edit: I just read the article you linked in the other topic and just saw you are using a different technique. Please disconsider.
No, I donāt need more rays. The problem with float precision. I donāt understand why it happened, only on android. But if I pre calculate some values for optimisation, I get bad shadows
Iām in need of the same effect. Does this happen with any Android device?
I will give your shader a try.
Not sure, on my galaxy s3 happened. Your can use not optimize shader, and less rays per light. This technique is hard for phones, but if you will have 5 or less light sources, in all devices include low performance, all should work fast.
I intend to use just one point light.