How is audio exactly implemented?

hakoptak · October 21, 2020, 7:42pm

Hi all,

I’m new to Defold and I’d like to contribute to the engine, especially to its audio implementation on Android and iOS. But I have a hard time figuring out how audio is implemented in Defold.

First of all: it seems that Defold uses multiple versions of OpenSL ES (in device_opensl.cpp and in opensl.c), and I don’t find a CoreAudio device for iOS except the openAL Alc backend coreaudio.c).

In sound.cpp I further find that the SoundSystem runs on its own thread and calls UpdateInternal to update the mixer. In sound.cpp I also find that the SoundSystem can be updated from the main thread via the Update function.

Coming from an audio programming background I am somewhat surprised since the audio systems on mobile run on a high-prio thread and provide the data in via a callback. I would therefore expect that the SoundSystem is updated via this callback. Something like

static void sound_mix(float* buffer, int samples)
{
	for (int i = 0; i < SOUNDS; i++)
	{
		if (sounds[i].state == STATE_PLAYING)
		{
			mix(buffer, &(sounds[i].source.buffer), samples);
		}
}

static void audio_callback(float* input, float* output, int samples)
{
	// from microphone
	input_filter(input, buffer, samples);
	
	// mix with active sounds
	sound_mix(buffer, samples);

	// post processing
	output_filter(buffer, samples);

	// to speakers
	buffer_to_output(buffer, output, samples);
}

Could someone elaborate on this a bit? Thanks in advance!

britzl · October 21, 2020, 8:39pm

Welcome! And yay, we really appreciate code contribution!

@JCash should be able to give you an explanation of the system tomorrow.

Mathias_Westerdahl · October 22, 2020, 6:55am

Hi @hakoptak!

We run the sound update on a thread on those platforms that support it. Html5 for instance is not threaded.
Both implementations use dmSound::UpdateInternal.

In Defold, we mix audio ourselves, and only use the native (e.g. OpenAL/OpenSL) api for queueing the sound buffers.

Each sound update we ask the native sound system if there are any free buffers.
If there are, we will mix some new audio, and put it in the free buffers.
E.g. for OpenAL there are 6 buffers, with 768 frames each.

This ofc means that we play audio a bit more “in advance” than is perhaps always desired.
Especially if we want to add support for runtime effects.

Regarding contributing to Defold’s audio system:
I’ve been playing with the idea of adding a callback as you suggest to our Defold SDK (similar to your example).
Having such a callback, it would be possible to auto generate sounds and effects from a a Native Extension.
However, since I haven’t programmed a lot of sound for games, I’m not entirely sure of the requirements needed from such an API.

hakoptak · October 22, 2020, 12:33pm

Hi @Mathias_Westerdahl,

Thanks for your explanation!

So on iOS and Android Defold uses the OpenAL backend, which is coupled through device_openal.cpp. And what’s the purpose of the device in device_opensl.cpp?

Concerning Defold's audio system:

I am building a mobile game right now that uses the microphone to detect beeps and communicates with other devices through sound cues (you play in a group). I was building my own mobile game framework, but decided not to continue with that. But I did implement a flexible audio system. From experience I know that the audio system should support 4 modes:

	typedef enum audio_mode
	{
		AUDIO_MODE_RECORD,
		AUDIO_MODE_PLAYBACK,
		AUDIO_MODE_RECORD_PLAYBACK,
		AUDIO_MODE_STREAM
	} audio_mode;

The difference between AUDIO_MODE_RECORD_PLAYBACK and AUDIO_MODE_STREAM is that the latter copies the input to the output, while the former does not. With AUDIO_MODE_STREAM you could, for example, distort voices in real time. For this mode it is important to select the “fast audio path” (no input filtering). My audio implementation takes that into account with INPUT_MODE_FAST versus INPUT_MODE_DEFAULT. With that we should be pretty complete.

What I like about the Defold approach is that you keep it simple. In this line we could just supply a callback for the input and for sound creation/playing. Then it is up to the Native Extension to fill these buffers. In that way we don’t have to think about the order of processing, because that’s up to the implementer of the extension. But to be honest: I have no experience with defining a good API.

Oh and yeah, we could also rework the sampling rate to go with the device’s preference (especially important on Android).

Mathias_Westerdahl · October 22, 2020, 1:00pm

Android uses OpenSL. See the wscript

What I like about the Defold approach is that you keep it simple

Thanks, that’s what we strive for. It makes maintaining easier, as well as adding new features. And also, if there’s an issue, is should (mostly) be the same code on all platforms, which is a great benefit.

Yes, a callback for the extension to be able to fill in audio is the main idea.
My concern is perhaps when it should be called.
As I mentioned, we fill up the 6 buffers as soon as they’re available.
But in order to do realtime effects, we want to reduce that latency to a minimum. For example, can we keep using our polling technique, without risking sound starvation? E.g. will it suffice to have just one or two buffers?

Mathias_Westerdahl · October 22, 2020, 1:12pm

No worries, we’ll figure it out together!
The main thing is to look at the actual use cases, and try not to make everything too generic.
I guess figuring out the sound uses cases is where I lack experience. What types of sound effects does one wish to do in a typical game?

hakoptak · October 22, 2020, 1:32pm

Android uses OpenSL. See the wscript

Ah, I have to get used to the build process. I will look into that!

Concerning the audio callback:

Isn’t the when determined by the native audio thread? When the native audio system is ready it will call the native callback to ask for the next frame buffer (although Android and iOS have a different way of doing this). It is then up to us to fill that frame buffer a.s.a.p. This means that our processing may take up to FRAME_SIZE / SAMPLE_RATE seconds before the next call. For example: on iOS we may set FRAME_SIZE = 512 (samples) and SAMPLE_RATE = 44100 so that we have 11.6 ms to fill/process the buffer. This amount of time is comparable with the time we have for updating and rendering the game frame.

To conclude: as far as I know there is not need for polling because the native audio thread takes care of that via its callback.

Mathias_Westerdahl · October 22, 2020, 1:45pm

That depends on the implementation. Currently we update on our own sound thread, polling the state of the sound api.
Another approach would be to react on callbacks from the sound api (as you have done previously).

Your calculation suggests that for one sound frame, we’d need to keep at least two buffers in flight at any time to cover 16.67ms (given 60fps).

Perhaps changing to using callbacks from the sound api should be in your first tests?

Also, remember that we support more platforms than iOS/Android: macOS, Windows, Linux, HTML5, Switch. So we need to keep the apis consistent across them all.

hakoptak · October 22, 2020, 2:15pm

I guess that this is true for the polling strategy. But with a callback approach we don’t need to worry about it. Then the native audio system pulls in a frame when it needs it. This process runs in parallel with the main thread.

Yep, I have to put my words into action

You are absolutely right! Let me sleep on it…

By the way: what is the best way to prepare a setup?

Mathias_Westerdahl · October 22, 2020, 2:25pm

Hehe, looking forward to seeing some great improvements!

By the way: what is the best way to prepare a setup?

Step one is to start reading this guide:

Also, I suggest joining our official slack channel #sourcecode
if you run into any problems building the the engine.

hakoptak · January 8, 2021, 12:11pm

Happy New Year! It has been a while since I started this thread and in the meanwhile time seems to fly

Just a quick check to know whether it is possible to build the engine on my Macbook Pro: During the build I get a final exception that “32 bit hosts are not supported!”. Does this mean that the engine cannot be build on my machine? Or am I missing some setting? My guess is that my machine is too old…

To make this build run I had to make some quirky adjustments:

My Macbook (model late 2011, running macOS High Sierra) cannot be updated to Catalina and this means that Homebrew is not supported anymore. As a result some packages need to be build from source and some did not build (ccache).
I did some patching to install XCode 10.3 but it seems that Defold requires a later version. In order to build the project I renamed the installed SDKs to the expected version 12.1. This could have introduced some issues.

Here you find a full trace of the build: defold-engine-build-trace.txt (189.0 KB)

I hope that you recognize this issue…

Mathias_Westerdahl · January 8, 2021, 1:00pm

Yes, the XCode toolchain has moved on, and we only support 64 bit macOS now.

hakoptak · January 15, 2021, 3:34pm

Hi Mathias,

Thanks for your reply.

I’ve a compatible machine now and and I am able to build the engine (yeah!).

However, when I build the editor following the steps from https://github.com/defold/defold/blob/dev/editor/README_BUILD.md and run the lein run command I get the following exception:

2021-01-15 16:19:27.675 INFO  default    com.defold.libs.ResourceUnpacker - defold.unpack.path=/Users/hakoptak/Github/Defold/editor/tmp/unpack
2021-01-15 16:19:30.159 java[4040:73974] Apple AWT Internal Exception: NSWindow drag regions should only be invalidated on the Main Thread!
2021-01-15 16:19:30.160 java[4040:73974] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'NSWindow drag regions should only be invalidated on the Main Thread!'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff2ff8db57 __exceptionPreprocess + 250
	1   libobjc.A.dylib                     0x00007fff68e195bf objc_exception_throw + 48
	2   CoreFoundation                      0x00007fff2ffb634c -[NSException raise] + 9
	3   AppKit                              0x00007fff2d1b05ec -[NSWindow(NSWindow_Theme) _postWindowNeedsToResetDragMarginsUnlessPostingDisabled] + 310
	4   AppKit                              0x00007fff2d198052 -[NSWindow _initContent:styleMask:backing:defer:contentView:] + 1416
	5   AppKit                              0x00007fff2d197ac3 -[NSWindow initWithContentRect:styleMask:backing:defer:] + 42
	6   libnativewindow_macosx.jnilib       0x0000000132cbe3fe Java_jogamp_nativewindow_macosx_OSXUtil_CreateNSWindow0 + 398
	7   ???                                 0x0000000110c63330 0x0 + 4576391984
)
libc++abi.dylib: terminating with uncaught exception of type NSException

The editor starts, shows the Defold splash screen and then crashes. Have you experienced this before? Thanks in advance.

britzl · January 15, 2021, 4:46pm

I don’t recognise that crash. Which macOS version are you using? Have you installed the correct JDK?

hakoptak · January 16, 2021, 12:43pm

I tried to follow all the steps of your description. When I run java --version I get:

openjdk 11.0.9 2020-10-20
OpenJDK Runtime Environment (build 11.0.9+11)
OpenJDK 64-Bit Server VM (build 11.0.9+11, mixed mode)

Maybe it’s happening because I am trying to build the dev branch? I’ll try to do some more research. Thanks for following up!

britzl · January 16, 2021, 3:16pm

Actually I do, vaguely. It was mentioned here: Update Jogamp/JoGL to support macOS Big Sur · Issue #5281 · defold/defold · GitHub

Another user had problems a while back but I believe it boiled down to a version discrepancy of some kind.

hakoptak · January 20, 2021, 7:40pm

Thanks for the link. I will look into it.

thejustinwalsh · October 6, 2021, 2:14pm

I too am interested in how one could extend the Audio API so that we may play or create generative sounds from code. Though my requirements are much simpler than @hakoptak.

The API docs on the website do not list the dmSound API in the SDK, so I was uncertain where to start looking.

What I am hoping to achieve is something as simple as the web audio api createBuffer.

Create a new buffer
Fill it’s channel data
Play the sound

Once this is in order, then adding a callback or a buffer type that can be filled / streamed would be the next goal to stream audio continuously.

Though, I would be happy with just rendering buffers and being able to play them from code at runtime.

Strongly considering contributing for hacktober fast, but I foresee this one feature taking most of the time this month. Haha.

thejustinwalsh · October 6, 2021, 4:57pm

Can we just expose some of these methods for creating and controlling sound playback from nativeExtensions?

github.com

defold/defold/blob/d7a0fff46fba28511598a6012d3bb86068df3455/engine/sound/src/sound.h

// Copyright 2020 The Defold Foundation
// Licensed under the Defold License version 1.0 (the "License"); you may not use
// this file except in compliance with the License.
//
// You may obtain a copy of the License, together with FAQs at
// https://www.defold.com/license
//
// Unless required by applicable law or agreed to in writing, software distributed
// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
// CONDITIONS OF ANY KIND, either express or implied. See the License for the
// specific language governing permissions and limitations under the License.

#ifndef DM_SOUND_H
#define DM_SOUND_H

#include <dlib/configfile.h>
#include <dlib/hash.h>

#include <dmsdk/vectormath/cpp/vectormath_aos.h>

This file has been truncated. show original

That coupled with a SoundDataType of SOUND_DATA_TYPE_RAW or something similar where you do not need anything other than the raw float data for the audio.

Mathias_Westerdahl · October 7, 2021, 7:03am

Perhaps it might be a good first step.
You can try using it first by creating a copy of that header file and using it in your extension as a proof of concept.
That way we’ll know it works and we can put the code in the Defold SDK.
(We generally don’t want to put things in the SDK if it’s not going to be used)