How is audio exactly implemented?

#1

Hi all,

I’m new to Defold and I’d like to contribute to the engine, especially to its audio implementation on Android and iOS. But I have a hard time figuring out how audio is implemented in Defold.

First of all: it seems that Defold uses multiple versions of OpenSL ES (in device_opensl.cpp and in opensl.c), and I don’t find a CoreAudio device for iOS except the openAL Alc backend coreaudio.c).

In sound.cpp I further find that the SoundSystem runs on its own thread and calls UpdateInternal to update the mixer. In sound.cpp I also find that the SoundSystem can be updated from the main thread via the Update function.

Coming from an audio programming background I am somewhat surprised since the audio systems on mobile run on a high-prio thread and provide the data in via a callback. I would therefore expect that the SoundSystem is updated via this callback. Something like

static void sound_mix(float* buffer, int samples)
{
	for (int i = 0; i < SOUNDS; i++)
	{
		if (sounds[i].state == STATE_PLAYING)
		{
			mix(buffer, &(sounds[i].source.buffer), samples);
		}
}

static void audio_callback(float* input, float* output, int samples)
{
	// from microphone
	input_filter(input, buffer, samples);
	
	// mix with active sounds
	sound_mix(buffer, samples);

	// post processing
	output_filter(buffer, samples);

	// to speakers
	buffer_to_output(buffer, output, samples);
}

Could someone elaborate on this a bit? Thanks in advance!

4 Likes

#2

Welcome! And yay, we really appreciate code contribution!

@JCash should be able to give you an explanation of the system tomorrow.

2 Likes

#3

Hi @hakoptak!

We run the sound update on a thread on those platforms that support it. Html5 for instance is not threaded.
Both implementations use dmSound::UpdateInternal.

In Defold, we mix audio ourselves, and only use the native (e.g. OpenAL/OpenSL) api for queueing the sound buffers.

Each sound update we ask the native sound system if there are any free buffers.
If there are, we will mix some new audio, and put it in the free buffers.
E.g. for OpenAL there are 6 buffers, with 768 frames each.

This ofc means that we play audio a bit more “in advance” than is perhaps always desired.
Especially if we want to add support for runtime effects.

Regarding contributing to Defold’s audio system:
I’ve been playing with the idea of adding a callback as you suggest to our Defold SDK (similar to your example).
Having such a callback, it would be possible to auto generate sounds and effects from a a Native Extension.
However, since I haven’t programmed a lot of sound for games, I’m not entirely sure of the requirements needed from such an API.

3 Likes

#4

Hi @Mathias_Westerdahl,

Thanks for your explanation!

So on iOS and Android Defold uses the OpenAL backend, which is coupled through device_openal.cpp. And what’s the purpose of the device in device_opensl.cpp?

Concerning Defold's audio system:

I am building a mobile game right now that uses the microphone to detect beeps and communicates with other devices through sound cues (you play in a group). I was building my own mobile game framework, but decided not to continue with that. But I did implement a flexible audio system. From experience I know that the audio system should support 4 modes:

	typedef enum audio_mode
	{
		AUDIO_MODE_RECORD,
		AUDIO_MODE_PLAYBACK,
		AUDIO_MODE_RECORD_PLAYBACK,
		AUDIO_MODE_STREAM
	} audio_mode;

The difference between AUDIO_MODE_RECORD_PLAYBACK and AUDIO_MODE_STREAM is that the latter copies the input to the output, while the former does not. With AUDIO_MODE_STREAM you could, for example, distort voices in real time. For this mode it is important to select the “fast audio path” (no input filtering). My audio implementation takes that into account with INPUT_MODE_FAST versus INPUT_MODE_DEFAULT. With that we should be pretty complete.

What I like about the Defold approach is that you keep it simple. In this line we could just supply a callback for the input and for sound creation/playing. Then it is up to the Native Extension to fill these buffers. In that way we don’t have to think about the order of processing, because that’s up to the implementer of the extension. But to be honest: I have no experience with defining a good API.

Oh and yeah, we could also rework the sampling rate to go with the device’s preference (especially important on Android).

2 Likes

#5

Android uses OpenSL. See the wscript

What I like about the Defold approach is that you keep it simple

Thanks, that’s what we strive for. It makes maintaining easier, as well as adding new features. And also, if there’s an issue, is should (mostly) be the same code on all platforms, which is a great benefit.

Yes, a callback for the extension to be able to fill in audio is the main idea.
My concern is perhaps when it should be called.
As I mentioned, we fill up the 6 buffers as soon as they’re available.
But in order to do realtime effects, we want to reduce that latency to a minimum. For example, can we keep using our polling technique, without risking sound starvation? E.g. will it suffice to have just one or two buffers?

1 Like

#6

No worries, we’ll figure it out together! :slight_smile:
The main thing is to look at the actual use cases, and try not to make everything too generic.
I guess figuring out the sound uses cases is where I lack experience. What types of sound effects does one wish to do in a typical game?

2 Likes

#7

Android uses OpenSL. See the wscript

Ah, I have to get used to the build process. I will look into that!

Concerning the audio callback:

Isn’t the when determined by the native audio thread? When the native audio system is ready it will call the native callback to ask for the next frame buffer (although Android and iOS have a different way of doing this). It is then up to us to fill that frame buffer a.s.a.p. This means that our processing may take up to FRAME_SIZE / SAMPLE_RATE seconds before the next call. For example: on iOS we may set FRAME_SIZE = 512 (samples) and SAMPLE_RATE = 44100 so that we have 11.6 ms to fill/process the buffer. This amount of time is comparable with the time we have for updating and rendering the game frame.

To conclude: as far as I know there is not need for polling because the native audio thread takes care of that via its callback.

1 Like

#9

That depends on the implementation. Currently we update on our own sound thread, polling the state of the sound api.
Another approach would be to react on callbacks from the sound api (as you have done previously).

Your calculation suggests that for one sound frame, we’d need to keep at least two buffers in flight at any time to cover 16.67ms (given 60fps).

Perhaps changing to using callbacks from the sound api should be in your first tests? :slight_smile:

Also, remember that we support more platforms than iOS/Android: macOS, Windows, Linux, HTML5, Switch. So we need to keep the apis consistent across them all.

0 Likes

#10

I guess that this is true for the polling strategy. But with a callback approach we don’t need to worry about it. Then the native audio system pulls in a frame when it needs it. This process runs in parallel with the main thread.

Yep, I have to put my words into action :wink:

You are absolutely right! Let me sleep on it…

By the way: what is the best way to prepare a setup?

2 Likes

#11

Hehe, looking forward to seeing some great improvements! :slight_smile:

By the way: what is the best way to prepare a setup?

Step one is to start reading this guide:

Also, I suggest joining our official slack channel #sourcecode
if you run into any problems building the the engine.

1 Like