Developers might require direct access to the audio buffers on the capture client or render client to complete any of the following actions:
- Add an audio effect that modifies the voice (for example, a robot voice, radio distortion, or static)
- Use a third-party audio engine (such as Wwise) for voice render instead of an automatic render device provided by Vivox
- Perform phoneme analysis
Vivox Core SDK users have access to these hooks directly as displayed in the following sections.
Vivox Unreal SDK users must modify the included code to access the functions displayed in the following sections.
Vivox Unity SDK users can access audio data in a similar but different way as Core SDK and Unreal SDK do by using the integration with Unity Audio Sources. Please see Unity Editor Audio Tap components for more information.
The Vivox SDK automatically handles the capture and render of voice data without requiring audio buffer access. Unless your implementation requires the situations detailed in the preceding list, you most likely do not need audio buffer access. If you are unsure whether you require audio buffer access, contact us through the forums (all customers) or a support ticket (paid support plan).
The Vivox Core SDK provides four hooks for optional callback functions that allow access to the audio buffers:
- Capture side
- After capture, but before Vivox audio processing
- After Vivox audio processing, but before being sent to the Vivox server
- Render side
- After being received from the Vivox server, but before audio processing and mixdown to a single audio stream
- After Vivox audio processing and mixdown, but before render
Caution: Accessing or modifying the audio buffers incorrectly could have a substantial negative impact on the user experience.
Callback APIs
Vivox Core
The vx_sdk_config_t
structure has six methods that are related to audio buffers and the audio subsystem. Note that only the required callbacks need to be set. This structure is passed in when calling vx_initialize3()
.
After the callbacks are set, they are called from the Vivox audio processing thread. The callbacks cannot be changed to be called from another thread. Any blocking operations (for example, writing data to storage) must pass the given data to another thread to perform that operation. The Vivox audio processing thread must not be blocked.
// Called after audio is read from the capture device.
// This is as close to the capture device as possible, but can still
// have audio adjustments due to hardware echo cancellation and other
// factors. // No blocking operations can occur on this callback. void(*pf_on_audio_unit_after_capture_audio_read)(
void *callback_handle,
const char *session_group_handle,
const char *initial_target_uri,
short *pcm_frames, // frame buffer
int pcm_frame_count, // number of frames in buffer
int audio_frame_rate, // sample rate
int channels_per_frame // channels per frame
); // Called when an audio processing unit is about to
// send captured audio to the network from the audio processing
// thread. // No blocking operation can occur on this callback. void(*pf_on_audio_unit_before_capture_audio_sent)(
void *callback_handle,
const char *session_group_handle,
const char *initial_target_uri,
short *pcm_frames, // frame buffer
int pcm_frame_count, // number of frames in buffer
int audio_frame_rate, // sample rate
int channels_per_frame // channels per frame
);
// Called before an audio processing unit mixes the per-participant
// audio data to a single stream from the audio processing thread.
// No blocking operations can occur on this callback.
void (*pf_on_audio_unit_before_recv_audio_mixed_t)(
void *callback_handle,
const char *session_group_handle,
const char *initial_target_uri,
vx_before_recv_audio_mixed_participant_data_t *participants_data,
size_t num_participants
); // Called when an audio processing unit is about to write received
// audio to the render device from the audio processing thread. // No blocking operations can occur on this callback. void(*pf_on_audio_unit_before_recv_audio_rendered)(
void *callback_handle,
const char *session_group_handle,
const char *initial_target_uri,
short *pcm_frames, // frame buffer
int pcm_frame_count, // number of frames in buffer
int audio_frame_rate, // sample rate
int channels_per_frame, // channels per frame
int is_silence // equals 0 if there is renderable audio data
);
// Called when an audio processing unit is started
// from the audio processing thread. // No blocking operations can occur on this callback. void(*pf_on_audio_unit_started)(
void *callback_handle,
const char *session_group_handle,
const char *initial_target_uri); // Called when an audio processing unit is stopped
// from the audio processing thread. // No blocking operations can occur on this callback. void(*pf_on_audio_unit_stopped)(
void *callback_handle,
const char *session_group_handle,
const char *initial_target_uri);
Vivox Unity
Vivox Unity SDK users can access audio data in a similar but different way as Core SDK and Unreal SDK do by using the integration with Unity Audio Sources. Please see Unity Editor Audio Tap components for more information.
Vivox Unreal
Vivox Unreal wrapper APIs are not currently exposed, but are on the Vivox roadmap for possible future exposure. Currently, you can expose this API by adding your callbacks to the vx_sdk_config_t
creation step in the Vivox C++ source code.
Callback usage
pf_on_audio_unit_after_capture_audio_read
This callback is the most appropriate to inject audio to replace the captured audio.
This callback returns the data as close to the audio capture device as possible based on the native device. For example, if the device performs hardware echo cancellation, this data is obtained after that step. If a user wants to inject audio to replace the captured audio, have this function overwrite the PCM frames with the data to inject. The data is then run through Vivox audio processing, such as Voice Activity Detection (VAD), Acoustic Echo Cancellation (AEC), and Automatic Gain Control (AGC).
The data must be in the following format to be written to the buffer that is pointed to by pcm_frames
.
- 16-bit signed integer
-
pcm_frame_count
number of frames -
channels_per_frame
number of channels per frame -
audio_frame_rate
sample rate
Note that all resampling or other audio conditioning of injected data is the developer's responsibility. The buffer must always be filled. If there is no audio data, represent this by 0s in that portion of the buffer.
pf_on_audio_unity_before_capture_audio_sent
This callback is the most appropriate for recording applications that are designed to capture a player’s speech.
This callback is called after Vivox audio processing (such as Voice Activity Detection [VAD], Acoustic Echo Cancellation [AEC], and Automatic Gain Control [AGC]) occurs, and before being transmitted to the Vivox server. It is not recommended that developers modify the media payload at this point because the metadata (for example, is_speaking
) would no longer match the originally analyzed data.
pf_on_audio_unit_before_recv_audio_mixed
This callback is the most appropriate for adding Digital Signal Processing (DSP) effects to individual participants on the render side.
This callback is called after receiving the audio from the Vivox server, and prior to mixing the audio down to a single stream. Use this callback to gain access to the per-participant audio data. You can call this callback with num_participants
of 0, which indicates that there is currently silence and no audio data to mix.
If the audio frames are zeroed out in this callback, no events for non-local participants in the session are generated because a zeroed-out frame plays silence for that participant.
pf_on_audio_unit_before_recv_audio_rendered
This callback is the most appropriate for recording applications that are designed to capture what a player hears.
This callback is called before rendering the audio to the render device. This action occurs after the per-participant mixdown and the application of any 3D audio effects.
Taking audio render or capture responsibility / Third-party audio render and capture
If the application's audio engine will be used to render Vivox voice audio retrieved from these callbacks, or if the application's audio engine will be used to provide capture audio to Vivox through these callbacks, then the Vivox render and/or capture device(s) should be set to "No Device". This will prevent Vivox from opening any audio devices for reading/writing.
However, on mobile platforms Vivox relies on hardware acoustic echo cancellation to prevent echo when in speakerphone-like audio configurations. Android and iOS require that actual audio endpoints be opened for voice intentions with the OS for hardware echo cancellation to operate. So on mobile, it is better to NOT set Vivox's audio devices to "No Device". Rather, leave the Vivox audio device settings as they are and then zero or overwrite the audio data in these Vivox audio callbacks to prevent Vivox from rendering the voice audio or to provide substitute capture audio (discarding what was read from the microphone).
Additional notes
- Callbacks only occur when in a channel session.
- Sample rates can switch mid-stream.
- Samples are 16-bit signed integers.
-
pcm_frame_count
is the total number of frames for the period, where a frame consists of one sample for each channel. For 32 kHz, the number of frames in a 20ms period would be 640, regardless of whether the channel is stereo or mono. - Silence is represented by 0s.
- In cases where Vivox cannot open the capture device in single channel mode, the results from the microphone are mixed down to a single channel, which is then presented on a capture callback.
- On the render side, the audio is stereo interleaved.