Build a Karaoke App with Interactive Audio Effects using the Switchboard SDK and Amazon IVS

Add interactive audio effects to your Amazon IVS live streams with this step by step guide

Tony Vu
Amazon Employee
Published Mar 4, 2024
Amazon Interactive Video Service (IVS) is a managed live streaming service for live streaming video and audio at scale. But what if you want to do more with that audio on device, before broadcasting it, or after receiving it? Or what if the audio from your Amazon IVS live stream is only one part of a more complex audio pipeline? Enter the Switchboard SDK.
The Switchboard SDK is a cross platform audio SDK that makes it easier to develop complex audio features and applications without needing to be a specialist in audio programming or C++. Building an application with advanced audio features can take months or longer, but using the Amazon IVS extension in the Switchboard SDK, that effort can be reduced to days or hours. The Amazon IVS extension in Switchboard makes it easy to build complex audio pipelines that can allow for Amazon IVS to work alongside features such as external media players, voice changers, stem separation, advanced noise filtering and other DSP, mixing, ducking, and handling various OS related audio issues, Bluetooth, and more. 
Karaoke Apps are one of many use cases in which such audio pipelines are useful. In the rest of this article, we will walk you through a step by step process of building an Android karaoke app that combines Switchboard and IVS. Switchboard is used to apply various voice changing effects (such as pitch correction and reverb), while Amazon IVS is used to broadcast the resulting voice and music streams.
Find the tutorial's code on Synervoz's website. Also, check out this recorded video demo of the app to get a sense of the audio effects you can create.

What you will learn

  • How to create a real-time streaming experience with Amazon IVS
  • How to integrate the Switchboard SDK Extensions into your application
  • How to test and apply voice changing effects from Switchboard SDK
  • How to live stream your new voice to an audience using Amazon IVS

Solution Overview

This tutorial consists of 3 parts:
  • Part 1 - Creating a real-time streaming app with Amazon IVS and SwitchboardSDK
  • Part 2 - Importing and applying voice changing effects
  • Part 3 - Testing your new found voice
Let's take a quick look at the high level solution overview in Figure 1. The Switchboard SDK is a versatile toolkit that streamlines audio app development across different platforms. It features a collection of AudioNodes—like players, recorders, and mixers—that interconnect within an AudioGraph. This graph operates via an AudioEngine, leveraging advanced platform-specific capabilities. The SDK also provides various extensions, which are wrappers around popular libraries, such as Amazon IVS Broadcast SDK. The Amazon IVS Broadcast SDK is a cross-platform SDK used to capture audio and video from a device’s microphone and camera, encode it, and then sent to the Amazon IVS servers for video ingestion. Using the Amazon IVS Extension, we can easily leverage the capabilities Switchboard SDK to apply voice filters and audio effects to live streams. Figure 2 depicts where the Amazon IVS Extension sits in the audio capture and processing flow when live streaming. We will revisit each component in more detail in later steps.
Switchboard SDK and Amazon IVS Solution Overview
Figure 1. Solution Overview
Diagram of how captured audio flowing through Switchboard and the Amazon IVS Broadcast SDK
FIgure 2. How captured audio from a device’s microphone is processed by the Amazon IVS Extension

Part 1 - Creating a real-time streaming app with Amazon IVS and the Switchboard SDK

Overview of the Android Audio APIs

To create a karaoke app, we first need to put ourselves in the shoes of an end user. It’s probably safe to say that there are some if not many users that are shy about singing because they are embarrassed about how their voice might sound to others. Our first job then is to make their voice sound better. To do that, the Android platform provides two different APIs for building audio apps.
There is a high level Java API, using AudioTrack and AudioRecord. AudioTrack is used for playing back audio data. It's part of the Android multimedia framework and allows developers to play audio data from various sources, including memory or a file. AudioRecord is the counterpart of AudioTrack for recording audio, it's typically used in applications that need to capture audio input from the device's microphone.
The low level API is exposed through Oboe, a C++ library designed to help apps use the Android audio APIs effectively and efficiently. It is particularly useful for applications requiring low latency. It supports both the older OpenSL ES and the newer audio APIs, offering high performance and reduced latency.
While these APIs exist, they are complex, difficult to understand, and require a great deal of effort to implement.

Why use the Switchboard SDK?

If the developed application, such as a karaoke app, requires more functionality beyond playing or recording sound, then using the Java/Kotlin API is not an option. Audio processing achieves peak efficiency when it is directly integrated with the audio driver. An audio driver allows software applications to communicate with and access audio hardware. Additionally, it operates at its best when compiled straight into machine code. Due to these requirements, languages like Java and Kotlin are typically not preferred choices for this purpose.
The way to write real-time Android audio code is using the C++ Oboe library. Writing C++ code is hard, especially in a real-time environment. When developing music software, you are operating under tight time constraints. The time between subsequent audio processing callbacks is typically around 10 ms. If your process does not compute its audio output and write it into the provided buffer before this deadline, you will get an audible glitch. However, the complexities of writing C++ code for real-time audio processing in the Android environment can be significantly mitigated by leveraging the tools and resources provided by the Switchboard SDK.
The Switchboard SDK is a cross-platform software development kit designed to simplify the process of developing audio applications and features. It facilitates quick integration and testing of various audio libraries. The Switchboard SDK makes it easy to connect various audio components like audio players, recorders, mixers, effects together and it also provides wrappers (extensions) around various audio related libraries, like the Amazon IVS Extension which can then easily work together with the  Switchboard SDK components. The Amazon IVS Extension is a wrapper developed by Synervoz that makes it easy for developers to incorporate audio effects from the Switchboard SDK into their live streaming applications.

Switchboard SDK building blocks

To let the Switchboard SDK know what kind of audio processing we want to happen, we need to create what is called an audio graph. An audio graph is a data structure used in audio programming to represent a collection of audio processing nodes and the connections between them.
Every audio component is provided as an AudioNode in the SwitchboardSDK and you can wire these together inside an AudioGraph. This graph then can be run in an AudioEngine that makes sure that all advanced platform-specific features are utilized.  Let’s break down these components.

Audio Graphs

The Switchboard SDK implements the audio graph paradigm. In an audio graph, each processing node represents an audio processing unit, such as an audio player, a filter, an effect, etc. The nodes are connected by audio buses, which represent the flow of audio data between the processing nodes. The audio graph can be thought of as a visual representation of the signal flow in an audio processing application.
The audio graph allows for flexible routing of audio data between processing nodes, which is important for complex signal processing applications. For example, an audio graph can represent a digital audio workstation (DAW), where the user can create and arrange audio processing nodes to process and mix multiple audio tracks. The audio graph allows the user to easily connect and disconnect audio processing nodes, change their parameters, and monitor the audio data at various points in the signal flow.
For example, in the following audio graph (Figure 2), we see a simple audio processing chain consisting of three nodes. The “SineGeneratorNode” creates a sine wave, which is a fundamental type of sound wave with a smooth periodic oscillation. This sine wave is then routed into a “GainNode”, which can adjust the volume of the audio signal. Finally, the processed signal is sent to the OutputNode (speakers), which would output the sound to the user's speakers.
Example audio graph
Example audio graph

Audio Nodes

In the Switchboard SDK there are three kinds of audio nodes: source nodes, processor nodes and sink nodes. For a comprehensive list and detailed documentation of all available nodes, please refer to the API reference page.
Source nodes are audio generators in the audio graph. They don't have inputs, only outputs. Some examples:
  • AudioGraphInputNode: The input node of the audio graph.
  • AudioPlayerNode: A node that reads in and plays an audio file.
SineGeneratorNode: A node that generates sine waves.
Processor nodes receive audio on their input and also output audio. Some examples:
  • MixerNode / SplitterNode: A node that mixes / splits audio streams.
  • GainNode: A node that changes the gain of an audio signal.
  • NoiseFilterNode: A node that filters noise from the audio signal.
Sink nodes only receive audio in an audio graph. They don't have outputs, only inputs. Some examples:
  • IVSBroadcastSinkNode: The audio that is streamed into this node will be sent directly into the Stage (stream).
  • RecorderNode: A node that records audio and saves it to a file.
  • VUMeterNode: A node that analyzes the audio signal and reports its audio level.
Before connecting nodes in the audio graph, you need to add the nodes you wish to connect to the graph. Some nodes support multiple connections (buses) while others support only one connection. For example, a mixer node can receive multiple node's output.

Audio Engine

In order to run an audio graph you need to create an AudioEngine instance. The audio engine handles the device-level audio I/O and connects the device's microphone to the graph's input node and the device's speaker to the graph's output node. You can start and stop the audio engine by simply calling the start and stop methods. One audiograph must only be run by one audio engine.

Adding Switchboard SDK and Amazon IVS to your Android Project

Add the following to your module's build.gradle.
Before you make any calls to the Switchboard SDK you need to initialize it. Please get in touch with Synervoz to get your clientID and clientSecret values or you can use the evaluation license provided below for testing purposes.
To find out more about the integration process please visit the integration guide and Amazon IVS Extension page.

Creating a Karaoke App with Amazon IVS and SwitchboardSDK

Creating the Domain Logic

The core idea of the Karaoke App is the following: mix the backing track with the recorded voice and stream it to a virtual stage in real-time. In order to achieve this we need the following nodes:
  • AudioPlayerNode: plays the loaded audio file.
  • AudioGraphInputNode: provides the microphone signal.
  • MixerNode: mixes the backing track and the microphone signal.
  • IVSBroadcastSinkNode: audio sent into this node is published to the created Stage.
Let’s first create a visual representation of the audio graph, which will help us understand the interaction between the various nodes. We can see that the audio graph is pretty simple, it contains only four nodes.
Visual representation of the audio graph
Visual representation of the audio graph
We'll now dive into the code and wire up the nodes. 
Let’s go through the different steps:
  • STEP 1: declare the audio graph. The audio graph will route the audio data between the different nodes.
  • STEP 2: declare the audio player node, which will handle loading the audio file, and exposes different playback controls.
  • STEP 3: define the mixer node, which is responsible for combining the audio from the audio player node with the audio from input node (microphone signal)
  • STEP 4: declare the IVSBroadcastSinkNode. The audio data transmitted to this node will be published to the Stage. We will take care of the AudioDevice initialization in a later step, which is an interface for a custom audio source.
  • STEP 5: add the audio nodes to the audio graph.
  • STEP 6: wire up the audio nodes. We connect the input node (microphone signal) and the audio player node to the mixer node, and the mixer node to the IVSBroadcastSinkNode. The mixer node mixes the two signals together and sends it to the stream. For more detailed information please visit the official documentation.
In order to run an audio graph we need to create an AudioEngine instance. The audio engine handles the device-level audio I/O and connects the device's microphone to the graph's input node and the device's speaker to the graph's output node. Let’s extend our code with an audio engine:
  • STEP 7: define and initialize the audio engine
The AudioEngine class takes various parameters, let’s go through them:
  • Set microphoneEnabled = true, enables the microphone 
  • Set performanceMode = PerformanceMode.LOW_LATENCY to achieve the lowest latency possible.
  • Set micInputPreset = MicInputPreset.VoicePerformance, to make sure that the capture path will minimize latency and coupling with the playback path. The capture path refers to the microphone recording process, while the playback path involves audio output through speakers or headphones. This preset optimizes both paths for synchronized and clear audio performance.
For information about the parameters please check out the Switchboard SDK Documentation.

The IVSBroadcastSinkNode

Next we need to send the processed audio to Amazon IVS by using the IVSBroadcastSinkNode. The IVSBroadcastSinkNode enables our application to easier route the processed audio to Amazon IVS. In order to set up the IVSBroadcastSinkNode**** we need to initialize the AudioDevice and create a Stage**** with a Stage.Strategy. Let’s have a closer look at how the IVSBroadcastSinkNode communicates with the AudioDevice. The Amazon IVS Broadcast SDK provides local devices such as built-in microphones via DeviceDiscovery**** for simple use cases, when we only need to publish the unprocessed microphone signal to the stream. 
Our use case for the karaoke app is more complicated, since we want to mix the microphone signal with the backing track. We may also want to apply some audio effects on the vocals, before publishing it to an Amazon IVS stage. For this use case, the Amazon IVS Broadcast SDK for Android enables the creation of custom audio input sources by calling createAudioInputSource**** on a DeviceDiscovery**** instance. 
This method creates virtual devices that can be bound to the mixer like any other device.
The AudioDevice returned by createAudioInputSource can receive Linear PCM data generated by any audio source through the following using the appendBuffer method. Linear PCM (Pulse-Code Modulation) data is a format for storing digital audio. 
Without going into much implementation detail let’s look at how the IVSBroadcastSinkNode interacts with the created virtual audio device. 
  • STEP 8: the audio data is channeled to the IVSBroadcastSinkNode by the AudioGraph, utilizing the writeCaptureData function.
  • STEP 9: the audio data is forwarded to the AudioDevice, from where it will be broadcasted to the Stage.
Let’s continue by building up the remaining components of IVSBroadcastSDK.
Let’s go through the different steps:
  • STEP 10: declare an instance of the AudioDevice. Audio input sources must conform to this interface. We will initialize it in STEP 20.
  • STEP 11: declare an instance of DeviceDiscovery. We use this class to create a custom audio input source in STEP 20.
  • STEP 12: declare a list of LocalStageStream. The class represents the local audio stream, and it is used in Stage.Strategy to indicate to the SDK what stream to publish.
  • STEP 13: declare an instance of Stage. This is the main interface to interact with the created session.
  • STEP 14: we define a Stage.Strategy. The Stage.Strategy interface provides a way for the host application to communicate the desired state of the stage to the SDK. Three functions need to be implemented: shouldSubscribeToParticipant, shouldPublishFromParticipant, and stageStreamsToPublishForParticipant.
  • STEP 15: Choosing streams to publish. When publishing, this is used to determine what audio and video streams should be published.
  • STEP 16: Publishing. Once connected to the stage, the SDK queries the host application to see if a particular participant should publish. This is invoked only on local participants that have permission to publish based on the provided token.
  • STEP 17: Subscribing to Participants. When a remote participant joins the stage, the SDK queries the host application about the desired subscription state for that participant. In our application we care about the audio.
  • STEP 18: create an instance of a Stage, by passing the required parameters. The Stage class is the main point of interaction between the host application and the SDK. It represents the stage itself and is used to join and leave the stage. Creating and joining a stage requires a valid, unexpired token string from the control plane (represented as token).  Please note that we also have to create a Stage using the Amazon IVS console, and generate the needed participant tokens. We can also create a participant token programmatically using the AWS SDK for JavaScript.
  • STEP 19: we define the sample rate of the broadcast, based on what sample rate the audio engine is running.
  • STEP 20: create a custom audio input source, since we intend to generate and feed PCM audio data to the SDK manually. We pass the number of audio channels, sampling rate and sample format as parameters.
  • STEP 21: create an AudioLocalStageStream instance, which represents the local audio stream.
  • STEP 22: add the created AudioLocalStageStream instance to the publishStreams**** list. 
We need to create a few more functions, and we are done with the domain logic. 
These steps are generally simple, but let's examine them one by one.
  • STEP 23: Initiate the audio graph via the audio engine. The audio engine manages device-level audio input / output and channels the audio stream through the audio graph. This should be usually called when the application is initialized.
  • STEP 24: stop the audio engine.
  • STEP 25: start playing the backing track.
  • STEP 26: pauses the backing track.
  • STEP 27: checks whether the audio player is playing.
  • STEP 28: loads an audio file located in the assets folder.
  • STEP 29: starts to stream the audio by joining the created stage.
  • STEP 30: we stop the stream by leaving the stage.

Creating a Basic UI

Let’s create a basic UI and bind it to the domain logic that we previously created. For now we only need two buttons: one for starting / pausing the audio player and one for the stream.
First we create the skeleton in XML.
It results in the following simple screen.
Simple screen to play music and start streaming
Simple screen to play music and start streaming
The next step is to bind together the UI with the domain logic.
When you are ready, just press start and your karaoke experience will be streamed live!

Part 2 - Importing Audio Effects through Switchboard Extensions

Already impressive, but what if we want to enhance our voice to sound like a professional singer? The music industry employs a variety of effects to enrich a singer's sound. Among these, Reverb and Autotune are particularly notable. Reverb, short for reverberation, creates an echo effect that mimics singing in a spacious environment, like a concert hall, adding depth and richness to the voice. Autotune, on the other hand, adjusts and corrects vocal pitch, ensuring harmony with the music. It's ideal for smoothing out off-key notes, thus boosting overall vocal performance. Behind the scenes, these effects are underpinned by sophisticated digital signal processing algorithms. Thankfully, we don’t need to delve into their intricate implementation, as existing audio libraries have already tackled this.
One such powerhouse is the Superpowered Audio SDK, a leading C++ Audio Library renowned for its high-performance real-time audio effects, and more. The good news is, there's no necessity to grapple with the nuances of low-level C++ audio coding. The Switchboard Superpowered Extension simplifies this by offering a range of effects through audio nodes. These can be seamlessly integrated into the audio graph we established in PART 1, making it accessible and user-friendly
To import the Superpowered Extension add the following to your module level build.gradle file:
Initialize the Superpowered Extension by calling:
You can use the example license for testing purposes, or visit the official website to get one.

Loading and applying voice changing effects

Now let’s apply some cool effects on our voice!  As previously mentioned the number of available effects is very large, so we will pick three popular effects and implement them. Our choice is: Automatic Vocal Pitch Correction, Reverb and Flanger.  Automatic Vocal Pitch Correction (Automatic Tune) fine-tunes and corrects vocal pitch to ensure it harmonizes with the music, perfect for ironing out any off-key notes and augmenting vocal quality. Reverb, generates an echo effect, simulating singing in a capacious venue like a concert hall, thereby adding depth and richness to the voice. The Flanger effect combines a delayed signal with the original, creating a unique, swirling sound that resembles a jet plane in motion.
Let’s add these components to our KaraokeAppAudioEngine.
  • STEP 31: Declaring the audio effect nodes.
  • STEP 32: Adding the nodes to the audio graph.
  • STEP 33: Connecting the nodes in the audio graph. The audio effect nodes are processor nodes, described in section 1.3.2. These processor nodes accept audio input, apply the designated effect during processing, and then output the enhanced audio.
These effects are highly configurable, but that is out of the scope of this article, please visit the official documentation for more information.

Creating a basic UI

Let’s create a basic UI and bind it to the domain logic. We only need three additional buttons for audio effects.
First we create the skeleton in XML.
This results in the following user interface (UI) screen.
User interface screen with voice effects
Simple UI screen with buttons to enable voice effects
We can now bind together the UI with the domain logic.
  • STEP 34:  Associate the audio effects toggle buttons with their respective effects. Pressing the button activates the effect, and pressing it again deactivates it.
And we are ready to rock! Easy, right?

Part 3 - Testing your new found voice

To hear the audio with the effects applied, create another participant token and join a Stage by using the Amazon IVS Real-Time Streaming Web Sample. You can also join the Stage by using the Synervoz sample app linked from Synervoz's website.

Conclusion

In this tutorial, you created a karaoke app by leveraging the Switchboard SDK and the Extension for IVS to transform your voice and live stream it using Amazon IVS. By following the steps outlined in this tutorial, you have gained the knowledge and tools necessary to lift your own unique voice into new heights.
About the Authors
Tony Vu is a Senior Partner Engineer at Twitch. He specializes in assessing partner technology for integration with Amazon Interactive Video Service (IVS), aiming to develop and deliver comprehensive joint solutions to our IVS customers. Tony enjoys writing and sharing content on LinkedIn.
Balazs Banto, a Senior Software Engineer at Synervoz Communications, specializes in audio programming for mobile devices and contributes to the development of the Switchboard SDK.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments