UPDATE (November 22, 2022): the demo page has moved from heroku to deno and is up and running again.
As an experiment I wanted to try if it is possible to stream music files from one browser to another using WebRTC. This post describes how to achieve this. I’ve published the result of my experiment as a demo web application, you’ll find the link to the demo application below.
What is WebRTC
WebRTC is a JavaScript API that enables web-developers to create real-time communication (RTC) applications. WebRTC uses peer-to-peer connections to send data between browsers, without the need for servers in the data path. WebRTC is mostly known for making audio and video calls from one browser to another, making skype-like communication possible using only browser technology.
But WebRTC has more to it that real-time communication only. Personally I believe that RTCDataChannels, that makes it possible to send arbitrary data between browsers, is the most disruptive feature WebRTC has to offer.
WebRTC explicitly does not handle signaling messages that are needed to setup a peer-to-peer connection between two browsers.
Today I want to write about using WebRTC in combination with WebAudio APIs. This article will not cover WebRTC basics like sending around the signalling, great tutorials on this alreay exist.
What is WebAudio
HTML5 ROCKS has a nice introduction on the basics of WebAudio, please go there for more details on this API.
Tying the two together
Right in the center of the WebAudio API is the AudioContext object. The AudioContext is mostly used as a singleton for routing audio signals and representation of audio object. Below is a simple example on how to play a mp3 file using the WebAudio API:
var context = new AudioContext();
function handleFileSelect(event) {
var file = event.target.files[0];
if (file) {
if (file.type.match('audio*')) {
var reader = new FileReader();
reader.onload = (function(readEvent) {
context.decodeAudioData(readEvent.target.result, function(buffer) {
// create an audio source and connect it to the file buffer
var source = context.createBufferSource();
source.buffer = buffer;
source.start(0);
// connect the audio stream to the audio hardware
source.connect(context.destination);
});
});
reader.readAsArrayBuffer(file);
}
}
}
This code will make a mp3 file, selected using a file input element, play over the audio hardware on the host computer. To send the audio stream over a WebRTC connection we’ll have to add a few extra lines of code:
var context = new AudioContext();
// create a peer connection
var pc = new RTCPeerConnection(pc_config);
function handleFileSelect(event) {
var file = event.target.files[0];
if (file) {
if (file.type.match('audio*')) {
var reader = new FileReader();
reader.onload = (function(readEvent) {
context.decodeAudioData(readEvent.target.result, function(buffer) {
// create an audio source and connect it to the file buffer
var source = context.createBufferSource();
source.buffer = buffer;
source.start(0);
// connect the audio stream to the audio hardware
source.connect(context.destination);
// create a destination for the remote browser
var remote = context.createMediaStreamDestination();
// connect the remote destination to the source
source.connect(remote);
// add the stream to the peer connection
pc.addStream(remote.stream);
// create a SDP offer for the new stream
pc.createOffer(setLocalAndSendMessage);
});
});
reader.readAsArrayBuffer(file);
}
}
}
I’ve tried playing the audio on the receiving side using the WebAudio API as well but this did not seem to work at the time of writing. So for now we’ll have to add the incoming stream to an <audio/> element:
function gotRemoteStream(event) {
// create a player, we could also get a reference from a existing player in the DOM
var player = new Audio();
// attach the media stream
attachMediaStream(player, event.stream);
// start playing
player.play();
}
And we’re done.
Demo
Building upon this example I’ve created a demo web application (requires Chrome) to show this functionality. The demo basically uses the example above but it connects the audio stream for local playback to a GainNode object so the local audio volume can be controlled. The demo application allows you to play an mp3 file from a webpage and let others listen in. On the listener page the music is played using an <audio/> element. The listener page also receives some id3 meta-data using an RTCDataChannel connection. Finally the listener page displays the bit-rate of the music stream that is played.
The demo application has a few loose ends:
- Currently it only works on Chrome (mainly because it plays mp3s)
- On the receiving side it doesn’t use the WebAudio API for playing the audio
- There is static on the line when the audio stream is paused. I haven’t been able to detect, on the receiving side, without signaling, if a stream is removed.
- The audio quality is quite poor, the audio stream has a bit rate of around 32kbits/sec, increasing the bandwidth constrains does not seem to influence the audio stream yet.
It is good to see that WebRTC and WebAudio are playing together quite nicely. In the future it will be possible to expand on this concept and create all kinds of cool applications like for example sending a voice message when a user isn’t available to accept a WebRTC call or a DJ-ing web application where others can listen in on!
Thanks for this article. Really great exemple, simple and clean. I am also experimenting with streaming audio through WebRTC. I have a couple of questions on my side:
1. Instead of using createBufferSource from AudioContext, would it be possible to use createMediaElementSource (from an existing audio element in the DOM for instance) and then connect it to the MediaStreamDestination ?
2. In your exemple I noticed you re-create an offer every time you call addStream on each peer connection. Can’t we create the offer once and for all, and then removeStream/addStream ?
Thanks.
I’ll get back on your questions after my vacation :-)
Have a great vacataion ;-)
Thanks again for the compliment.
As for your questions:
1. This should work fine as both functions create an AudioSourceNode object.
2. I would expect re-creating the offer is not needed after a removeStream but when I tested the code I found it to be necessary anyway. However I did not investigate this thoroughly.
Thanks for your answers. Just came back from my vacations ;-)
I’ll experiment a bit more on the second point and tell you how it went on my side.
see ya.
Do you think it is possible to send via RTCPeerConnection instead of RTCDataChannel? If so, the other side could possible tune in directly without audio tag.
BTW, how about streaming video in this scenario? Is it doable with WebRTC ?
Thanks
The demo does send the audio stream via a PeerConnection. Only the mp3’s meta-data is send via a DataChannel. There is still need to link the incoming media stream to an audio device. In my case I used the HTML element but you can also link it to the Web Audio API.
I don’t think streaming video would work as there is no Web Video API.
Thanks for your quick answers. Regarding the second question, let me to put it in another way, is it possible to stream a local video file via RTCPeerConnection to the other side? For example, to use file APIs to load video file, and create media from it.
I forgot about the captureMediaStreamUntilEnded function so my answer above is not correct. With this function it is possible to capture a media stream that plays the same media (audio and video) as the element. But I don’t know how mature the implementations are. The function is described in http://www.w3.org/2011/audio/drafts/1WD/MediaStream/
Possibly one of the coolest demos I have seen in a long time. thank you for sharing
Thanks!
Very interesting experiment, thanks for a good write-up. Landed here because an idea popped in my head: having a jam sessions with musicians in reasonably close proximity, using audio channeled via WebRTC. I worry the delay makes this inpossible, seeing as anything over 10ms is probably too much and would have that annoying ‘hearing your own echo over the phone’ effect. Any thoughts?
A very cool idea indeed!
All-tough I’m not an expert on the topic I would say it’s going to be pretty hard to get the delay as low as you need it for online / real-time jam sessions. In this demo I’m not doing anything fancy and even on the same PC I already have a noticeable delay.
I guess the 3 most noticeable time consuming parts are the encoding, the data transfer itself and the decoding. In case of more than 2 participants you would need to mix the incoming audio streams as well and this could take some time. If there are a lot of participants you might not want to have streams between all participants (this does not scale very well) and have a server mix the streams. This adds additional latency because it adds extra decoding, mixing, encoding and data transfer delays.
If it’s just the ‘hearing your own echo over the phone’ effect you want to rule out you could create a single stream for each connected participant and leave out the audio you receive from that participant in the stream you send back to him (so he only gets the mixed stream with audio from other participants). Maybe this makes it acceptable for jamming?
Would you be available for a small coding project on this subject, or could you point me in the right direction? Much appreciated.
Hi Tony. Thanks for your post. All code is available under MIT open source license. I’m currently not available to do anything substantial myself but if you give me a bit more detail on what you are looking for I might be able to give you some pointers. You can contact me by sending a mail to eelco @ thisdomain.
Pingback: Video Component For the Interface | Marques J. Hardin