Some time ago we at Igalia embarked on the journey to ship a GStreamer-powered WebRTC backend. This is a long journey, it is not over, but we made some progress. This post is the first of a series providing some insights of the challenges we are facing and the plans for the next release cycle(s).
Most web-engines nowadays bundle a version of LibWebRTC, it is indeed a pragmatic approach. WebRTC is a huge spec, spanning across many protocols, RFCs and codecs. LibWebRTC is in fact a multimedia framework on its own, and it’s a very big code-base. I still remember the suprised face of Emilio at the 2022 WebEngines conference when I told him we had unusual plans regarding WebRTC support in GStreamer WebKit ports. There are several reasons for this plan, explained in the WPE FAQ. We worked on a LibWebRTC backend for the WebKit GStreamer ports, my colleague Thibault Saunier blogged about it but unfortunately this backend has remained disabled by default and not shipped in the tarballs, for the reasons explained in the WPE FAQ.
The GStreamer project nowadays provides a library and a plugin allowing applications to interact with third-party WebRTC actors. This is in my opinion a paradigm shift, because it enables new ways of interoperability between the so-called Web and traditional native applications. Since the GstWebRTC announcement back in late 2017 I’ve been experimenting with the idea of shipping an alternative to LibWebRTC in WebKitGTK and WPE. The initial GstWebRTC WebKit backend was merged upstream on March 18, 2022.
As you might already know, before any audio/video call, your browser might ask permission to access your webcam/microphone and even during the call you can now share your screen. From the WebKitGTK/WPE perspective the procedure is the same of course. Let’s dive in.
WebCam/Microphone capture
Back in 2018 for the LibWebRTC backend, Thibault added support for GStreamer-powered media capture to WebKit, meaning that capture devices such as microphones and webcams would be accessible from WebKit applications using the getUserMedia spec. Under the hood, a GStreamer source element is created, using the GstDevice API. This implementation is now re-used for the GstWebRTC backend, it works fine, still has room for improvements but that’s a topic for a follow-up post.
A MediaStream can be rendered in a <audio>
or <video>
element, through a
custom GStreamer source element that we also provide in WebKit, this is all
internally wired up so that the following JS code will trigger the WebView in
natively capturing and rendering a WebCam device using GStreamer:
<html>
<head>
<script>
navigator.mediaDevices.getUserMedia({video: true, audio: false }).then((mediaStream) => {
const video = document.querySelector('video');
video.srcObject = mediaStream;
video.onloadedmetadata = () => {
video.play();
};
});
</script>
</head>
<body>
<video/>
</body>
</html>
When this WebPage is rendered and after the user has granted access to capture devices, the GStreamer backends will create not one, but two pipelines.
flowchart LR pipewiresrc-->videoscale-->videoconvert-->videorate-->valve-->appsink
flowchart LR subgraph mediastreamsrc appsrc-->srcghost[src] end subgraph playbin3 subgraph decodebin3 end subgraph webkitglsink end decodebin3-->webkitglsink end srcghost-->decodebin3
The first pipeline routes video frames from the capture device using
pipewiresrc
to an appsink
. From the appsink
our capturer leveraging the
Observer design pattern notifies its observers. In this case there is only one
observer which is a GStreamer source element internal to WebKit called
mediastreamsrc
. The playback pipeline shown above is heavily simplified, in
reality more elements are involved, but what matters most is that thanks to the
flexibility of Gstreamer, we can leverage the existing MediaPlayer backend that
we at Igalia have been maintaining for more than 10 years, to render
MediaStreams. All we needed was a custom source element, the rest of our
MediaPlayer didn’t need much changes to support this use-case.
One notable change we did since the initial implementation though is that for us a MediaStream can be either raw, encoded or even encapsulated in a RTP payload. So depending on which component is going to render the MediaStream, we have enough flexibility to allow zero-copy, in most scenarios. In the example above, typically the stream will be raw from source to renderer. However, some webcams can provide encoded streams. WPE and WebKitGTK will be able to internally leverage these and in some cases allow for direct streaming from hardware device to outgoing PeerConnection without third-party encoding.
Desktop capture
There is another JS API, allowing to capture from your screen or a window,
called
getDisplayMedia,
and yes, we also support it! Thanks to these recent years ground-breaking
progress of the Linux Desktop such as PipeWire and
xdg-desktop-portal we can now
stream your favorite desktop environment over WebRTC. Under the hood when the
WebView is granted access to the desktop capture through the portal, our backend
creates a pipewiresrc
GStreamer element, configured to source from the file
descriptor provided by the portal, and we have a healthy raw video stream.
Here’s a demo:
WebAudio capture
What more, yes you can also create a MediaStream from a WebAudio
node.
On the backend side, the
GStreamerMediaStreamAudioSource
fills GstBuffers from the audio bus channels and notifies third-parties
internally observing the MediaStream, such as outgoing media sources, or simply
an <audio>
element that was configured to source from the given MediaStream. I
have no demo for this, you’ll have to take my word.
Canvas capture
But wait there is more. Did I hear canvas? Yes we can feed your favorite
<canvas>
to a MediaStream. The JS API is called
captureStream,
its code is actually
cross-platform
but defers to the HTMLCanvasElement::toVideoFrame()
method which has a
GStreamer implementation. The code is not the most optimal yet though due to
shortcomings of our current graphics pipeline implementation. Here is a demo of
Canvas to WebRTC running in the
WebKitGTK MiniBrowser:
Wrap-up
So we’ve got MediaStream support covered. This is only one part of the puzzle though. We are facing challenges now on the PeerConnection implementation. MediaStreams are cool but it’s even better when you can share them with your friends on the fancy A/V conferencing websites, but we’re not entirely ready for this yet in WebKitGTK and WPE. For this reason, WebRTC is not yet enabled by default in the upcoming WebKitGTK and WPE 2.40 releases. We’re just not there yet. In the next part of these series I’ll tackle the PeerConnection backend on which we’re working hard on these days, both in WebKit and in GStreamer.
Happy hacking and as always, all my gratitude goes to my fellow Igalia comrades for allowing me to keep working on these domains and to Metrological for funding some of this work. Is your organization or company interested in leveraging modern WebRTC APIs from WebKitGTK and/or WPE? If so please get in touch with us in order to help us speed-up the implementation work.