What is WebRTC?
WebRTC — Web Real-Time Communications — is a standard and an open framework for real-time video, audio, and messaging. You have probably used WebRTC. The video and audio calling features of Microsoft Teams, Google Meet, Facebook Messenger, Discord, WhatsApp, and Snap are all built on top of WebRTC.
A standard, an open framework, and built-in browser support
WebRTC has been under development for more than a decade (see below for a brief history). In 2021 two important standards bodies jointly announced publication of the finalized, official WebRTC standards document.
In addition to being a standard, WebRTC is also built into the major web browsers: Chrome, Safari, and Firefox. This means that web developers can use WebRTC to build video, audio, and messaging applications that work on almost every computer and mobile device in the world with no downloads or software installs required.
The core WebRTC code that Chrome, Safari, and many other applications use is also available as an Open Source project developers can use in their own C++ applications. This code is actively maintained and constantly improved by many contributors, with especially heavy participation and oversight from Google engineers.
A brief history of WebRTC
Early work on the technology that would become WebRTC started at a company called Global IP Solutions, founded in 1999. In 2010, Google acquired Global IP Solutions and began to propose Internet standards based on the company’s technology and code.
Other major technology companies including Ericsson and Cisco participated in the standards process. As WebRTC gained momentum and began to be implemented in the Chrome web browser, early adopters of the technology included Google Hangouts/Meet, the social app Houseparty, Twilio Voice, and many telehealth platforms.
WebRTC 1.0 was finalized in 2021 but WebRTC will continue to evolve. New networking standards like QUIC and WebTransport are being developed with WebRTC in mind. Extensions of WebRTC like WISH and WHEP see active development for specific use cases. And the next major iteration of WebRTC is being worked on by the standards committee as the WebRTC Next Version Use Cases draft.
WebRTC technology overview
WebRTC is a complicated standard with many moving parts. The most important components are signaling; establishing a network connection; encryption; video and audio processing, compression and transmission; and scaling sessions using media servers.
Signaling is the process of exchanging enough information to begin setting up a WebRTC session for a client. Perhaps surprisingly, signaling is not fully specified by the WebRTC standard. Some components of signaling are defined, but many parts of signaling are left open. This flexibility allows WebRTC to support a very wide range of use cases, but also puts a large initial burden on application developers.
Establishing a network connection between any two clients on any device anywhere in the world is a complex problem. Here the WebRTC specification does a great deal of heavy lifting, simplifying and standardizing how to negotiate a real-time connection over the open Internet.
Encryption is critical to security and privacy, and here again the WebRTC standard shines. All WebRTC connections are encrypted point-to-point and the specification makes use of tried-and-true best practices for encryption and key exchange. (Note, however, that most WebRTC sessions in the real world are not encrypted end-to-end, because it’s usually necessary to route audio and video through media servers in the cloud. See below for more about media servers.) Encryption is notoriously difficult to do well, so having very good encryption built into the lowest levels of the WebRTC stack is a big benefit for developers.
Video and audio processing, compression and transmission are complicated, CPU-intensive activities that are very sensitive to variations in network and device performance. The WebRTC standard specifies baseline support for core codecs — VP8, H264, and Opus audio — along with a framework for adding future codecs such as VP9 and H265. The major Open Source WebRTC implementation also includes good adaptive bandwidth management and real-time feedback implementations, along with excellent echo cancellation and noise reduction for audio. (And these features continue to improve with each release.)
To support real-time sessions that include more than a few participants, it’s necessary to make use of media servers in the cloud for routing video and audio efficiently. As is the case with signaling, the WebRTC standard provides some building blocks but leaves unspecified most of the details of how media servers should work. This allows for design flexibility, experimentation, and technology evolution. For example, it is now commonly recognized that a multi-server, “mesh” approach to media routing is the best way to implement scalability and call quality for geographically dispersed clients. This was not obvious in the early days of WebRTC development.
Developing a WebRTC application
However, four parts of WebRTC are complicated enough that almost every product team implementing real-time video and audio features will rely in production on a WebRTC platform. A WebRTC platform provides developer tooling, APIs and infrastructure on top of the core WebRTC standard.
- The WebRTC standard leaves the specification of call setup and management — signaling — open. Implementing robust, hardened signaling is a fairly large amount of work. And because signaling is not part of the standard, there is little interoperability between different platforms and even between Open Source helper libraries.
- Similarly, the need for media servers to scale to calls with more than three or four participants means that a major part of developing and deploying a WebRTC application can’t be handled just by writing client-side code. Expertise in configuring and tuning a media server is a skill set that is outside the experience of most application developers.
- For WebRTC calls to work well, media servers need to be as close to end users as possible. Real-time video and audio sessions are very sensitive to network latency. Just as it is beneficial to distribute traditional application assets using a globally distributed Content Delivery Network (CDN), it’s extremely helpful to route video and audio through a globally distributed network of media servers. And just as setting up and maintaining a CDN is outside the scope of work that most product teams would want to take on, setting up and maintaining a global mesh network of media servers is non-trivial.
- Finally, many important application features have complicated underpinnings and rely on tight integration at multiple layers of the technology stack. For example, recording and telephone dial-in are two common features of video calling applications that require large amounts of supporting code and infrastructure.
For all of these reasons, implementing real-time video tends to benefit from platform-level support in the same way that, say, payments or SMS features do. Several good WebRTC infrastructure platforms exist to service this developer need.