A technical guide to the Zoom Web SDK

Zoom has impressive market share in the video conferencing space. Zoom’s infrastructure and tech stack is very good. The Zoom desktop clients for macOS and Windows are easy to download and work well.

More recently, Zoom launched developer SDKs. These developer SDKs are less mature than the Zoom end-user products. In particular, the Zoom Web SDK has important feature gaps and major performance issues that developers should be aware of before attempting to port web applications to Zoom.

We've broken this post up into feature gaps relative to more established developer SDKs, performance issues relative to native WebRTC, and SDK maturity.

Feature gaps

Standard HTML <video> and <audio> elements aren’t supported
Styling and positioning video tiles requires writing a lot of complex code
Maximum video resolution is 720p
No high-fidelity audio support
Virtual backgrounds and background blur are not available in Safari
No custom video or audio tracks
No low-level video simulcast control
Limited debugging information
No end-to-end encryption
No HLS live streaming or recording
No access to raw media tracks
No React helper libraries

Performance issues

Substandard video quality
High CPU usage
Call size limit of 1,000 participants

How video works on the Web

Web developers today can build video, audio, and messaging applications that work on almost every computer and mobile device in the world, with no application downloads or installs required. This is made possible by an Internet standard called WebRTC, which all the major web browsers support.

Zoom’s Web SDK ports parts of Zoom’s proprietary video stack into JavaScript and WebAssembly code. Zoom does not use WebRTC. This mismatch between Zoom’s technology and video on the web means that Zoom’s Web SDK will perform poorly in browsers compared to best-in-class WebRTC video implementations.

Here is a brief overview of WebRTC. And here is a technical deep dive into three important video standards: WebRTC, RTMP, and HLS.

Web applications like Google Meet and Microsoft Teams use WebRTC. WebRTC is also used outside the web browser by native mobile applications like WhatsApp and Snap.

In fact, most real-time video and audio calls today run on WebRTC. Zoom is one of the few exceptions.

Zoom’s proprietary video vs WebRTC

Zoom’s proprietary video stack uses Zoom’s own specific implementation of the H.264 video codec, designed to run efficiently in Zoom’s macOS and Windows applications. Expertise at the video codec level gave Zoom an important advantage in the early days of consumer video conferencing. Zoom developed a reputation for “it just works” when other tools struggled to deliver a reliable video experience.

Advantages of Zoom’s proprietary approach to using H.264 include:

The ability to make fine-grained decisions about trade-offs between video resolution, frame rate, network bandwidth, and CPU usage.
Flexibility to optimize infrastructure and client implementations together, which can lead to significant operational efficiencies at scale.

Disadvantages include:

Relying so heavily on a specific H.264 implementation limits options on some platforms, particularly the web browser. The WebRTC specification mandates the use of a different variant of H.264 than Zoom uses.
Zoom is locked out of the ecosystem benefits that come from using a standard like WebRTC. Open standards generally outpace proprietary stacks in performance, features, flexibility, and security over the long term.

Today, there is no longer a gap between Zoom’s proprietary H.264 implementation versus WebRTC. In fact, WebRTC usage now far outpaces usage of proprietary stacks, including Zoom. WebRTC platforms accommodate a much wider variety of use cases than Zoom is capable of. WebRTC is used for 1:1 video sessions and 100,000-participant live streams. WebRTC can deliver good video quality on resource-constrained devices like entry-level Android phones, or can deliver 4k video at 120 frames per second on more powerful devices.

Zoom Web SDK feature gaps

Perhaps because Zoom has always prioritized development of its own Windows and macOS applications, Zoom’s Web Video SDK is relatively feature poor. In addition, technology mismatch between Zoom’s video stack and how video is implemented in web browsers limits the Zoom Web SDK’s functionality.

Developers porting from WebRTC platforms will find that many things they consider “table stakes” are missing.

Standard HTML video and audio elements aren’t supported

The Zoom Web SDK only supports video rendering via drawing to a single canvas element. The SDK automatically plays all audio streams internally through a WebAudio pipeline.

This means that you can’t use <video> and <audio> elements to play video and audio, as you would in a normal web app. Video can’t be styled using CSS. Each video tile must be drawn as a 16:9 rectangle on a single, shared canvas. (The Zoom consumer web app uses multiple canvas elements, but the Web SDK only supports drawing to a single canvas.)

This use of a canvas for video rendering also creates performance and responsiveness issues. Here is video showing how the official Zoom Web SDK demo application looks when its window is resized.

Zoom canvas resize issues

Styling and positioning video tiles requires writing a lot of complex code

Because all inbound video streams must be drawn on a single canvas and can only be drawn as 16:9 rectangles, creating anything other than a very simple UX requires a lot of code.

For example, implementing square or round video tiles — or even rounded corners — requires using techniques like drawing to an offscreen canvas.

Zoom does not provide any library support for implementing these kinds of custom, multi-pass canvas rendering operations. For example, you will need to write code by hand for double buffering, aligning pixels on the <canvas> element with other DOM elements, responding to resize events, and more.

Maximum video resolution is 720p

Zoom sets a hard limit of 720p on video resolution. This makes use cases that require high quality live streaming or cloud recording impossible.

No high-fidelity audio support

Zoom's consumer applications have support for sending higher fidelity audio, intended for music use cases. This is called "music mode" or "original sound" in the UX.

The Zoom Web SDK does not allow the audio stream to be configured for higher fidelity. The audio stream is locked to a configuration appropriate for low-bandwidth speech streams. Here is a Zoom developer forum post raising this issue.

To support music use cases, WebRTC platforms generally implement audio presents, expose multiple low-level audio parameters, or both.

Virtual backgrounds and background blur are not available in Safari

This is presumably a limitation that Zoom will fix at some point. But as of December 17 2023, virtual backgrounds and background blur are not supported in Safari.

No custom video or audio tracks

The Zoom Web SDK only allows video and audio input from system devices or a URL. Custom tracks are not supported. So it is impossible to do any local video or audio processing on a camera or mic stream before sending a track into a session.

You can’t bring your own or third-party background replacement or noise suppression solutions into your web app. You are limited to the Zoom Web SDK’s features, which are significantly less capable than offerings from, for example, Krisp.ai and Banuba.

See also No access to raw media tracks, below.

No low-level video simulcast control

Zoom provides default send- and receive-side bandwidth and video quality management. In the Zoom native clients, the algorithms for this work quite well. They do not work as well in the Zoom Web SDK and are not flexible enough to deliver the best possible user experience across the full range of real-world use cases.

For example, live streaming scenarios often require sending very high quality video layers that are appropriate for cloud recording and for users on fast network connections, alongside lower-quality fallback layers for users on slower connections. The Zoom Web SDK does not support this.

Limited debugging information

Experienced WebRTC developers rely heavily on a combination of:

standards-based debugging and testing tools like Chrome’s webrtc-internals interface and testRTC
platform-specific metrics and logs (see Daily’s metrics docs here, for example)

Zoom’s proprietary approach means that standard video debugging and performance optimization tools mostly aren’t useful. And the Zoom platform does not offer any detailed post-session logs or metrics data.

No end-to-end encryption

In 2020 the Federal Trade Commission accused Zoom of making substantive misrepresentations about security and encryption, including in HIPAA documentation. Zoom entered into an agreement with the FTC that mandated security improvements and that Zoom stop falsely claiming to support end-to-end encryption. In 2021 Zoom settled a related class action lawsuit for $85m.

Today, Zoom offers optional end-to-end encryption in their native macOS, Windows, and Zoom Room applications. This encryption is proprietary and it’s not possible to verify independently that Zoom is encrypting all data end-to-end.

End-to-end encryption is not supported at all in the Zoom Video SDKs for developers.

WebRTC platforms can build on top of WebRTC’s excellent, standards-based support for auditable end-to-end encryption. When a WebRTC connection is configured so that data is routed peer-to-peer, it is possible for any third party (including tech-savvy end users) to independently verify that data is encrypted end-to-end.

No HLS live streaming or recording

Zoom offers RTMP live streaming and MP4 cloud recording. Zoom does not offer HLS live streaming or recording. HLS has a number of advantages over both RTMP and MP4 for many of today’s live streaming and recording use cases.

With HLS, you can live stream directly to an audience of any size (millions of viewers). No transcoding or rebroadcasting services are needed.

Using HLS also gives you multi-bitrate recordings that are immediately playable on any device and any network connection. Again, no transcoding is needed for production-ready, on-demand streaming. Just set up the CDN of your choice in front of your HLS recordings bucket to create a cost-effective video streaming solution that’s compatible with any hosting stack.

For more information on how WebRTC, RTMP, and HLS compare, see our technical deep dive into these three widely used video protocols.

No access to raw media tracks

The Zoom Web SDK does not give developers access to raw audio or video data. This makes it impossible to build applications that do any processing of inbound audio or video. For example, you can’t do any filtering or analysis of audio, can’t implement client-side transcription, and can’t build AI-powered video features like face filters.

No React helper libraries

The React front-end framework is widely used for dynamic, single-page web apps. React offers sophisticated state management features and a powerful virtual DOM abstraction.

Some of React’s abstractions are tricky to use efficiently and safely in combination with real-time video and audio elements. For this reason, many WebRTC platforms offer React-specific helper libraries. For example: Daily’s daily-react and Vonage’s opentok-react.

Zoom Web SDK performance issues

The Zoom Web SDK uses some components of Zoom’s proprietary video stack, combined with some parts of the web browser’s native WebRTC support. This is a creative approach. But it results in high CPU usage, video quality problems, and call scaling issues.

Video quality

Zoom encodes and decodes video and audio using custom WebAssembly modules rather than the web’s standard codecs. This means that the Zoom Web SDK uses more CPU than the native browser WebRTC stack does. Zoom’s web video resolution is limited to 720p and is often lower in real-world situations, especially on older devices and most mobile phones (even current-generation iPhones).

Here’s a video showing pixelated, low resolution video quality in the Zoom Web SDK sample app running in Safari on an iPhone 15. This test is easy to replicate. Simply run the sample app and join a call from both an iPhone and a laptop.

Zoom iPhone video quality

For video and audio transport, Zoom uses WebRTC data channels rather than WebRTC media tracks.

Zoom's combination of using both non-standard encoding and non-standard media transport makes it impossible to “shape” the bitrate used for video as effectively as a native WebRTC solution.

These limitations show up as jerky video — freezes and inconsistent framerates — any time there are variable network conditions or local packet loss. For a simple, real-world test, start a video call and then walk away from your WiFi router until the signal starts to degrade. A good video calling implementation should handle moderate packet loss with very little visual impact. The Zoom Web SDK exhibits freezes and jerky video even for users on fairly good WiFi networks.

High CPU usage

Efficient CPU usage is critical for video applications. The Zoom Web SDK can’t make use of the highly optimized H.264 and VP8 codecs that are built into today’s web browsers.

As a result, on older computers and phones, the Zoom Web SDK has issues with high CPU usage and low video quality. Even on newer laptops and phones, Zoom in a web browser can’t display multiple videos in grid mode with acceptable visual performance.

Here are CPU usage tests on a fairly typical older laptop, a 2.6 GHz Dual-Core Intel Core i5 macOS machine manufactured in 2020.

In a 2-person test call, the Zoom Web SDK sample application delivers video at 360p resolution. The Safari process uses 90-100% CPU as measured by Activity Monitor. The video frame rate is inconsistent and the machine overall feels heavily loaded and laggy.

Here is the same 2-person test using Daily’s native WebRTC SDK. Configured to deliver the same video resolution as Zoom (360p), the Safari process uses 25-40% CPU. The video frame rate is consistent at 30fps. The machine feels responsive. Daily can also deliver 720p video on this machine, but CPU usage goes up to 80% and if other applications are running at the same time, the machine may start to lag. So we generally don’t recommend trying to send and receive 720p video on older devices.

Here is a four-person call on the same machine. With the Zoom Web SDK, Safari CPU usage is 120%. The machine is very laggy. Audio and video are out of sync by several seconds. The Zoom sample application has gotten confused about the pixel resolution of the local video stream.

Here is the same four-person call using Daily’s native WebRTC SDK. Configured to deliver the same resolution as the Zoom Web SDK (360p), CPU usage is 70%, the frame rate is steady, and the machine is responsive.

Scaling calls

The Zoom Web SDK is limited to a maximum call size of 1,000 participants. This puts interactive streaming use cases like live auctions, events with audience participation, and social games out of reach.

SDK maturity and developer tooling

Zoom has historically focused on consumer desktop applications. The Zoom Web SDK is less mature, has fewer features, and performs poorly compared to the company’s core products. It has not been widely used in browser-oriented embedded video applications.

As of the first week of December, 2023, the Zoom Web SDK shows fewer than 4,000 downloads per week on npmjs.com. Daily's npmjs download stats average about 10 times Zoom's downloads, week over week.

Daily's npmjs download counts average 10x greater than Zoom's, week over week

Zoom’s guides and official sample application for the Web SDK are incomplete and sometimes misleading. Code workarounds are required for the SDK to work properly on Safari. The Zoom guide for migrating from Twilio Video recommends implementing a precall test in a way that won’t be helpful for a real-time video application. Zoom’s code samples don’t cover basic topics like how to listen for important browser events.

Here’s a video showing the official Zoom web sample app leaving stale video participants in a session for more than 2 minutes.

Zoom demo app stale participants

Zoom provides little support for event logging, load testing, session analytics, integration with BI data systems, and many other things that are helpful for production coding.

Development teams building video applications that need to run in a web browser should carefully consider all of these issues before committing to building and maintaining applications using the Zoom Web SDK.

Categories

Topics