This post is part six of a series on how to build an app with spatialization features using Daily's real time video and audio APIs.

Introduction

In the previous part of our spatialization series, we talked about creating interactive focus zones in our 2D world. We thought that would wrap up our app's feature set, but we decided to add one additional (yet key) feature: screen sharing!

Let's look at how we can extend our spatial audio/video app with a screen share feature, which requires a bit of refactoring of our current app.

You can check out the full diff of our new screen sharing feature in this pull request, to see how exactly we modified our original demo to add this feature.

As a recap, our spatialization demo is written in TypeScript and uses the PixiJS rendering framework.

Getting started with our new feature

Since this is an additional feature to our existing demo, we've published it in its own release (meaning it gets its own git tag). We recommend checking out the rest of the series to get familiarized with the demo, since we’ll be building on code we’ve covered before.

You can set up the demo locally as follows:

First, you will need a Daily account and a Daily room.

To clone and run Daily’s spatialization demo, run the following commands in your terminal:

git clone git@github.com:daily-demos/spatialization.git
cd spatialization
git checkout tags/v2.0.0
npm i
npm run build
npm run start

Designing the screen sharing feature

As we saw in the last post of our series, our demo contains focus zones where participants can gather amongst themselves or broadcast to all other users in the world. These zones provide a good opportunity for screen sharing, so our new feature will work as follows:

  • A user in a broadcast zone can share their screen to all other users in the world
  • One user per desk zone can share their screen to their zonemates
Gif of 2D world with a user entering a desk spot and screen sharing a terminal window
Screen sharing example

This also works nicely with our recommended screen track subscription constraints. We suggest that users subscribe to a maximum of two screen share tracks at a time. Having too many screen share track subscriptions coupled with camera and microphone tracks can have a significant performance impact.

But wait! We also need to make sure the user's browser supports screen sharing, and that the Daily room configuration we're using permits it. We will use Daily's API to confirm both of these things and conditionally enable screen share controls based on that.

In addition to the above, we'll utilize a slightly different way of obtaining the screen video tracks than what we've done so far for audio and camera. For this feature, we will obtain screen video by calling MediaDevices.getDisplayMedia() manually and then passing the screen MediaStream to daily-js.

This is not strictly required with Daily, but has been a common use case for some of our customers, so we decided to showcase it in the demo implementation. The benefit of this approach is that by getting the display media themselves, the caller has direct control over the track constraints. If Daily is tasked with obtaining the track, the caller can only specify the maximum width and height of the track and pass an optional audio stream if they choose. For many use cases this works just fine, but some of our customers prefer more direct control.

Now that we know what we're building, let's get started!

Screen share controls

Let's start with adding another button to our control panel at the bottom of the world view. Right now, our controls contain three options:

  • a toggle for the camera
  • a toggle for the microphone
  • a button to leave the call
A call control panel with camera toggle, mic toggle, and Leave Call button
Call controls without screen sharing support

If screen sharing is supported by the user’s browser and enabled in the Daily room they join, we're going to show another button to toggle screen share to the above:

Call control panel with camera toggle, mic toggle, screen share button, and Leave Call button
Call controls with screen sharing support

We'll do this by adding another button element to our "controls" div in index.html:

 <button
   id="toggleScreenShare"
   class="screen-disabled hidden"
   aria-label="Toggle screen share"
   disabled
 ></button>

We start off with this button disabled and hidden by default, because we don't know if the room we're joining allows screen sharing yet.

Next, in nav.ts, we add some exported methods to register listeners for button click events and to update the button. This largely follows the same pattern as the logic we already have for the camera and mic toggle controls.

const toggleScreenBtn = <HTMLButtonElement>(
  document.getElementById("toggleScreenShare")
);

export function registerScreenShareBtnListener(f: () => void) {
  if (toggleScreenBtn.classList.contains("hidden")) {
    toggleScreenBtn.classList.remove("hidden");
  }
  toggleScreenBtn.onclick = f;
}

export function updateScreenBtn(screenOn: boolean) {
  if (screenOn && !toggleScreenBtn.classList.contains("screen-on")) {
    toggleScreenBtn.classList.remove("screen-off");
    toggleScreenBtn.classList.add("screen-on");
    return;
  }
  if (!screenOn && !toggleScreenBtn.classList.contains("screen-off")) {
    toggleScreenBtn.classList.remove("screen-on");
    toggleScreenBtn.classList.add("screen-off");
  }
}

Above, we first get our "toggleScreenShare" button. We then define the registerScreenShareBtnListener() function. When registering a listener, we also make sure the button isn't hidden to the user. Until we call this function to register at least one listener, the button will remain hidden from view and completely disabled.

However, screen sharing requires one additional navigation function: temporarily disabling and re-enabling the button. Unlike the camera and mic controls, the screen share controls should only be enabled when:

  • Screen sharing is enabled in the user's Daily room config and:
    • A user is in a broadcast zone
    • A user is in a desk zone and nobody else is currently sharing their screen
    • A user is already sharing their screen, because if screen sharing is on the user should always be able to disable it

To help us accomplish this, we have created an enableScreenBtn() function:

export function enableScreenBtn(doEnable: boolean) {
  if (toggleScreenBtn.classList.contains("hidden")) return;
  if (doEnable) {
    toggleScreenBtn.classList.remove("screen-disabled");
    toggleScreenBtn.disabled = false;
    return;
  }

  toggleScreenBtn.classList.add("screen-disabled");
  toggleScreenBtn.disabled = true;
}

Above, we first check if the button is permanently hidden and early out if so.

Otherwise, if the caller is enabling screen sharing, we update the button accordingly.

We also need to disable the button once more when the user leaves the call — in case the next room they join does not have screen sharing enabled. We'll do this by adding a couple of additional calls to our existing showJoinForm() function, which is called when the user leaves a call:

export function showJoinForm() {
  removeAllZonemates();
  stopBroadcast();

  const entryDiv = document.getElementById("entry");
  const callDiv = document.getElementById("call");
  callDiv.style.display = "none";
  entryDiv.style.display = "block";
  joinForm.style.display = "block";
  toggleScreenBtn.onclick = null;
  enableScreenBtn(false);
  toggleScreenBtn.classList.add("hidden");
}

In the three new lines above we reset the onclick handler for the screen share button, disable our button once more,  and re-add the "hidden" class to hide the button completely (this will be a no-op if the element already has this class assigned).

Now that control manipulation is all set up in nav.ts, let's move on to our Room class — where the screen sharing magic happens.

Gob Bluth doing magic in a boardroom

Enabling screen sharing

In our Room class, the first thing we do is add a new boolean to our State type:

type State = {
  audio?: boolean;
  video?: boolean;
  // New screen state bool:
  screen?: boolean;
};

Next, in our existing handler for the "joined-meeting" event, we check if screen sharing is enabled in the Daily room the user just joined by adding an extra call to a new function called maybeEnableScreenSharing():

  private maybeEnableScreenSharing() {
    // Retrieve our browser properties and check if screen sharing is possible
    const browserInfo = DailyIframe.supportedBrowser();
    if (!browserInfo.supportsScreenShare) return;

    // Retrieve our room properties and check if screen sharing is enabled
    this.callObject
      .room({ includeRoomConfigDefaults: true })
      .then((roomInfo) => {
        const info = roomInfo as DailyRoomInfo;
        // If screen sharing is disabled, early out
        if (!info || !info.config?.enable_screenshare) return;

        // If screen sharing is enabled, enable our screen share controls
        registerScreenShareBtnListener(() => {
          const isSharing = this.callObject.participants().local.screen;
          if (!isSharing) {
            this.startScreenShare();
            return;
          }
          this.stopScreenShare();
        });
      });
  }

Let's go through what's happening here:

  • First, we make a call to Daily's static supportedBrowser() method and check if the user's browser supports screen sharing. If not, we early out.
  • Next, we make a call to the room() call object instance method. This returns a Promise with our room configuration.
  • Once the Promise resolves, we cast the returned data to the relevant Daily type (DailyRoomInfo in this case).
  • If the info object is not formed as expected or the configuration enable_screenshare property is falsy, we go no further: screen sharing is not supported in this room.
  • Otherwise, we go on to call the registerScreenShareBtnListener() function we went over earlier in this post to enable screen sharing. When the user clicks the screen share button, we'll start the screen share if it isn't already on. If the user is already screen sharing, we'll stop the screen share.

So what actually happens when we call startScreenShare()?

[Screen] sharing is caring

As we mentioned above, we'll be grabbing our screen MediaStream ourselves for the purpose of showcasing how one can implement this use case with Daily. We do this in the startScreenShare() Room method:

  private async startScreenShare() {
    let captureStream = null;
    const options = {
      video: true,
    };

    try {
      captureStream = await navigator.mediaDevices.getDisplayMedia(options);
    } catch (err) {
      console.error("Failed to get display media: " + err);
    }
    if (captureStream) {
      this.callObject.startScreenShare({ mediaStream: captureStream });
    }
  }

Above, we configure our options to retrieve video only. The constraints we can specify here are quite extensive, but for our purposes we just specify the most basic "Just give us video, please" option. You can read more about all the potential parameters here.

Next, we await a call to getDisplayMedia(), which gives us a MediaStream with the user's screen video track.

If we obtained this stream successfully, we call the Daily call object startScreenShare() instance method and pass our media stream to Daily.

Voilà! We're officially screen sharing.

Julianna Margulies saying 'And Voila'

Handling screen share tracks

The startScreenShare() call we went over above, if successful, will result in a Daily "participate-updated" event which contains the local user's screen track. So we modify our updateLocal() method to handle screen state changes:

  private updateLocal(p: DailyParticipant) {
    // Existing logic:
    if (this.localState.audio != p.audio) {
      this.localState.audio = p.audio;
      updateMicBtn(this.localState.audio);
    }
    if (this.localState.video != p.video) {
      this.localState.video = p.video;
      updateCamBtn(this.localState.video);
    }
    // New screen share logic:
    if (this.localState.screen != p.screen) {
      this.localState.screen = p.screen;
      updateScreenBtn(this.localState.screen);
    }
  }

Above, just as with the microphone and camera, we toggle the local state depending on whether the local participant's screen is on or not and call the updateScreenBtn() function we went through above accordingly.

Remote users also get a "participant-updated" event with the screen track. To utilize it, we update our getParticipantTracks() method to return the screen share track if one exists. The code is very much the same as what we already do for the camera and microphone and you can check it out here. This track will then be passed to our World instance to actually display the track as needed. We'll go through what's happening in the World instance shortly.

Finally, we make a small adjustment in our existing subToUserTracks() method, to ensure that when we're subscribing to another participant's tracks, the screen track is included. We do this by setting screenVideo to true:

 setSubscribedTracks: { audio: true, video: true, screenVideo: true },

We'll go through how these tracks are handled in the world shortly, but first we need to check out how we handle the user stopping the screen share.

Stopping the screen share

Most browsers allow you to stop screen sharing outside of the application itself. This method doesn't go through any of our code at all — once the user stops screen sharing, Daily sends a new "participant-updated" event with the appropriate participant properties updated to indicate that the user is not screen sharing and that there is no valid track.

Firefox browser "Stop Sharing" UI
Firefox browser-specific "Stop Sharing" UI

However, we also want to allow the user to stop screen sharing in our application itself. As we saw above, when they click on the screen share button while screen sharing, our stopScreenShare() method will be called:

  private stopScreenShare() {
    this.callObject.stopScreenShare();
    // The above call performs relevant cleanup and ensures 
    // associated events get fired, BUT daily-js only calls 
    // stop() on Daily-managed tracks. Since we got our screen 
    // track ourselves, we must call stop on it manually.
    this.callObject.participants().local.screenVideoTrack?.();
  }

Above, we first call stopScreenShare() on the call object, which performs some relevant cleanup and would normally handle stopping the screen tracks for us. However, as noted in the comment, we then also stop our screen track manually. This is because we retrieved our display media by hand via getDisplayMedia() and then passed the track to Daily. The call object's stopScreenShare() instance method only stops Daily-managed tracks, which this one is not.

Gif of a user clicking the screen share button to stop sharing
Stopping screen sharing

Disabling screen sharing

As we mentioned, we don't want the screen share button to be clickable when the user is in a global traversal zone. So, in the onJoinZone() callback defined inside of our Room class, we add a bit of logic to handle screen sharing:

    // The function World will call when the local user changes zone.
    // This will update their bandwidth and broadcast their new zone
    // to other participants.
    const onJoinZone = (zoneData: ZoneData, recipient: string = "*") => {
      if (zoneData.zoneID === globalZoneID) {
        this.setBandwidth(BandwidthLevel.Tile);
        if (this.localState.screen) {
          this.stopScreenShare();
        }
        enableScreenBtn(false);
      } else {
        this.setBandwidth(BandwidthLevel.Focus);
        enableScreenBtn(true);
      }
      const data = {
        action: "zoneChange",
        zoneData: zoneData,
      };
      this.broadcast(data, recipient);
    };

Above, if the local user is joining the global traversal zone, we stop any screen share they may currently have active and call enableScreenBtn() with a false parameter to disable the button.

If the user is joining a focus zone, we call enableScreenBtn(true) to enable the button.

Now, let's see how our World class handles our new track.

Handling screen share tracks in the World

As a quick recap, we have a single method called from our Room instance to our World instance when a user's tracks are updated: world.updateUser(). Before, this method took the participant's session ID, name, video, and audio tracks as parameters. Now that we're adding screen sharing support, we'll add one more parameter for the screen track:

private handleParticipantUpdated(event: DailyEventObjectParticipant) {
    const p = event.participant;
    const tracks = this.getParticipantTracks(p);
    world.updateUser(
      p.session_id,
      p.user_name,
      tracks.video,
      tracks.audio,
      tracks.screen
    );
    if (p.session_id === this.callObject.participants()?.local?.session_id) {
      this.updateLocal(p);
    }
  }

The world's updateUser() method then passes the screen track to the user's updateTracks() method:

  updateUser(
    id: string,
    name: string,
    video: MediaStreamTrack = null,
    audio: MediaStreamTrack = null,
    screen: MediaStreamTrack = null
  ) {
    const user = this.getUser(id);
    if (user) {
      user.updateTracks(video, audio, screen);
      if (!user.isLocal) {
        user.setUserName(name);
      }
    }
  }

Finally, the user's updateTracks() method calls updateScreenSource() on the user's UserMedia instance:

    this.media.updateScreenSource(screenTrack);

At this point, UserMedia decides what to do with the track. We've left comments inline to walk you through the function:

  updateScreenSource(newTrack: MediaStreamTrack) {
    const hadTrack = Boolean(this.screenTrack);
    this.screenTrack = newTrack;

    // If this user is traversing, don't show their screen track
    if (this.currentAction === Action.Traversing) return;

    // Otherwise, if the user has a new track and is not traversing,
    // try to show the screen track
    if (this.screenTrack) {
      this.tryShowScreenShare();
      return;
    }
    // If we had a track previously and no longer do,
    // try to remove any existing screen share DOM elements
    if (hadTrack) {
      this.tryRemoveScreenShare();
    }
  }

Let's see what tryShowScreenShare() and tryRemoveScreenShare() do:

tryShowScreenShare() and tryRemoveScreenShare()

The tryShowScreenshare() method  tries to call out to our tile.ts utility file, which will handle actually creating the relevant DOM elements to display the screen video in the application:

  tryShowScreenShare() {
    if (!this.screenTrack) return;

    showScreenShare(this.id, this.userName, this.screenTrack);
    // Someone in a relevant zone started screen sharing that is NOT the local user,
    // disable the screen share button for the local user
    if (this.toggleScreenControls && this.currentAction === Action.InZone) {
      enableScreenBtn(false);
    }
  }

The tryRemoveScreenShare() method calls out to the same utility file to remove the relevant screen video DOM elements for a user, if they exist:

  tryRemoveScreenShare() {
    removeScreenShare(this.id);
    // Someone in a relevant zone stopped screen sharing that is NOT the local user,
    // so enable the screen share button for the local user.
    if (this.toggleScreenControls && this.currentAction === Action.InZone) {
      enableScreenBtn(true);
    }
  }

Both of the above methods also toggle the local user's screen share button's usability based on who they are and if they are currently in the user's zone. The toggleScreenControls boolean will be true if the user is not a local user. We set this when constructing the UserMedia instance.

This toggle ensures that when a remote user in the local user's focus zone turns their screen share on, the local user won't  be able to start another screen share (since we allow one per focus zone). Likewise, when a remote user in the local user's focus zone turns their screen sharing off, the local user's screen share controls will be re-enabled.

💡 Can you think of some flaws with this approach and how you might make this more robust? Hint: When a user joins a meeting, it can take a couple of seconds to get the data of other participants and reflect their status in the world. How could this affect our screen share logic?

If newTrack is truthy, we call out to our tile.ts utility to show the screen share tile. Otherwise, we make a call to remove the screen share tile, if one exists.

showScreenShare() and removeScreenShare() reuse most of the logic from our previous functions to show a zonemate's camera track, the only difference being that screen share tiles are larger than camera tiles. We've refactored this code a bit to allow for reuse of the camera tile logic and minimize duplication as you can see in the diff.

To enable this kind of code reuse, we've introduced a new enum called zonemateTileKind:

enum zonemateTileKind {
  Screen = "screen",
  Camera = "camera",
}

The public showCamera() and showScreen() functions use this enum to call a common showZonemate() private function:

export function showCamera(
  sessionID: string,
  name: string,c
  videoTrack?: MediaStreamTrack,
  audioTrack?: MediaStreamTrack
) {
  const tracks: Array<MediaStreamTrack> = [];
  if (videoTrack) tracks.push(videoTrack);
  if (audioTrack) tracks.push(audioTrack);
  showZonemate(zonemateTileKind.Camera, sessionID, name, tracks);
}

export function showScreenShare(
  sessionID: string,
  name: string,
  videoTrack?: MediaStreamTrack
) {
  const tracks: Array<MediaStreamTrack> = [];
  if (videoTrack) tracks.push(videoTrack);
  showZonemate(zonemateTileKind.Screen, sessionID, name, tracks);
}

showZonemate() then creates either kind of video tile for us in much the same way it did before, except using the zonemateTileKind enum for any screen or camera-specific logic:

function showZonemate(
  kind: zonemateTileKind,
  sessionID: string,
  name: string,
  tracks: MediaStreamTrack[]
) {
  let tileID: string;
  let videoTagID: string;
  if (kind === zonemateTileKind.Camera) {
    tileID = getCameraTileID(sessionID);
    videoTagID = getCameraVidID(sessionID);
  } else if (kind === zonemateTileKind.Screen) {
    tileID = getScreenShareTileID(sessionID);
    videoTagID = getScreenShareVidID(sessionID);
  } else {
    console.error(getErrUnrecognizedTileKind(kind));
    return;
  }
  let zonemate = <HTMLDivElement>document.getElementById(tileID);
  if (!zonemate) {
    zonemate = createZonemateTile(kind, sessionID, name);
  }

  if (tracks.length === 0) {
    return;
  }

  const vid = <HTMLVideoElement>document.getElementById(videoTagID);
  vid.srcObject = new MediaStream(tracks);
}

And that's it! Our screen share zonemate tile creation now follows almost the same code path as our camera zonemate tile creation.

With that, our new screen sharing feature is complete.

Screen sharing a Cat TV video of birds
Screen sharing example

Conclusion

In this post, we went over how to add a new screen sharing feature to our original spatialization demo. We looked at:

  • How to tell if a user's browser supports screen sharing with Daily's supportedBrowser() method.
  • How to tell if the Daily room the user has joined has screen sharing enabled with the call object room() method.
  • How to manually retrieve the user's screen video and pass it to Daily with startScreenShare()
  • How to stop Daily-managed screen share tracks with stopScreenShare() and manually managed tracks with MediaStreamTrack.stop()

Please don't hesitate to reach out if you have any questions about screen sharing with Daily.

In the next (and final) post of our spatialization series, we'll go through some approaches we took to testing our Daily demo app.

Jeff Goldblum saying "I'll see you there"