This post is part two of a series on how to build an app with spatialization features using Daily's real time video and audio APIs.

Introduction

In part one of our Daily spatialization tutorial series, we covered the main structure of our demo and went over the most relevant parts of the APIs we'll be using.

To recap, we are building an application in which users traverse a 2D world. We will manipulate users' audio and video tracks based on their proximity to each other, such as fading in audio and video as users approach our local participant. This series is a technical walkthrough of one way to implement such spatial audio and video features with Daily.

As we mentioned in part one, we’ll be using TypeScript for this demo. But the Daily API usage we’ll cover here is entirely applicable to vanilla JavaScript as well. We recommend checking out part one to see the full tech stack and instructions on running the demo.

Now, let's go through the most important part of the demo: the video call itself.

The application entry point

Our entry point to the application is index.ts. Its only job in relation to our Daily call is to register the entry form listener and join the room once the form is submitted:

registerJoinFormListener(initCall);

export function initCall(name: string, url: string) {
  const globalRoom = new Room(url, name, true);
  globalRoom.join();
}

registerJoinFormListener() is an exported function in our navigation utilities (util/nav.ts). It takes a function which it'll call with the name and URL the user submits as part of the entry form:

const joinForm = document.getElementById("enterCall");

export function registerJoinFormListener(f: (name: string, url: string) => void) {
  joinForm.addEventListener("submit", (event) => {
    event.preventDefault();
    joinForm.style.display = "none";
    const nameEle = <HTMLInputElement>document.getElementById("userName");
    const urlEle = <HTMLInputElement>document.getElementById("roomURL");
    f(nameEle.value, urlEle.value);
  });
}

Now, let's move on to the fun stuff: the room itself.

Constructing our Room

The room contains a Daily call object and is responsible for all communication with Daily.

The Room class contains a few member variables. We've left inline comments below explaining the purpose of each one:

export class Room {
  // The URL of the Daily room we'll be joining.
  url: string;
  // The username filled into the entry form.
  userName: string;
  // Whether this is a global room (i.e., our main room that controls the world)
  isGlobal: boolean;
  // The Daily call object.
  callObject: DailyCall;
  // Acknowledgements of data receipt which we're waiting for.
  pendingAcks: { [key: string]: ReturnType<typeof setInterval> } = {};
  // Bandwidth level of the local participant.
  localBandwidthLevel = BandwidthLevel.Unknown;
  // Local audio and video states.
  localState: State = { audio: null, video: null };
  // Current call topology: SFU or Peer to Peer.
  topology: Topology;

  // ...the rest of the class...
}

Now that we're familiar with the fields the room holds, let's see how it is constructed.

constructor(url: string, userName: string, isGlobal = false) {
  this.url = url;
  this.userName = userName;
  this.isGlobal = isGlobal;

  // The rest of the constructor
}

Above, we start by setting our URL, username, and whether the room is intended to be the single global room (the one to rule them all). Currently, the global room is the only room we instantiate in the demo. While not in scope of the current implementation of the demo, multiple Room instances could be used at a later point in a breakout-room approach. In this case, we'd still want just one global room to handle the state of our single World for us.

Creating the Daily call object

Next, we create our call object:

 this.callObject = DailyIframe.createCallObject({
    subscribeToTracksAutomatically: false,
    dailyConfig: {
      experimentalChromeVideoMuteLightOff: true,
      camSimulcastEncodings: [{ maxBitrate: 600000, maxFramerate: 30 }],
    },
  })
  .on("camera-error", (e) => this.handleCameraError(e))
  .on("joined-meeting", (e) => this.handleJoinedMeeting(e))
  .on("left-meeting", (e) => this.handleLeftMeeting(e))
  .on("error", (e) => this.handleError(e))
  .on("participant-updated", (e) => this.handleParticipantUpdated(e))
  .on("participant-joined", (e) => this.handleParticipantJoined(e))
  .on("participant-left", (e) => this.handleParticipantLeft(e))
  .on("app-message", (e) => this.handleAppMessage(e))
  .on("network-connection", (e) => this.handleNetworkConnectionChanged(e));

  this.setBandwidth(BandwidthLevel.Tile);

In the call object configuration, we set subscribeToTracksAutomatically to false. This turns off Daily’s default track management and turns on manual track subscriptions.  This ensures that participants are not subscribed to all other participant tracks as soon as they join the call. For performance reasons, we'll want to subscribe to other participants' tracks only when we get close enough to them, or join the same focus zone as them.

We also define a single simulcast layer:

camSimulcastEncodings: [{ maxBitrate: 600000, maxFramerate: 30 }]

And bear with us, it’ll take a minute to explain why.

Why a single simulcast layer?

In our demo, the client only renders other participants who are in the same zone or area as them. Depending on the location, we either render the videos at 200x200px or 94x94px. Since we know this constraint, we limit our video locally to only ever send these resolutions. To do this, we use setBandwidth() to change the constraints of the outgoing video.

Note that setBandwidth() sets the constraints for the sending track, which in turn affects the highest simulcast layer. This means that these constraints will be the best possible resolution/framerate that other participants receive.

By default, there are three simulcast layers. Each subsequent layer cuts the resolution in half. This is very useful so that browsers can accommodate potential network or CPU issues on the receiver’s end. In our spatialization demo, the highest layer is already a low resolution, so lower simulcast layers would degrade the video too much. We overcome this by enforcing the sender to only send a single layer using the camSimulcastEncodings config. By setting a single layer, we ensure all users get that one and not something lower res.

Be careful with restricting simulcast layers via camSimulcastEncodings! This approach is only recommended in use cases where the receiving end only renders the video at small sizes. Otherwise, you will encounter issues with high bandwidth requirements and disconnects.

The above is followed by registering our handlers for relevant Daily call object events and, finally, calling setBandwidth() for the first time. This will set the local participant's camera constraints to what is the most relevant for their default state in our world. We'll go through the setBandwidth() method shortly!

Hooking up call controls

As the final step in our constructor, we register our call control listeners:

  registerCamBtnListener(() => {
    const current = this.callObject.participants().local.video;
    this.callObject.setLocalVideo(!current);
  });
  
  registerMicBtnListener(() => {
    const current = this.callObject.participants().local.audio;
    this.callObject.setLocalAudio(!current);
  });
  
  registerLeaveBtnListener(() => {
    this.resetPendingAcks();
    this.callObject.leave();
    this.callObject.destroy();
    world.destroy();
    world = new World();
    showJoinForm();
  });

Above, we set up the camera and microphone controls to toggle each input respectively. When the user clicks the "Leave" button, we clear any pending acknowledgements that are still active, destroy the call and the world, and take them back to the join form.

Clicking "Start demo" to join the world, clicking "Leave" to go back to the entry form.
Joining and leaving our 2D world.

Now, let's take a look at two very important methods on the Room class: join(), and setBandwidth().

join()

async join() {
  try {
    await this.callObject.join({ url: this.url, userName: this.userName });
  } catch (e) {
    console.error(e);
    showJoinForm();
  }
}

When the user submits the entry form, we call the Daily join() method on our pre-created call object, passing in the URL and name that the user provided. We catch any errors and take the user back to the entry form if the process fails.

setBandwidth()

Our setBandwidth() method is used to set camera constraints for the local user. We don't need to send a full resolution video track to other participants for this particular demo because of the tile size. Most of the time, users will be displayed as small tiles in the 2D space. When they join a seated zone, they will be displayed in slightly larger, draggable focus tiles overlaying the world.

So to avoid sending larger tracks than necessary, we will use Daily's own setBandwidth() method to constrain the track to just what we need: not too big, not too small.

Note: Please mind the usage warnings in the setBandwidth() API documentation above. It can be a useful tool when the design of your application calls for it, but for most applications it’s wise to leave the work of bandwidth management to Daily.
 private setBandwidth(level: BandwidthLevel) {
   switch (level) {
   case BandwidthLevel.Tile:
     console.log("setting bandwidth to tile");
     this.localBandwidthLevel = level;
     this.callObject.setBandwidth({
       trackConstraints: {
         width: standardTileSize,
         height: standardTileSize,
         frameRate: 15,
       },
     });
     break;
   case BandwidthLevel.Focus:
     console.log("setting bandwidth to focus");
     this.localBandwidthLevel = level;
     this.callObject.setBandwidth({
       trackConstraints: {
         width: 200,
         height: 200,
         frameRate: 30,
       },
	 });
     break;
   default:
     console.warn(
     `setBandwidth called with an unrecognized level (${level}). Not modifying any constraints.`
     );
   }
 }

As you can see above, we have two valid BandwidthLevel options: Tile and Focus. BandwidthLevel.Tile is used when the participant is traversing the main world:

Video call participant tile moving through a 2D space.
Traversing our 2D world.

BandwidthLevel.Focus is used when the participant is seated at a desk or broadcast spot:

Video call participant tile moving onto a focus spot, where a larger video tile appears.
Moving to a focus zone.

Note that our world will know nothing about these constraints. We've decoupled the Daily API/room calls and the 2D world logic as much as possible. Despite setting a specific resolution in our constraints above, the world should be able to handle any size of video track regardless of the constraints. This is why if we get an unrecognized requested level in the switch statement above, all we do is log a warning. The constraints will remain at their defaults. Keeping default constraints might not be ideal for performance, but the actual functionality of the application should still work.

Joining the call

When the local participant joins a call, Daily will emit a "joined-meeting" event. At this point, our handleJoinedMeeting() method is called. This method is where we'll finish the configuration of our world and start the world update loop. Let's go through it now. It is quite long, so we've commented on what each part does inline, and we’ll describe the methods used in more detail below.

private handleJoinedMeeting(event: DailyEventObjectParticipants) {
    // The following world setup is only relevant for the global
    // room, since we can only have one world and one global room.
    if (!this.isGlobal) return;

    // Get the local participant
    const p = event.participants["local"];

    // Retrieve the video and audio tracks of this participant
    const tracks = this.getParticipantTracks(p);

    // The function World will use to instruct the room to
    // subscribe to another user's track.
    const subToTracks = (sessionID: string) => {
      this.subToUserTracks(sessionID);
    };

    // The function World will use to instruct the room to
    // unsubscribe from another user's track.
    const unsubFromTracks = (sessionID: string) => {
      this.unsubFromUserTracks(sessionID);
    };

    // The function World will call when the local user moves.
    // This will broadcast their new position to other participants.
    // The “*” default will send the data to everyone.
    const onMove = (pos: Pos, recipient: string = "*") => {
      const data = {
        action: "posChange",
        pos: pos,
      };
      this.broadcast(data, recipient);
    };

    // The function World will call when the local user changes zone.
    // This will update their bandwidth and broadcast their new zone
    // to other participants.
    const onJoinZone = (zoneData: ZoneData, recipient: string = "*") => {
      if (zoneData.zoneID === globalZoneID) {
        this.setBandwidth(BandwidthLevel.Tile);
      } else {
        this.setBandwidth(BandwidthLevel.Focus);
      }
      const data = {
        action: "zoneChange",
        zoneData: zoneData,
      };
      this.broadcast(data, recipient);
    };

    // The function World will call to send a full data dump (position and zone)
    // to another participant. Happens when a new user first joins.
    const onDataDump = (zoneData: ZoneData, posData: Pos, recipient: "*") => {
      const data = {
        action: "dump",
        pos: posData,
        zoneData: zoneData,
      };
      this.broadcast(data, recipient);
    };

    // Display the world DOM elements.
    showWorld();

    // Configure the world with the callbacks we defined above
    world.subToTracks = subToTracks;
    world.unsubFromTracks = unsubFromTracks;
    world.onMove = onMove;
    world.onJoinZone = onJoinZone;
    world.onDataDump = onDataDump;

    // Start the world (begins update loop)
    world.start();

    // Create and initialize the local user.
    world.initLocalUser(p.session_id, tracks.video);
  }

Now that we know what happens when users join a Daily call, let's dig deeper into some of the methods referenced above.

Track subscription/unsubscription

As we saw above, the room provides the world with a callback it can use to instruct us to subscribe to/unsubscribe from other participants' tracks. This is done by passing the session ID of the relevant remote participant and then using Daily's updateParticipant() call object instance method:

  private subToUserTracks(sessionID: string) {
    this.callObject.updateParticipant(sessionID, {
      setSubscribedTracks: { audio: true, video: true, screenVideo: false },
    });
  }

  private unsubFromUserTracks(sessionID: string) {
    // Unsubscriptions are not supported in peer-to-peer mode. Attempting
    // to unsubscribe in P2P mode will silently fail, so let's not even try.
    // P2P vs SFU: https://docs.daily.co/guides/how-daily-works/intro-to-video-arch#the-architecture-of-a-room-p2p-vs-sfu-calls
    if (this.topology !== Topology.SFU) return;

    this.callObject.updateParticipant(sessionID, {
      setSubscribedTracks: { audio: false, video: false, screenVideo: false },
    });
  }

We use the setSubscribedTracks object to subscribe to video and audio in case of subscription, and later unsubscribe from the same. We don't support screen sharing in this demo, so screenVideo will always remain false.

Once the updateParticipant() call is made, we expect a "participant-updated" event to be propagated by Daily. At that time, we're ready to retrieve our new tracks in our handler for that event:

  private handleParticipantUpdated(event: DailyEventObjectParticipant) {
    const p = event.participant;
    const tracks = this.getParticipantTracks(p);
    world.updateUser(p.session_id, p.user_name, tracks.video, tracks.audio);
    if (p.session_id === this.callObject.participants()?.local?.session_id) {
      this.updateLocal(p);
    }
  }

Above, we first retrieve the participant which was updated. Then, we get their tracks via getParticipantTracks():

  private getParticipantTracks(participant: DailyParticipant) {
    const tracks = participant?.tracks;
    if (!tracks) return { video: null, audio: null };

    const vt = <{ [key: string]: any }>tracks.video;
    const at = <{ [key: string]: any }>tracks.audio;

    const videoTrack =
      vt?.state === playableState ? vt["persistentTrack"] : null;
    const audioTrack =
      at?.state === playableState ? at["persistentTrack"] : null;
    return {
      video: videoTrack,
      audio: audioTrack,
    };
  }

When retrieving the tracks, we return the user's video and audio tracks if they are playable. If they don't exist or are not playable, we return null for each.

Please note that returning a null track does not necessarily mean we'll be unsetting the entire track in the world! We'll cover how track swapping is handled on the world end in a future part of this tutorial series.

Next, we call the world's updateUser() method with the participant's session ID, name, and tracks.

Note that the participant's username is not included in the payload we get from the "participant-joined" event! The "participant-updated" event is sent with the username, which is why we give it to the world at this point as well.

Tangentially, if the updated participant is our local participant, we call updateLocal(). This method updates the local call controls to reflect the state of the user's camera and microphone:

  private updateLocal(p: DailyParticipant) {
    if (this.localState.audio != p.audio) {
      this.localState.audio = p.audio;
      updateMicBtn(this.localState.audio);
    }
    if (this.localState.video != p.video) {
      this.localState.video = p.video;
      updateCamBtn(this.localState.video);
    }
  }

Broadcasting our presence

You might notice that the onMove(), onJoinZone(), and onDataDump() callbacks we define all make use of the room's broadcast() method. This is the key to all of our spatialization features, and it only consists of one line:

  broadcast(data: BroadcastData, recipientSessionID = "*") {
    this.callObject.sendAppMessage(data, recipientSessionID);
  }

sendAppMessage() is where all the magic happens. It sends an "app-message" event to all specified recipients. A wildcard recipient means that the message will be sent to all participants by default. As we'll see later, in some cases we'll want to broadcast to a specific user via their Daily session ID.

We'll go through how we actually handle the message data shortly. But first, we need to take a quick look at what happens when the call is joined by a remote participant. This will be relevant when we go through the handling of our "dump" message type.

Handling new participants

When a remote participant joins the call, Daily emits a "participant-joined" event, which is handled as follows:

  private handleParticipantJoined(event: DailyEventObjectParticipant) {
    const sID = event.participant.session_id;
    if (this.isRobot(event.participant.user_name)) {
      world.createRobot(sID);
      return;
    }
    world.initRemoteParticpant(sID, event.participant.user_name);

    this.pendingAcks[sID] = setInterval(() => {
      if (!this.callObject.participants()[sID]) {
        this.clearPendingAck(sID);
        return;
      }
      world.sendDataDumpToParticipant(sID);
    }, 1000);
  }

We first retrieve the session ID of the new participant. Next, we figure out if the participant is a robot or not. We won't dig into robots in this post - suffice to say we used them to test out our spatialization demo locally. You can read more about using headless robots with Daily here.

One thing to remember is that the order of "participant-joined" and "joined-meeting" events is not guaranteed. For this reason, there might be a situation where we're sending data to a participant who's just joined the meeting before they are fully ready to handle the data we send.

This is why in the method above, we set up an interval to not just send the data dump to the new participant once, but send it repeatedly until they acknowledge receipt. We'll check out the acknowledgement process shortly.

Handling incoming data from other participants

Now that we've covered pending acknowledgements and handling new participants, let's go through what happens after an "app-message"  event is received. This is where our handleAppMessage() method comes in:

  private handleAppMessage(event: DailyEventObjectAppMessage) {
    const data = <BroadcastData>event.data;
    const msgType = data.action;
    switch (msgType) {
      case "posChange":
        world.updateParticipantPos(event.fromId, data.pos.x, data.pos.y);
        break;
      case "zoneChange":
        world.updateParticipantZone(
          event.fromId,
          data.zoneData.zoneID,
          data.zoneData.spotID
        );
        break;
      case "dump":
        this.broadcast(
          { action: "ack" },
          event.fromId
        );

        world.updateParticipantZone(
          event.fromId,
          data.zoneData.zoneID,
          data.zoneData.spotID
        );
        if (data.zoneData.zoneID === globalZoneID) {
          world.updateParticipantPos(event.fromId, data.pos.x, data.pos.y);
        }
        break;
      case "ack":
        console.log(`Received acknowledgement from ${event.fromId}`);
        const pendingAck = this.pendingAcks[event.fromId];
        if (pendingAck) {
          this.clearPendingAck(event.fromId);
          world.sendDataDumpToParticipant(event.fromId);
        }
        break;
    }
  }

How we handle an "app-message" event depends entirely on the message type the sender specified.

If the event is a position change, we will call the world's public updateParticipantPos() method and provide the sender's session ID along with their new position. If the event is a zone change, we'll do the same and handle it with the world's updateParticipantZone() event. We'll start digging into the details of the world methods in our next post!

The data dump message is a little bit different. This is where the acknowledgement comes in. If the message is a data dump, we emit an "ack" event to the sender with no other data, and then proceed to update the sender's zone and position in our local world.

If the message type is "ack" (i.e. acknowledge), we clear the pending acknowledgement for that user.

Removing remote participants

When a remote participant leaves the call, Daily emits a "participant-left" event, which we handle as follows:

  private handleParticipantLeft(event: DailyEventObjectParticipant) {
    const up = event.participant;
    this.clearPendingAck(up.session_id);
    world.removeUser(up.session_id);
  }

First, we clear any pending acknowledgement that we might still have for the leaving user. Then, we remove the user from the world.

Toy story woody saying "So long, partner"

Conclusion

In this post, we went through the Daily API functionality we'll be using to enable making use of spatial audio and video features in our 2D world. Please don't hesitate to reach out if you have any questions.

In the next part of our spatialization tutorial, we'll go through running a 2D world for our video call participants to traverse!

Dana Scully saying "This is just the beginning"