This post is part two of a three-part series on how to build a custom Daily video call overlay app in Electron using daily-js.

Introduction

In part one of our Electron call overlay tutorial, we went through our Electron application structure and set up our main process and preload scripts.

Next, we'll go through the most important parts: using Daily's call object to join an actual video call.

Getting started

If you haven't yet read part one of this tutorial, we encourage you to do so.

To clone and run Daily’s call overlay Electron demo, run the following commands in your terminal:

git clone git@github.com:daily-demos/electron-overlay.git
npm i && npm start

Setting up our tray window

Now that our Electron main process and preload scripts are ready, it's time to set up our renderer processes for the tray and call windows.

tray.html: the tray window renderer process entry point

The first thing a user will see when they start our Daily call overlay Electron application will be a call join form:

      <div id="entry">
        <h2>Join a video call</h2>
        <form id="enterCall">
          <span class="formContainer">
            <label>Your Name:</label>
            <input type="text" id="userName" required />
          </span>
          <span class="formContainer">
            <label
              >Room URL (<a
                href="https://dashboard.daily.co/rooms/create"
                target="_blank"
                >Create room</a
              >):
            </label>
            <input
              type="url"
              id="roomURL"
              placeholder="https://<domain>.daily.co/<room>"
              required
            />
          </span>
          <button type="submit">Join Call</button>
        </form>
      </div>

Once they've joined a call, this form will be hidden in favor of a button to copy the room URL to the clipboard:

 <div id="inCall">
    <h2>Invite others to call</h2>
    <p>Copy and share URL with others to invite them to this call</p>
    <button id="clipboard">Copy to clipboard</button>
 </div>

All of this is handled by tray/tray.js:

    <script src="./renderer/tray/tray.js" type="module"></script>

tray/tray.js: where the tray magic happens

Our tray.js logic is actually quite short. The first thing we do is get the join form and set up an event listener for the submit event:

const joinForm = document.getElementById("enterCall");

joinForm.addEventListener("submit", (event) => {
  event.preventDefault();
  const urlEle = document.getElementById("roomURL");
  const nameEle = document.getElementById("userName");
  api.joinCall(urlEle.value, nameEle.value);
  setupInCallView(urlEle.value);
});

Above, we retrieve the room URL and user name specified in the join form. We then call the joinCall() API method we defined in the tray's preload script above. Finally, we call setupInCallView() to go ahead and hide the join form while the user joins the call.

function setupInCallView(callURL) {
  const entry = document.getElementById("entry");
  const inCall = document.getElementById("inCall");
  entry.style.display = "none";
  inCall.style.display = "block";
  const wrapper = document.getElementById("wrapper");
  wrapper.classList.remove("entry");
  wrapper.classList.add("inCall");

  const copyButton = document.getElementById("clipboard");
  copyButton.onclick = () => {
    navigator.clipboard.writeText(callURL).catch((err) => {
      const msg = "failed to copy room URL to clipboard";
      console.error(msg, err);
      alert(msg);
    });
  };
}

In setupInCallView(), we first hide the entry form and show the inCall div. We also set the onclick attribute of our clipboard button. When clicked, the URL of the call the user is joining will be copied to their clipboard. If the copy action fails, an error popup is to be shown.

Next, tray.js sets up a couple of window event listeners for call states:

window.addEventListener("join-failure", () => {
  resetTray();
});

window.addEventListener("left-call", () => {
  resetTray();
});

The above events originate in the main process and are sent to the tray preload, which then sends them on to our tray window renderer process to be handled above. In case joining a call fails or a user leaves a call, the tray window is reset to show the join form once more:

function resetTray() {
  const entry = document.getElementById("entry");
  const inCall = document.getElementById("inCall");
  entry.style.display = "block";
  inCall.style.display = "none";
  const wrapper = document.getElementById("wrapper");
  wrapper.classList.remove("inCall");
  wrapper.classList.add("entry");
}

We're done with setting up the tray window! Let's move on to the most important part: the video call.

index.html: our call window renderer process entry point

As we saw above, our video call overlay app runs in full screen mode and always remains in the foreground of the user’s desktop when they are in a call.

Most of the magic happens inside of a full-screen wrapper, nested within the body in index.html:

<body>
  <div id="wrapper">
    <!-- contents here -->
  </div>
</body>

In style.css, which we import from the index.html header, we can see that this wrapper takes up 100% of the window:

#wrapper {
  height: 100%;
  width: 100%;
}

The first thing we put into our wrapper is our call controls element. The callControls div is defined as follows:

      <div id="callControls" class="clickable" draggable="true">
        <div id="controller">
          <div id="dots"></div>
          <div id="line"></div>
        </div>
        <button id="toggleCam"></button>
        <button id="toggleMic"></button>
        <button id="clipboard" class="invite"></button>
        <span  id="clipboardTooltip">Room URL copied</span>
        <button id="leave">Leave Call</button>
      </div>

Note the special clickable class we assign to the callControls div: that’s the class name we used in our call window preload to iterate over clickable elements when setting up our mouse event logic.

We also set draggable to true, to allow the navigation div and all of its contents to be draggable anywhere on the screen (we’ll go through how we set up the relevant drag/drop handlers in the next part of this tutorial).

Participant tiles

Next on our tour of index.html are our participant tiles: the elements in which we’ll display the participant’s camera track or just their name if their camera is off.

      <div id="tiles"></div>

Participant tiles will consist of a user's name (or "You" in case of a local participant), a video tag, and an audio tag. All of these will be generated as participants join the call.

Importing daily-js

Finally, we import daily-js via a script tag:

<script src="https://unpkg.com/@daily-co/daily-js"></script>

To make sure daily-js can properly communicate with the Daily servers during a call, we need to make some allowances in the CSP:

<meta
  http-equiv="Content-Security-Policy"
  content="default-src 'self'; frame-src 'self' https://*.daily.co; script-src 'self' https://unpkg.com/@daily-co/daily-js 'unsafe-eval'; connect-src https://*.daily.co https://*.pluot.blue wss:; worker-src 'self' blob:;"
/>

Most of this is pretty straightforward: we allow the app to pull in and use resources from the Daily domain, as well as the pluot.blue domain (which is used to do some internal shenanigans related to setting up STUN/TURN services).

And then there’s the unsafe-eval. Currently, daily-js fetches a call object bundle from Daily servers and runs the fetched code on demand as needed. Unfortunately, this involves an eval. We know this is not ideal, and our goal is to remove this requirement in the future. For now, for the purposes of this demo, we’ll allow unsafe-eval in the CSP.

Joining a Daily video call from our Electron app

When we import daily.js from index.html, one of the first things it does is set up listeners for our navigation and call-control-related events.

To keep our navigation elements separate from our Daily call logic, we’ve created a nav.js module for daily.js to import, which exports some functions to register relevant listeners for nav elements.

daily.js calls registerJoinFormListener(initAndJoin); from nav.js, which takes a function and adds an event listener for the join-call event we'll get from the preload when the user submits the join form from the tray window:

export function registerJoinListener(f) {
  window.addEventListener("join-call", (e) => {
    const url = e.detail.url;
    const name = e.detail.name;
    f(url, name)
      .then((joined) => {
        api.callJoinUpdate(joined);
        if (joined) {
          updateClipboardBtnClick(url);
        }
      })
      .catch(() => api.callJoinUpdate(false));
  });
}

Above, the handler first and foremost calls the function passed to it from the caller (initAndJoin() in this case). It then handles communicating the call status back to the main process via the callJoinUpdate() method we defined in the call window preload.

initAndJoin() creates a Daily call object, and joins the call:

async function initAndJoin(roomURL, name) {
  callObject = DailyIframe.createCallObject({
    dailyConfig: {
      // Completely stop the camera track when a user disables their camera (instead of keeping the camera connection alive)
      experimentalChromeVideoMuteLightOff: true,
    },
  })
    .on("camera-error", handleCameraError)
    .on("joined-meeting", handleJoinedMeeting)
    .on("left-meeting", handleLeftMeeting)
    .on("error", handleError)
    .on("participant-updated", handleParticipantUpdated)
    .on("participant-joined", handleParticipantJoined)
    .on("participant-left", handleParticipantLeft)
    .on("active-speaker-change", handleActiveSpeakerChange);

  return callObject
    .join({ url: roomURL, userName: name })
    .then(() => {
      return true;
    })
    .catch((err) => {
      alert(err);
      return false;
    });
}

The function returns a boolean (and shows a popup with any errors we may run into during the join call).

That’s it! We are now in a call. So…now what?

Displaying local call controls

When the call is joined, handleJoinedMeeting(), which we registered when we created the Daily call object earlier, updates our call controls. It then updates some UI elements based on the state of the local participant. updateCallControls() is an exported function in nav.js.

function  handleJoinedMeeting(event)  {
  updateCallControls(true);
  const p = event.participants.local;
  updateLocal(p);
}

updateCallControls() shows the call controls if the user is in a call, or alternatively the entry form if the user is not in a call.

export function updateCallControls(inCall) {
  const controls = document.getElementById("callControls");
  // If the user has joined a call, remove the call entry form
  // and display the call controls. Otherwise, do the opposite.
  if (inCall) {
    controls.classList.add("controls-on");
    return;
  }
  controls.classList.remove("controls-on");
}

updateLocal() makes sure that the user’s controls and tile match their media track state.

function updateLocal(p) {
  if (localState.audio != p.audio) {
    localState.audio = p.audio;
    updateMicBtn(localState.audio);
  }
  if (localState.video != p.video) {
    localState.video = p.video;
    updateCamBtn(localState.video);
  }
  const tracks = getParticipantTracks(p);
  addOrUpdateTile(p.session_id, "You", tracks.video, tracks.audio, true);
}

updateLocal() is also called when the local participant is updated, such as when they toggle their camera or mic.

updateCamBtn() and updateMicBtn() change the icons shown on those buttons to correspond with whether the camera and microphone are enabled or not:

export function updateCamBtn(camOn) {
  if (camOn && !toggleCamBtn.classList.contains("cam-on")) {
    toggleCamBtn.classList.remove("cam-off");
    toggleCamBtn.classList.add("cam-on");
  }
  if (!camOn && !toggleCamBtn.classList.contains("cam-off")) {
    toggleCamBtn.classList.remove("cam-on");
    toggleCamBtn.classList.add("cam-off");
  }
}

export function updateMicBtn(micOn) {
  if (micOn && !toggleMicBtn.classList.contains("mic-on")) {
    toggleMicBtn.classList.remove("mic-off");
    toggleMicBtn.classList.add("mic-on");
  }
  if (!micOn && !toggleMicBtn.classList.contains("mic-off")) {
    toggleMicBtn.classList.remove("mic-on");
    toggleMicBtn.classList.add("mic-off");
  }
}

The localState object we are referencing in updateLocal() is defined at the top of the daily.js module along with the callObject:

let callObject =  null;
let localState =  {
  audio:  false,
  video:  false,
};

We also call addOrUpdateTile(), an exported function in tile.js, to make sure the participant tile is up to date. We use it to display both the local and remote participants' tiles. We'll go through this function in more detail below!

Displaying participants

When a remote participant joins our video call, the participant-joined event is emitted and handled by our handleParticipantJoined() function in daily.js.

function  handleParticipantJoined(event)  {
  const up = event.participant;
  const tracks = getParticipantTracks(up);
  addOrUpdateTile(up.session_id, up.user_name, tracks.video, tracks.audio);
}

getParticipantTracks() returns an object of our persistent audio and video tracks if they are playable.

function getParticipantTracks(participant) {
  const vt = participant?.tracks.video;
  const at = participant?.tracks.audio;

  const videoTrack = vt.state === playableState ? vt.persistentTrack : null;
  const audioTrack = at.state === playableState ? at.persistentTrack : null;
  return {
    video: videoTrack,
    audio: audioTrack,
  };
}

As we mentioned above, addOrUpdateTile() is an exported function in our tile.js module. It adds a tile for a new participant, or updates a tile if it already exists. This same function is called when we get a participant-updated event for a remote participant, to handle them toggling their camera or microphone, or changing their name.

export function addOrUpdateTile(
  id,
  userName,
  videoTrack,
  audioTrack,
  isLocal = false
) {
  const videoTagID = getVideoID(id);
  let videoTag = null;

  const audioTagID = getAudioID(id);
  let audioTag = null;

  const participantID = getParticipantID(id);
  let participant = document.getElementById(participantID);

  // If the participant already exists, make sure the displayed name
  // is up to date and get their video tag.
  if (participant) {
    const nameTag = document.getElementById(getNameID(id));
    if (nameTag.innerText != userName) {
      nameTag.innerText = userName;
    }
    videoTag = document.getElementById(videoTagID);
    audioTag = document.getElementById(audioTagID);
  } else {
    // If the participant does not already exist, create their tile.
    const tags = addTile(id, userName);
    participant = tags.participant;
    videoTag = tags.video;
    audioTag = tags.audio;
    if (isLocal) {
      audioTag.volume = 0;
    }
  }

  // Stream the given tracks to the participant's video
  // and audio tags
  streamVideo(videoTag, videoTrack);
  streamAudio(audioTag, audioTrack);

  // Update the media-off icon class depending on whether
  // we have a stream
  const camOffDiv = participant.querySelector("#cam-off");
  setIconVisibility(camOffDiv, videoTag);
  camOffDiv.classList.add("clickable");
  camOffDiv.classList.add("draggable");
  setupDraggableElement(camOffDiv);

  const micOffDiv = participant.querySelector("#mic-off");
  setIconVisibility(micOffDiv, audioTag);
}

Hopefully the inline comments above are helpful to understand what the function is doing. It first tries to find an existing participant tile by the given ID. If that tile exists, we update the participant’s name if needed, in case they’ve changed it during the call. We then retrieve their existing video element. If the tile doesn’t exist, we create one with addTile().

We won’t go through the entire addTile() function here as it is pretty long and we’ve commented it throughout to explain what’s happening in it. It sets up the relevant DOM elements inside our participants div to add a new participant element.

Please check it out in the demo repository.

Finally, we stream the given video and audio tracks to the participant’s tile.

function streamVideo(tag, track) {
  if (track === null) {
    tag.srcObject = null;
    return;
  }
  if (track.id === getVideoTrackID(tag)) {
    return;
  }
  let stream = new MediaStream([track]);
  tag.srcObject = stream;
}

function streamAudio(tag, track) {
  if (track === null) {
    tag.srcObject = null;
    return;
  }
  if (track.id === getAudioTrackID(tag)) {
    return;
  }
  let stream = new MediaStream([track]);
  tag.srcObject = stream;
}

Above, if the given video or audio track is null, we set the tag's source object to null as well. Otherwise, we compare the given track's ID to that of the track in the tag's existing source object (if any). If the ID is different, we replace the tag's source with a new stream created from the new track.

💡 Consider how you’d handle screen sharing tracks in this setup. Would you stream a shared screen track to a participant tile, or does it make more sense to stream it to another, larger, element for better screen view?

Toggling the microphone and camera

When the local user clicks the Microphone or Camera buttons, toggleMicrophone() and toggleCamera() are called, respectively.

The only thing these functions do is call setLocalAudio() and setLocalVideo(), instance methods on the call object.

function  toggleMicrophone()  {
  callObject.setLocalAudio(!localState.audio);
}

function  toggleCamera()  {
  callObject.setLocalVideo(!localState.video);
}

Once the call object actually processes our call, the "participant-updated" event will be sent to us by Daily. We handle this event in handleParticipantUpdated() to ensure that we update the relevant controls and video tracks as needed.

function handleParticipantUpdated(event) {
  const up = event.participant;
  if (up.session_id === callObject.participants().local.session_id) {
    updateLocal(up);
    return;
  }
  const tracks = getParticipantTracks(up);
  addOrUpdateTile(up.session_id, up.user_name, tracks.video, tracks.audio);
}
Gif of camera and microphone being toggled in call control panel

Leaving the call

When a remote participant leaves the call

When a remote participant leaves the Daily video call, handleParticipantLeft() in daily.js gets that user’s session_id and uses it to remove their tile from the app.

function  handleParticipantLeft(event)  {
  const up = event.participant;
  removeTile(up.session_id);
}

removeTile simply calls remove() on that participant tile element:

export function removeTile(id)  {
  document.getElementById(getParticipantID(id))?.remove();
}

When the local participant leaves the call

If a local participant clicks the “Leave Call” button either in their on-screen call controls or in the system tray context menu, the Leave button listener we registered on script load will handle the event:

export function registerLeaveBtnListener(f) {
  const leaveBtn = document.getElementById("leave");
  const leave = () => {
    f();
    api.leftCall();
    updateClipboardBtnClick(null);
  };
  leaveBtn.addEventListener("click", leave);
  window.addEventListener("leave-call", leave);
}

The function we registered above is leave() in daily.js, so that will be called first:

async function leave() {
  callObject.leave();
  callObject.destroy();
  callObject = null;
}

Above, we call the call object leave() instance method, and then call destroy() to free up any resources.

Next, the handler calls the api.leftCall() method we defined in the preload script. This ensures that the leave event is handled properly in the main process and also propagated to our tray window process. The tray window will then display the entry form once more, so that the user can join another call.

Next steps

In this part of this Electron and Daily call object walkthrough, we set up our renderer processes, used daily-js to join a video call, and set up our call controls and participant tiles.

Next week, in part three, we'll implement the last piece of the puzzle: making our call controls and participant tiles draggable on the screen.

More resources