Here at Daily, we’re excited at the prospect of all the ways that AI can enhance video calls on the web, but one thing that’s become painfully clear: building production-ready AI apps that handle video and audio is no small feat. It can sometimes take months to get these applications and their architecture fully operational. That’s why we’ve been stoked to work with Sieve, the ultimate video and audio AI cloud service.
What is Sieve, and why should I use it?
There are hundreds of ways to utilize AI to manipulate video and audio data, but what if you had access to a whole library of AI functions and models running in the cloud? This is exactly what Sieve offers.
Users can choose from dozens of apps and pre-deployed models, such as audio enhancement, video dubbing (with lip syncing!), and transcript summarization.
With Sieve, the possibilities are endless.
Using Sieve with Daily video recordings
We built a demo featuring three examples of using Sieve functions to process Daily video recordings with incredible results, demonstrating just some of the possibilities Sieve opens up for Daily users.
The functions we selected were as follows:
- audio_enhancement – Remove background noise and improve audio quality from a recording
- text_to_video_lipsync – Accepts a video and a piece of text and then alters the recording to look like the speaker is saying the words provided.
- Video Dubbing – Translate and dub over the original video to make it look like the subject is speaking a different language. This function actually consists of four distinct Sieve functions:
To see these demos in action, be sure to watch the companion video to this blog post:
How to use Sieve functions
Applying any number of Sieve functions to your video or audio data always follows the same basic workflow.
- Upload your video or audio to Sieve
- Fetch the Sieve function of your choice
- Run the Sieve function
For example, in the case of using the audio_enhancement
function:
import sieve
# Step 1: Upload your video/audio to Sieve
audio = sieve.Audio(url="https://storage.googleapis.com/sieve-prod-us-central1-public-file-upload-bucket/79543930-5a71-45d9-b690-77f4f0b2bfaa/1a704dda-d8be-4ae1-9894-b4ee63c69567-input-audio.mp3")
# Step 2: Fetch the Sieve function of your choice:
audio_enhancement = sieve.function.get("sieve/audio_enhancement")
# Step 3: Run the Sieve function (and capture the output)
filter_type = "all"
enhance_speed_boost = False
enhancement_steps = 50
output = audio_enhancement.run(audio, filter_type, enhance_speed_boost, enhancement_steps)
And that’s it! Any requirements for the input that each function accepts are clearly listed in their respective README on Sieve’s website.
Conclusion
The sky is the limit with Sieve’s AI infrastructure at your fingertips, and here at Daily, we are super excited about the possibilities this opens up for building AI-powered workflows for recorded voice and video data.
To learn more about Sieve, check out their documentation, and you can find the codebase for this demo on GitHub.