How to paint over a video with HTML

Edit on GitHub

I recently got to work on a project where I need to capture a camera, sendback some drawed feedback, exchange commands and chat messages (and voice comments), and record everything. I’ve always been interested on using non-mainstream features of the Web Platform, and after taking a look of the current state of the art, I’ve found a way to implement this particular use case using ONLY open and readily available web standards.

Sometimes, when you left out social networks and embrace back the Internet 90’s spirit, you find yourself rediscovering that the Web is still an amazing place :-)

Canvas recording

Canvas element has by default a transparent background, allowing it to be used as an overlay over other elements, for example a Video element. In addition to that, Canvas provides the captureStream method, that generates a MediaStream with all the Canvas changes and drawing. By combining both concepts, we can be able to “paint” over a video (or any other HTML element) and record the changes without modifying the original.

Send stream

This MediaStream can be send over a WebRTC connection to a Media Server, or to another WebRTC client, like a web browser. The main drawback is that videos with alpha channel are only available for VP8/ VP9 codecs and the upcoming AV1, that are available only with WebM (a subset of Matroska) containers, lefting MP4 and H264 out. Both VP8/VP9 codecs and WebM containers are supported by all main browsers since their support are mandatory by WebRTC specification, so there would not be any problem to codec and stream the videos, but since for them there’s no hardware aceleration available, for reproduction there could be some situations like Apple doesn’t support WebM outside a WebRTC context (as usual, Safari is the new IE6), and it would need to have the video in a different format.

Inline metadata

WebVTT is a web standard to add cue (subtitles) to the videos in the web that can also to be styled with CSS for example for each one of the participants in a conversation, and includes support for metadata cues that can be used to store information like GeoJSON info about the location of the streamer or operations that are being done. Drawback is that there’s no current Javascript APIs to add the WebVTT tracks in the browser in the MediaStream itself, although it has been considered as a future use case of WebRTC, so currently they can not be send inline from the source and would need to be out-of-band and added afterwards. The same happens with tracks names, that there’s no APIs to identify them inline from the source and would need to be mapped based on their unique ìds once the video is already generated. That same mechanism can be used to include additional metadata of the video itself, not only of its tracks.

Videos storage

For videos storage, due to usage of videos with an alpha channel only currently available option is to use WebM container, as already discussed. WebM has support for multiple video tracks, so it’s possible to store both the original video and the Canvas overlayed one in the same container, and also allow to add multiple WebVTT tracks to store the audio transcriptions, commands operations, or metadata.

Reproduction of combined videos

Finally, for reproduction, the multiple video tracks are extracted from the WebM container and/or the WebRTC stream, and applied to multiple Video elements. It’s possible to define what video track to use on each one by using the videotracks attribute of the Video element, but althought its support is very extended, it’s still a experimental feature that needs to be enabled explicitly, and where most current implementations like Chrome one only show the first available video track. Alternatively, it could be possible to extract the video tracks from the MediaStream and include them in new ad-hoc MediaStream objects, mostly replicating by hand the Video element videotracks functionality (in fact, it could be possible to write a polyfill, creating and returning a fixed VideoTrackList object). That would probably be the same process needed to be done with Android or iOS clients, in case they don’t support selecting the video track. Once videos are extracted, it would be just a matter of layout the Video elements with the Canvas alpha channel videos on top of the other original one since they already support transparent videos, but alpha videos support for native Android and iOS APIs would need to be investigated, or if not, then each frame would need to be painted by hand. Additionally, it would be good to use the mediagroup attribute, although it’s not clear what its current support status. Regarding inline WebVTT tracks, it’s not clear if they would be automatically extracted and included by the Video element itself or if it would be needed to be send out-of-band and included by using a Track element but there’s a spec with the intention to provide a common Javascript mapping API between inline cues in multiple container formats (including WebM).

Written on November 1, 2020

Comment on Twitter

You can leave a comment by replying this tweet.