1984 words
10 minutes
How hard can playing a movie be?

If you’re wondering what the cover image is. It is mpv attempting to play an anime with Dolby Vision but not understanding the dynamic range so all the bright parts are clipped and so the intro text is not readable. HDR is really new in Linux and it shows.

NOTE

This is not marked as a draft so people can see this on dev.

WARNING

I’m not an expert on this stuff so don’t like cite me on this.

Home media releases are cool but it turns out there’s a lot you have to do to get the most out of your media.

Video#

Color Space#

Most people when working with colors are familiar with specifying levels of red, green, and blue. This works fine for most cases in what is called SDR (Standard Dynamic Range). But sometimes you want to have a large contrast between two colors with one being really bright (like the sun) and the other being really dark (like a shadow). This is really hard to represent using just RGB so people developed more formats for “high dynamic range” aka HDR. That’s the basic idea, and past this I don’t think I’m qualified to explain the terminology but this new representation basically let you display a wider range of colors and enhance the viewing experience.

Most phones take pictures in HDR and they may tonemap the content to SDR if needed but this allows you to take incredible sunset photos (TODO: link blog about this).

Standards for HDR#

So there are a 3 ways HDR is typically represented.

HDR 10#

Most primitive method. Most content is not delivered with only HDR10, usually it’s one of the below enhanced formats and HDR10 is used as a fallback. There’s just metadata at the start (citation needed)

Dolby Vision#

Basically there’s extra metadata along with the video per scene that tells the viewing hardware what to do. Dolby has some strict rules about the hardware that can do this so the player needs to be certified (likely pay licensing fee) and the TV needs to support it. It is well known that all Samsung tvs do not support Dolby Vision but they do support HDR10+.

Because of the certification requirement, you can’t just tell a Raspberry Pi running Kodi to output Dolby Vision because it’s not certified (I wish). There are lots of cheap streaming sticks that you just get that are certified however but be careful what you buy since not all of them work the same. Each device has their own quirks documented on this spreadsheet but tbh you might not notice and it turns out you can also play Dolby Vision from a Windows device.

I think if you use mpv it can tonemap Dolby Vision nicely for you even if you’re on a SDR display. If you ever tried playing a video on a player that doesn’t interpet Dolby Vision and doesn’t have fallback things you may just see a the content but everything is tinted purple and green.

Sidenote

Most iPhones record in Dolby Vision. At the time of writing I had an iPhone SE 3rd gen which was unsuprisingly an exception. Apple devices are nice to use a reference for how HDR content should look. 2026 Update: Ever since liquid glass came out, it seems like overusing HDR in ui has been normalized (see photos app for example). Of course when you take a screenshot, it will tonemap to SDR which makes taking photos of people’s phones a different story.

HDR10+#

Dolby Vision but not as proprietary.

There is a tool to convert Dolby Vision stuff to HDR10+. samsung (and I think a few other brands) love supporting this format (unsuprisingly).

Hybrid Files#

Some files my contain both HDR10+ and Dolby Vision content in case a TV cannot play Dolby Vision. There’s apparently a well known bug where some TV stick SoCs will “die” and show a black screen if a certain common type of these hybrid files is being played. It’s well known on GitHub (TODO: link) and has still not been fixed.

Tone mapping#

Basically processing the content to be playable on the display. Commonly needed cause displays have different max brightness levels like a projector struggles to reproduce the same brightness range as a TV so it needs to recalculate the brightness levels.

Sidenote: Windows HDR is broken#

No one in their right mind would want to keep their displays in HDR mode Windows to be honest. It basically recolors your games so they look a lot different than in SDR the algorithm doesn’t seem to be very good imho. The other thing is that a lot of screenshot tools break and all screenshots you take look too bright as if someone pointed a lamp at your screen. When I had HDR on in one display I would drag Windows over to the SDR display just to screenshot them.

Audio#

Most casual content like content on YouTube, is in stereo which means it has 2 channels. However to enhance your audio experience a lot of stuff comes with more than 2 channels for “surround sound” in which there are more than 2 speakers in a room and each of them gets their own channel of audio. Commonly you may have heard of the figure “5.1” or “7.1” or maybe even “5.1.” which all refer to the distribution of channels. In higher end multimedia there are quite a few different formats for audio, like most DVD, BluRay, or streaming content does not come in opus or flac audio codecs because Dolby has their own proprietary codecs of which there are confusingly lot of like “Dolby Digital”, “Dolby Digital Plus”, etc…yea I’m not sure if I’m remebering correctly.

Typically one has 3 speakers at the front which are the front left, front right, and the center channel. The center channel is cool because all the dialogue goes through it. If you don’t have one some mathing magic will happen to create a “phantom center” by playing the audio from the front left and front right channels together. After that you have another left and right in the rear and these are called the surround speakers. People typically spend a lot less on the speakers in the back. The subwoofer doesn’t have a standard place to be in because it varies per room. This gives us 5.1 but to adapt it from 7.1 according to the Dolby diagrams you just put 2 more speakers in between the front speakers and the surround speakers.

In short the first number is the number of “normal” speakers in the room, the second is the number of subwoofers (apparently having more than one sub makes bass more even if you can afford it because of some destructive interference from the room), and the third is the number of height channels which is like audio coming from the ceiling. Height channels are not always implemented with speakers on the ceiling however. Some people have designed speakers that are angle towards the ceiling and the idea is that the sound bounces off the ceiling but those obviously aren’t effective.

NOTE

A soundbar is basically a bunch of speaker drivers crammed into a bar. So like the left, right, and center are in the bar and with some bars they also have upfiring speakers in an attempt to implement height channels but it’s not always effective.

Anyways older movies just have the channels of audio for speaker setups but it’s up to the device processing the content to figure out how to play it like obviously you can play 5.1 content on a stereo system because the player or your system audio stack (e.g. pipewire/pulseaudio on linux) will do this thing called “downmixing” where it makes it’s own left and right channels as a function of the original channels. The math for this is pretty simple I think but like people tend to not agree on coefficients/multipliers for the channels although there is a standard. Obviously this destroys spatial properties of the audio mix.

There is also something called virtual surround to get the effect if you are using headphones using some fancy math. Sometimes it involves figuring out physics of stuff bouncing off ear shapes (e.g. see 360 reality audio feature or airpods integration with Apple Music Atmos mixes). On linux you should look up sofa or the some pipewire plugins but tbh I haven’t really tried the options.

Dolby Atmos#

A movie theater has more than like 6 or 8 speakers of course so how do we scale this “spatial” audio to fit more setups. 9.x.x and 11.x.x stuff exists now but how does this make sense? Will enter, object based audio formats, which basically just embed metadata saying this audio is at this position (which can be changing I think) instead of a fixed amount of channels in fixed positions. A canned example of this is scaling a helicopter moving through the air. The device plugged into the speakers is aware of the room speaker positioning and calculates the signal to send to each of the speakers.

At least Atmos isn’t locked down to specific players, so apparently you can have a Pi send it. Also most stuff that has Atmos has fallbacks for the older fixed channel format in case the hardware doesn’t support it.

NOTE

Apple Music Atmos is basically the same format as the audio tracks I mentioned except in a m4a container.

dts exists#

I haven’t got much to say for this but some more advanced DTS encoded stuff gets downgraded depending on the player. Seems like I have yet to hit this case.

They made dynamic range for audio#

DTS-HD MA is supposedly very loseless so the original dynamic volume range is not as lost as much. As a result if you ever come across a movie with an audio track in the format you might notice that something small like footsteps are too quiet and then turn up the volume. Then later in the movie there’s an exploision that plays a lot louder than you are typically used to because of aforementioned dynamic range preservation.

Subtitles#

Most stuff just has text or rendered bitmap subtitles (like image on top of the video). It seems like in anime there’s a lot more intresting uses of subtitles.

ASS#

ASS is text based but you can like bundle fonts for the subtitle to display in as well as some other style options. It seems like depending on player and device the font sizes can be scaled appropriately. Recent Android tries to sort of support reading this subtitles but you’re better off burning in the subtitles because the support is so horrible at times with text sometimes just appearing for a single frame and disappearing. Fonts just seem to be ignored under Android and controlled in the settings app.

PGS#

When you ship bitmap images to overlay on top of stuff. At least no rendering inconsistencies based off player.

SRT#

Literally just shipping text files. Basically time ranges and text in sorted order.

WebVTT#

You see this in website videos. Basically SRT but more modern I think?

Streaming Platforms#

Well like bluray has higher bitrates but sometimes people like to shit on encoding/compression quality of streaming services. For most people it’s acceptable as long as you don’t pause during a high motion scene.

Codecs#

Video#

Markdown table time.

TODO

tldr: H264 highly supported, HEVC/H265 has better compression but lacks support (e.g. ChromeOS video player refuses to play it somehow) and has legal issues, AV1 is even newer so hardware support is better, has even better ratios, but some people say it destroys the background. Most stuff is in a mix of HEVC and H264. They all have their advantages and disadvantages, I’m not picking specifically one out here because this is a subject of intense heated debate.

Higher bitrate is not better. If you have like really old stuff it’ll have a lot of film grain. If it’s file has very high bitrate, that means congrats, you modeled film grain very accurately.

Audio#

To be honest this one is too confusing to explain. Do your own research but as long as it’s not like low bitrate AAC it should sound fine to most people.