Id3vx – a library for parsing and encoding ID3 tags

working a binary file format with Elixir

The podcast moguls of Changelog Media have been keen to introduce chapters into their podcast episodes and they reached out to me for a hand. And would you know, now they have chapters and the Elixir community has a shiny new ID3 tag library.

Chapters are neat, they let you see a Table of Contents, jump to specific parts and similar. The changelog.com site is an open source project built in Elixir. I love working with Elixir. It is a steadily growing but still somewhat niche language and it doesn’t have every library under the sun, only most of them. It hasn’t had a fully fledged ID3v2 library so far. There are a few libraries that let you parse a tag in one specific way, targeting a subset. I haven’t found any that allow writing a tag. To create podcast chapters you need to update the ID3 tag of the MP3 file, or create one from scratch. This means decoding and encoding.

This brought the motivation to create something they needed and contribute that in the open to the community. I think that’s a very healthy approach: building something you actually need and sharing it.

I’ve worked with Jerod on Changelog things in the past and am somewhat active in the Changelog Slack. I’ve also made an appearance on Ship It! early on, recently with episode 70 and I took Jerod with me out to the wider Elixir and Erlang community for an episode of BEAM Radio where I am a host. Typically I write on Underjord.io and on my newsletter. I mostly publish about Elixir with plenty of wider tech opinions.

I want to give a lot of credit to the Changelog folks for their approach to this. Rather than building the bare minimum inside their code base they tasked me with building a useful library that should solve their problem and many adjacent problems people might have. Then turn that loose as open source. They paid to add another library to the Elixir ecosystem and are committed to maintaining it. Classy!

On to the ID3!

The ID3 format

Specifically we are targeting ID3v2.3. Because that’s the most commonly used and supported one in our experience. There is the newer ID3v2.4 which we might expand to support at some point. If you want to read or work with the spec I recommend going via the Mutagen project that re-hosts the specs in a much more readable manner.

I’ve covered my reflections on the fun of the ID3 spec and the thoughts it awoke in me on my blog. This time around I want to get more technical about it and by chance shine a light on some very nice parts in my favourite language.

If you have an MP3 file and it wants to offer metadata it probably has an ID3 tag prepended. The format is a nice and tight chunk of binary. It starts with a tag header, 10 bytes, like this:

ID3 3 0 X XXXX
|   | | | |______________ 4 bytes of "synch-safe" size
|   | | |________________ 8 bits/1 byte of flags
|   | |__________________ 1 byte, revision (currently 0)
|   |____________________ 1 byte, version (the 3 in v2.3)
|________________________ Initial 3-byte marker

Parsing this out in Elixir with binary pattern matching looks like this:

<<"ID3", 3::8, _::8, flag1::1, flag2::1, _more_flags::6, size::32,
  remainder::binary>> =
    File.read!("myfile.mp3")

Some of those flags can add some additional data to the header but this is the basic thing. You read the flags and the size, convert the size (it uses an unusual format) and you are prepared to get into the next part. Frames. A frame is one piece of metadata. Text frames like TIT1 and TIT2 represent titles, TALB is album. Then we use APIC which is Attached Picture. CHAP which is the slightly newer Chapter frame. And on it goes. Each type of frame has a custom format aside from the T-series. T-frames, text frames, are all the same format.

A frame starts with a frame header, 10 bytes, and it has the following format, exemplified with a TALB frame:

TALB XXXX XX
|    |    |____ 2 bytes of flags
|    |_________ 4 bytes of size
|______________ 4 bytes of frame ID

And an Elixir match would look something like this, first pull the header, then use the size to grab the data:

<<id::32, fsize::32, fflags::16, remainder::binary>> = remainder

<<frame_data::binary-size(fsize), remainder::binary>> = remainder

The size gives us the ability to grab the rest of the frame data and if we don’t know how to deal with it we can simply skip it to get to the next frame. For cases where we do know how to deal with the frame data that means we’ve implemented that particular type of frame. We can look at the humble text frame as that gives us a lot of useful frames with a single implementation. Most other frames are special butterflies and we need code for all of them. Some examples:

@spec parse(id :: binary(), size :: integer(), flags :: integer(), data :: binary) :: map()

def parse("T" <> _, _size, _flags, data) do
  <<encoding::8, text::binary>> = dat
  null = get_null_for_encoding(encoding)
  {real_text, _skip} == :binary.split(text, null)
  decoded_text = decode_string(real_text, encoding)
end

def parse("APIC", _size, _flags, data) do
  <<encoding::8, rest::binary>> = data
  {mime_type, rest} = :binary.split(rest, <<0>>)
  <<picture_type::8, rest::binary>> = rest
  null = get_null_for_encoding(encoding)
  {real_description, picture_data} == :binary.split(rest, null)
  %{
	encoding: encoding,
	mime_type: mime_type,
	picture_type: picture_type,
	description: description,
	picture_data: picture_data
  }
end

This parse function matches on the frame identifier to take us to the right parsing code. Frame data often comes with a mix of fixed-size fields and variable-length things, such as strings. These variable-length strings are null-terminated. That’s where we call :binary.split above. I’m hand-waving the encoding-handling above. Essentially ID3v2.3 supports ISO 8859-1 and UTF-16. If you are using UTF-16 your null bytes are <<0,0>> and if not <<0>>.

These are the fundamentals of the ID3 format. You have your header, then you have your frames, each frame has a header and some data. The frame data is specific to the frame type. Certain frames can actually have subframes. I’m looking at Chapters (CHAP) here.

Elixir binaries & strings

A curiosity about Elixir is that an Elixir string is just any binary that is clean UTF-8. All strings are binaries but not all binaries are strings. This is why we can write a binary match like this <<"TALB", size::32, flags::16, rest::binary>>. If the first characters spell out “TALB” as a string it will match. Well a UTF-8 string will be the same as an ISO-8859-1 / latin1 string or even an ASCII string if we’re just using your basic letters. This means we have a bunch of cases where we can conveniently use a string in a binary match.

Overall the binary pattern matching syntax is just incredibly useful and powerful and make most work I’ve done with binaries in Elixir quite convenient.

Efficiently building binaries

As mentioned we wanted to write tags as well, not just read them. Elixir is a Functional Programming style language and as such maintains immutability. This means when we build our binaries we can’t just add a byte to the end of a piece of memory without consequence. This:

new_binary = first_binary <> <<0xff>>

Would mean we create a copy. We could get away with this in the ID3 library because we aren’t talking about particularly large binaries in the ID3 tag. Mostly, at least, there can be pictures. The copies are small, the cost is small. I have seen some absurd base64 encoded data in an XML file absolutely murder memory in a poorly considered XML library in Elixir though. Hundreds and hundreds of copies of a large binary. OOM Killer had a field day.

We don’t want that. So we use IO lists. Lists in Elixir are cheap as long as you prepend. IO lists are structures of nested lists and binaries. Our previous operation becomes:

new_iolist = [first_binary, <<0xff>>]

That’s cheap, new list, two items, we don’t try to change them so they can be simple references to memory and all that. Once we need the final binary we can flatten them with IO.iodata_to_binary/1. Many parts of Elixir also allows writing the IO list directly. I don’t believe I made use of this though.

This means we have a pretty convenient and efficient way of building binaries in an immutable world.

A well-behaved library

When I explored the existing libraries to learn from them I ran into what I would consider a small illness of the early days of the Elixir community. Back from when everyone was less experienced with all the possibilities and almost everything was built too cleverly by half. At least one of the libraries I tried set up an app and when I tried to parse a file it would run a GenServer (essentially an actor) to provide a stream interface to the data and the parsing of data got intermingled with the implementation detail of how to get the binary data. I believe it also ended up being a Task, or involved GenServer calls and such. Things that can fail and time out in fun ways.

Now, this library had one thing quite right. You shouldn’t always do a File.read!("myfile.mp3") and call it a day. That means loading the whole file into memory which is overkill. This library does offer a function for parsing the ID3 tags out of a binary you already have, whether you got that from a stream, you read a full file or you generated a file that you want to write. You can use Id3vx.parse_binary for that. But the common case would be reading from a file and parsing that. This should ideally only pull the necessary bytes as needed and leave the rest alone. The tag is at the beginning for a reason. This incremental streaming read is what happens under the hood if you use Id3vx.parse_file as it opens an IO device (essentially a file descriptor).

Regardless of what way you started the parsing procedure the library will pull the bytes of binary data as needed, either from your big hunk of binary file data or from the IO device.

There is very little reason to use any of the fancy parts of Elixir, Erlang or OTP when parsing less than 256MB of binary data. And if you are doing a bit of parsing where you need to do a ton of files in parallel or you need a timeout, then this library offers simple enough primitives that you can stitch that together yourself. There are no global GenServers, no config at all. It reads and it writes and provides a simple API while mostly doing the right thing under the hood.

Representing the data in Elixir

Elixir offers maps that are you average hashed dictionary dealio. Efficient random access. Keys. Values. We also have structs that are named maps with a limited set of fields.

I started working with ad hoc maps as I implemented the library but after the fifth or so instance of getting a field-name wrong and not hearing about it until I wrote a test I decided to go with structs. The tag itself is represented as a %Tag{} struct that captures the header information of the tag such as size and flags. It also contains a list of frames. Each frame is a %Frame{} struct with information about size, flags and such. Each frame has a data key that gets into the specific frame implementations. %Text{}, %AttachedPicture{}, %Chapter{} and %Popularimeter{} to list some examples.

I also have a special case which is the %Unknown{} frame. It is essentially a no-op decode. For each frame we keep the raw frame data around and if we need to re-encode a tag where we can’t parse all the frames we can still reproduce it fully.

When you throw a file path or binary at this library for parsing you will get a %Tag{} struct and if you can produce a %Tag{} struct the library should be able to encode you an appropriate ID3 tag.

Purpose-built API

The library is built primarily to add chapters to a podcast but also in general to make building out ID3 tags decently convenient. With ID3 tags there are certain tasks that are very common and a lot of less common ones. To support the more common set I built out a pipeline-style API. If you need the less common frames or perfect control you have every possibility to construct the Frame structs yourself and add appropriate data. They typically have solid defaults as well. But for the 90% use-case, this is the way you’d do things:

alias Id3vx.Tag

Tag.create(3) # For ID3v2.3, the 3 in 2.3
|> Tag.add_text_frame("TIT1", "Regular Programming")
|> Tag.add_typical_chapter_and_toc(
        0,             # start-time
        20,            # end-time
        0,             # start byte offset
        1024,          # end byte offset
        "Chapter 1",   # title
        fn chapter ->  # callback for adding to chapter

          chapter
          |> Tag.add_text_frame("TPUB", "Chapter publisher")
          |> Tag.add_attached_picture("", "image/jpeg", File.read!("mypicture.jpg"))

        end
      )

That code would create a tag struct with a title and one chapter. Rinse and repeat for more chapter, more images, more text fields.

You can see it in action in Changelog’s code doing specifically this:

def add_chapters_to_tag(tag, chapters) do
    Enum.reduce(chapters, tag, fn(c, acc) ->
      Tag.add_typical_chapter_and_toc(acc,
        in_milliseconds(c.starts_at),
        in_milliseconds(c.ends_at),
        0, 0, # settings start/end offsets to zero forces use of start/end times
        c.title, fn(chapter) ->
          chapter
          |> add_link_to_chapter(c.link_url)
          |> add_image_to_chapter(c.image_url)
        end)
    end)
  end

For writing the tag you could use Id3vx.encode_tag and then concatenate it to the start of an MP3 like encoded_tag <> File.read!("myfile.mp3") but what if that file might already have an ID3 tag you want to replace? This was something Jerod needed that also makes a ton of sense.

This gave rise to Id3vx.replace_tag(tag, infile, outfile) which will take a media file, find the ID3 tag and measure it, strip it off of the start of the file and then encode and add our own tag. Example from Changelog’s codebase here:

 new_tag =
      episode
      |> tag_for_episode()
      |> add_image_to_tag(cover_file)
      |> add_chapters_to_tag(chapters)

    Id3vx.replace_tag(new_tag, file_path, tagged_file_path)

We don’t implement full ID3v2.4 support (contributions welcome) but I believe we can replace them just fine.

For any usage that is more complex than this there is plenty of in-depth primitive data to access in the frames, there are functions that let you encode things piecemeal and much more.

I hope this library will be useful in general around the Elixir community and I’m very glad that Changelog podcasts will now have chapters.