Every year, the Wrapped-ification of the entire known world continues its slow march towards irrelevance through over-saturation. But what if you're just a nerd about your own data and you want to crunch it yourself? Last year, for whatever reason, YouTube Rewind claimed to generate for me but never worked. It worked this year - but how do I make my own version?
yt-dlp is a tool for downloading audio and video files all over the internet, but when it was forked from youtube-dl it gained several features, including the ability to parse and archive non-video pages on YouTube like your history "feed" (despite the name, unfortunately, this data is not provided via RSS). However, many of these features remain undocumented, and while some effort was made to explain and document some of these features - in particular how to format queries to the YouTube search API - how the tool parses the API results from YouTube watch history and what you can do with it are unexplained.
As my previous This Week in YouTube series indicates, I both watch a lot of and care about my YouTube history. I tried to dive into the code to figure out how yt-dlp was parsing the History API responses, but I ended up putting it together through a little trial-and-error and a lot of CTRL+F on the yt-dlp README.
Now, some limitations. It is not possible - as far as I can tell, I still can't figure out how to get yt-dlp to dump the raw YouTube API response - to filter the history data by watch date, only by video publication date, so we are making guesses based on my knowledge of channels I watch videos for on-release to scope our query to a Rewind-like year-in-review format. I'm also not sure if there are missing data in the response - either if you are going to try to pull your entire YouTube watch history, or just the year-in-review - because I just don't have a way to verify that.
First, you'll want to install yt-dlp, ffmpeg, and a JavaScript engine. You may optionally want the secretstorage Python package (I'll go into why you probably shouldn't later). With the exception of ffmpeg, I do not recommend using either sandboxed or OS package manager setups, as they will respectively be unable to access other data on your system and be very old. You don't technically need ffmpeg if you're not also archiving the videos you watched this year, but having it installed makes the output cleaner as it will suppress warnings per-video.
Next, you're going to have to do some trial-and-error depending on how long your YouTube watch history is both historically and for the time period you'd like data on. This means your first run of the commands below is going to be intentionally terminated early so you can gather some metrics. The command - don't worry, we'll get to the syntax and what each part means later - will begin by outputting something like:
[youtube:history] Extracting URL: :ythis
[youtube:tab] Extracting URL: https://www.youtube.com/feed/history
[youtube:tab] history: Downloading webpage
[download] Downloading playlist: history
[youtube:tab] history page 1: Downloading API JSONIgnore that for now, although how many pages of history may be useful to know when you can get up and stretch your legs during execution, depending on how much history you want to download. Next, it will start outputting actual video metadata parsing results, like this:
[download] Downloading item x of xxx
[youtube] Extracting URL: https://www.youtube.com/watch?v=xxx_xxxxxxx
[youtube] xxx_xxxxxxx: Downloading webpage
[youtube] xxx_xxxxxxx: Downloading tv downgraded player API JSON
[youtube] xxx_xxxxxxx: Downloading web creator player API JSON
[youtube] [jsc:deno] Solving JS challenges using deno
[info] xxx_xxxxxxx: Downloading 1 format(s): 401+251
[info] Writing '%(title)s, %(channel)s' to: ythistory.csvThat first line is the most important bit - this is essentially the size of the playlist of every video in your watch history. For example, it might say it's downloading video 1 of 1000, or 1 of 10000. You can go ahead and CTRL+C at this point (again, we'll get to the exact sequence of commands later). You're going to need to do some guesswork and math now - how long have you had a YouTube account? Of that time, how much of it have you spent watching videos this year? That's the range of your playlist you'll want to archive.
Okay! Time for the fun part! Let's run some commands. For safety, start a screen or tmux session. This will take a while - not nearly as long as actually downloading all these videos, but the JSON parsing and JS challenges are nontrivial.
For the initial playlist math, let's just tell yt-dlp that we want to list every video in our watch history. We're going to CTRL+C this command as soon as we get the information from the second codeblock above:
yt-dlp --js-runtimes deno:/usr/local/bin/deno --cookies-from-browser chrome+gnomekeyring --no-download --print "%(title)s, %(channel)s" --no-quiet :ythis
Let's break that down. I'm passing in the path to my deno JS runtime, because while the yt-dlp docs say it will look in the "standard path" for the deno binary, it doesn't clarify what it means by that, and I didn't want to futz with it. Next, I'm using --cookies-from-browser. That's probably a bad idea! You should probably use one of the other cookies options that yt-dlp supports, preferably just your YouTube cookie, because you don't know where this code came from or what it's doing with all your cookies. But I'm both lazy and have 2FA. Anyway, let's move on. --no-download will prevent yt-dlp from actually archiving the videos you watched. You don't want to remove this yet - we're still in a metadata step - but you may wish to remove it later. The --print option, for now, just makes sure we have enough info to get some vague judgments at this point. And - somewhat bizarrely - because we have specified --print, we also need to specify --no-quiet so we get the rest of the metadata we need. You can hit CTRL+C once you have the metadata described in the first codeblocks above.
Let's move on to the real command, in my use-case leveraging a browser cookie store to grab a CSV of the 100 most-recent videos I've watched:
yt-dlp --js-runtimes deno:/usr/local/bin/deno --cookies-from-browser chrome+gnomekeyring --no-download -I 1:100 --print-to-file %(title)s,%(channel)s,%(upload_date)s,%(original_url)s ythistory.csv --no-quiet :ythis
We start as we did before, but there are two new arguments. -I is an index of the playlist, which is ordered most-recent-first. This is where you're doing math based on the above - entirely down to your YouTube habits so I'm afraid I can't offer any hints here. Next, we've replaced --print with --print-to-file to give us a CSV. Let's pause.
I initially got a very, very basic version of this command from a Reddit thread. But where'd they get the variables to pass to print? I wanted a little more. These are used throughout yt-dlp for a variety of things, and are documented in the README - specifically, scroll down to "The available fields are." This is where I believe that, even if the raw YouTube history API response included a watched_at timestamp or something, we wouldn't be able to include it in our reporting - it's not something we can pass through --print. You also may want to use a character other than , to delineate the columns, depending on the type of characters that are used in YouTube titles you watch.
At this point, you'll probably go into a couple of rounds of guess-and-checking; you'll repeat the above command a couple of times until you find the sweet spot for the -I parameter. But once your final command run is complete, you'll have an initial CSV with some data to stare at! Since the command I used above includes the YouTube video URL, you can also use other yt-dlp features to build more Rewind-like functionality, such as fetching thumbnails, descriptions, the video itself - whatever you like to stare at most.
Have fun!