jpichon.net

Migrated to Pelican

After 8 years of maintaining my lil' custom Django blog, it's time for a change! I'd been thinking about migrating for a while. After the first couple of years of excitement I started falling further and further behind framework upgrades, and my cute anti-spam system kicked the bucket a couple of years back, even though there never was much conversation on the blog. Drop me an email or a tweet if you want to chat about something here :)

I'd been postponing the migration because I thought it would be real painful to migrate both the content and keep the URL format the same, especially for a custom platform. It turned out to be really easy. Pelican rocks!

Migrating the content

Pelican comes with an import tool that supports bland little feeds like mine. By default my feed only displays 10 entries but since it's my code I just modified it locally to show them all. That probably ended up being one of the least straightforward parts of the process actually. I was super excited about Django when I first created the blog but not too familiar with how to manage Python dependencies. Thus, although I did write down the dependency names in a text file I wasn't forward looking enough to include version numbers. pip freeze is my friend now. Thankfully I only had a couple of plugins to play guess-what-version at.

I did end up making a couple of changes to Pelican locally so it would work better with my content (yay open-source).

First, to avoid the <pre> code snippets getting mangled with no linebreaks I ended up commenting out a few lines in fields2pelican() that look like they're meant to ensure the validity of the original HTML. I was using a wizard in the old blog so there's no reason it shouldn't be. I wasn't too worried about it and didn't notice side-effects during the migration.

Secondly, the files weren't created with the correct slugs and filenames, which caused some issues when rewriting the URLs. It looks like the feed parser doesn't look at the real slug so I figured out where the URL was at in feed2fields() (in entry.id for me) and changed the slug = slugify(entry.title) line to break down that value and extract the real slug.

Adjusting the content

Now, I use tags quite liberally and on the feed that was marked with "Tagged with: blah, bleh, bloh" at the end of an article. I wrote a short script to scrap that line from the rst files created in the previous step, add the discovered tags to :tags: in the metadata and remove the 'Tagged with' line. That was fun! The script is ugly and bugs were found along the way, but it did the job and now it even works when there are so many tags on an entry that they're spread over several lines ;)

Rewriting the URLs

I don't know if I should even give this a heading. Figuring out rewrite rules was giving me cold sweats but it turns out Pelican gives you handy settings out of the box to have your URLs look like whatever you want. It's really easy. I mean, I don't think I broke anything?!

Except the feeds, but after some thinking that's something I decided to do on purpose. The blog has ended up aggregated in a lot of places I don't even remember, and I was really concerned about 8 years of entries somehow getting newer timestamps and flooding the planets I'm on. So, brand new feeds. I'll update the two or three planets I remember being a part of, and the others as I find them or they find me again :)

Going mad with sed

After putting what I had so far on a temporary place online, a couple of additional issues popped up:

When the feed was imported, some of the internal URLs were copied as full URLs rather than relative ones. That means there were a bunch of references to http://localhost:8000, since I'd used a local copy of the feed.
The theme, images and most of the links didn't work because they expected the site to start at / but I was working off a temporary sub-directory for the test version.

I've never used sed so much in my life. I'm going to be an expert at it for the next three days at least, until I forget it all again. Here, writing some of them down now for future-me when how to use groups becomes a distant memory:

# Fix the images!
$ for f in `grep -rl "image:: http:\/\/localhost:8000" *`; do  sed -i 's/image:: http:\/\/localhost:8000/image:: {filename}/g' "$f"; done

# Fix the internal links!
$ sed -i 's/<\/blog\/[0-9]*\/[0-9]*/<{static}\/Tech/g' content/Tech/*
$ sed -i 's/\({filename.*\)\/>`__/\1.rst>`__/g'

# Fix the tags!
$ for f in `grep -rl /tag/ *`; do  sed -i 's/\/tag\/\(.*\)\//{tag}\1/g' $f; done

I think I had to do a bunch of other ad-hoc modifications. I also expect to find more niggles which I'll fix as I see them, but for now I'm happy with the current shape of things. I can't overstate how much easier this was than I expected. The stuff that took the most time (remembering how to run the custom blog code locally, importing tags, sedding all the things) was nearly all self-inflicted, and the whole process was over in a couple of evenings.

Blogging from emacs

Sure feels nice.

links

social