After 8 years of maintaining my lil' custom Django blog, it's time for a
change! I'd been thinking about migrating for a while. After the first couple
of years of excitement I started falling further and further behind framework
upgrades, and my cute anti-spam system kicked the bucket a couple of years
back, even though there never was much conversation on the blog. Drop me an
email or a tweet if you want to chat about something here :)
I'd been postponing the migration because I thought it would be real painful to
migrate both the content and keep the URL format the same, especially for a
custom platform. It turned out to be really easy. Pelican rocks!
Migrating the content
Pelican comes with an import tool that supports bland
little feeds like mine. By default my feed only displays 10 entries but since
it's my code I just modified it locally to show them all. That probably ended
up being one of the least straightforward parts of the process actually. I was
super excited about Django when I first created the blog but not too familiar
with how to manage Python dependencies. Thus, although I did write down the
dependency names in a text file I wasn't forward looking enough to include
version numbers. pip freeze is my friend now. Thankfully I only had a
couple of plugins to play guess-what-version at.
I did end up making a couple of changes to Pelican locally so it would work
better with my content (yay open-source).
First, to avoid the <pre> code snippets getting mangled with no linebreaks
I ended up commenting out a few lines in fields2pelican()
that look like they're meant to ensure the validity of the original HTML. I was
using a wizard in the old blog so there's no reason it shouldn't be. I wasn't
too worried about it and didn't notice side-effects during the migration.
Secondly, the files weren't created with the correct slugs and filenames, which
caused some issues when rewriting the URLs. It looks like the feed parser
doesn't look at the real slug so I figured out where the URL was at in
feed2fields()
(in entry.id for me) and changed the slug = slugify(entry.title) line
to break down that value and extract the real slug.
Adjusting the content
Now, I use tags quite liberally and on the feed that was marked with "Tagged
with: blah, bleh, bloh" at the end of an article. I wrote a short script to
scrap that line from the rst files created in the previous step, add the
discovered tags to :tags: in the metadata and remove the 'Tagged with'
line. That was fun! The script is ugly and bugs were found along the way, but
it did the job and now it even works when there are so many tags on an entry
that they're spread over several lines ;)
Rewriting the URLs
I don't know if I should even give this a heading. Figuring out rewrite rules
was giving me cold sweats but it turns out Pelican gives you handy settings out of the
box to have your URLs look like whatever you want. It's really easy. I mean, I
don't think I broke anything?!
Except the feeds, but after some thinking that's something I decided to do on
purpose. The blog has ended up aggregated in a lot of places I don't even
remember, and I was really concerned about 8 years of entries somehow getting
newer timestamps and flooding the planets I'm on. So, brand new feeds. I'll
update the two or three planets I remember being a part of, and the others as I
find them or they find me again :)
Going mad with sed
After putting what I had so far on a temporary place online, a couple of
additional issues popped up:
- When the feed was imported, some of the internal URLs were copied as full
URLs rather than relative ones. That means there were a bunch of references
to http://localhost:8000, since I'd used a local copy of the feed.
- The theme, images and most of the links didn't work because they expected the
site to start at / but I was working off a temporary sub-directory for the
test version.
I've never used sed so much in my life. I'm going to be an expert at it for
the next three days at least, until I forget it all again. Here, writing some
of them down now for future-me when how to use groups becomes a distant memory:
# Fix the images!
$ for f in `grep -rl "image:: http:\/\/localhost:8000" *`; do sed -i 's/image:: http:\/\/localhost:8000/image:: {filename}/g' "$f"; done
# Fix the internal links!
$ sed -i 's/<\/blog\/[0-9]*\/[0-9]*/<{static}\/Tech/g' content/Tech/*
$ sed -i 's/\({filename.*\)\/>`__/\1.rst>`__/g'
# Fix the tags!
$ for f in `grep -rl /tag/ *`; do sed -i 's/\/tag\/\(.*\)\//{tag}\1/g' $f; done
I think I had to do a bunch of other ad-hoc modifications. I also expect to
find more niggles which I'll fix as I see them, but for now I'm happy with the
current shape of things. I can't overstate how much easier this was than I
expected. The stuff that took the most time (remembering how to run the custom
blog code locally, importing tags, sedding all the things) was nearly all
self-inflicted, and the whole process was over in a couple of evenings.
Blogging from emacs
Sure feels nice.