Published: Sun 07 May 2023
In Tech .
tags: python
In early 2020, it became really difficult to read books. I'm slowly starting to
get past that at last, but while Goodreads was super fun for a few years, right
now tracking and counting and rating just makes reading feel like a chore and
Not Fun as a hobby.
I still would like to keep a record of what I read, though. And I used to write
book reviews on this blog before getting onto
Goodreads. It seems like a nice way to keep a record of what I read without it
becoming a number thing. I'll probably still write Goodreads reviews for small
authors since I know it can help, but I feel less pressed about having
everything there.
I also wanted to save the reviews I wrote only on Goodreads here, and wrote a
script to migrate my reviews into Pelican-friendly Markdown pages since that's
what powers this blog now. I decided to keep to a single entry per year
rather than one entry per book, since I had a few good reading years in there
(and others will only a single review!)
Step 1: CSV export of the books
First, if you go to 'My books' and find the 'Tools' menu at the bottom of the
leftside menu on Goodreads, you'll find a page to export your
library . You may have to try a couple
of times: my first export only had a handful a books, the second one looks more
comprehensive although the number of books was off by two, but what can you do.
Step 2: Python script to create the Markdown pages
This is the script I wrote to extract only the books I've actually read. I
don't really care about DNF (did not finish) and to-read, right now. I also
hardcoded the years relevant to me. May someone find something helpful in here!
import csv
from dataclasses import dataclass
from datetime import date , datetime
@dataclass
class Review :
title : str
author : str
date_read : date
review : str
rating : int
def get_reviews ( year ):
reviews = []
with open ( 'goodreads_library_export.csv' ) as csvfile :
reader = csv . DictReader ( csvfile )
for row in reader :
try :
r = Review ( row [ 'Title' ],
row [ 'Author' ],
row [ 'Date Read' ],
row [ 'My Review' ],
int ( row [ 'My Rating' ]))
except ValueError :
# When the int() cast fails, usually it means the CSV is
# corrupted for that line. In my case, it was for a few to-read
# records so I ignore them rather than attempt to fix the
# original CSV. You can print the row here if you want to check
# what's failing.
pass
if year is None or year in r . date_read :
reviews . append ( r )
return reviews
def rating_or_review ( review ):
# If I wrote a review, return that
if review . review :
return review . review
# Otherwise, make the rating into words.
if review . rating >= 4 :
return "I really enjoyed it."
elif review . rating == 3 :
return "It was fine."
else :
return "Wasn't for me."
def format_reviews ( reviews , year ):
with open ( f 'book-reviews- { year } .md' , 'w' ) as f :
f . write ( f "Title: Book reviews: Year { year } \n " )
f . write ( f "Date: { datetime . now () . isoformat () } \n " )
f . write ( "tags: book review \n\n " )
for r in reviews :
# A couple of abandoned books sneaked in with a '0' rating, and I'm
# not interested in preserving those
if r . rating != 0 :
f . write ( f "## { r . title } by { r . author } \n\n " )
f . write ( f " { rating_or_review ( r ) } \n " )
f . write ( " \n " )
for year in range ( 2013 , 2023 ):
reviews = get_reviews ( str ( year ))
# Chronological order
reviews = sorted ( reviews , key = lambda r : r . date_read )
format_reviews ( reviews , year )
The hardest part was probably to decide what text to convert a rating into,
since I didn't want to keep numbers!
Step 3: Checking the output looks right and recalling fond memories
I used the 'Year in Books'
pages on Goodreads to
compare the results. There was some funkiness sometimes, like a book read in
2011 showing in year in books but without any shelves and a date read showing
as 2020, even though I don't remember messing with it. The review also shows as
Jan 2020 on the Goodreads UI despite appearing in the correct 'Year in
books'. 2020 turned out to be date_added
(which is definitely false) while
the date_read
field is empty. Maybe some data migration funkiness on the
Goodreads side at some point during the last 12 years. Otherwise, a duplicate
once, and a couple of intra-Goodreads links that didn't work.
I still have to clean up the file for 2022. I was getting annoyed with tracking
myself so I didn't write reviews, but if it's for the blog I wouldn't mind
adding a few notes. And I need to decide if I want to post my 2023 reviews as I
go, or batch them in some way!