With ShipItCon 2024 fast approaching (in 2 days!!), it's time for me to admit I will never finish writing up my notes from last year's and just post what I do have. Perhaps this year my note-taking skills will improve?
ShipItCon is a small non-profit conference in Dublin that focuses on Software Delivery. The conference is a single track on a single day, and the theme in 2023 was "The Unknown" (observability came up A LOT!). Here are of a few of the highlights, from my perspective (and from my very, very partial notes).
The 2023 recordings are available now.
Keynote: Dr Norah Patten
Inspiring journey to become Ireland's first astronaut! Also very impressive and massive amount of outreach work. Including writing a children's book, Shooting for the Stars!
I liked how she broke down several stages of her life into categories: Influences (e.g. teachers, family) / Role models / Peers / Experience (e. g. a visit to NASA as a child), and how each contributed and came together to allow her to come to where she is, now.
Some cool outreach projects I didn't know about:
- Using Nanoracks, students plan experiments for astronauts then examine the results.
- PoSSUM 13 for researchers, and the microgravity challenge for payload experiments in space/suborbital/microgravity. High school students apply with detailed plans of the science experiment they want to run, and if successful get to build the payload themselves with mentoring, etc.
What's Slowing Your Team Down? (Laura Tacho)
After observing hundreds of teams, the speaker found that every team has the same problems, divided into three categories: projects, processes, people.
The speaker uses thousands of metrics to determine issues. Tracking doesn't solve the problem but it helps with tracking progress as well as "quantify sentiment" which I found really interesting. In general, I found this talk incredibly informative and valuable, and I recommend the recording.
1. Projects
- Too much work in progress
This is the biggest problems for most teams. It causes issues with getting things done or finishing things.
Measures that can help: number of projects per person (engineer cognitive load), ...
- Lack of prioritisation
No means "not now"; yes means "forever" (with an every increasing maintenance cost).
The goal should be specific enough that teams can easily say no to things.
2. Process
-
Slow feedback (from people or machines)
-
Not enough focus time
Length and frequency of focus time chunks, number of meetings where something is actually at stake, ...
3. People
- Unclear expectations
Giving expectations vs fear of micromanagement.
Receiving expectations vs fear of seeming incompetent, psychologically
unsafe teams.
(How this can set up a negative circle where one lead to the other and
back)
Reading recommendations: "Accelerate" (about DORA metrics), the SPACE of Developer Productivity, DevEx: What Actually Drives Productivity (interesting, although at least one of the main author is selling something related).
I imagine the recording is available now, and the slides are also interesting for a more detailed summary with many more examples of measures for each category, and a link to the kind of developer survey they used to "quantify sentiment."
Building scalable GenAI Retrievial Augmented Generation Platforms (Mihai Criveti)
LLM and their limitations. Text generation, doesn't learn from interactions, AI function to predict the next word, doesn't access the Internet.
Models with 70 billions parameters cost $20k/month dedicated GPU instance on Amazon.
Slides with more information on this works and the limitations the speaker found.
How to Make Your Automation a Better Team Player (Laura Nolan)
This felt particularly relevant to my work. The talk was really engaging because the speaker always links every issue back to famous incidents or incidents she personally experienced. The slides also include these examples.
"Joint Cognitive Systems" where computers and humans collaborate on cognitive work.
Paper recommendation: Ironies of Automation by Lisanne Bainbridge. 40yo paper and these problems still haven't been solved!
The problem described is like this:
- The easy stuff is automated first
- New people only ever learn the hard stuff, and develop a partial mental model of how things work
- Life becomes harder for operators because they need to learn both how things work + how the automation works.
We are doing a bad job if the operator can shoot themselves in the foot with our tools.
Two anti-patterns to avoid!
1. Automation surprises
Automation through using a lot of scripts, tools, cronjobs is hard to manage. Centralising into a self-running/autonomous automation helps, where possible.
The behaviour should be as simple and as predictable as possible.
Unattended upgrades are a bad idea, silent and deadly.
Incident examples (although there are many! Linked in the slides, all informative):
- A routine GitOps operation (link now defunct :() that caused a namespace to be deleted when ArgoCD system lost visibility on the yaml, bringing down the entire site.
- Unattended upgrade disaster at DataDog
2. Recommendation systems (e.g. to assist in troubleshooting.)
They make it harder to generate alternative hypothesis. Using softer language may help, like "in the past, problems like this were caused by X."
Tools as amplifiers: help with understanding systems, as opposed to providing suggestions. They shouldn't hide complexity. Help operators to build a mental model of the system.
The slides include links to more advice and lots of incident examples.
And that's it, because the talks were too interesting and I'm apparently unable to be concise when taking notes. The conference is really excellent at inviting deeply knowledgeable, deeply interesting speakers, and I'm looking forward to the next one. Thankfully I won't have to wait long ;)