Matt at Keyboard Writes Code

Moving Blog to BashBlog

May 04, 2021 — Matt Forrester

I have Blog files stored in Markdown which I pre-process using remark and some custom plugins.

They all originally were from Jekyll and then Pelican but I've become disatisfied with them so I wanted to move to BashBlog because I want something simple where the files are just pure markdown.

There are two jobs in this tasks

Job 1: Change Front Matter into a H1 and `Tags:`

The one off...

Jekyll has content stored as Markdown with Front Matter above which looks like the following:

---
title:        Code In PostgreSQL: Combining data from multiple tables with INNER JOIN
date:         2019-03-04 21:00:20
tags:         code-in-postgresql, javascript, postgresql
category      postgresql
---

Welcome to my post, the H1 will automatically be written by the title in my front matter.

## This is a H2

I need to convert this into

# This is my real title

And I have content

## And other headers

Tags: but, i-have, other-tags-too

This conversion is relatively for a one off via an AWK script

BEGIN {
    TAGS=""
    CATEGORY=""
    PRINT=0
    TITLE=""
}
match($0, /^tags: +(.*)/, match_arr) {
    TAGS=match_arr[1]
    $0=""
}
match($0, /^category: +(.*)/, match_arr) {
    CATEGORY=match_arr[1]
    $0=""
}
match($0, /^title: +(.*)/, match_arr) {
    TITLE=match_arr[1]
}
PRINT==1 { print $0 }
TITLE && match($0, /^---/) {
    print "#", TITLE
    print ""
    PRINT=1
}
END {
    if (TAGS && CATEGORY) {
        printf "Tags: %s, %s", TAGS, CATEGORY
        exit 0
    }
    if (TAGS) {
        printf "Tags: %s", TAGS
        exit 0
    }
    if (CATEGORY) {
        printf "Tags: %s", CATEGORY
        exit 0
    }
}

The basic idea is that

It pulls out the tags:, category: and title:
If the title is defined and you hit a line that begins with ---
Print out the H1 tag (title)
If the title has been printed out, print out the current line
Finally print out the category: and tags: as tags.

You can run this with cat your_jekyll.md | awk -f convert_post.awk and it will print the BashBlog output to STDOUT.

Every file

To do this for every file is relatively simple with GNU Parallel

find 2*.md | parallel mv {} {}.x
find 2*.md.x | parallel cat {} '|' awk -f convert_post.awk '>' {.}
rm *.x

Job 2: File names with multiple periods (.)

function multi_dot() { FILENAME="$1" FILENAME_NO_EXT="$(echo "$FILENAME" | sed 's/.md$//')" NEW_FILENAME="$(echo "$FILENAME_NO_EXT" | sed 's/./_/g').md" mv "$FILENAME" "$NEW_FILENAME" } export -f multi_dot

find ..md | parallel multi_dot

Job 3: Correct Dates

Given I have files like 2019-07-24-ndjson-env.md which encode the date it shouldn't be too hard, but BashBlog wants to store the date within a comment of the HTML file that looks like .

If I want to import my data I can copy the markdown file into the working directory and run bb.sh edit MARKDOWN_FILE.md which will open up my $EDITOR and, upon save, generate a HTML file with the timestamp of $(date).

To do a mass import I will need to do some thinking:

Generating the HTML for blog post

if I run bb.sh edit 2019-07-24-ndjson-env.md and there is no 2019-07-24-ndjson-env.html BashBlog will exit with the status code 255 and the following output:

$ ./bb.sh edit 2019-07-24-ndjson-env.md
Can't edit post 2019-07-24-ndjson-env.html, did you mean to use "bb.sh.sh post <draft_file>"?

This is easily solved by a touch 2019-07-24-ndjson-env and then re-running:

$ touch  2019-07-24-ndjson-env.html
$ bb.sh edit 2019-07-24-ndjson-env.md
MARKDOWN: /home/fozz/Projects/plain-text-blog/markdown
Posted 2019-07-24-ndjson-env.html
Rebuilding tag pages .
Rebuilding the index ........
Creating an index page with all the posts ........
Creating an index page with all the tags ...
Making RSS ........

Which is a good process, but I don't want to re-open my $EDITOR for every post...

The fake `$EDITOR`

If BashBlog always uses $EDITOR to allow you to edit before a post, why not set $EDITOR to something which will take the parameter of a file, will return a exit code of 0, but not be interactive... something like echo.

So now our bb.sh edit ... command now looks like env EDITOR=echo bb.sh edit 2019-07-24-ndjson.env.md

The Date

Having done all this BashBlog will still think the date of the post is now, because it has no logic to read the date from the filename, only from the bashblog_timestamp comment within the HTML file (falling back to $(date));

The following command will use sed to replace the date:

sed 's/\(<!-- bashblog_timestamp: #\).*\(# -->\)/\1YOUDATE\2/' -i 2019-07-24-ndjson-env.html

One Blog Post

To restore just one Blog post we would need to run the following commands:

DATE="$(echo "2019-07-24-ndjson-env.md" | cut -d '-' -f 1,2,3 | sed 's/\-//g')0830.01"
touch "2019-07-24-ndjson-env.html"
env EDITOR=echo bb.sh edit "2019-07-24-ndjson-env.md"
sed 's/\(<!-- bashblog_timestamp: #\).*\(# -->\)/\1'"$DATE"'\2/' -i "2019-07-24-ndjson-env.html"

But after this the post will display the current date within the file and if you bb.sh edit 2019-07-24-ndjson-env.md it will reset the date still, because for bb.sh edit the date of the file still takes precedence.

The clue to fixing this is in the changelog, it says:

2.7 Store post date on a comment in the html file (#96). On rebuild, the post date will be synchronised between comment date and file date, with precedence for comment date.

So all we need to do is run bb.sh rebuild and it changes the date of the file to the bashblog_timestamp and upates the text of the HTML to the correct date.

Many posts

This is all great. But I need to update a lot of blog posts so I'll turn to my old friend GNU parallel:

First putting the above into script called publish-dated-md:

#!/bin/bash

set -euo pipefail
IFS=$'\n\t'

echo "= BEGIN ================="
FILENAME="$1"
echo "PROCESSING FILE: $FILENAME"
FILENAME_HTML="$(echo "$FILENAME" | sed 's/md$/html/')"
DATE="$(echo "$FILENAME" | cut -d '-' -f 1,2,3 | sed 's/\-//g')0830.01"
touch "$FILENAME_HTML"
echo "== BB START =="
env EDITOR=echo bb.sh edit "$FILENAME"
echo "== BB END ===="
sed 's/\(<!-- bashblog_timestamp: #\).*\(# -->\)/\1'"$DATE"'\2/' -i "$FILENAME_HTML"
echo "= END ==================="

And then running it on all the files:

find *.md | grep '^[0-9]\{4\}' | parallel -j1 --halt now,fail=1 publish-dated-md {}
bb.sh rebuild

Which pretty much does the same as in the "One Blog Post" header, but using GNU Parallel to process all the files and then runs a bb.sh rebuild at the end to clean it all up.

Tags: blogging, bashblog, bash, gnu-parallel

View more posts — All tags — Subscribe via RSS