First we make 0 assumptions. You've used some static site generator like Gatsby/Hugo/11ty/sapper/whatever's hot today. Now you need a sitemap.
We need to start with a file containing all the URLs we care about. To do this we will find all the .html files in our output directory (mine is named public
). We can use find
to do this but fd
is easier to use and explain: where -e
specifies the extension (html
) and --base-directory
changes the output.
fd -e html --base-directory public
with --base-directory
the output doesn't include the path we took to get the the directory we care about, which means we can run this command from anywhere.
why-use-discord-for-open-communities.htmlwtf-is-kubernetes.htmlyarn-workspace-nohoist-an-entire-package-s-dependencies.htmlyarn-workspace-nohoist.htmlyour-first-crdt.html
To get the same output from find
, which is more commonly found on systems, we need to use regex with -name
and then pipe to sed
to remove the directory prefix. Note that we're using ,
as the separator in our sed command here so that we don't have to escape slashes.
find ./public -name '*.html' | sed -e 's,^\./public,,'
Now with the output of find being a list of filepaths, we need to strip .html
off the filepath and prefix our site's domain to each line. I like AWK for this although you could also use more sed
commands and such.
find -e html --base-directory public |awk -F '.' '{print "https://christopherbiscardi.com/"$1}'
What this AWK script is doing is going through each line one by one. -F
is being used to specify .
as a separator because AWK chops up lines similar to a CSV by default (the real default separator is an empty space, but it works the same). This means from a file named post/something.html
you'll get back two values: post/something
and html
.
With the separator taken care of, we then want to format some strings. This is a "one-liner" because we can let AWK handle the defaults and such. We print out our domain ahead of $1
, which is the first "split up" variable: our filepath and name.
We end up with a list like this.
https://christopherbiscardi.com/what-is-dynamo-dbhttps://christopherbiscardi.com/what-s-next-for-react-based-productshttps://christopherbiscardi.com/why-use-discord-for-open-communitieshttps://christopherbiscardi.com/wtf-is-kuberneteshttps://christopherbiscardi.com/yarn-workspace-nohoist-an-entire-package-s-dependencieshttps://christopherbiscardi.com/yarn-workspace-nohoisthttps://christopherbiscardi.com/your-first-crdt
We can then redirect this output to a file or into our clipboard (pbcopy
, xclip
), etc. In this case we use > urls.txt
to output the list of urls into a file named urls.txt
.
find -e html --base-directory public |awk -F'.' '{print "https://christopherbiscardi.com/"$1}' > urls.txt
Given the list of URLs we just generated, we can drop in a basic sitemap using npx
and an npm package called sitemap
. We feed urls.txt
into npx sitemap
and redirect the output to sitemap.xml
.
npx sitemap < urls.txt > sitemap.xml
You'll end up with a sitemap.xml
file that contains something similar to the following.
<?xml version="1.0" encoding="UTF-8"?><urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9"xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"xmlns:xhtml="http://www.w3.org/1999/xhtml"xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"><url><loc>https://christopherbiscardi.com/30x500-notes</loc></url><url><loc>https://christopherbiscardi.com/30x500-safari-1</loc></url><url><loc>https://christopherbiscardi.com/30x500-safari-2-old-todo</loc></url><url><loc>https://christopherbiscardi.com/30x500-safari-2</loc></url><url><loc>https://christopherbiscardi.com/7guis-recoil-js-counter</loc></url><url><loc>https://christopherbiscardi.com/7guis-recoil-js-temperature-converter</loc></url><url><loc>https://christopherbiscardi.com/a-css-in-js-of-my-own</loc></url><url><loc>https://christopherbiscardi.com/a-modern-copy-button</loc></url><url><loc>https://christopherbiscardi.com/adjacency-lists-in-dynamodb</loc></url><url><loc>https://christopherbiscardi.com/amplify-and-appsync</loc></url><url><loc>https://christopherbiscardi.com/authoring-stylis-plugins</loc></url><url><loc>https://christopherbiscardi.com/aws-app-sync-without-amplify</loc></url><url><loc>https://christopherbiscardi.com/build-time-code-blocks-with-rehype-prism-and-mdx</loc></url><url><loc>https://christopherbiscardi.com/building-an-mdx-preview-app-with-electron</loc></url></urlset>