The Summer of Japanese Puppets, Part 4

#d3js   #lunr   #jekyll

This post is part 3 of 4 in a series. Feel free to skip around to:

part 1: the task,
part 2: data transformation, or
part 3: the site.


epilogue




The demo!

The mostly finished demo has directories of plays, productions, performances, authors, performers, characters, kashira, scenes, image tags, slide images, image albums, and realia images, with individual layouts displaying and linking object data together.

It is navigable through the above directory listings, through several dynamic search boxes running client-side Lunrjs, and via clickable D3js data visualizations. It handles relative/massive image sets by implementing lazy load in a jQuery carousel.

To learn more about implementing D3 visualizations in Jekyll, you can check out this post. And a post on building multi-language Lunr indexes in Jekyll should be coming soon!




tl;dr.

   Started with a Cake PHP site powered by a relational MYSQL database.

   MySQL dump to CSVs.

   Imported CSVs into IPython as Pandas dataframes.

   Merged relational data (from CSV jointables) onto dataframes by type.

   Exported dataframes as JSON records (and CSVs, for archival purposes only).

   Dropped null key:value pairs from JSON using JQ.

   Converted (non-nulls) JSON to YAML using Pyyaml.

   Generated Jekyll collections (and pages) from YAML using pagemaster jekyll plugin.

   Ended with a ~40k page static Jekyll site powered by YAML data, with JSON index for client-side search.


last thoughts/notes

   Jekyll is shockingly powerful, but WOW is it slow when building a site at this scale.

   Having your data set at-the-ready in clean JSON is great for the long term, and leaves plenty of room for others to play with/visualize it.

   Processing data in iPy with Pandas is super easy, and the notebooks can double as documentation for what you’ve done.

   Apparently you can just throw ~40k pages at GitHub pages without a hitch…?