MySQL database and OpenRefine workflow

MySQL database and OpenRefine workflow

Description: In June 2024, I took Harvey Quamen and Jon Bath’s “Databases for Humanists” course at the Digital Humanities Summer Institute . During the week, we covered topics such as database design, MySQL, basic to advanced SQL queries, and (briefly) how to connect to databases with Python or using other methods. While I had some familiarity with these topics already, the course provided me with the opportunity to design and build my own MySQL database from the ground up, which I did using my open access dataset from the Canadian Modernist Magazines Project . (You can read about the CMMP in one of my other posts ).

On the final day, I even had the opportunity to demo one of my discoveries: how OpenRefine, a tool I’ve been using for some of my other work at the University of Victoria, can be used to connect to MySQL database tables, perform advanced data transformations with relative ease, or (my favourite part) take advantage of OpenRefine’s powerful but seemingly little-known “templating” feature to export my tabular dataset in the form of customized SQL INSERT statements. For example:

    /* Prefix */
INSERT INTO text (id,title,contribution_id,issue_id,pagestart,pageend) VALUES 

/* Row template*/
{{"(" + if(cells["text_id"].value==null,"",cells["text_id"].value) + if(cells["SQLsafetitle"].value==null,"",",'" + cells["SQLsafetitle"].value + "'") + if(cells["text_id"].value==null,"","," + cells["text_id"].value) + if(cells["issue_id"].value==null,"","," + cells["issue_id"].value) + if(cells["Page Range: Start"].value==null,"",",'" + cells["Page Range: Start"].value + "'") + if(cells["Page Range: End"].value==null,"",",'" + cells["Page Range: End"].value + "'") + ")"}}

/* Separator */
,

Working from a single CSV file, I was able to use similar statements for each table to create an SQL file. I then pasted the resulting text from these files into the MySQL Command Line Client, one at a time, to create and populate each of my 6 MySQL tables. The largest of these tables—corresponding to unique texts digitized and published as part of the Canadian Modernist Magazines Project, and pictured below—contained 720 rows. So I was very grateful to use this method as an alternative to manual data entry.

Deliverables: Using the model for my database’s design (above) as a guide, plus the templates I used in OpenRefine, I produced a complete MySQL database of the CMMP’s dataset. I can now query this dataset in complex ways. In the next phase of the Canadian Modernist Magazines Project, I’d like to add new data to the project’s database, including (for example) gender and age info from VIAF records, to do further and more in-depth analyses of authorship.

screenshot #2

Related Posts

Canadian Humanities and Social Sciences Commons

Canadian Humanities and Social Sciences Commons

The Canadian HSS Commons is in-development, national-scale, bilingual network for Canadian Humanities and Social Sciences researchers to share, access, re-purpose, and develop scholarly projects, publications, educational resources, data, and tools.

Read More
Canadian Modernist Magazines Project

Canadian Modernist Magazines Project

The Canadian Modernist Magazines Project (CMMP) is a digital repository of modernist periodical literature published in Canada (ca.

Read More
Media Manifesto

Media Manifesto

Media Manifesto is a web design and development company. Originally founded in Alberta in 2001 by Graham Jensen, it was sold in 2011 to an investor in Saskatoon, Saskatchewan.

Read More