MySQL database and OpenRefine workflow

MySQL database and OpenRefine workflow

Description: In June 2024, I took Harvey Quamen and Jon Bath’s “Databases for Humanists” course at the Digital Humanities Summer Institute. During the week, we covered topics such as database design, MySQL, basic to advanced SQL queries, and (briefly) how to connect to databases with Python or using other methods. While I had some familiarity with these topics already, the course provided me with the opportunity to design and build my own MySQL database from the ground up, which I did using my open access dataset from the Canadian Modernist Magazines Project. (You can read about the CMMP in one of my other posts).

On the final day, I even had the opportunity to demo one of my discoveries: how OpenRefine, a tool I’ve been using for some of my other work at the University of Victoria, can be used to connect to MySQL database tables, perform advanced data transformations with relative ease, or (my favourite part) take advantage of OpenRefine’s powerful but seemingly little-known “templating” feature to export my tabular dataset in the form of customized SQL INSERT statements. For example:

    /* Prefix */
INSERT INTO text (id,title,contribution_id,issue_id,pagestart,pageend) VALUES 

/* Row template*/
{{"(" + if(cells["text_id"].value==null,"",cells["text_id"].value) + if(cells["SQLsafetitle"].value==null,"",",'" + cells["SQLsafetitle"].value + "'") + if(cells["text_id"].value==null,"","," + cells["text_id"].value) + if(cells["issue_id"].value==null,"","," + cells["issue_id"].value) + if(cells["Page Range: Start"].value==null,"",",'" + cells["Page Range: Start"].value + "'") + if(cells["Page Range: End"].value==null,"",",'" + cells["Page Range: End"].value + "'") + ")"}}

/* Separator */
,

Working from a single CSV file, I was able to use similar statements for each table to create an SQL file. I then pasted the resulting text from these files into the MySQL Command Line Client, one at a time, to create and populate each of my 6 MySQL tables. The largest of these tables—corresponding to unique texts digitized and published as part of the Canadian Modernist Magazines Project, and pictured below—contained 720 rows. So I was very grateful to use this method as an alternative to manual data entry.

Deliverables: Using the model for my database’s design (above) as a guide, plus the templates I used in OpenRefine, I produced a complete MySQL database of the CMMP’s dataset. I can now query this dataset in complex ways. In the next phase of the Canadian Modernist Magazines Project, I’d like to add new data to the project’s database, including (for example) gender and age info from VIAF records, to do further and more in-depth analyses of authorship.

screenshot #2

Related Posts

Mapping Alcohol Consumption in F. Scott Fitzgerald's "May Day": A Digital Critical Edition

Mapping Alcohol Consumption in F. Scott Fitzgerald's "May Day": A Digital Critical Edition

Mapping Alcohol Consumption in F. Scott Fitzgerald’s “May Day”: A Digital Critical Edition will be an annotated edition of the 1920 public-domain text of “May Day,” featuring an integrated map, critical apparatus, and sample lesson plans. TEI mark-up of the text facilitates interactive DH assignments for my “Alcohol in/and Modernist Fiction” course.

Read More
Canadian Modernist Magazines Project

Canadian Modernist Magazines Project

The Canadian Modernist Magazines Project (CMMP) is a digital repository of modernist periodical literature published in Canada (ca. 1900-1960). Led by Principal Investigator Dr. Graham Jensen, the project’s primary aim is to digitize and transcribe a selection of Canadian literary “little magazines” so that they can be read, searched, downloaded, analyzed, and taught. As a repository of Canadian modernist periodicals, the CMMP makes important but difficult-to-access literary texts available to scholars as well as to the broader public. But the CMMP will not simply provide easy access to texts; in its next phase, it will also feature critical introductions of select Canadian modernist magazines, tools for computational analysis of the transcribed texts, sample lesson plans, and relevant syllabi.

Read More
Canadian Humanities and Social Sciences Commons

Canadian Humanities and Social Sciences Commons

The Canadian HSS Commons is in-development, national-scale, bilingual network for Canadian Humanities and Social Sciences researchers to share, access, re-purpose, and develop scholarly projects, publications, educational resources, data, and tools. It is being developed by the Implementing New Knowledge Environments Partnership with the Electronic Textual Cultures Lab, University of Victoria Systems, Compute Canada, CANARIE, the Canadian Research Knowledge Network, the Canadian Association of Research Libraries, the Federation for the Humanities and Social Sciences, the Humanities Commons, the University of Newcastle, and others.

Read More