Rhetmap.org: Composing Data for Future Re-Use and Visualization

Contributors: Chris Lindgren and Jim Ridolfo (alphabetic authorship)
Affiliation: Virginia Tech and University of Kentucky
Email: lindgren at vt.edu, jimridolfo@uky.edu
Released: 25 June 2020
Published: Fall 2020 (Issue 25.1)

Introduction

In this webtext, we reflect on our experiences and collaboration through rhetmap.org’s open job-market data as a tutorial-driven example of how scholars may collaborate around open data projects. (Skip to the fifth section, A Tutorial: Basics of Reading Data and JavaScript for Processing Data, below to begin viewing the instructional tutorials!) In the second and third sections, Ridolfo reflects on the process of making six years of rhetmap.org field data open and available for the discipline. He highlights how the data and its openness have invited numerous applications (Omizo, 2014; Beveridge, 2015), including the exploration of gender and class (Goodwin, 2014). He also emphasizes how this real-time comparative access to field data opens for job seekers, search committees, and graduate programs a new window into how the market functions across time. In the fourth section, we provide readers a heuristic tool that asks its users to reflect on their questions, goals, and audiences in relationship to their desire to visualize data. In the fifth section, we provide four tutorial videos, which review the following topics: 1) An introduction to the tutorial series, 2) How to read common data formats and structures, 3) How to read JavaScript code, and 4) A case of processing the rhetmap data for visualization with JavaScript. Lindgren grounds each topic through his experience with Ridolfo's rhetmap.org data, so users can acclimate to themselves to the project in tandem with their learning of data and coding fundamentals.

Overall, we aim to help others see the dynamic nature of archiving, coding, and visualizing data. We posit that such activities with aggregate information provide a teaching tool for discussing the job market with upper-level English or rhetoric and writing studies majors, and rhetoric and composition graduate programs (cf. Omizo & Hart-Davidson, 2016). Additionally, by discussing our distributed collaboration, we show how the creation and distribution of open field data may be leveraged to facilitate the potential for future collaborations (cf. DeVoss & Webb, 2008; Ridolfo, 2015; Rife, 2006). In doing so, we seek to demonstrate how the collection and publication of open field data informs people in real-time about market realities, and it has the potential to extend our field’s research and teaching skills with a rich archive of data.

An Overview of Rhetmap

Rhetmap started on May 30, 2012 as a project to update the Consortium of Doctoral Programs in Rhetoric and Composition’s (CDPRC) list of members, as well as to crowdsource, geocode, and make an interactive, clickable map of rhetoric and composition doctoral programs and links to their program websites. This work built on Louise Phelps and John Ackerman’s (2010) College Composition and Communication piece “Making the Case for Disciplinarity in Rhetoric, Composition, and Writing Studies: The Visibility Project,” specifically Ackerman’s map of doctoral programs (p. 194). In contrast to the 2010 map, however, that is based on survey data from 51 out of 70 CDPRC members, the 2012 map utilized crowd-sourced data from Twitter to identify and add programs, some of which did not have CDPRC membership at the time. As of 7/5/2018, rhetmap.org/doctoral lists 92 programs with some form of doctoral concentration in rhetoric and composition studies.

In September 2012, Collin Brooke published the blog post “Migrating the MLA JIL from List to Service,” in which he advocated that the Modern Language Association’s Job Information List (MLA JIL) “update its horrific interface, which hasn’t changed substantially since the days of its first appearance.” At the time, not only was the MLA JIL’s interface terrible, but one had to pay to access the job listings. Institutions were paying exorbitant fees to list job advertisements with MLA based on a print model of publication that no longer existed. After collecting money from institutions, MLA charged job seekers to access job ads via a cumbersome and proprietary interface that according to David Parry was “holding information hostage” (as cited in Mangan, 2012).

In response to Brooke’s post, that same month Jim Ridolfo began to map the Rhetoric and Composition category of the 2012-2013 MLA JIL for the full season. The season is typically the week of the 13th of Setember to the first week of July. While there’s a plethora of job listing websites (Inside Higher Ed, Chronicle, R/C Job Wiki, MLA JIL), they each offer a relatively similar data view of job listings. Rhetmap started with the idea of presenting job market data in a different manner. In 2013-2014, rhetmap provided an open data mirror to the JIL, adding the JIL’s Technical and Business Writing (TBW) category to the listing. Beginning in the 2014-2015 year, rhetmap.org began to accept direct listings from universities whose jobs are not listed on the JIL. For the last seven seasons and counting, the cumulative data on rhetmap.org provides a window into weekly and yearly market patterns. Around this same time, scholars such as Caroline Dadas (2013) and Carie Leverenz (2015) published and presented new research on the rhetoric and composition job market. Joyce Locke Carter (2016) recognized rhetmap’s impact in her chair’s address at the Conference on College Composition and Communication’s as one example of “making, disrupting, innovating" (p. 401). In his book Network Sense: Methods for Visualizing a Discipline, Derek Mueller (2017) wrote that rhetmap provides not just a“real-time report on the hiring climate in any particular year, and they accumulate to form an archive of employment activity useful for gauging not only the geographic distribution of positions, but also the temporal circulation within any year (i.e., the rate of postings to comparable dates in past years) and across the set” (p. 140). Rhetmap’s weekly account for job market listings provides a necessary correction to the September to October media hype surrounding the job market during the first six weeks of the market. For example, the September 16, 2014, ChronicleVitae story “Are More MLA Faculty Jobs on the Way?” focused specifically on first week (September 13) market listings. The article drew on applicants’ assumptions and fears about the total market picture for the year based on week one listings, particularly for literature. As Sano-Franchini (2016) argued in her College Composition and Communication article “'It’s Like Writing Yourself into a Codependent Relationship with Someone Who Doesn’t Even Want You!' Emotional Labor, Intimacy, and the Academic Job Market in Rhetoric and Composition,” this kind of academic job market rhetoric leads applicants to “a rhetoric of emotional crisis” (p. 99). One of the most pitched moments for panic news stories about the job market is the first six weeks of the job market. Prior to 2016 there wasn’t a resource for viewing job listing data on a week-by-week basis.

By collecting and examining rhetoric and composition job listings for week six and seventeen over the last six seasons, we argue that some reassuring trends emerge. First, week six of the season (October 18) is a more important measure of listings than week one. In the market comparison below (Fig. 1), the blue line indicates that by week six of the market, a more balanced comparative picture of early season listings is visible across the last six seasons.

Figure 1: Screenshot of the "Market Comparison" data, which Jim Ridolfo collects each year. The screenshot depicts a speedsheet of seventeen weeks of job data at rhetmap.org. The row "Week 6," for example, is highlighted in blue and shows that 134 jobs were posted in 134 jobs were posted by October 18, 2012, wheras 2018 saw 99.

Second, by week 17 (January 1) of the market, we argue that it’s possible to expect that between 70.71% and 73.74% of rhetmap's listings for the total job market year have been posted. What these two points indicate are two important pieces of advice for candidates. First, we advise caution when faced with the initial numbers of week one listings: think of week one as extending all the way until mid-October as search committees and HR receive authorization to list their fall ads. Second, even at January 1, there’s roughly between 26% and 29% of season listings that will continue to be posted until July 8 or the last week of the market. Proportionally, those post-week 17 listings will have fewer and fewer candidates applying. This information may help candidates strategize their searches, particularly when thinking about a seasonal approach to applying for listings. Based on these trends, we advise candidates looking for employment to continue applying well into the spring.

Limitations of Rhetmap Data

Rhetmap’s job listings first began as a real-time geographic visualization and mirror of the MLA JIL’s Composition & Rhetoric (C/R) listings (2012-2013), then both the C/R and Technical and Business Writing (TBW) listings (2013-2014). It did not begin as a long-term data collection project. In 2014-2015, rhetmap began accepting listings, for a donation to a field fund, that are not on the MLA JIL. By 2018-2019, direct listings on rhetmap.org now make up approximately 11% of advertisements. There are thus a few limitations to this dataset over time. First, different silos are measured (with some overlap) on the MLA JIL between 2012-2014. While some 2012-2013 TBW jobs are cross-listed in the C/R category, not all were. As rhetmap.org began to accept job advertisement submissions in 2014-2015, the project no longer became a direct mirror of the MLA JIL. That said, neither the combined MLA JIL or rhetmap listings are a complete snapshot of the total market. While rhetmap.org maintains a low barrier to job submissions (the small field fund donation on the honor system), some jobs will only be listed on sites such as Inside Higher Ed, Chronicle, local HR websites, or field listservs. The comparative usefulness of the data, long term, should be viewed with those concerns in mind.

Growing Rhetmap Over Time: Data Management and Open Data Access

Rhetmap’s data management strategy is designed with three principles in mind. First, hosting redundancy. The data is available in two different formats: open Google spreadsheet files and geocoded maps. The open Google spreadsheets are hosted separately from the maps, which are geocoded using the BatchGeo service. This provides redundancy in regards to how job seekers access data. These two resources are embedded on the main rhetmap.org website, hosted on server slice in Atlanta, GA. This strategy produces some basic redundancies. If the hosting provider has a local problem, the map and the spreadsheet are still available. Likewise, if the map is down, the spreadsheet is still available and the data may be quickly re-geocoded via Google Fusion or a similar service. If Google Sheets is down, the map data also provides a data view in list format. If need be, offline data backups also exist. All of these backups are designed to avoid a situation similar to what happened to MLA in October/November of 2012, when Hurricane Sandy rendered MLA’s servers inaccessible for a week during peak job season.

Second, user experience and developer interoperability. While the primary objective of rhetmap is to visualize doctoral and job market data via geocoded maps, the open spreadsheets allow direct access to the current raw data. At any point in the market year, job seekers may copy and paste job the entire spreadsheet or specific data columns, and applications such as Chris Lindgren’s job market data comparison are similarly able to access and visualize job market updates in real-time. This philosophy also extends to other rhetmap.org projects such as the complete optical character recognized (OCR) MLA JIL from 1965-2012, which is available for full download from rhetmap.org. Scholars such as Ryan Omizo and Jonathan Goodwin have used the open-access resource to analyze the data.

Third, simple data format. The data structure needs to be as simple as possible in order to make maintaining rhetmap fit within specific work and time constraints. For rhetmap.org, Ridolfo spends about twenty minutes a week curating the data. For the job market data, the basic format is six columns: college/university, position, geocode information, external link to ad, date added, and notes. Rhetmap.org does not host any job ads, meaning that the only curatorial requirement is to manage these six columns each week. When rhetmap first started, college/university and geocode were combined into one column. This, however, proved impractical for automated geocoding, especially across multiple platforms such as Batchgeo and Google Fusion, and these data fields were separated into their own columns. While it’s always possible to collect more data, part of the strategy that has allowed the project to persist for seven years is to not collect too much data. Instead, the basic data structure of rhetmap focuses primarily on those four data points.

A Heuristic for Finding Stories with Data Sets

As the above sections demonstrate, we can tell new stories about the job market based on our provisional networked senses of it. We offer such stories up as narratives that can help others organize their time and energy before and during their job market year(s), but also thereafter if and when they serve as mentors to other job-seekers or advise faculty on the management, creation, or shrinking of doctoral programs in our field.

More broadly, "finding stories in data" has become a social practice across data-driven domains. Numerous professional communities describe common experiences of data work as a narrative process of storyfinding and telling. Data journalism has quickly emerged as a field of practice, wherein coding and dataset work involves activities to “interrogate” the data sets (Wiggins as cited in Abelson et al., 2015) or create sets of data of their own (Boyer as cited in Royal & Blasingame, 2014). Data science, as a broader and emerging practice, shares similar conceptions of data analysis and visualization. For example, Perez (2014), creator of the IPython Interactive Notebook, emphasized how blending narrative and computational data work was central to the development of this important programming environment for people who work with data sets. In the public domain, there is even a podcast devoted to such a subject: Data Stories (Bertini & Stefaner, 2017). Overall, each of these example domains surrounding data-work describes the importance of finding stories within the data. Yet, how does one begin to learn and perform such a process?

In what follows, we first describe some fundamental visualization terminology. From there, we provide a flexible heuristic as a generalizable guide to apply to any particular set of data that you desire to understand, visualize, and use to find new stories to tell. The heuristic makes explicit some sense-making strategies to help its users develop descriptive statistical projects; that is, the practice of reducing a large set of data to understand it better. We recommend users of the heuristic familiarize themselves with some basic types of visualization before using it. At the moment, we recommend Andy Kirk’s (2016) Data Visualization: A Handbook for Data Driven Design. Kirk thoroughly described 5 main types of charts: Categorical, Hierarchical, Relational, Temporal, and Spatial — or CHRTS for short. We do not have the space to define each type and their nuances here, but here are Kirk’s basic definitions for each type of chart (p. 158), which we suggest you learn more about in your own time:

Table 1: Chart Types

Chart-Type	Definition
Categorical	Compares quantities of categories and their distributions.
Hierarchical	Describes part-to-whole relationships and quantities.
Relational	Graphs connections and correlations between variables.
Temporal	Graphs trends and categories over time.
Spatial	Maps spatial patterns with categorical overlays.

Each general type of chart offers you avenues for different types of goals for your analysis and storytelling with data. For example, the most basic type of chart is Categorical, because it merely isolates one type of information and computes its quantity. Categorical charts require effort to define and consider why an audience might care to see the distributions of quantity across properties of a category. The (in)famous Word Cloud is a categorical chart, since it displays the frequency of particular words used within a corpus. Categorical charts also help audiences do comparison work, such as Bar charts or Bubble charts. Relational charts differ from Categorical in that they help people examine and test for correlations and connections between two or more categories. Some Relational charts include the Matrix chart and the Chord diagrams. Rhetmap’s Market Comparison chart is considered a Temporal chart, since it displays a particular category of data (number of job postings) across a definition of time (per week). In effect, what type of chart you choose to develop will help you ask and answer certain types of research questions, so how and where do you begin to use open data?

Below, we offer a simple heuristic that will help you invent potential data visualization goals in relationship to whatever type of data sets that you manage to find or create for yourself. First, the heuristic asks you to make explicit your curiosities and questions that you think the data can help you answer (Questions 1.1 and 1.2). Next, it asks you to consider your audiences and how the visualization may illuminate noteworthy aspects of the data are sufficient for your questions and goals (Questions 2.1 and 2.2). From there, it helps you to consider the ethics of your goals in relationship to the data set (Question 3). In Question 4, the heuristic helps you consider the method by which you are accessing the data, which ties into the production of the data visualization. Finally, Question 5 asks you to consider what data-processing you may need to perform, based on the type of visualization you are planning to code. For this final question, it assumes that you have done some research into potential example charts. (The next section elaborates on this data-processing step.)

Table 2: Heuristic for Data Visualizations

QUESTION	RESPONSE
1. Research questions and aims
1.1 What’s my question or interest in this topic?	How do the number of job postings per week compare across years?
1.2 What type of visualization does this question describe: categorical, hierarchical, relational, temporal, or spatial?	Temporal, since it compares market weeks to the weekly posting count with each year as a variable.
2. What about the data?
2.1 What information might this visualization emphasize for audiences?	For job seekers, they can compare the current year against others. This can help them contextualize the per week posting count. It can also perhaps support decisions about how many jobs the number of submissions to make, based on the trending outlook for the year.
2.2 Can I combine or aggregate any available datapoints to either 1) answer my question, or 2) refine my angle?	I could aggregate the data temporally by per Month, or perhaps by per Major Job Cycle. I could also consider collecting new data to combine with this data. For example, I could track What type of job posting (TC or R/C), or field (Rhetoric of Science, WPA, Generalist, etc.), and ask examine when each major category are posted across time, or What university type (R1-3) and examine when these universities post their jobs.
3. Can I ethically claim that the available data supports the answer to my question? If no, what is missing or needed (see 2.2)? If yes, then see next step.	Yes. These are simple data points tracked over time. However, Ridolfo's data are not perfect inscriptions of all job postings for each year. Accordingly, the visualization should note this limitation on the page.
4. How will I access this data for my visualization: dynamically or statically?	Dynamically with Google’s URL in the JSON format with permission from Ridolfo.
5. Will I need to process the data? If so, consider how.	The JSON format of the data excludes the year variables, and it bundles the weekly posting counts in a per Week fashion. After reviewing the example MultiLine chart, I will need to bundle these weekly counts in per Year lists instead.

By using this heuristic, you can make explicit your visualization goals and what data will help you fulfill those goals.

In the next section, we provide you with an often overlooked step in data-visualization activity: data processing. In so doing, you can take your first steps toward learning how to read data and the basics of JavaScript, so you can take one step closer toward visualizing data.

A Tutorial: Basics of Reading Data and JavaScript for Processing Data

Do you know how to read a variety of data formats, such as CSV (Comma-Separated Values), JSON (JavaScript Object Notation), or matrices? Do you know how to read JavaScript (JS)? And, do you know what the D3.js code library is? Most data visualization tutorials in JS will assume that you do, and these fundamentals are indeed important to understand before visualizing data dynamically with the JS programming language. However, we aim to lower the floor to visualizing information with JavaScript. In the following four videos, we review 1) the basics about how to read the aforementioned common data formats and structures, 2) the basics of reading JS code, and 3) a specific part of Lindgren's JS code that performs an often overlooked part of visualization: data processing.

Data processing is the practice of revising the structure and format of the data in service of the analysis and visualization goals. Despite it being an essential part of the visualization process, not many tutorials exist to help novices understand what it is, what is typically involved, and how to start learning how to do it. In the four videos below, we provide this “missing” tutorial, so you can learn fundamental knowledge about how to read data and see how its textual structures are linked to the visualization goals. We ground this tutorial in Lindgren’s uptake of Ridolfo’s market comparison data, so it also provides a very basic introduction to reading the JavaScript programming language. Again, we contend that this broader set of knowledge about data and its processing will make the latest forms of interactive charts that use JavaScript and the D3.js code library less intimidating to learn.

Video 1: Introduction

Transcript

Video 2: Understanding data sets as texts

Transcript

Video 3: The basics of reading javascript

Transcript

Video 4: Reading rhetmap's data-processing code

Transcript

For resources used in the tutorial, please refer to Lindgren's site for the "Market Comparison" data visualization.

Conclusion

This PraxisWiki webtext demonstrates how a limited amount of field data collected each week, over several years’ time, and made available in a format that’s accessible to developers, may lead to the development of third-party resources. We provide a rich technical description of Lindgren’s uptake of Ridolfo’s data to inspire others to consider what other network senses could be created the discipline. In 2019, Tim Lockridge released http://rhetorlist.net, a resource that tracks “new book releases (scholarly monographs and edited collections) in writing research.” Based on the premise of tracking one specific resource through a similar JSON backend, rhetorlist makes it possible for someone else to consider, through a large curated and sortable list of book data, what books have and have not been written in rhetorical studies. Such work has the potential as an open data resource to support the composing of monograph prospecti for presses, or help students formulate a literature review around an emerging field conversation.

Moving forward, we think that the first premise of new field resources begins with open data management. A major reason for rhetmap coming into existence seven years ago was the inability of the MLA to release open job market data in a manner that was accessible and useful to job seekers. Still, though, as a general practice, large field organizations such as MLA, National Council of Teachers of English, and College Composition and Communication do not yet release open CSV datasets for the field to use as platforms for applications. For more data and visualization collaborations like ours to happen, we argue that this needs to be more of a priority for these organizations, so we can yield more data-driven mentoring and advice about the job market and other aspects of the profession.

References

Abelson, Brian, Keefe, John, Wei, Sisi, & Wiggins, Chris. (2015). How data is changing media companies. Panel at The Daily News Innovation Lab, New York, NY. Retrieved May 23, 2016, from https://www.youtube.com/watch?v=1eOVN21je4k

Bertini, Enrico, & Stefaner, Moritz. (2017). Data Stories. Retrieved March 5, 2017, from http://datastori.es/

Beveridge, Aaron. (2015). MLA JIL writing studies analysis. http://aaronbeveridge.com (portfolio website). Retrieved April 15, 2019, from http://aaronbeveridge.com/mla-data/

Brooke, Collin. (2012, September 19). Migrating the MLA JIL from list to service. Collin Gifford Brooke. Retrieved from http://www.cgbrooke.net/2012/09/19/migrating-the-mla-jil-from-list-to-service/

Carter, Joyce Locke. (2016). 2016 CCCC Chair's Address: Making, disrupting, innovating. College Composition and Communication, 68(2), 378–408.

Dadas, Caroline. (2013). Reaching the profession: The locations of the rhetoric and composition job market. College Composition and Communication, 65(1), 67–89.

DeVoss, Dànielle Nicole, & Webb, Suzanne. (2008). Media convergence: Grand Theft Audio: Negotiating copyright as composers. Computers & Composition, 25(1), 79–103.

Goodwin, Jonahan. (2014, January 28). Some notes on the MLA job information list. Retrieved October 23, 2018, from https://jgoodwin.net/blog/mla-job-information-list/

Hart-Davidson, William. (2012, November 02). MT @ridolfoj @GracieG: Am I correct that Sandy took down the servers holding MLA's Job Information list? Retrieved October 23, 2018, from https://twitter.com/billhd/status/264453932363952128

Kirk, Andy. (2016). Data visualization: A handbook for data driven design. SAGE Publications.

Leverenz, Carrie. (2015, March 19). Telling it like it is—But how is it? The job market in Rhetoric and Composition. Conference on College Composition and Communication. Tampa, FL.

Mangan, Katherine. (2012, September 24). Faculty group leaks MLA jobs list in dispute over free access. The Chronicle of Higher Education. Retrieved July 28, 2020, from https://www.chronicle.com/blogs/wiredcampus/faculty-group-leaks-mla-jobs-list-in-dispute-over-free-access

Mueller, Derek. (2017). Network sense: Methods for visualizing a discipline. The WAC Clearinghouse and University Press of Colorado. Retrieved from https://wac.colostate.edu/books/writing/network/

Omizo, Ryan. (2014, March 9). Mac automator tutorial: Extract text from PDFs. Retrieved October 23, 2018, from http://ryan-omizo.com/2014/03/09/tutorial-mac-automator/

Omizo, Ryan, & Hart-Davidson, William. (2016). Hedge-o-Matic. Enculturation, 7. Retrieved October 25, 2018, from http://hedgeomatic.cal.msu.edu/hedgeomatic/

Perez, Fernando. (2014). Ipython: From interactive computing to computational narratives. Presented at the CSE Symposium on Weathering the Data Storm in January 2014. Retrieved September 23, 2016, from https://www.youtube.com/watch?v=V-EDqvscVxk

Phelps, Louise W., & Ackerman, John M. (2010). Making the case for disciplinarity in rhetoric, composition, and writing studies: The visibility project. College Composition and Communication, (62)1, 180–215.

Ridolfo, Jim. (2012, May 30). Rhet-comp people: Here's a geocoded map of all doctoral programs (+2, UC & UK) on the Rhetoric Consortium's site: http://bit.ly/JUM2V5. Retrieved from https://twitter.com/ridolfoj/status/207816716275814400

Ridolfo, Jim. (2015). Digital Samaritans: Rhetorical delivery and engagement in the digital humanities. University of Michigan Press. http://dx.doi.org/10.3998/drc.13406713.0001.001

Rife, Martine. (2006). Why Kairos matters to writing: A reflection on its intellectual property conversation and developing law during the last ten years. Kairos: A Journal of Rhetoric, Technology, and Pedagogy, 11(1). Retrieved March 11, 2019, from http://kairos.technorhetoric.net/11.1/binder.html?topoi/rife/index.html

Royal, Cindy, & Blasingame, Dale. (2014). Data journalism explication - Product. Presented at the International Symposium on Online Journalism in April 2015. Retrieved September 23, 2016, from https://youtu.be/Qi-iXJV9iow?t=58

Sano-Franchini, Jennifer. (2016). “It’s like writing yourself into a codependent relationship with someone who doesn’t even want you!” Emotional labor, intimacy, and the academic job market in rhetoric and composition. College Composition and Communication, 68(1), 98–124.

The Consortium of Doctoral Programs in Rhetoric and Composition. (n.d.). Retrieved October 23, 2018, from https://ccccdoctoralconsortium.org/

Wood, Mauren, & Read, Brock. (2014, September 16). Are more MLA faculty jobs on the way? The Chronicle of Higher Education. Retrieved October 23, 2018, from https://chroniclevitae.com/news/708-are-more-mla-faculty-jobs-on-the-way