Blackboard CMS for Large-scale Data Collection

Contributor: Thomas Peele
Afilliation: City College of New York
Email: tpeele@ccny.cuny.edu
Date Published: 26 April 2016


In the fall semester of 2015, I asked twenty-nine instructors teaching forty sections of the first semester of a two-semester first-year writing sequence to use the Assignment tool in the Blackboard (Bb) Course Management System (CMS) to collect first- and second-drafts of essays written by students in response to four separate assignments. The project allowed me to collect a corpus of between four- and five-thousand first- and second-drafts of student essays which will over the next few years be used by me and other faculty on our campus to study, on a large scale, various aspects of students' writing. These aspects include revision (see Holcomb and Buell), n-gram and personal pronoun use and distribution (see Aull, especially chapters three and four), and other questions with regard to syntax and collocations. While these aspects of this project will be discussed briefly in this text, most of this work lies in the future. Here, I focus on the impact on faculty development of using the Bb CMS for data collection. 

When I launched the corpus project, I did not realize that the collection process would have such a positive impact on the digital culture of the writing program. By paying instructors to use the Bb platform to collect the data (a requirement of the collection process, since Bb automatically adds a file identifier that would enable us to compare all of an individual student's work), we included them as participants in the research process. As researchers, they experienced for the first time in recent memory a planned, systematic collection of student writing that would be used for purposes other than the assessment of an individual student’s efforts. As researchers, they had not only to learn but also to teach the collection methods. One immediate result of this practice was that everyone, instructors and students, left their first semester composition class with a thorough understanding of one of Bb's more robust features. While some instructors appended this process as a more-or-less bureaucratic requirement that would have no impact on how they collected essays for their own use (i.e., some instructors required students to submit essays through Bb to fulfill the requirements of the research project and also to submit paper copies, which the instructors would grade), most used this as their sole collection method.  

While I do not know whether or not the collection process has had a long-term impact on instructors' use of Bb or other digital platforms, my hope is that this experience demonstrated the links between their work as classroom instructors and the work that we had undertaken as researchers whose focus was not individual students in discrete sections of composition but all of the students in all of the sections. This view of themselves might be enhanced as the essays are analyzed and the results become known; the computer analysis of their students’ essays will provide them with information that they will not be able to use to help the students who wrote the essays but instead for future students. Including a large cohort of individual instructors in the research process by employing a commonly available tool such as the Bb CMS thus has the potential to have an impact not only instructors’ pedagogic practices of collecting, grading, and returning assignments, but on how they see themselves as researchers within the classroom and the broader context of the writing program.

Corpus-based Analysis in FYC

When corpus studies are conducted in Composition, instructors do not play a significant role in collecting the data. Corpus analysis is often focused on English Language Learners and based on a relatively small corpus (see, for example, Siyanova-Chanturia, 2015; Liu, 2013). In Laura Aull’s (2015) major study of first-year writing, the corpus of 19,433 documents were placement essays that students completed before classes began (p. 58). Software designed for large-scale data collection and assessment, such as the Educational Testing Service’s Criterion Online Writing Evaluation program and other placement software bypasses instructor participation. Much of the study of revision in Composition, following in the tradition of Nancy Sommers (1980) or of Lester Faigley and Stephen Witte (1981), are small scale and qualitative. A corpus-based study such as this one expands the knowledge that results from qualitative studies, but it is also a much greater logistical challenge. While this study resulted in greater faculty involvement with research and digital tools, they also had to invest time and interest into this aspect of research and pedagogy. I wouldn’t have been able to support this without significant financial support from the Provost’s office. And, as the result of another grant, I was able to collaborate with a computer scientist at CCNY who will develop the software necessary to analyze the essays. Without significant financial and human resources, this kind of large-scale, faculty-inclusive study wouldn’t be possible.

The Corpus

As noted above, the study included forty sections of the first-semester of a two-semester first-year writing sequence. Our writing program requires that students write four major essays: a literacy narrative, an expository essay, a critical analysis, and a research essay. Instructors write their own assignments, but they are required to use the Norton Field Guide to Writing. A maximum total of 910 students could enroll in these sections of composition. We could potentially collect a total of 7,280 essays (3,640 first- and 3,640 second-drafts). However, since not every student completed every assignment, not every student completed the course, and not every instructor submitted the data, the total number of essays that we collect is likely to be significantly lower.  In the following sections, I describe the project, the training process, the impact that this collection process could have on faculty grading practices, and how the data collection process unfolded.

The Study

The study that I undertook is modeled after the work that Christopher Holcomb and Duncan Buell conducted at the University of South Carolina in the spring of 2014 and on which they reported at the Conference on College Composition and Communication in the spring of 2015. Holcomb and Buell built a corpus of "439 sets of first and final drafts from 5 sections of ENGL 101 and 6 sections of ENGL 102" (2015). After having developed a reasonably reliable method for assessing the changes between the first- and second-drafts of these essays, Holcomb and Buell learned that in 55.7% of the essays, there was either no evidence of revision at all or evidence of minimal revision. Of the essays that did show revision, 9% of the changes were deletions and 21.8% of them involved students adding complete sentences. Holcomb and Buell noted with some surprise that students didn't revise the sentences that preceded these insertions. In their interpretation of the data, they write that students didn't appear to rethink their essays and revise accordingly, but rather treated "their original drafts as fixed structures into which they plug or unplug, not words, but sentences" (2015). In addition, revisions were more likely to occur in the body of the essay than in the introduction or conclusion. In my view, this study suggests important pathways for faculty development with regard to revision. Among other approaches, it might be useful if instructors and students were to look at their own essays with an eye on quantity and type in addition to the kinds of qualitative assessment that we currently use. Holcomb and Buell are careful to note that this kind of quantitative assessment is intended to supplement rather than replace qualitative assessment. While this study will not allow us to evaluate the quality of the revisions or their impact on an essay’s meaning, it will provide us with an aggregate image of how much students revise. Depending on the comprehensiveness of the data, we might also be able to see if students’ revision practices change over the course of the semester or if there are any differences based on demographic data. Given City College’s rich linguistic diversity, the corpus has the potential to provide significant data to second language acquisition researchers.

The Training Process

My initial plans for collection were modest. I applied for an in-house research grant that would pay faculty to participate in the collection process, but my application was not successful. I knew, though, that newly appointed graduate teaching assistants would be able to collect the data since they would be using the Assignment tool in the teaching practicum. In May of 2015, however, I received a call: a $25,000 grant had been returned to the Provost’s office, and they needed to spend the money before the end of the year. 

I sent a call for applicants, each of whom would receive a stipend of $500 for attending a three hour workshop and collecting the data from their classes. The three hour training workshop was a significant undertaking for instructors. In it, they learned how to access Bb, how to use its Assignment function, and how to use its Grade Book. For the part-time faculty at City College, this workshop represented a significant amount of training. Most of the instructors had never accessed their Bb sites, and most of them did not know that they were automatically provided with a Bb site. Both the Assignment and Grade Book features of Bb are fairly complex, but since the Assignments feature automatically involves the Grade Book in that each Assignment automatically generates a column for grade in the Grade Book, it made sense to approach these two features during the same workshop. We encountered a wide range of problems, such as instructors not have access to their college email accounts. By the end of the workshop, though, everyone who had a Bb site was able to access it, create an Assignment, and navigate to the Grade Book to see what it would look like for students.  

As a training tool, I created this screen capture video to show instructors each step of the process from student submission to essay delivery. We watched the video in the training session and then individual instructors went through the process themselves on their on Bb course sites. 

Training Video

Assignments in Bb: Impact on Faculty Grading Practices

Figure 1: Downloading assignments in Blackboard
Figure 1: Downloading assignments in Blackboard

The Assignment tool is well-suited to this kind of data collection. It provides an easy way to collect essays and, unlike some of Bb’s other features, it is neither unnecessarily complex nor frustratingly glitchy. For my purposes, the Assignment tool has the enormous added benefit of assigning a unique code and a timestamp to each essay. I wouldn’t have to manually change file names in order for the essays to be read by the Natural Language Toolkit (the software that we plan to use for data analysis). The unique identifiers and the time stamps will allow the Natural Language Toolkit program to compare the first- and second drafts of the essays by first using the identifiers to isolate the essays and then comparing the earliest time stamp with the next one in the sequence.

List of Unique Identifiers
Figure 2: Unique identifiers in Blackboard

While the data collection method was primarily conceived as a tool for the revision study, it became apparent early on that implementing its use could have a far-reaching impact on faculty teaching and the digital culture on campus. Familiarity with this tool could, for example, have an impact on instructors’ willingness to collect digital copies of essays. Instructors who were already collecting digital copies by email could reduce the time they would spend on data processing. Instead of collecting the essays by email and individually downloading the attachments for commenting, instructors would be able to download all of the essays at once.

The assignment file download is quick; it takes less than thirty seconds. After commenting on the essays, instructors could upload them to the Bb platform (one at a time, unfortunately) which, while inconvenient, is no more inconvenient than attaching responses to email and has the added benefit of providing a permanent archive. Students’ original essays and instructors’ responses to them are all housed in one, shared location. The collection method could also have an impact on instructors’ access to students’ essays. It has long been considered good practice to identify and comment on particular aspects of students’ essays, and to maintain consistency not only from one draft to the next (e.g., if you comment on organization on the first draft, don’t switch to an entirely new subject when you respond to the second draft), but from one essay to the next. When instructors evaluate essays, we often want to look back at earlier essays to identify patterns. This kind of research, though, is very difficult if we rely on printed essays. Instead, I have suggested to instructors that they rename students’ essays following a format that will let them find  students’ work with a simple search.  

By adding the student’s first name, the essay and draft number, and another identifier that distinguishes the student’s original draft from the copy that includes our response, we can quickly search for a student’s first name and open all of her essays. This easy access to an individual student’s essays makes it possible for instructors to track the comments and suggestions they make from one essay to the next. The Assignment tool also allows instructors to comment on the essay on the Bb platform itself.This tool allows instructors to completely skip the download/upload process, and it keeps the advantage of having access to all students’ essays in one place. This platform makes it much more difficult, however, to compare one draft to another.

Data Collection

The data collection process has been surprisingly complex. The training session, training video, and multiple follow-up reminder emails resulted in much more confusion than I anticipated. About a month into the semester, having received several requests for clarification about the data collection process, I checked the statistics on the video sharing platform. The view counter revealed that the video had been viewed just nine times, and this number included my final review of it before the faculty development workshops and the two times I had shown it in the workshops. These statistics suggest that the screen-capture video was a poor platform for the training material. If I was to repeat this process (which seems likely) I will either create my own instruction sheets or use the materials on Blackboard’s web site. 


With some infrastructural support, using a CMS such as Blackboard to facilitate a corpus-based study of student essays has to the potential increase the use of a digital platform in a first year writing program. Once collected, the data will provide us with an unprecedented view into students’ revision practices, which will, in turn, help us to shape our curriculum and faculty development. Through this project, instructors were compelled to use features of Bb that they had not before been aware of; for many of them, this was their first use of Bb at all. By using the Bb CMS to collect data, we were also able to include instructors in the research process. This experience might have an impact on how they see themselves in the classroom and how they see their courses in the larger context of the first-year writing program. Given Blackboard’s ubiquity and the role it plays in the institution’s idea of faculty support its use as a data collection platform seems promising.

The data collection and the various analyses that other researchers and I will conduct in the following months also places our writing program in a good position both in terms of asking for funds to support additional training and development: the thousands of essays in our corpus represents the material results of our work, and we can easily demontrate that we have completed the task for which we were funded. In addition, as our accrediting body's decennial evaluation approaches, we will be able to describe a large-scale, data-driven assessment that also contributed to faculty and pedagogical development. 


Aull, Laura. (2015). First-year university writing: A corpus-based study with implications for pedagogy. New York, NY: Palgrave.

Buell, Duncan, & Holcomb, Christopher. (2015, March). First-year composition as big-data: Natural Language Processing and Portfolio Assessment. Paper presented at the annual meeting of the Conference on College Composition and Communication, Tampa, FL. 

Faigley, Lester, & Witte, Stephen. (1981). Analyzing revision. College Composition and Communication32.4, 400-14. 

Liu, Song, Lui, Peng, & Urano, Yoshiyori. (2013). A study of composition/ correction system with corpus retrieval function. International Journal of Distance Education Technologies, 11.3, 58+.

Sommers, Nancy. (1980). Revision strategies of student writers and experienced adult writers. College Composition and Communication31.4, 378-88.

Siyanova-Chanturia, Anna. (2015). Collocation in beginner learner writing: A longitudinal study. System 53, 148+.

Created by matthew. Last Modification: Tuesday July 5, 2016 18:57:26 GMT-0000 by kristi.