Annotating Texts

In the context of BACKBONE, a corpus is made up of a set of interviews. For our purposes, these interviews have been transcribed with the aid of BACKBONE Transcriptor. However, we may just want to add a simple .txt file to our corpus.

In BACKBONE, annotation relies on four key aspects.

(1) Annotation is unit-bound, not text bound. Our spoken texts are divided into sections which are defined as perceived units for language learning. In this way, while speakers’ contributions to the different corpora are topic driven, sections are established on the basis of pedagogical convenience.

(2) Annotation is based on TEI-compliant XML. This gives our XML files the processing and structuring power of XML plus the uniformity and standardisation of TEI.

(3) BACKBONE annotation has a multipurpose nature, as it informs users of a different kind and hierarchy in the system. The annotation informs material and activities designers on the main pedagogical characteristics of a given section. This information will be crucial in determining the potential uses of the section as it gives material designers metadata that is both relevant and suitable for a specific range of students. (Tornero et al. (2007)).

(4) BACKBONE Annotation allows server-stored collaborative corpora. For more information on collaborative annotation click here.

Adding documents to our corpus

Once we have created a corpus file and associated a taxonomy tree to it, we can start annotating.

Before launching the annotation process, we need to add text files to our corpus. To do this click Document and then Add Document (CTRL + A).

You can add as many documents as you need or as few as just one. In the context of BACKBONE the “BACKBONE Interview (UTF-16)” format is most suitable. You can always delete or add documents from a corpus. To do this, use the menu or the buttons below or CTRL + D. When you delete a document from your corpus, this message is prompted. When you add a new document to your corpus, this message is prompted.

You can open and start the annotation process by clicking the corresponding button, and then select the document you want to open and annotate. In this example, the documents that have been added are plain text documents (.txt) with no metadata assocciated. We can use BACKBONE Transcriptor to include this metadata information or we can use BACKBONE Annotator to do so. For the time being, we will assume that the texts which we are going to add to our corpus contain the metadata required for BACKBONE and we have used BACKBONE Transcriptor.

Opening and using a document for annotation: keywords

We will select one of the texts from our corpus and start annotating. This is what we see. To annotate the text, drag a category fron the tanonomy tree on the left and then drop it on the corresponding section.

For section 1, we will annotate education topic and nouns. The section has been annotated. However, BACBONE Annotator allows users to establish a link or relationship between the categories that have been added to the section/text and the word or string of words that have motivated this decision. This word or string of words are called keywords.

To establish this link, first apply a category on a section as described above, then select a word or string of words, right click and choose which category or tag you want to link to the word or string of words. This is where users can appreciate the usefulness of colours. If we have defined our colour palette carefully when creating the categories, now it will be easier to get an idea of what has been annotated by looking at the text. This feature can be conveniently tunrned off. Double click the eye icon on the left of the category and the keywords for this category won't be visible.

More than one category/tag may be added to a particular word or key of words. You may also need to remove a category/tag from a keyword. "Inspectorate" has been applied two different categories. You can remove both, or just one, by selecting the word, righ-clicking and choosing what to remove. Once a category has been applied, it can be easily removed. Select the category you want o remove and double-click its name.

Some coding from BACKBONE Transcriptor

If you have used BACKBONE Transcriptor you will find that some linguistic information is rendered in a special way. This includes the following:

a [b]: square brackets for standard form of a word or expression

\^: a word or passage is unclear

... : if the speaker changes topic or leaves a sentence unfinished

.word. : foreign word

( ): used for comments

Double quotes: direct speech

Single quotes: titles of films or books