This article is about transcription in linguistics. For other uses, see Transcription.
Transcription in the linguistic sense is the systematic representation of language in written form. The source can either be utterances (speech or sign language) or preexisting text in another writing system.
Transcription should not be confused with translation, which means representing the meaning of a source language text in a target language (e.g. translating the meaning of an English text into Spanish), or with transliteration which means representing a text from one script in another (e.g. transliterating a Cyrillic text into the Latin script).
In the academic discipline of linguistics, transcription is an essential part of the methodologies of (among others) phonetics, conversation analysis, dialectology and sociolinguistics. It also plays an important role for several subfields of speech technology. Common examples for transcriptions outside academia are the proceedings of a court hearing such as a criminal trial (by a court reporter) or a physician's recorded voice notes (medical transcription). This article focuses on transcription in linguistics.
Phonetic vs. orthographic transcription
Broadly speaking, there are two possible approaches to linguistic transcription. Phonetic transcription focuses on phonetic and phonological properties of spoken language. Systems for phonetic transcription thus furnish rules for mapping individual sounds or phones to written symbols. Systems for orthographic transcription, by contrast, consist of rules for mapping spoken words onto written forms as prescribed by the orthography of a given language. Phonetic transcription operates with specially defined character sets, usually the International Phonetic Alphabet.
Which type of transcription is chosen depends mostly on the research interests pursued. Since phonetic transcription strictly foregrounds the phonetic nature of language, it is most useful for phonetic or phonological analyses. Orthographic transcription, on the other hand, has a morphological and a lexical component alongside the phonetic component (which aspect is represented to which degree depends on the language and orthography in question). It is thus more convenient wherever meaning-related aspects of spoken language are investigated. Phonetic transcription is doubtlessly more systematic in a scientific sense, but it is also harder to learn, more time-consuming to carry out and less widely applicable than orthographic transcription.
Mapping spoken language onto written symbols is not as straightforward a process as may seem at first glance. Written language is an idealisation, made up of a limited set of clearly distinct and discrete symbols. Spoken language, on the other hand, is a continuous (as opposed to discrete) phenomenon, made up of a potentially unlimited number of components. There is no predetermined system for distinguishing and classifying these components and, consequently, no preset way of mapping these components onto written symbols.
Literature is relatively consistent in pointing out the nonneutrality of transcription practices. There is not and cannot be a neutral transcription system. Knowledge of social culture enters directly into the making of a transcript. They are captured in the texture of the transcript (Baker, 2005).
Transcription systems are sets of rules which define how spoken language is to be represented in written symbols. Most phonetic transcription systems are based on the International Phonetic Alphabet or, especially in speech technology, on its derivative SAMPA. Examples for orthographic transcription systems (all from the field of conversation analysis or related fields) are:
CA (conversation analysis)
Arguably the first system of its kind, originally sketched in (Sacks et al. 1978), later adapted for the use in computer readable corpora as CA-CHAT by (MacWhinney 2000). The field of Conversation Analysis itself includes a number of distinct approaches to transcription and sets of transcription conventions. These include, among others, Jefferson Notation. To analyze conversation, recorded data is typically transcribed into a written form that is agreeable to analysts. There are two common approaches. The first, called narrow transcription, captures the details of conversational interaction such as which particular words are stressed, which words are spoken with increased loudness, points at which the turns-at-talk overlap, how particular words are articulated, and so on. If such detail is less important, perhaps because the analyst is more concerned with the overall gross structure of the conversation or the relative distribution of turns-at-talk amongst the participants, then a second type of transcription known as broad transcription may be sufficient (Williamson, 2009).
The Jefferson Notation System is a set of symbols, developed by Gail Jefferson, which is used for transcribing talk. Having had some previous experience in transcribing when she was hired in 1963 as a clerk typist at the UCLA Department of Public Health to transcribe sensitivity-training sessions for prison guards, Jefferson began transcribing some of the recordings that served as the materials out of which Harvey Sacks’ earliest lectures were developed. Over four decades, for the majority of which she held no university position and was unsalaried, Jefferson’s research into talk-in-interaction has set the standard for what became known as Conversation Analysis (CA). Her work has greatly influenced the sociological study of interaction, but also disciplines beyond, especially linguistics, communication, and anthropology. This system is employed universally by those working from the CA perspective and is regarded as having become a near-globalized set of instructions for transcription.
DT (discourse transcription)
A system described in (DuBois et al. 1992), used for transcription of the Santa Barbara Corpus of Spoken American English (SBCSAE), later developed further into DT2.
GAT (Gesprächsanalytisches Transkriptionssystem – Conversation Analytic transcription system)
- A system described in (Selting et al. 1998), later developed further into GAT2 (Selting et al. 2009), widely used in German speaking countries for prosodically oriented conversation analysis and interactional linguistics
HIAT (Halbinterpretative Arbeitstranskriptionen – Semiinterpretative Working Transcriptions)
- Arguably the first system of its kind, originally described in (Ehlich and Rehbein 1976) – see (Ehlich 1992) for an English reference - adapted for the use in computer readable corpora as (Rehbein et al. 2004), and widely used in functional pragmatics.
Transcription was originally a process carried out manually, i.e. with pencil and paper, using an analogue sound recording stored on, e.g., a Compact Cassette. Nowadays, most transcription is done on computers. Recordings are usually digital audio or video files, and transcriptions are electronic documents. Specialized computer software exists to assist the transcriber in efficiently creating a digital transcription from a digital recording. Among the most widely used transcription tools in linguistic research are:
- ANVIL (Annotation of Video and Language Data)
- A tool specialising in transcription of multimodal interaction, see ANVIL-Website
- A tool to create captions, indexes and transcripts for searchable metadata, see .
- CLAN (Computerized Language Analysis)
- A tool mainly used for the transcription of child language acquisition data as in the CHILDES database, see CLAN page of the CHILDES website.
- ELAN (EUDICO Linguistic Annotator)
- A tool widely used for the transcription of signed and spoken language data and the documentation of endangered languages, see ELAN page on the Language Archiving Technology portal
- EXMARaLDA (Extensible Markup Language for Discourse Annotation)
- A tool widely used in discourse analysis, dialectology and sociolinguistics, see EXMARaLDA website
- A tool used in social science which includes a free guide on transcription methodology on its site, see f4transkript website
- FOLKER (FOLK Editor)
- A tool developed for the Research and Teaching Corpus of Spoken German (FOLK) and widely used in conversation analysis, see FOLKER page at the website of the Institute for German Language
- A tool widely used in phonetics
- A tool originally developed for the transcription of speech, see Transcriber website at SourceForge
- A tool for media (audio/video) transcription and captioning with embedded high speech recognition for English.
Other transcription software is developed for commercial sale.
- Hepburn, A., & Bolden, G. B. (2013). The conversation analytic approach to transcription. In J. Sidnell & T. Stivers (Eds.), The handbook of Conversation Analysis (pp. 57-76). Oxford: Blackwell. PDF
- DuBois, John / Schuetze-Coburn, Stephan / Cumming, Susanne / Paolino, Danae (1992): Outline of Discourse Transcription. In: Edwards/Lampert (1992), 45-89.
- Ehlich, K. (1992). HIAT - a Transcription System for Discourse Data. In: Edwards, Jane / Lampert, Martin (eds.): Talking Data – Transcription and Coding in Discourse Research. Hillsdale: Erlbaum, 123-148.
- Ehlich, K. & Rehbein, J. (1976) Halbinterpretative Arbeitstranskriptionen (HIAT). In: Linguistische Berichte (45), 21-41.
- Haberland, H. & Mortensen, J. (2016) Transcription as second order entextualisation: The challenge of heteroglossia. In: Capone, A. & Mey, J. L. (eds.): Interdisciplinary Studies in Pragmatics, Culture and Society, 581-600. Cham: Springer.
- Jenks, C.J. (2011) Transcribing Talk and Interaction: Issues in the Representation of Communication Data. Amsterdam: John Benjamins.
- MacWhinney, Brian (2000): The CHILDES project: tools for analyzing talk. Mahwah, NJ: Lawrence Erlbaum.
- Rehbein, J.; Schmidt, T.; Meyer, B.; Watzke, F. & Herkenrath, A. (2004) Handbuch für das computergestützte Transkribieren nach HIAT. In: Arbeiten zur Mehrsprachigkeit, Folge B (56). Online version
- Ochs, E. (1979) Transcription as theory. In: Ochs, E. & Schieffelin, B. B. (ed.): Developmental pragmatics, 43-72. New York: Academic Press.
- Sacks, H.; Schegloff, E. & Jefferson, G. (1978) A simplest systematics for the organization of turn taking for conversation. In: Schenkein, J. (ed.): Studies in the Organization of Conversational Interaction, 7-56. New York: Academic Press.
- Selting, Margret / Auer, Peter / Barden, Birgit / Bergmann, Jörg / Couper-Kuhlen, Elizabeth / Günthner, Susanne / Meier, Christoph / Quasthoff, Uta / Schlobinski, Peter / Uhmann, Susanne (1998): Gesprächsanalytisches Transkriptionssystem (GAT). In: Linguistische Berichte 173, 91-122.
- Selting, M., Auer, P., Barth-Weingarten, D., Bergmann, J., Bergmann, P., Birkner, K., Couper-Kuhlen, E., Deppermann, A., Gilles, P., Günthner, S., Hartung, M., Kern, F., Mertzlufft, C., Meyer, C., Morek, M., Oberzaucher, F., Peters, J., Quasthoff, U., Schütte, W., Stukenbrock, A., Uhmann, S. (2009): Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). In: Gesprächsforschung (10), 353-402. Online version
What is Transcription?
Transcription generally refers to the written form of something. In biology, transcription is the process whereby DNA is usedas a template to form a complementary RNA strand – RNA is the “written” form of DNA. This is the first stage of protein production or the flow of information within a cell. DNA stores genetic information, which is then transferred to RNA in transcription, before directing the synthesis of proteins in translation. Three types of RNA can be formed: messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA).
Transcription occurs in four stages: pre-initiation, initiation, elongation, and termination. These differ in prokaryotes and eukaryotes in that DNA is stored in the nucleus in eukaryotes, and whereas DNA is stored in the cytoplasm in prokaryotes. In eukaryotes, DNA is stored in tightly packed chromatin, which must be uncoiled before transcription can occur. The production of mRNA from RNA in eukaryotes is particularly more complicated than it is in prokaryotes, involving several additional processing steps.
Pre-initiation, or template binding, is initiated by the RNA polymerase σ subunit binding to a promoter region located in the 5’ end of a DNA strand. Following this, the DNA strand is denatured, uncoupling the two complementary strands and allowing the template strand to be accessed by the enzyme. The opposing strand is known as the partner strand. Promoter sequences on the DNA strand are vital for the successful initiation of transcription. Promoter sequences are specific sequences of the ribonucleotide bases making up the DNA strand (adenine, thymine, guanine,and cytosine), and the identity of several of these motifs have been discovered, including TATAAT and TTGACA in prokaryotes and TATAAAA and GGCCAATCT in eukaryotes. These sequences are known as cis-acting elements. In eukaryotes, an additional transcription factor is necessary to facilitate the binding of RNA polymerase to the promoter region.
RNA polymerase catalyzes initiation, causing the introduction of the first complementary 5’-ribonucleoside triphosphate. Remember that each DNA nucleotide base has a complement: adenine and thymine, and guanine and cytosine. However, the ribonucleotide base complements differ slightly as RNA does not contain thymine, but rather a uracil, and so adenine’s complement is uracil. After the introduction of the first complementary 5’-ribonucleotide, subsequent complementary ribonucleotides are inserted in a 5’ to 3’ direction. These ribonucleotides are joined by phosphodiester bonds, and at this stage, the DNA and RNA molecules are still connected(see Figure 1).
Image Source : Wikipedia
Figure 1: Initiation of transcription. RNAP refers to RNA polymerase.
Chain elongation occurs when the σ subunit dissociates from the DNA strand, allowing the growing RNA strand to separate from the DNA template strand. This is facilitated by the core enzyme (see Figure 2).
Image Source : Wikipedia
Figure 2: Elongation in transcription
Termination occurs when the core enzyme encounters a termination sequence, which is a specific sequence of nucleotides which acts as a signal to stop transcription. At this point, the RNA transcript forms a hairpin secondary structure by folding back on itself with the aid of hydrogen bonds. Termination in prokaryotes can be assisted by an additional termination factor known as rho(ρ). Termination is complete when the RNA molecule is released from the template DNA strand. In eukaryotes, termination requires an additional step known as polyadenylation in eukaryotes, whereby a tail of multiple adenosine monophosphates is added to the RNA strand.
Figure 3: The main events in each stage of transcription
What is Reverse Transcription?
Reverse transcription is the process of transcribing a DNA molecule from an RNA molecule. This method of replication is utilized by retroviruses, such as HIV, and produces altered DNA, which can be incorporated directly into a host cell, allowing rapid reproduction. This is made possible by the reverse transcriptase enzyme. This can be seen in Figure 4.
Image Source : Wikipedia
Figure 4: The process of reverse transcription.
What is Translation?
Translation refers to the conversion of something from one language or form to another. In biology, translation is the process whereby messenger ribonucleic acid, or mRNA, synthesizes proteins – mRNA is converted to proteins. This is accomplished by the production of a chain of amino acids (a polypeptide chain) determined by the chemical information stored by a specific strand of mRNA. These polypeptides fold to form proteins. Each strand of mRNA is coded by a different gene and codes a different protein. This is important for gene expression.
Image Source : Wikimedia Commons
Figure 5: The triplet code is translated into amino acids, some of the amino acids code for the start and end of translation
Translation has three main stages: initiation, elongation, and termination. These differ slightly in prokaryotic and eukaryotic organisms: in prokaryotes, translation occurs in the cytoplasm, while in eukaryotes, translation takes place in the endoplasmic reticulum. Essential to the process of translation is the ribosome; ribosomal structure also differs in prokaryotes and eukaryotes, mostly concerning the rate of the migration of their subunits when centrifuged, and the number of proteins their subunits contain.
Initiation begins with the small ribosomal subunit binding to the 5’ end of the mRNA, the messenger RNA created in transcription from DNA. This occurs in two stages: the small ribosomal subunit first binds to several proteinaceous initiation factors, before the combined structure binds to mRNA. This binding site is several ribonucleotides before the start codon of the mRNA. Following this, a charged molecule of tRNA binds to the small ribosomal subunit. The large ribosomal subunit then goes on to bind to the complex formed by the small ribosomal subunit, the mRNA, and the tRNA. This process hydrolyzes the GTP (guanosine-5′-triphosphate) needed to power the bonds. After the large ribosomal subunit joins the complex, the initiation factors are released.
The charging of the molecule of tRNA utilized in the process of translation refers to the linking of the tRNA molecule with an amino acid. This occurs as a result of aminoacyl-tRNAsynthetases, which reacts with the amino acid and ATP (adenosine triphosphate) to form a reactive form of the amino acid, known as an aminoacyladenylic acid. This binds with the ATP to form a complex which can react with a tRNA molecule, forming a covalent bond between the two. The tRNA can now transfer the amino acid to the mRNA molecule.
Elongation begins when both the small and large ribosomal subunits have been bound to the mRNA. A peptidyl site and an aminoacyl site are formed on the mRNA molecule for further binding with tRNA. The tRNA first binds to the P site (peptidyl site), and elongation begins with the binding of the second tRNA molecule to the A site (aminoacyl site). Both these tRNA molecules are transporting amino acids. An enzyme known as peptidyl transferase is released and forms a peptide bond between the amino acids transported by the two tRNA molecules. The covalent bond between the tRNA molecule at the P site and its amino acid is broken, releasing this tRNA to the E site (exit site) before it is released from the mRNA molecule entirely. The tRNA located at the A site then moves to the P site, utilizing the energy produced from the GTP. This leaves the A site free for further bindingwhile the P site contains a tRNA molecule attached to an amino acid, that is attached to another amino acid. This forms the basis of the polypeptide chain. Another tRNA molecule then binds to the A site, and peptidyl transferase catalyzes the creation of a peptide bond between this new amino acid and the amino acid attached to the tRNA located at the P site. The covalent bond between the amino acid and tRNA at the P site is broken and the tRNA is released. This process repeats over and over again, adding to add amino acids to the polypeptide chain.
Termination occurs when the ribosome complex encounters a stop codon(see figure 5). At this stage, the polypeptide chain is attached to a tRNA at the P site, while the A site is unattached. GTP-dependent release factors break the bond between the final tRNA and the terminal amino acid. The tRNA is released from the ribosome complex, which then splits again into the small and large ribosomal subunits, which are released from the mRNA strand. This polypeptide chain then folds in on itself to form a protein. This process is depicted in Figure 6 and Figure 7.
Image Source : Wikipedia
Figure 6: The overview of the process of translation
Figure 7: The main events in each stage of translation.
How is Translation Different from Transcription?
Both transcription and translation are equally important in the process of genetic information flow within a cell, from genes in DNA to proteins. Neither process can occur without the other. However, there are several important differences in these processes.
To begin with, initial transcription components include DNA, RNA polymerase core enzyme, and the σ subunit.Translation components include mRNA, small and large ribosomal subunits, initiation factors, elongation factors and tRNA. In transcription, a DNA double helix is denatured to allow the enzyme to access the template strand. In translation, no such denaturing is necessary, as the template is a single mRNA strand. The product of transcription is RNA, which can be encountered in the form mRNA, tRNA or rRNAwhile the product of translation is a polypeptide amino acid chain, which forms a protein.
Transcription occurs in the nucleus in eukaryotic organisms, while translation occurs in the cytoplasm and endoplasmic reticulum. Both processes occur in the cytoplasm in prokaryotes. The factor controlling these processes is RNA polymerase in transcription and ribosomes in translation. In transcription, this polymerase moves over the template strand of DNA, while in translation, the ribosome-tRNA complex moves over the mRNA strand.
These differences are summarized in Table 1 below.
Table 1: The differences between transcription and translation
|DNA, RNA polymerase core enzyme, σ subunit||mRNA, small and large ribosomal subunits, initiation factors, elongation factors, tRNA|
|RNA polymerase reacts with DNA template strand||Ribosome complex interacts with mRNA strand|
Wrapping Up Translation vs. Transcription
For as powerful as it is, DNA is as good as its products. It is for this very reason that the processes of transcription and translation are so important. For a smooth operation of cell processes both the DNA sequences and the products thereof must work according to plan. This is where transcription and translation come into play and fulfill a vital purpose in the DNA function.
Let’s put everything into practice. Try this Biology practice question:
Looking for more Biology practice?
Check out our other articles on Biology.
You can also find thousands of practice questions on Albert.io. Albert.io lets you customize your learning experience to target practice where you need the most help. We’ll give you challenging practice questions to help you achieve mastery in Biology.
Start practicing here.
Are you a teacher or administrator interested in boosting Biology student outcomes?
Learn more about our school licenses here.