Content deleted Content added
No edit summary |
m Open access bot: url-access updated in citation with #oabot. |
||
(38 intermediate revisions by 20 users not shown) | |||
Line 1:
The field of [[language documentation]] in the modern context involves a complex and ever-evolving set of tools and methods, and the study and development of their use
==
Researchers in language documentation often
=== Ethics ===▼
Ethical practices in language documentation have been the focus of much recent discussion and debate.<ref>Austin, Peter K. 2010. 'Communities, ethics and rights in language documentation.' In Peter K. Austin, Ed., ''Language Documentation and Description Vol 7''. London, SOAS: 34-54.</ref> The [[Linguistic Society of America]] has prepared an [http://www.linguisticsociety.org/sites/default/files/Ethics_Statement.pdf Ethics Statement], and maintains an [https://lsaethics.wordpress.com/about/ Ethics Discussion Blog] which is primarily focused on ethics in the language documentation context. The [[First Peoples' Cultural Council]] and [[Endangered Languages Project]] have released a [http://fpcc.ca/linguistcode Linguist's Code of Conduct] for engaging in documentation work. The morality of ethics protocols has itself been brought into question by [[George van Driem]].<ref>{{Cite journal|last=van Driem|first=George|date=2016|title=Endangered Language Research and the Moral Depravity of Ethics Protocols
=== Data Formats ===▼
*
Most current archive standards for [[video]] use MPEG-4
=== Principles for recording ===
Since documentation of languages is often difficult, with many languages that linguists work with being endangered (they may not be spoken in the near future), it is recommended to record at the highest quality possible given the limitations of a recorder. For video, this means recording at HD resolution (1080p or 720p) or higher when possible, while for audio this means recording minimally in uncompressed
=== Workflows ===
For many linguists the end-result of making recordings is language analysis, often investigation of a language's phonological or syntactic properties using various software tools. This requires transcription of the audio, generally in collaboration with native speakers of the language in question. For general transcription, media files can be played back on a computer (or other device capable of playback) and paused for transcription in a text editor. Other (cross-platform) tools to assist this process include [https://www.audacityteam.org/ Audacity] and [[sourceforge:projects/trans/files/transcriber/1.5.1/|Transcriber]], while a program like [https://tla.mpi.nl/tools/tla-tools/elan/ ELAN] (described further below) can also perform this function.
Programs like [https://software.sil.org/toolbox/ Toolbox] or [https://software.sil.org/fieldworks/ FLEx] are often preferred by linguists who want to be able to [[Interlinear gloss|interlinearize]] their texts, as these programs build a dictionary of forms and parsing rules to help speed up analysis. Unfortunately, media files are generally not linked by these programs (as opposed to ELAN, in which linked files are preferred), making it difficult to view or listen back to recordings to check transcriptions. There is [https://github.com/lingdoc/trs2txt currently a workaround] for Toolbox that allows timecodes to reference an audio file and enable playback (of a complete text or a referenced sentence) from within Toolbox - in this workflow, time-alignment of text is performed in Transcriber, and then the relevant timecodes and text are converted into a format that Toolbox can read.
== Hardware ==
=== Video+audio recorders ===
Recorders that record video typically also record audio as well. However, the audio does not always meet the criteria of minimal needs and recommended best practices for language documentation (uncompressed WAV format, 44.
The [https://www.zoom.co.jp/products/field-video-recording/video-recording Zoom] series, particularly the [https://www.zoom-na.com/products/field-video-recording/video-recording/zoom-q8/specs Q8], [https://www.zoom.co.jp/products/field-video-recording/video-recording/q4n-handy-video-recorder#specs Q4n], and [https://www.zoom.co.jp/products/field-video-recording/video-recording/q2n-handy-video-recorder#specs Q2n], which record to multiple video and audio resolutions/formats,
When using a video recorder that does not record audio in WAV format (such as most DSLR cameras), it is recommended to record audio separately on another recorder, following some of the guidelines below. As with the audio recorders described below, many video recorders also accept microphone input of various kinds
=== Audio recorders and microphones ===
Audio-only recorders can be used in scenarios where video is impractical or otherwise undesirable. In most cases it is advantageous to combine the use of an audio-only recorder with one or more external microphones, however many modern audio recorders include built-in microphones which are usable if cost or setup speed are important concerns. Digital (solid state) recorders are preferred for most language documentation scenarios. Modern digital recorders achieve a very high level of quality at a relatively low price. Some of the most popular field recorders are found in the [https://www.zoom.co.jp/ Zoom] range, including the [https://www.zoom.co.jp/products/handy-recorder/h1-handy-recorder H1], [https://www.zoom.co.jp/products/handy-recorder/h2n-handy-recorder H2], [https://www.zoom.co.jp/products/field-recording/h4n-pro-handy-recorder H4], [https://www.zoom.co.jp/products/handy-recorder/h5-handy-recorder H5] and [https://www.zoom.co.jp/products/field-video-recording/field-recording/h6-handy-recorder H6]. The [https://www.zoom.co.jp/products/handy-recorder/h1-handy-recorder H1] is particularly suitable for situations in which cost and user-friendliness are major desiderata. Other popular recorders for situations where size is a factor are the [http://www.getolympus.com/us/en/audio/pcm-recorders.html Olympus LS-series] and the [https://www.sony.com/electronics/voice-recorder-products/t/voice-recorders?bestfor=meeting-recording Sony Digital Voice recorders] (though in the latter case, ensure that the device can record to WAV/Linear PCM format).
Several types of [[microphone]] can be effectively used in language documentation scenarios, depending on the situation (especially, including factors such as number, position and mobility of speakers) and on budget. In general, [[condenser microphones]] should be selected rather than [[dynamic microphones]]. It is an advantage in most fieldwork situations if a condenser microphone is self-powered (via a battery); however, when power is not a major factor, phantom-powered models can also be used. A stereo microphone setup is needed whenever more than one speaker is involved in a recording; this can be achieved via an array of two mono microphones, or by a dedicated stereo microphone.
Line 23 ⟶ 40:
Directional microphones should be used in most cases, in order to isolate a speaker's voice from other potential noise sources. However, omnidirectional microphones may be preferred in situations involving larger numbers of speakers arrayed in a relatively large space. Among directional microphones, [[Cardioid microphone|cardioid]] microphones are suitable for most applications, however in some cases a [[hypercardioid]] ("shotgun") microphone may be preferred.
Good quality headset microphones are comparatively expensive, but can produce recordings of extremely high quality in controlled situations.<ref>{{Cite journal|
Some good quality microphones used for film-making and interviews include the [http://www.rode.com/microphones/video Røde VideoMic shotgun and the Røde lavalier series], [http://www.shure.com/americas/products/microphones/beta/beta-53-headworn-microphone Shure headworn mics] and [http://www.shure.com/americas/search?utf8=%E2%9C%93&keyword=lavalier#keyword=lavalier&category_1=Microphones Shure lavaliers]. Depending on the recorder and microphone, additional [[Audio and video interfaces and connectors|cables]] (XLR, stereo/mono converter or a [https://www.amazon.com/Rode-SC3-3-5mm-TRRS-Adaptor/dp/B00L6C8PNU TRRS to TRS adapter]) will be necessary.
Line 39 ⟶ 56:
=== SayMore ===
[
The primary functions of SayMore are: (a) audio recording (b) file import from recording device (video and/or audio) (c) file organization (d) metadata entry at session and file levels (e) association of AV files with evidence of informed consent and other supplementary objects (such as photographs) (f) AV file segmentation (g) transcription/translation (h) [https://sites.google.com/site/boldpng/bold BOLD]-style Careful Speech annotation and Oral Translation.
SayMore files can be further exported for annotation in [
=== ELAN ===
[[
=== FLEx ===
Line 52 ⟶ 69:
=== Toolbox ===
[
=== Tools for automating components of the workflow ===
Language documentation may be partially automated thanks to a number of software tools, including:
* [[ESpeakNG|eSpeak]]▼
*Maus▼
*Sox▼
* [[Lingua Libre]], a [[FLOSS|libre]] online tool allowing to record a large number of words and phrases in a short period (up to 1 000 words/hour with a clean word list and an experienced user). It automatizes the classic procedure for recording audio and video pronunciation files (for [[Spoken language|spoken]] and [[Sign language|signed]] languages). Once the recording is done, the platform automatically uploads clean, well cut, well named and apps-friendly files, directly to [[c:Category:Lingua_Libre_pronunciation|Wikimedia Commons]] (it is possible to download datasets for a specific language).
*Prosodylab Aligner▼
▲* Maus
▲*[[ESpeakNG|eSpeak]]
▲* Prosodylab Aligner
▲*[[HTK_(software)|HTK]]
▲* Sox
▲=== Data Formats ===
▲Standards for formats are critical for interoperability between software tools, e.g. [[OLAC]]. Many individual archives or data repositories have their own standards and requirements for data deposited on their servers - knowledge of these requirements ought to inform the data collection strategy and tools used, and should be part of a data management plan developed before the start of research. Some example guidelines from well-used repositories are given below:
▲* [https://www.soas.ac.uk/elar/helpsheets/ Endangered Languages Archive (ELAR)] guidelines
▲* [http://www.mpi.nl/corpus/html/lamus/apa.html Max Planck Institute Archive] accepted formats
▲* [https://web.library.yale.edu/digital-initiatives/digitization-standards-and-guidelines/audiovisual Yale University Library] audiovisual guidelines
▲Most current archive standards for [[video]] use MPEG-4 encoding as a storage format, which includes an AAC audio stream of up to 320 kbps. [[Sound quality|Audio]] archive quality is at least WAV 44.1 khz, 16-bit.
▲== Ethics ==
▲Ethical practices in language documentation have been the focus of much recent discussion and debate.<ref>Austin, Peter K. 2010. 'Communities, ethics and rights in language documentation.' In Peter K. Austin, Ed., ''Language Documentation and Description Vol 7''. London, SOAS: 34-54.</ref> The [[Linguistic Society of America]] has prepared an [http://www.linguisticsociety.org/sites/default/files/Ethics_Statement.pdf Ethics Statement], and maintains an [https://lsaethics.wordpress.com/about/ Ethics Discussion Blog] which is primarily focused on ethics in the language documentation context. The morality of ethics protocols has itself been brought into question by [[George van Driem]].<ref>{{Cite journal|last=van Driem|first=George|date=2016|title=Endangered Language Research and the Moral Depravity of Ethics Protocols|url= http://hdl.handle.net/10125/24693|journal=Language Documentation and Conservation 10: 243-252|doi=|pmid=|access-date=}}</ref> Most postgraduate programs in Language Documentation and Description require research proposals to submit to an internal Institutional Review Board which ensures that research is being conducted ethically.
== Literature ==
The peer-reviewed journal [http://nflrc.hawaii.edu/ldc/ Language Documentation and Conservation] has published a large number of articles focusing on tools and methods in language documentation.
== Film ==
The 2021 Indian documentary film [[Dreaming of Words]] traces the life and work of [[Njattyela Sreedharan]], a fourth standard drop-out, who compiles a multilingual dictionary connecting four major [[Dravidian languages]] [[Malayalam]], [[Kannada]], [[Tamil language|Tamil]] and [[Telugu language|Telugu]].<ref>{{Cite web|url=https://bookofachievers.com/articles/82-yo-compiles-dictionary-of-4-dravidian-languages-useful-ofcourse|title = 82-year-old Kerala man's Dictionary is in the four Dravidian languages. 25 long years to compile}}</ref><ref>{{Cite web|url=https://www.thebetterindia.com/246205/83-yo-kerala-school-dropout-creates-unique-dictionary-in-4-south-indian-languages-vid01/|title=83-YO Kerala School Dropout Creates Unique Dictionary in 4 South Indian Languages|date=31 December 2020}}</ref><ref>{{Cite news|url=https://www.thehindu.com/news/national/kerala/for-keralites-door-opens-to-three-other-dravidian-languages/article32986464.ece|title = For Keralites, door opens to three other Dravidian languages|newspaper = The Hindu|date = 30 October 2020|last1 = Sajit|first1 = C. p.}}</ref> Travelling across four states and doing extensive research, he spent twenty five years<ref>{{Cite web|url=https://silvertalkies.com/the-man-who-wrote-a-dictionary-in-four-languages/|title=The Man Who Wrote A Dictionary In Four Languages – Silver Talkies|website=silvertalkies.com}}</ref> making this multilingual dictionary.
== See also ==
[https://web.archive.org/web/20181026095442/http://www.resourcebook.eu/ LRE Map] Language resources map
Searchable by Resource Type, Language(s), Language type, Modality, Resource Use, Availability, Production Status, Conference(s), Resource name
|