語料庫語言學(xué):語料庫的種類types of corpora_第1頁
語料庫語言學(xué):語料庫的種類types of corpora_第2頁
語料庫語言學(xué):語料庫的種類types of corpora_第3頁
語料庫語言學(xué):語料庫的種類types of corpora_第4頁
語料庫語言學(xué):語料庫的種類types of corpora_第5頁
已閱讀5頁,還剩4頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Types of corpora General vs. specialized corpora Written vs. spoken corpora Synchronic vs. diachronic corpora Monolingual vs. multilingual corpora Comparable vs. parallel corpora Native vs. learner corpora Sample vs. monitor corpora Raw vs. annotated corpora General vs. specialized corpora General c

2、orpora (通用語料庫通用語料庫) or reference corpora(參考語料庫參考語料庫): a wide coverage of different text categories or registers; represents language for general purposes. usu.: very large , millions of words. E.g. British National Corpus (BNC), Bank of English (BOE). specialized corpora (專用語料庫專用語料庫): texts from a p

3、articular variety of a language, e.g. from a particular dialect or from a particular subject area.Written vs. spoken corpora Written corpora(筆語語料庫)(筆語語料庫):contain only written materials. (more) Spoken corpora(口語語料庫)(口語語料庫):contain transcribed texts of spoken language. (less)Synchronic vs. diachronic

4、 corpora Synchronic corpora(共時(shí)語料庫)(共時(shí)語料庫): materials from a specific period of time. Diachronic corpora(歷時(shí)語料庫):(歷時(shí)語料庫):materials over a longer period of time.Monolingual vs. multilingual corpora Monolingual corpora(單語語料庫):(單語語料庫):texts in one language. Multilingual corpora(多語語料庫):(多語語料庫):texts in se

5、veral different languages.Comparable vs. parallel corpora Comparable corpora(可比語料庫):可比語料庫):texts from two or more languages which are similar in genre, topic, register etc. without, however, containing the same content. Parallel corpora(平行語料庫)(平行語料庫)(translation corpora)(翻譯語料庫):(翻譯語料庫):a corpus of o

6、riginal texts in one language and their translations into another (or several other languages)。探索“同一內(nèi)容是如何用兩種語言表達(dá)的” 。Native vs. learner corpora Native speakers corpora(本族語語料庫)(本族語語料庫):texts from native speakers. Learner corpora(學(xué)習(xí)者語料庫)(學(xué)習(xí)者語料庫):texts from language learners.Sample vs. monitor corpora S

7、ample corpora (樣本語料庫樣本語料庫):as opposed to a monitor corpus, a sample corpus is of finite size and consists of text segments selected to provide a static picture of language Monitor corpora (監(jiān)控語料庫監(jiān)控語料庫):monitor language change.It is regularly updated and open-ended.Raw vs. annotated corpora Raw corpor

8、a(生語料庫)(生語料庫):in raw states of plain text; without annotations Annotated corpora(標(biāo)注語料庫)(標(biāo)注語料庫):some external information is added to a corpus.e.g. information identifying the origin and nature of the text; tagging to show the word class of each word; parsing to show the sentence structure and the function of different elements in a sentence.one specific example, “gives”: third person singular present tense verbIn an annotated corpus, the form gives may be gives_VVZ, VVZ: it is a third person singular present tense (Z) form of a lexi

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論