HOME   ::  Journal List   ::   Article

Solan, Z., Horn, D., Ruppin, E., and Edelman, S. (2005) Unsupervised learning of natural languages. PNAS, 102(33):11629--11634.
Bookmark:   ( bookmarked by 10 relevant users: garyfeng, hukkinen, soramame_0518, awarlau, marimo, paulclinger, grzegorz, jrw, ansobol, Kevembuangga ).   tags: 5nlp 5language 4linguistics 3statistics 3language_acquisition 3computational_linguistics 3learning 3machine_learning 3grammar 2unsupervised-learning 2unsupervised 2algorithms 2bioinformatics ...............................

Full-text
   || Go to publishers' web page for final fulltext/PDF. (see the Authoritative link as below)
   URL: http://www.cs.tau.ac.il/~ruppin/pnas_adios.pdf
   Cached: PDF-497K   
   SAVE AS an easy-to-recall long filename:
      Filename format: author--year--title   PDF-497K   
      Filename format: author--year--title--journal|proceedings|...--pages   PDF-497K   

Related links
   Authoritative: http://dx.doi.org/10.1073/pnas.0409746102   (Publisher's PDF... likely be available here.)
   Source: http://www.cs.tau.ac.il/~ruppin/
  Web search: Google Web Search   ::   Google Scholar
  Within this site: References (42)

Abstract

We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The ADIOS (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.

Keywords: computational linguistics, grammar induction, language acquisition, machine learning, protein classification

BibTex
@article{solan05languageLearningPNAS,
  author={Zach Solan and David Horn and Eytan Ruppin and Shimon Edelman},
  title={Unsupervised learning of natural languages},
  journal={PNAS},
  year={2005},
  month={August},
  volume={102},
  number={33},
  pages={11629-11634},
  doi={10.1073/pnas.0409746102},
  url={http://www.isrl.uiuc.edu/~amag/langev/paper/solan05languageLearningPNAS.html},
  keywords={computational linguistics, grammar induction, language acquisition, machine learning, protein classification}
}


 HOME   ::  Journal List   ::   Article Comments to: junwang4 you-know-at gmail.com Last update: 2/2/08