User:Unormal: Difference between revisions
Appearance
Added snowball stemmer |
|||
(15 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
* [http://edu.kde.org KDEedu (Parley, KWordQuiz, KHangman, etc.)] | * [http://edu.kde.org KDEedu (Parley, KWordQuiz, KHangman, etc.)] | ||
* [http://www.kontact.org KMail ("attachment" recognition)] | * [http://www.kontact.org KMail ("attachment" recognition)] | ||
== Theory == | |||
In NLP we've the following tasks to do: | |||
* Look up - look up in a directionary (to find an antonym or synonym or definition) | |||
* Machine translation - translate a text or word from one language to another | |||
* Parsing - extract specific information from a word or text | |||
* Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets) | |||
* Segmentation/Tokenization - Split the text or sentence by words or sentences | |||
* Spell checking - check to correct writing of a word or text | |||
* Stemming - Extract the stem of a word | |||
== Free Linguistic software tools and framework == | == Free Linguistic software tools and framework == | ||
Line 49: | Line 62: | ||
| | | | ||
|GPL | |GPL | ||
| | |||
|- | |||
|[http://lcl.uniroma1.it/babelnet/ BabelNet] | |||
|English, Catalan, French, German, Italian and Spanish | |||
|A very large multilingual semantic network | |||
|1.0 | |||
|Java | |||
|Creative Commons Attribution-Noncommercial-Share Alike 3.0 | |||
| | |||
|- | |||
|[http://langtech.jrc.ec.europa.eu/DGT-TM.html DGT-TM] | |||
|Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. | |||
|A freely available large-scale translation memory in 22 languages | |||
| | |||
|Java | |||
|EUPL | |||
| | | | ||
|- | |- | ||
Line 57: | Line 86: | ||
|C++, Python | |C++, Python | ||
|GPLv3 | |GPLv3 | ||
| | |||
|- | |||
|[http://hspell.ivrix.org.il/ hspell] | |||
|Hebrew | |||
|spell checker (and morphological analyzer) | |||
| | |||
| | |||
|GPL | |||
| | |||
|- | |||
|[http://mokk.bme.hu/resources/hunmorph hunmorph] | |||
| | |||
|morphological analyer | |||
| | |||
| | |||
| | |||
|More nlp tools at this page | |||
|- | |||
|[http://code.google.com/p/hunpos/ hunpos] | |||
| | |||
|tagger | |||
|1.0 | |||
|OCaml | |||
|BSD | |||
| | | | ||
|- | |- | ||
Line 74: | Line 127: | ||
|unknown | |unknown | ||
|Probably deprecated | |Probably deprecated | ||
|- | |||
|[http://www.languagetool.org/ LanguageTool] | |||
|many | |||
|style and grammar proofreading software | |||
|1.8 | |||
|Java | |||
|LGPL | |||
| | |||
|- | |||
|[http://search.cpan.org/dist/Lingua-EN-Tagger/ liblingua-tagger] | |||
|English | |||
|tagger | |||
|0.16 | |||
|Perl | |||
|Unknown | |||
|Liblingua in Perl's CPAN module provides more tagger and stemmers | |||
|- | |- | ||
|[http://www.link.cs.cmu.edu/link/ LinkGrammar] | |[http://www.link.cs.cmu.edu/link/ LinkGrammar] | ||
Line 80: | Line 149: | ||
|4.1b | |4.1b | ||
| | | | ||
|GPL | |||
| | |||
|- | |||
|[http://www.abisource.com/projects/link-grammar/ Link Grammar Parsre] | |||
|English and more | |||
|syntactic parser | |||
|4.7.6 | |||
|Probably C | |||
|GPL | |GPL | ||
| | | | ||
Line 114: | Line 191: | ||
| | | | ||
|Former spell checker of OOo, now deprecated | |Former spell checker of OOo, now deprecated | ||
|- | |||
|[http://www.nltk.org/ nltk] | |||
| | |||
|Natural Language ToolKit | |||
| | |||
|Python | |||
| | |||
| | |||
|- | |||
|[http://incubator.apache.org/opennlp/ OpenNLP] | |||
|unknown | |||
|NLP software collection at Apache | |||
| | |||
|Java | |||
|Apache License | |||
| | |||
|- | |||
|[http://nlp.stanford.edu/software/tagger.shtml Stanford Log-linear Part-Of-Speech Tagger] | |||
| | |||
|tagger | |||
| | |||
|Java | |||
|GPL | |||
| | |||
|- | |- | ||
|[http://ilk.uvt.nl/timbl/ TiMBL] | |[http://ilk.uvt.nl/timbl/ TiMBL] | ||
Line 129: | Line 230: | ||
|C, Java, Python | |C, Java, Python | ||
|BSD | |BSD | ||
| | |||
|- | |||
|[http://www.lsi.upc.edu/~nlp/SVMTool/ SVM] | |||
|English, Catalan, Spanish | |||
|An Open Source generator of sequential taggers | |||
|1.3.2 | |||
|Perl | |||
|LGPL | |||
| | | | ||
|- | |- | ||
Line 137: | Line 246: | ||
| | | | ||
|Tagger License | |Tagger License | ||
| | |||
|- | |||
|} | |||
[http://www-nlp.stanford.edu/links/statnlp.html Standford list of NLP tools] | |||
== Free text to speech tools == | |||
{| class="nlptoolstable" border="1" cellpadding="5" cellspacing="0" style="border: gray solid 1px; border-collapse: collapse; text-align: left; width: 100%;" | |||
|- style="background: #ececec; white-space:nowrap;" | |||
!Tool | |||
!Supported Languages | |||
!Type | |||
!Version | |||
!Programming language | |||
!License | |||
!Notes | |||
|- | |||
|[http://www.cstr.ed.ac.uk/projects/festival/ Festival] | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[http://tcts.fpms.ac.be/synthesis/ MBrola] | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | | | ||
|- | |- | ||
Line 143: | Line 291: | ||
=== Additional tools === | === Additional tools === | ||
* Foma | * [http://foma.sourceforge.net/ Foma] - a finite-state machine toolkit and library | ||
* SFST - Stuttgart Finite State Transducer | * [http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html SFST - Stuttgart Finite State Transducer] - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology | ||
== Semantics and Co == | |||
* [http://www.lexinfo.net/ LexInfo builds on the lemon model to represent lexical information attached to ontologies on the semantic web] | |||
* [http://linguistics-ontology.org/ GOLD is an ontology for descriptive linguistics] | |||
=== Futher stuff === | === Futher stuff === | ||
* LIMA | * LIMA |
Latest revision as of 21:31, 12 July 2012
NLP - Natural Language Programming/Processing in KDE
- Jovie/KTTS - KDE Text-To-Speech
- Sonnet (Spell checking, etc.)
- Simon Listens - Speech Recognition
- KDEedu (Parley, KWordQuiz, KHangman, etc.)
- KMail ("attachment" recognition)
Theory
In NLP we've the following tasks to do:
- Look up - look up in a directionary (to find an antonym or synonym or definition)
- Machine translation - translate a text or word from one language to another
- Parsing - extract specific information from a word or text
- Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
- Segmentation/Tokenization - Split the text or sentence by words or sentences
- Spell checking - check to correct writing of a word or text
- Stemming - Extract the stem of a word
Free Linguistic software tools and framework
Tool | Supported Languages | Type | Version | Programming language | License | Notes |
---|---|---|---|---|---|---|
Apertium | many | machine translation platform | 3.1 | GPL | ||
Aspell | many ;-) | spell checker | 0.61 | LGPL | successor of Ispell | |
Enchant | many | Spell checker | 1.6.0 | Spell checker for Abiword | ||
FreeLing | Spanish, Catalan, Galician, Italian, English, Welsh, Portuguese, and Asturian | suite of language analyzers | 2.2 | GPL | ||
BabelNet | English, Catalan, French, German, Italian and Spanish | A very large multilingual semantic network | 1.0 | Java | Creative Commons Attribution-Noncommercial-Share Alike 3.0 | |
DGT-TM | Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. | A freely available large-scale translation memory in 22 languages | Java | EUPL | ||
frog | Dutch | tagger and parser | 0.1 | C++, Python | GPLv3 | |
hspell | Hebrew | spell checker (and morphological analyzer) | GPL | |||
hunmorph | morphological analyer | More nlp tools at this page | ||||
hunpos | tagger | 1.0 | OCaml | BSD | ||
HunSpell | many | spell checker and morphological analyzer | 1.2.12 | C, C++ | LGPL & MPL | Spell checker of OOo |
Ispell | large number of European languages | spell checker | 3.3.02 | unknown | Probably deprecated | |
LanguageTool | many | style and grammar proofreading software | 1.8 | Java | LGPL | |
liblingua-tagger | English | tagger | 0.16 | Perl | Unknown | Liblingua in Perl's CPAN module provides more tagger and stemmers |
LinkGrammar | English | syntactic parser | 4.1b | GPL | ||
Link Grammar Parsre | English and more | syntactic parser | 4.7.6 | Probably C | GPL | |
Malaga | German, Italian, Spanish, Suomi (not all free!) | grammar development environment | 7.12 | GPL | ||
mbt | Memory-based tagger-generator and tagger | 3.2.2 | C++ | GPLv3 | ||
Morphisto | German | morphological analyzer | LGPL & CC | |||
MySpell | many | spell checker | Former spell checker of OOo, now deprecated | |||
nltk | Natural Language ToolKit | Python | ||||
OpenNLP | unknown | NLP software collection at Apache | Java | Apache License | ||
Stanford Log-linear Part-Of-Speech Tagger | tagger | Java | GPL | |||
TiMBL | Tilburg Memory Based Learner | 1.0.0 | C++ | GPLv3 | ||
Snowball | many | stemmer library | C, Java, Python | BSD | ||
SVM | English, Catalan, Spanish | An Open Source generator of sequential taggers | 1.3.2 | Perl | LGPL | |
TreeTagger | many | PoSTagger & lemmatizer | Tagger License |
Free text to speech tools
Tool | Supported Languages | Type | Version | Programming language | License | Notes |
---|---|---|---|---|---|---|
Festival | ||||||
MBrola | ||||||
Additional tools
- Foma - a finite-state machine toolkit and library
- SFST - Stuttgart Finite State Transducer - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology
Semantics and Co
- LexInfo builds on the lemon model to represent lexical information attached to ontologies on the semantic web
- GOLD is an ontology for descriptive linguistics
Futher stuff
- LIMA