User:Unormal: Difference between revisions
Appearance
First list of linguistic tools |
|||
(20 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= | = NLP - Natural Language Programming/Processing in KDE = | ||
* | * [http://accessibility.kde.org/ Jovie/KTTS - KDE Text-To-Speech] | ||
* | * [http://techbase.kde.org/Development/Architecture/KDE4/Sonnet Sonnet (Spell checking, etc.)] | ||
* | * [http://www.simon-listens.org Simon Listens - Speech Recognition] | ||
* | * [http://edu.kde.org KDEedu (Parley, KWordQuiz, KHangman, etc.)] | ||
* | * [http://www.kontact.org KMail ("attachment" recognition)] | ||
== | == Theory == | ||
* Foma | In NLP we've the following tasks to do: | ||
* SFST - Stuttgart Finite State Transducer | |||
* Look up - look up in a directionary (to find an antonym or synonym or definition) | |||
* Machine translation - translate a text or word from one language to another | |||
* Parsing - extract specific information from a word or text | |||
* Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets) | |||
* Segmentation/Tokenization - Split the text or sentence by words or sentences | |||
* Spell checking - check to correct writing of a word or text | |||
* Stemming - Extract the stem of a word | |||
== Free Linguistic software tools and framework == | |||
{| class="nlptoolstable" border="1" cellpadding="5" cellspacing="0" style="border: gray solid 1px; border-collapse: collapse; text-align: left; width: 100%;" | |||
|- style="background: #ececec; white-space:nowrap;" | |||
!Tool | |||
!Supported Languages | |||
!Type | |||
!Version | |||
!Programming language | |||
!License | |||
!Notes | |||
|- | |||
|[http://www.apertium.org/ Apertium] | |||
|many | |||
|machine translation platform | |||
|3.1 | |||
| | |||
|GPL | |||
| | |||
|- | |||
|[http://aspell.net/ Aspell] | |||
|many ;-) | |||
|spell checker | |||
|0.61 | |||
| | |||
|LGPL | |||
|successor of Ispell | |||
|- | |||
|[http://abisource.com/projects/enchant/ Enchant] | |||
|many | |||
|Spell checker | |||
|1.6.0 | |||
| | |||
| | |||
|Spell checker for Abiword | |||
|- | |||
|[http://nlp.lsi.upc.edu/freeling/ FreeLing] | |||
|Spanish, Catalan, Galician, Italian, English, Welsh, Portuguese, and Asturian | |||
|suite of language analyzers | |||
|2.2 | |||
| | |||
|GPL | |||
| | |||
|- | |||
|[http://lcl.uniroma1.it/babelnet/ BabelNet] | |||
|English, Catalan, French, German, Italian and Spanish | |||
|A very large multilingual semantic network | |||
|1.0 | |||
|Java | |||
|Creative Commons Attribution-Noncommercial-Share Alike 3.0 | |||
| | |||
|- | |||
|[http://langtech.jrc.ec.europa.eu/DGT-TM.html DGT-TM] | |||
|Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. | |||
|A freely available large-scale translation memory in 22 languages | |||
| | |||
|Java | |||
|EUPL | |||
| | |||
|- | |||
|[http://ilk.uvt.nl/tadpole frog] | |||
|Dutch | |||
|tagger and parser | |||
|0.1 | |||
|C++, Python | |||
|GPLv3 | |||
| | |||
|- | |||
|[http://hspell.ivrix.org.il/ hspell] | |||
|Hebrew | |||
|spell checker (and morphological analyzer) | |||
| | |||
| | |||
|GPL | |||
| | |||
|- | |||
|[http://mokk.bme.hu/resources/hunmorph hunmorph] | |||
| | |||
|morphological analyer | |||
| | |||
| | |||
| | |||
|More nlp tools at this page | |||
|- | |||
|[http://code.google.com/p/hunpos/ hunpos] | |||
| | |||
|tagger | |||
|1.0 | |||
|OCaml | |||
|BSD | |||
| | |||
|- | |||
|[http://hunspell.sourceforge.net HunSpell] | |||
|many | |||
|spell checker and morphological analyzer | |||
|1.2.12 | |||
|C, C++ | |||
|LGPL & MPL | |||
|Spell checker of OOo | |||
|- | |||
|[http://lasr.cs.ucla.edu/geoff/ispell.html Ispell] | |||
|large number of European languages | |||
|spell checker | |||
|3.3.02 | |||
| | |||
|unknown | |||
|Probably deprecated | |||
|- | |||
|[http://www.languagetool.org/ LanguageTool] | |||
|many | |||
|style and grammar proofreading software | |||
|1.8 | |||
|Java | |||
|LGPL | |||
| | |||
|- | |||
|[http://search.cpan.org/dist/Lingua-EN-Tagger/ liblingua-tagger] | |||
|English | |||
|tagger | |||
|0.16 | |||
|Perl | |||
|Unknown | |||
|Liblingua in Perl's CPAN module provides more tagger and stemmers | |||
|- | |||
|[http://www.link.cs.cmu.edu/link/ LinkGrammar] | |||
|English | |||
|syntactic parser | |||
|4.1b | |||
| | |||
|GPL | |||
| | |||
|- | |||
|[http://www.abisource.com/projects/link-grammar/ Link Grammar Parsre] | |||
|English and more | |||
|syntactic parser | |||
|4.7.6 | |||
|Probably C | |||
|GPL | |||
| | |||
|- | |||
|[http://home.arcor.de/bjoern-beutel/malaga/ Malaga] | |||
|German, Italian, Spanish, Suomi (not all free!) | |||
|grammar development environment | |||
|7.12 | |||
| | |||
|GPL | |||
| | |||
|- | |||
|[http://ilk.uvt.nl/mbt mbt] | |||
| | |||
|Memory-based tagger-generator and tagger | |||
|3.2.2 | |||
|C++ | |||
|GPLv3 | |||
| | |||
|- | |||
|[http://code.google.com/p/morphisto/ Morphisto] | |||
|German | |||
|morphological analyzer | |||
| | |||
| | |||
|LGPL & CC | |||
| | |||
|- | |||
|MySpell | |||
|many | |||
|spell checker | |||
| | |||
| | |||
| | |||
|Former spell checker of OOo, now deprecated | |||
|- | |||
|[http://www.nltk.org/ nltk] | |||
| | |||
|Natural Language ToolKit | |||
| | |||
|Python | |||
| | |||
| | |||
|- | |||
|[http://incubator.apache.org/opennlp/ OpenNLP] | |||
|unknown | |||
|NLP software collection at Apache | |||
| | |||
|Java | |||
|Apache License | |||
| | |||
|- | |||
|[http://nlp.stanford.edu/software/tagger.shtml Stanford Log-linear Part-Of-Speech Tagger] | |||
| | |||
|tagger | |||
| | |||
|Java | |||
|GPL | |||
| | |||
|- | |||
|[http://ilk.uvt.nl/timbl/ TiMBL] | |||
| | |||
|Tilburg Memory Based Learner | |||
|1.0.0 | |||
|C++ | |||
|GPLv3 | |||
| | |||
|- | |||
|[http://snowball.tartarus.org/index.php Snowball] | |||
|many | |||
|stemmer library | |||
| | |||
|C, Java, Python | |||
|BSD | |||
| | |||
|- | |||
|[http://www.lsi.upc.edu/~nlp/SVMTool/ SVM] | |||
|English, Catalan, Spanish | |||
|An Open Source generator of sequential taggers | |||
|1.3.2 | |||
|Perl | |||
|LGPL | |||
| | |||
|- | |||
|[http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html TreeTagger] | |||
|many | |||
|PoSTagger & lemmatizer | |||
| | |||
| | |||
|Tagger License | |||
| | |||
|- | |||
|} | |||
[http://www-nlp.stanford.edu/links/statnlp.html Standford list of NLP tools] | |||
== Free text to speech tools == | |||
{| class="nlptoolstable" border="1" cellpadding="5" cellspacing="0" style="border: gray solid 1px; border-collapse: collapse; text-align: left; width: 100%;" | |||
|- style="background: #ececec; white-space:nowrap;" | |||
!Tool | |||
!Supported Languages | |||
!Type | |||
!Version | |||
!Programming language | |||
!License | |||
!Notes | |||
|- | |||
|[http://www.cstr.ed.ac.uk/projects/festival/ Festival] | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[http://tcts.fpms.ac.be/synthesis/ MBrola] | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|} | |||
=== Additional tools === | |||
* [http://foma.sourceforge.net/ Foma] - a finite-state machine toolkit and library | |||
* [http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html SFST - Stuttgart Finite State Transducer] - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology | |||
== Semantics and Co == | |||
* [http://www.lexinfo.net/ LexInfo builds on the lemon model to represent lexical information attached to ontologies on the semantic web] | |||
* [http://linguistics-ontology.org/ GOLD is an ontology for descriptive linguistics] | |||
=== Futher stuff === | |||
* LIMA |
Latest revision as of 21:31, 12 July 2012
NLP - Natural Language Programming/Processing in KDE
- Jovie/KTTS - KDE Text-To-Speech
- Sonnet (Spell checking, etc.)
- Simon Listens - Speech Recognition
- KDEedu (Parley, KWordQuiz, KHangman, etc.)
- KMail ("attachment" recognition)
Theory
In NLP we've the following tasks to do:
- Look up - look up in a directionary (to find an antonym or synonym or definition)
- Machine translation - translate a text or word from one language to another
- Parsing - extract specific information from a word or text
- Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
- Segmentation/Tokenization - Split the text or sentence by words or sentences
- Spell checking - check to correct writing of a word or text
- Stemming - Extract the stem of a word
Free Linguistic software tools and framework
Tool | Supported Languages | Type | Version | Programming language | License | Notes |
---|---|---|---|---|---|---|
Apertium | many | machine translation platform | 3.1 | GPL | ||
Aspell | many ;-) | spell checker | 0.61 | LGPL | successor of Ispell | |
Enchant | many | Spell checker | 1.6.0 | Spell checker for Abiword | ||
FreeLing | Spanish, Catalan, Galician, Italian, English, Welsh, Portuguese, and Asturian | suite of language analyzers | 2.2 | GPL | ||
BabelNet | English, Catalan, French, German, Italian and Spanish | A very large multilingual semantic network | 1.0 | Java | Creative Commons Attribution-Noncommercial-Share Alike 3.0 | |
DGT-TM | Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. | A freely available large-scale translation memory in 22 languages | Java | EUPL | ||
frog | Dutch | tagger and parser | 0.1 | C++, Python | GPLv3 | |
hspell | Hebrew | spell checker (and morphological analyzer) | GPL | |||
hunmorph | morphological analyer | More nlp tools at this page | ||||
hunpos | tagger | 1.0 | OCaml | BSD | ||
HunSpell | many | spell checker and morphological analyzer | 1.2.12 | C, C++ | LGPL & MPL | Spell checker of OOo |
Ispell | large number of European languages | spell checker | 3.3.02 | unknown | Probably deprecated | |
LanguageTool | many | style and grammar proofreading software | 1.8 | Java | LGPL | |
liblingua-tagger | English | tagger | 0.16 | Perl | Unknown | Liblingua in Perl's CPAN module provides more tagger and stemmers |
LinkGrammar | English | syntactic parser | 4.1b | GPL | ||
Link Grammar Parsre | English and more | syntactic parser | 4.7.6 | Probably C | GPL | |
Malaga | German, Italian, Spanish, Suomi (not all free!) | grammar development environment | 7.12 | GPL | ||
mbt | Memory-based tagger-generator and tagger | 3.2.2 | C++ | GPLv3 | ||
Morphisto | German | morphological analyzer | LGPL & CC | |||
MySpell | many | spell checker | Former spell checker of OOo, now deprecated | |||
nltk | Natural Language ToolKit | Python | ||||
OpenNLP | unknown | NLP software collection at Apache | Java | Apache License | ||
Stanford Log-linear Part-Of-Speech Tagger | tagger | Java | GPL | |||
TiMBL | Tilburg Memory Based Learner | 1.0.0 | C++ | GPLv3 | ||
Snowball | many | stemmer library | C, Java, Python | BSD | ||
SVM | English, Catalan, Spanish | An Open Source generator of sequential taggers | 1.3.2 | Perl | LGPL | |
TreeTagger | many | PoSTagger & lemmatizer | Tagger License |
Free text to speech tools
Tool | Supported Languages | Type | Version | Programming language | License | Notes |
---|---|---|---|---|---|---|
Festival | ||||||
MBrola | ||||||
Additional tools
- Foma - a finite-state machine toolkit and library
- SFST - Stuttgart Finite State Transducer - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology
Semantics and Co
- LexInfo builds on the lemon model to represent lexical information attached to ontologies on the semantic web
- GOLD is an ontology for descriptive linguistics
Futher stuff
- LIMA