User:Unormal
NLP - Natural Language Programming/Processing in KDE
- Jovie/KTTS - KDE Text-To-Speech
- Sonnet (Spell checking, etc.)
- Simon Listens - Speech Recognition
- KDEedu (Parley, KWordQuiz, KHangman, etc.)
- KMail ("attachment" recognition)
Theory
In NLP we've the following tasks to do:
- Look up - look up in a directionary (to find an antonym or synonym or definition)
- Machine translation - translate a text or word from one language to another
- Parsing - extract specific information from a word or text
- Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
- Segmentation/Tokenization - Split the text or sentence by words or sentences
- Spell checking - check to correct writing of a word or text
- Stemming - Extract the stem of a word
Free Linguistic software tools and framework
Tool | Supported Languages | Type | Version | Programming language | License | Notes |
---|---|---|---|---|---|---|
Apertium | many | machine translation platform | 3.1 | GPL | ||
Aspell | many ;-) | spell checker | 0.61 | LGPL | successor of Ispell | |
Enchant | many | Spell checker | 1.6.0 | Spell checker for Abiword | ||
FreeLing | Spanish, Catalan, Galician, Italian, English, Welsh, Portuguese, and Asturian | suite of language analyzers | 2.2 | GPL | ||
BabelNet | English, Catalan, French, German, Italian and Spanish | A very large multilingual semantic network | 1.0 | Java | Creative Commons Attribution-Noncommercial-Share Alike 3.0 | |
DGT-TM | Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. | A freely available large-scale translation memory in 22 languages | Java | EUPL | ||
frog | Dutch | tagger and parser | 0.1 | C++, Python | GPLv3 | |
hspell | Hebrew | spell checker (and morphological analyzer) | GPL | |||
hunmorph | morphological analyer | More nlp tools at this page | ||||
hunpos | tagger | 1.0 | OCaml | BSD | ||
HunSpell | many | spell checker and morphological analyzer | 1.2.12 | C, C++ | LGPL & MPL | Spell checker of OOo |
Ispell | large number of European languages | spell checker | 3.3.02 | unknown | Probably deprecated | |
LanguageTool | many | style and grammar proofreading software | 1.8 | Java | LGPL | |
liblingua-tagger | English | tagger | 0.16 | Perl | Unknown | Liblingua in Perl's CPAN module provides more tagger and stemmers |
LinkGrammar | English | syntactic parser | 4.1b | GPL | ||
Link Grammar Parsre | English and more | syntactic parser | 4.7.6 | Probably C | GPL | |
Malaga | German, Italian, Spanish, Suomi (not all free!) | grammar development environment | 7.12 | GPL | ||
mbt | Memory-based tagger-generator and tagger | 3.2.2 | C++ | GPLv3 | ||
Morphisto | German | morphological analyzer | LGPL & CC | |||
MySpell | many | spell checker | Former spell checker of OOo, now deprecated | |||
nltk | Natural Language ToolKit | Python | ||||
OpenNLP | unknown | NLP software collection at Apache | Java | Apache License | ||
Stanford Log-linear Part-Of-Speech Tagger | tagger | Java | GPL | |||
TiMBL | Tilburg Memory Based Learner | 1.0.0 | C++ | GPLv3 | ||
Snowball | many | stemmer library | C, Java, Python | BSD | ||
SVM | English, Catalan, Spanish | An Open Source generator of sequential taggers | 1.3.2 | Perl | LGPL | |
TreeTagger | many | PoSTagger & lemmatizer | Tagger License |
Free text to speech tools
Tool | Supported Languages | Type | Version | Programming language | License | Notes |
---|---|---|---|---|---|---|
Festival | ||||||
MBrola | ||||||
Additional tools
- Foma - a finite-state machine toolkit and library
- SFST - Stuttgart Finite State Transducer - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology
Semantics and Co
- LexInfo builds on the lemon model to represent lexical information attached to ontologies on the semantic web
- GOLD is an ontology for descriptive linguistics
Futher stuff
- LIMA