Jump to content

User:Unormal: Difference between revisions

From KDE Community Wiki
Unormal (talk | contribs)
Some more tools
Unormal (talk | contribs)
 
(17 intermediate revisions by the same user not shown)
Line 6: Line 6:
* [http://edu.kde.org KDEedu (Parley, KWordQuiz, KHangman, etc.)]
* [http://edu.kde.org KDEedu (Parley, KWordQuiz, KHangman, etc.)]
* [http://www.kontact.org KMail ("attachment" recognition)]
* [http://www.kontact.org KMail ("attachment" recognition)]
== Theory ==
In NLP we've the following tasks to do:
* Look up - look up in a directionary (to find an antonym or synonym or definition)
* Machine translation - translate a text or word from one language to another
* Parsing - extract specific information from a word or text
* Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
* Segmentation/Tokenization - Split the text or sentence by words or sentences
* Spell checking - check to correct writing of a word or text
* Stemming - Extract the stem of a word


== Free Linguistic software tools and framework ==
== Free Linguistic software tools and framework ==
Line 18: Line 31:
!License
!License
!Notes
!Notes
|-
|[http://www.apertium.org/ Apertium]
|many
|machine translation platform
|3.1
|
|GPL
|
|-
|[http://aspell.net/ Aspell]
|many ;-)
|spell checker
|0.61
|
|LGPL
|successor of Ispell
|-
|[http://abisource.com/projects/enchant/ Enchant]
|many
|Spell checker
|1.6.0
|
|
|Spell checker for Abiword
|-
|-
|[http://nlp.lsi.upc.edu/freeling/ FreeLing]  
|[http://nlp.lsi.upc.edu/freeling/ FreeLing]  
Line 26: Line 63:
|GPL
|GPL
|
|
|-
|[http://lcl.uniroma1.it/babelnet/ BabelNet]
|English, Catalan, French, German, Italian and Spanish
|A very large multilingual semantic network
|1.0
|Java
|Creative Commons Attribution-Noncommercial-Share Alike 3.0
|
|-
|[http://langtech.jrc.ec.europa.eu/DGT-TM.html DGT-TM]
|Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish.
|A freely available large-scale translation memory in 22 languages
|
|Java
|EUPL
|
|-
|[http://ilk.uvt.nl/tadpole frog]
|Dutch
|tagger and parser
|0.1
|C++, Python
|GPLv3
|
|-
|[http://hspell.ivrix.org.il/ hspell]
|Hebrew
|spell checker (and morphological analyzer)
|
|
|GPL
|
|-
|[http://mokk.bme.hu/resources/hunmorph hunmorph]
|
|morphological analyer
|
|
|
|More nlp tools at this page
|-
|[http://code.google.com/p/hunpos/ hunpos]
|
|tagger
|1.0
|OCaml
|BSD
|
|-
|[http://hunspell.sourceforge.net HunSpell]
|many
|spell checker and morphological analyzer
|1.2.12
|C, C++
|LGPL & MPL
|Spell checker of OOo
|-
|[http://lasr.cs.ucla.edu/geoff/ispell.html Ispell]
|large number of European languages
|spell checker
|3.3.02
|
|unknown
|Probably deprecated
|-
|[http://www.languagetool.org/ LanguageTool]
|many
|style and grammar proofreading software
|1.8
|Java
|LGPL
|
|-
|[http://search.cpan.org/dist/Lingua-EN-Tagger/ liblingua-tagger]
|English
|tagger
|0.16
|Perl
|Unknown
|Liblingua in Perl's CPAN module provides more tagger and stemmers
|-
|-
|[http://www.link.cs.cmu.edu/link/ LinkGrammar]
|[http://www.link.cs.cmu.edu/link/ LinkGrammar]
Line 35: Line 152:
|
|
|-
|-
|[http://code.google.com/p/morphisto/ Morphisto]
|[http://www.abisource.com/projects/link-grammar/ Link Grammar Parsre]
|German
|English and more
|morphological analyzer
|syntactic parser
|
|4.7.6
|
|Probably C
|LGPL & CC
|GPL
|
|
|-
|-
Line 51: Line 168:
|
|
|-
|-
|[http://lasr.cs.ucla.edu/geoff/ispell.html Ispell]
|[http://ilk.uvt.nl/mbt mbt]
|large number of European languages
|
|spell checker
|Memory-based tagger-generator and tagger
|3.3.02
|3.2.2
|C++
|GPLv3
|
|
|unknown
|Probably deprecated
|-
|-
|[http://aspell.net/ Aspell]
|[http://code.google.com/p/morphisto/ Morphisto]
|many ;-)
|German
|spell checker
|morphological analyzer
|0.61
|
|
|LGPL & CC
|
|
|LGPL
|successor of Ispell
|-
|-
|MySpell
|MySpell
Line 75: Line 192:
|Former spell checker of OOo, now deprecated
|Former spell checker of OOo, now deprecated
|-
|-
|[http://hunspell.sourceforge.net HunSpell]
|[http://www.nltk.org/ nltk]
|
|Natural Language ToolKit
|
|Python
|
|
|-
|[http://incubator.apache.org/opennlp/ OpenNLP]
|unknown
|NLP software collection at Apache
|
|Java
|Apache License
|
|-
|[http://nlp.stanford.edu/software/tagger.shtml Stanford Log-linear Part-Of-Speech Tagger]
|
|tagger
|
|Java
|GPL
|
|-
|[http://ilk.uvt.nl/timbl/ TiMBL]
|
|Tilburg Memory Based Learner
|1.0.0
|C++
|GPLv3
|
|-
|[http://snowball.tartarus.org/index.php Snowball]
|many
|many
|spell checker and morphological analyzer
|stemmer library
|1.2.12
|
|C, C++
|C, Java, Python
|LGPL & MPL
|BSD
|Spell checker of OOo
|
|-
|[http://www.lsi.upc.edu/~nlp/SVMTool/ SVM]
|English, Catalan, Spanish
|An Open Source generator of sequential taggers
|1.3.2
|Perl
|LGPL
|
|-
|-
|[http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html TreeTagger]
|[http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html TreeTagger]
Line 91: Line 248:
|
|
|-
|-
|[http://abisource.com/projects/enchant/ Enchant]
|}
|many
 
|Spell checker
[http://www-nlp.stanford.edu/links/statnlp.html Standford list of NLP tools]
|1.6.0
 
== Free text to speech tools ==
{| class="nlptoolstable" border="1" cellpadding="5" cellspacing="0" style="border: gray solid 1px; border-collapse: collapse; text-align: left; width: 100%;"
|- style="background: #ececec; white-space:nowrap;"
!Tool
!Supported Languages
!Type
!Version
!Programming language
!License
!Notes
|-
|[http://www.cstr.ed.ac.uk/projects/festival/ Festival]
|
|
|
|
|
|
|
|
|Spell checker for Abiword
|-
|-
|[http://ilk.uvt.nl/timbl/ TiMBL]
|[http://tcts.fpms.ac.be/synthesis/ MBrola]
|
|
|
|
|
|
|Tilburg Memory Based Learner
|1.0.0
|C++
|GPLv3
|
|
|-
|-
|[http://ilk.uvt.nl/mbt mbt]
|
|
|Memory-based tagger-generator and tagger
|3.2.2
|C++
|GPLv3
|
|
|-
|
|[http://ilk.uvt.nl/tadpole frog]
|
|Dutch
|
|tagger and parser
|
|0.1
|C++, Python
|GPLv3
|
|
|-
|-
Line 127: Line 291:
=== Additional tools ===
=== Additional tools ===


* Foma
* [http://foma.sourceforge.net/ Foma] - a finite-state machine toolkit and library
* SFST - Stuttgart Finite State Transducer
* [http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html SFST - Stuttgart Finite State Transducer] - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology
 
== Semantics and Co ==
 
* [http://www.lexinfo.net/ LexInfo builds on the lemon model to represent lexical information attached to ontologies on the semantic web]
* [http://linguistics-ontology.org/ GOLD is an ontology for descriptive linguistics]


=== Futher stuff ===
=== Futher stuff ===
* LIMA
* LIMA

Latest revision as of 21:31, 12 July 2012

NLP - Natural Language Programming/Processing in KDE

Theory

In NLP we've the following tasks to do:

  • Look up - look up in a directionary (to find an antonym or synonym or definition)
  • Machine translation - translate a text or word from one language to another
  • Parsing - extract specific information from a word or text
  • Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
  • Segmentation/Tokenization - Split the text or sentence by words or sentences
  • Spell checking - check to correct writing of a word or text
  • Stemming - Extract the stem of a word


Free Linguistic software tools and framework

Tool Supported Languages Type Version Programming language License Notes
Apertium many machine translation platform 3.1 GPL
Aspell many ;-) spell checker 0.61 LGPL successor of Ispell
Enchant many Spell checker 1.6.0 Spell checker for Abiword
FreeLing Spanish, Catalan, Galician, Italian, English, Welsh, Portuguese, and Asturian suite of language analyzers 2.2 GPL
BabelNet English, Catalan, French, German, Italian and Spanish A very large multilingual semantic network 1.0 Java Creative Commons Attribution-Noncommercial-Share Alike 3.0
DGT-TM Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. A freely available large-scale translation memory in 22 languages Java EUPL
frog Dutch tagger and parser 0.1 C++, Python GPLv3
hspell Hebrew spell checker (and morphological analyzer) GPL
hunmorph morphological analyer More nlp tools at this page
hunpos tagger 1.0 OCaml BSD
HunSpell many spell checker and morphological analyzer 1.2.12 C, C++ LGPL & MPL Spell checker of OOo
Ispell large number of European languages spell checker 3.3.02 unknown Probably deprecated
LanguageTool many style and grammar proofreading software 1.8 Java LGPL
liblingua-tagger English tagger 0.16 Perl Unknown Liblingua in Perl's CPAN module provides more tagger and stemmers
LinkGrammar English syntactic parser 4.1b GPL
Link Grammar Parsre English and more syntactic parser 4.7.6 Probably C GPL
Malaga German, Italian, Spanish, Suomi (not all free!) grammar development environment 7.12 GPL
mbt Memory-based tagger-generator and tagger 3.2.2 C++ GPLv3
Morphisto German morphological analyzer LGPL & CC
MySpell many spell checker Former spell checker of OOo, now deprecated
nltk Natural Language ToolKit Python
OpenNLP unknown NLP software collection at Apache Java Apache License
Stanford Log-linear Part-Of-Speech Tagger tagger Java GPL
TiMBL Tilburg Memory Based Learner 1.0.0 C++ GPLv3
Snowball many stemmer library C, Java, Python BSD
SVM English, Catalan, Spanish An Open Source generator of sequential taggers 1.3.2 Perl LGPL
TreeTagger many PoSTagger & lemmatizer Tagger License

Standford list of NLP tools

Free text to speech tools

Tool Supported Languages Type Version Programming language License Notes
Festival
MBrola

Additional tools

  • Foma - a finite-state machine toolkit and library
  • SFST - Stuttgart Finite State Transducer - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology

Semantics and Co

Futher stuff

  • LIMA