- /documents/Wortschatz_and_Topic_Maps
Corpora browser of the Wortschatz project now based on Topic Maps
Published by {{by}} on {{at}}.
Abstract:
With the Wortschatz project the NLP research group at the University of Leipzig - which the Topic Maps Lab is affiliated with - provides one of the most important language resources in Germany. The new Wortschatz browser for statistical analyses is now based in Topic Maps.
The Wortschatz project is one of the most important language resources in Germany. It is provided by the NLP research group at the University of Leipzig, which is also the host of the Topic Maps Lab.
A few days ago a new browser for corpora and language statistics was launched by the Wortschatz group. Within this portal users can compare a bunch of statistical parameters of currently 27 different languages and of different corpora. These are parameters like “most frequent word beginning n-grams”, “distribution of sentence length in words or characters”, or “Language fingerprints”.
And the best news is: the browser as it is depicted in the figure is 100% based on Topic Maps. The data is collected from several thousand files and merged using JRTM with tinyTIM as backend. This project fairly shows, that integrating Topic Maps in a lightweight way into third party projects might be a appropriate path to a more Topic Mappish world. And with RTM the Topic Maps Lab provides a good “glue” facility for such purposes.
Authors of this document are
Benjamin Bock
http://twitter.com/bnjmnbck
Benjamin is project leader of Ruby Topic Maps and rtm-tmql.
Lutz Maicher
http://www.wifa.uni-leipzig.de/isrm ...
Lutz is project leader of Musica Migrans, Topic Maps Lab Community.. , and Repertoire of the St. Thomas.. .
Subject Matter
Ruby Topic Maps
is a {{project}}.
RTM is a Topic Maps Engine written in Ruby.