home > library > publications > scaling biomedical topic maps to billions of associations: ...

Scaling Biomedical Topic Maps to Billions of Associations: How to Cope With Terabytes of Data?

Poster, was published by Benedikt Wachinger and Volker Stümpflen at 2010-09-30

This poster deals with issues of large-scaled systems and the usage of Topic Maps.

In order to understand biological systems generally and multifactorial diseases specifically, it is necessary to be able to create large-scale systems biological models as quickly as possible from the huge amounts of knowledge stored in multiple relational databases and published research articles. To achieve this we had to solve two problems: First, how do we solve the data integration problem if we want to store all that knowledge in one easily accessible place to be combined efficiently? And secondly, how do we efficiently store and manage the ever increasing amount of data, currently in the range of hundreds of terabytes? The first problem can be tackled with Topic Maps™, where a simple conversion schema from a relational database has to be developed. The data increase, however, entails that the underlying storage solutions have to scale accordingly. Since traditional approaches like relational databases do not scale well or only with a huge amount of administrational work, newer technologies able to distribute the data to clusters with arbitrary numbers of nodes and to limit I/O bottlenecks had to be found. Such technologies exist in cloud-like cluster architectures where storage and computation is done on the same machine. To use this, Google initially developed a column-oriented database concept called BigTable, which is essentially a very large key-value store. Hadoop HBase is an open source implementation of this concept. We have now invented a method for the efficient storage and retrieval of TMs in HBase. We have developed a column-oriented schema, able to reflect TMs efficiently in such key-value stores. At our institute we use this schema to integrate multiple biological databases from different resources in one central repository. Additionally, we have a semantic text mining system able to extract biologically relevant relations from the mass of available biomedical texts. Currently, our largest TM consists of over 4 billion associations.

Authors

Benedikt Wachinger

No contact information available.

Benedikt is author of Datenschätze heben:.. and Scaling Biomedical Topic.. .

Volker Stümpflen

http://www.informatik.uni-trier.de/~ley/db ...

Volker is author of From biological data to.. , Datenschätze heben:.. , and Scaling Biomedical Topic.. .

Presented at

TMRA 2010

Conference in Leipzig from 2010-09-29 to 2010-10-01

With Linked Topic Maps the motto of the TMRA 2009 conference was about spinning a global web of interchangeable and linkable topic maps. Linked …

Visit homepage of TMRA 2010

I like the easy but powerful way of merging Topic Maps to extend and combine existing knowledge bases. Thus I see high potential in distributed environments where peer to peer solutions may open the gates to the real Web 3.0.