Architecture for the proper decoupling of OWL/RDF knowledge and (mostly numerical) facts -


i working on application dealing public health indicators. related concepts , knowledge kept in owl ontology. there (potentially important) number of numerical facts (e.g. indicator x has value y), grow on time, more data gets crunched , added application. given querying system imply manipulating concepts (from ontology), (numerical) facts, wondering (in broad terms) ideal data model/storage architecture it.

i've been contemplating instance hybrid architecture facts stored in separate sql database (i.e. using pure relational model, not rdf-over-relational one), , querying decomposed in 2 phases: second (sql) being derived (or guided) concepts retrieved first (ontology).

as read robust triple stores being able handle massive amounts of data (billion+ triples), suggests try keep facts in rdf store (perhaps implemented relational db). have benefit suppose of offering more unified query interface (as query simultaneously in the schema , fact stores using same api or query engine, instead of mixing sql in process hybrid approach). on other hand, guess i'd lose data crunching capabilities of relational db (assuming triple store not optimized operations aggregation, reduction, etc.) might useful in context. final piece of information, have invested energy in beginning learn jena framework, i'd appreciate if suggestions take account.

(i asked question on answers.semanticweb.com, no avail.)

it seems pure-rdf solution application work. note, rdf databases maturing quickly, , there lot of high-quality open-source , commercial options available. scale billions or tens of billions of triples , support core semweb standards.

additionally, many of options optimized specific set of use cases , scale, might try more 1 option if you're not happy performance of first. also, don't roll own here, you're not going slap performs better worst rdf database. you'll better performance out of database uses native rdf storage rather backed relational db, @ least in experience, true.

as jena, it's reasonable framework use, prefer sesame, both work with. however, rather standardizing on jena (or sesame), might best off standardizing rdf part of application, part or of it, on sparql. has benefit of being database , programming language agnostic. sparql protocol based on http can use pretty language out there , able talk database, , because you're using sparql rather custom protocol, can more change database requirements evolve. makes easy others utilize data should wish make public, either within organization or on web.

sparql give powerful query language sql-like, includes aggregates (in sparql 1.1). may not have you'll need application, might have build custom processing code, should give leg stand on. rdf databases optimized handling sparql queries, no need worry performance generally, sparql pspace-complete in terms of complexity, can write query cannot answered.

finally, while hybrid architecture work, concern longer term create undue maintenance burden. if you're curious semtech, , think it's fit @ least part of application, might try pure-semtech solution first see how far get.

good luck.


Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -