java - Is modeling infinite-scale relationships in NoSQL / BigTable (GAE) possible? -
my team writing application gae (java) has led me question scalability of entity relationship modeling (specifically many-to-many) in object oriented databases bigtable.
the preferred solution modeling unowned one-to-many , many-to-many relationships in app engine datastore (see entity relationships in jdo) seems list-of-keys. however, google warns:
"there few limitations implementing many-to-many relationships way. first, must explicitly retrieve values on side of collection list stored since have available key objects. more important 1 want avoid storing overly large lists of keys..."
speaking of overly large lists of keys, if attempt model way , assume storing 1 long each key per-entity limit of 1mb theoretical maximum number of relationships per entity ~130k. platform who's primary advantage scalabililty, that's not many relationships. looking @ possibly sharding entities require more 130k relationships.
a different approach (relationship model) outlined in article modeling entity relationships part of mastering datastore series in appengine developer resources. however, here google warns performance of relational models:
"however, need careful because traversing connections of collection require more calls datastore. use kind of many-to-many relationship when need to, , care performance of application."
so asking: 'why need more 130k relationships per-entity?' i'm glad asked. let's take, example, cms application 1 million users (hey can dream right?!)
users can upload content , share with: 1. public 2. individuals 3. groups 4. combination
now logs in, , navigates dashboard shows new uploads people connected in group. dashboard should include public content, , content shared user or group user member of. not bad right? let's dig it.
public class content { private long id; private long authorid; private list<long> sharedwith; //can individual ids or group ids }
now query id allowed see might this:
list<long> idsthatgivemeaccess = new arraylist<long>(); idsthatgivemeaccess.add(myid); idsthatgivemeaccess.add(publicid); //let's sharing 0l makes public (group g : groupsimin) idsthatgivemeaccess.add(g.getid()); list<long> authoridsthatiwanttosee = new arraylist<long>(); //add bunch of authorids query q = new query("content") .addfilter("authorid", query.filteroperator.in, authoridsthatiwanttosee) .addfilter("sharedwith", query.filteroperator.in, idsthatgivemeaccess);
obviously i've broken several rules. namely, using 2 in filters blow up. single in filter @ size approaching limits talking blow up. aside that, let's want limit , page through results... no no! can't if use in filter. can't think of way operation in single query - means can't paginate without extensive read-time processing , managing multiple cursors.
so here tools can think of doing this: denormalization, sharding, or relationship entities. these concepts don't see how possible model data in way scale. it's possible. google , others time. can't see how. can shed light on how model or point me toward resources cms-style access control based on nosql db?
storing list of ids property wont scale. why not store new object each new relationship? (like in sql). object store cms 2 properties: id of shared item , user id. if shared 1000 users have 1000 of these. querying given user trivial. listing permissions given item or list of user has shared them easy too.
Comments
Post a Comment