120 mongodb collections vs single collection - which one is more efficient? -

i'm new mongodb , i'm facing dilemma regarding db schema design:

should create 1 single collection or put data several collections (we call these categories suppose).

now know many such questions have been asked, believe case different 2 reasons:

if go many collections, i'll have create 120 , that's it. won't grow in future.
i know i'll never need query or insert multiple collections. have query one, since document in collection x not related document stored in other collections. documents may hold references other parts of db though (like userid etc).

so question is: 120 collections improve query performance? useful optimization in case?

or should go single collection + sharding?

each collection expected hold millions of documents. if use one, store billions of docs.

thanks in advance!

------- edit:

thanks great answers.

in fact 120 collections self made limit, it's not optimal:

the data in collections related web publishers. there millions of these (any web site can join).

i guess ideal situation if create collection each publisher (to hold data only). obviously, not possible due mongo limitations.

so came idea of fixed number of collections @ least distribute data somehow. like: collection "a_xx" hold xx platform related data publishers names start "a".. etc. we'll support few of these platforms, 120 collections should more enough.

on website suggested using many databases instead of many collections. means overhead , have use / manage many different connections.

what think this? there better solution?

sorry not being specific enough in original question.

thanks in advance

single sharded collection

the edited version of question makes actual requirement clearer: have collection can potentially grow large , want approach partition data. artificial collection limit own planned partitioning scheme.

in case, think best off using single collection , taking advantage of mongodb's auto-sharding feature distribute data , workload multiple servers required. multiple collections still valid approach, unnecessarily complicates application code & deployment versus leveraging core mongodb features. assuming choose shard key, data automatically balanced across shards.

you can not have shard immediately; can defer decision until see workload requiring more write scale (but knowing option there when need it). have other options before deciding shard well, such upgrading servers (disks , memory in particular) better support workload. conversely, don't want wait until system crushed workload before sharding need monitor growth. suggest using free mongodb monitoring service (mms) provided 10gen.

on website suggested using many databases instead of many collections. means overhead , have use / manage many different connections.

multiple databases add more administrative overhead, , overkill , possibly detrimental use case. storage allocated @ database level, 120 databases consuming more space single database 120 collections.

fixed number of collections (original answer)

if can plan fixed number of collections (120 per original question description), think makes more sense take approach rather using monolithic collection.

note: design considerations below still apply, since question updated clarify multiple collections attempted partitioning scheme, sharding single collection more straightforward approach.

the motivations using separate collections be:

your documents single large collection have include indication of collection subtype, may need added multiple indexes , increase index sizes. separate collections subtype implicit in collection namespace.
sharding enabled @ collection level. single large collection gives "all or nothing" approach, whereas individual collections allow control subset(s) of data need sharded , choose more appropriate shard keys.
you can use compact command defragment individual collections. note: compact blocking operation, normal recommendation ha production environment deploy replica set , use rolling maintenance (i.e. compact secondaries first, step down , compact primary).
mongodb 2.4 (and 2.2) have database-level write lock granularity. in practice has not proven problem vast majority of use cases, multiple collections allow more move high activity collections separate databases if needed.
further previous point .. if have data in separate collections, these able take advantage of future improvements in collection-level locking (see server-1240 in mongodb jira issue tracker).

Search This Blog

Babette