Fastest way to get histogram of array sizes using MongoDB aggregation framework -
i'm trying list of number of records have arrays of varying size. want distribution of array sizes records can build histogram this:
| * | * documents | * * | * * * |_*__*__*___*__*___ 2 5 6 23 47 array size
so raw documents this:
{hubs : [{stuff:0, id:6}, {stuff:1"}, .... ]} {hubs : [{stuff:0, id:6}]}`
so far using aggregation framework , of here i've come
db.sitedata.aggregate([{ $unwind:'$hubs'}, { $group : {_id:'$_id', count:{$sum:1}}}, { $group : {_id:'$count', count:{$sum:1}}}, { $sort : {_id: 1}}])
this seems give me results want, it's not fast. i'm wondering if there can may not need 2 group calls. syntax wrong here, i'm trying put count value in first _id field:
db.sitedata.aggregate([{ $unwind:'$hubs'}, { $group : {_id:{$count:$hubs}, count:1}}, { $sort : { _id: 1 }}])
now 2.6 out, aggregation framework supports new array operator $size
allow $project
array size without having unwind , re-group.
db.sitedata.aggregate([{ $project:{ 'count': { '$size':'$hubs'} } }, { $group : {_id:'$count', count:{$sum:1} } }, { $sort : { _id: 1 } } ] )
Comments
Post a Comment