MongoDB - Aggregation

MongoDB - Aggregation

Aggregation operations process records of data and return calculated results. Aggregation operations group values ​​from multiple documents together and can perform different operations on the grouped data to return the same result. In SQL, count(*) and with group by is the equivalent of mongodb aggregation.

Aggregate() method

For aggregation in MongoDB, you must use the aggregate() method .

Syntax

The basic syntax of the aggregate() method is as follows:

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

example

In the collection, you have the following data −

{ 
   _id : ObjectId ( 7df78ad8902c ) 
   title : 'MongoDB Overview' ,  
   description : 'MongoDB is no sql database' , 
   by_user : 'tutorials point' , 
   url : 'http://www.tutorialspoint.com' , 
   tags : [ 'mongodb ' , 'database' , 'NoSQL' ], 
   likes : 100 }, { 
   _id : ObjectId ( 7df78ad8902d ) 
   title :         

  'NoSQL Overview' ,  
   description : 'No sql database is very fast' , 
   by_user : 'tutorials point' , 
   url : 'http://www.tutorialspoint.com' , 
   tags : [ 'mongodb' , 'database' , 'NoSQL ' ], 
   likes : 10 }, { 
   _id : ObjectId ( 7df78ad8902e ) 
   title : 'Neo4j Overview' ,  
   description : 'Neo4j is no sql database' ,       

   
   by_user : 'Neo4j' , 
   url : 'http://www.neo4j.com' , 
   tags : [ 'neo4j' , 'database' , 'NoSQL' ], 
   likes : 750 },      

Now from the above collection, if you want to display a list showing the number of tutorials written by each user, you will use the following aggregate() method:

> db . mycol . aggregate ([{ $group : { _id : "$by_user" , num_tutorial : { $sum : 1 }}}]) { "result" : [ { "_id" : "tutorials point" , "num_tutorial" : 2 }, { "_id" : "Neo4j" , "num_tutorial" : 1 } ], "ok" : 1 } >    

The sql equivalent query for the above use case will select by_user , count (*) from the group mycol by by_user .

In the example above, we have grouped the documents by the by_user field, and each time by_user occurs, the previous value of the sum is incremented. Below is a list of available aggregation expressions.

expression Description example
$ amount Summarizes a specific value from all documents in the collection. db.mycol.aggregate([{ $group: {_id: "$by_user", num_tutorial: {$sum: "$likes"}}}])
$ cf Calculates the average of all given values ​​from all documents in the collection. db.mycol.aggregate([{ $group: {_id: "$by_user", num_tutorial: {$avg: "$likes"}}}])
$ min Gets the minimum of matching values ​​from all documents in the collection. db.mycol.aggregate([{ $group: {_id: "$by_user", num_tutorial: {$min: "$likes"}}}])
$max Gets the maximum of matching values ​​from all documents in the collection. db.mycol.aggregate([{ $group: {_id: "$by_user", num_tutorial: {$max: "$likes"}}}])
$ push Inserts an array value into the resulting document. db.mycol.aggregate([{ $group: {_id: "$by_user", url: {$push: "$url"}}}])
$addToSet Inserts an array value into the resulting document, but does not create duplicates. db.mycol.aggregate([{ $group: {_id: "$by_user", url: {$addToSet: "$url"}}}])
$ first Gets the first document from the source documents according to the grouping. Typically, this only makes sense in conjunction with some previously applied "$sort" step. db.mycol.aggregate([{ $group: {_id: "$by_user", first_url: {$first: "$url"}}}])
$ last Gets the latest document from the source documents according to the grouping. Typically, this only makes sense in conjunction with some previously applied "$sort" step. db.mycol.aggregate([{$group:{_id:"$by_user",last_url:{$last:"$url"}}}])

Pipeline concept

In a UNIX command, a shell pipeline means being able to perform an operation on some input and use the output as input for the next command, and so on. MongoDB also supports the same concept in the aggregation framework. There are many possible stages, and each of them is taken as a set of documents as input and creates the resulting set of documents (or the final resulting JSON document at the end of the pipeline). This can then be used for the next step and so on.

Following are the possible steps in the aggregation framework −

$project − It is used to select some specific fields from the collection.

$match is a filtering operation that can reduce the number of documents that are passed as input to the next stage.

$group - Performs the actual aggregation as described above.

$sort - Sorts documents.

$skip - This allows you to skip ahead in the list of documents for a given number of documents.

$limit - limits the number of documents to view to a given number, starting from the current positions.

$unwind - Used to unwind a document that uses arrays. When using an array, the data is kind of pre-combined, and this operation will be reversed to get the individual documents again. Thus, at this stage, we will increase the number of documents for the next stage.