{"_id": 1,
"first_name": "Tom",
"email": "tom@example.com",
"cell": "765-555-5555",
"likes": [
"fashion",
"spas",
"shopping"
],"businesses": [
{"name": "Entertainment 1080",
"partner": "Jean",
"status": "Bankrupt",
"date_founded": {
"$date": "2012-05-19T04:00:00Z"
}
},
{"name": "Swag for Tweens",
"date_founded": {
"$date": "2012-11-01T04:00:00Z"
}
}
]
}
Doing Data Analytics Using NOSQL
Introduction to MongoDB
MongoDB Database
MongoDB in Summary
- It is a General purpose
Document
database - structures data in documents similar to json Objects
- A bit different from how relational databases ,structures data in
tables
ofrows
andcolumns
- Provides drivers in all major programming languages
- Allows flexibility of using one format for any application and thus making development easier and speeds up productivity
What is a document model
A document database (also known as a document-oriented database or a document store) is a database that stores information in documents. Document databases offer a variety of advantages, including:
- An intuitive data model that is fast and easy for developers to work with
- A flexible schema that allows for the data model to evolve as application needs change
- The ability to horizontally scale out Because of these advantages, document databases are general-purpose databases that can be used in a variety of use cases and industries.
Document databases are considered to be non-relational (or NoSQL) databases. Instead of storing data in fixed rows and columns, document databases use flexible documents. Document databases are the most popular alternative to tabular, relational databases. Learn more about NoSQL databases.
What are documents?
A document is a record in a document database. A document typically stores information about one object and any of its related metadata. Documents store data in field-value pairs. The values can be a variety of types and structures, including strings, numbers, dates, arrays, or objects. Documents can be stored in formats like JSON, BSON, and XML.
Below is a JSON document that stores information about a user named Tom.
Collections
A collection is a group of documents. Collections typically store documents that have similar contents. Not all documents in a collection are required to have the same fields, because document databases have flexible schemas. Note that some document databases provide schema validation, so the schema can optionally be locked down when needed.
Continuing with the example above, the document with information about Tom could be stored in a collection named users. More documents could be added to the users collection in order to store information about other users. For example, the document below that stores information about Donna could be added to the users collection.
{"_id": 2,
"first_name": "Donna",
"email": "donna@example.com",
"spouse": "Joe",
"likes": [
"spas",
"shopping",
"live tweeting"
],"businesses": [
{"name": "Castle Realty",
"status": "Thriving",
"date_founded": {
"$date": "2013-11-21T04:00:00Z"
}
}
] }
CRUD operations
Document databases typically have an API or query language that allows developers to execute the CRUD (create, read, update, and delete) operations.
-
Create:
Documents can be created in the database. Each document has a unique identifier. -
Read :
Documents can be read from the database. The API or query language allows developers to query for documents using their unique identifiers or field values. Indexes can be added to the database in order to increase read performance. -
Update:
Existing documents can be updated — either in whole or in part. -
Delete:
Documents can be deleted from the database.
What are the key features of document databases?
Document databases have the following key features:
-
Document model:
Data is stored in documents (unlike other databases that store data in structures like tables or graphs). Documents map to objects in most popular programming languages, which allows developers to rapidly develop their applications. +Flexible schema:
Document databases have flexible schemas, meaning that not all documents in a collection need to have the same fields. Note that some document databases support schema validation, so the schema can be optionally locked down. -
Distributed and resilient:
Document databases are distributed, which allows for horizontal scaling (typically cheaper than vertical scaling) and data distribution. Document databases provide resiliency through replication. -
Querying through an API or query language:
Document databases have an API or query language that allows developers to execute the CRUD operations on the database. Developers have the ability to query for documents based on unique identifiers or field values.
What makes document databases different from relational databases?
Three key factors differentiate document databases from relational databases:
- The intuitiveness of the data model: Documents map to the objects in code, so they are much more natural to work with. There is no need to decompose data across tables, run expensive joins, or integrate a separate Object Relational Mapping (ORM) layer. Data that is accessed together is stored together, so developers have less code to write and end users get higher performance.
- The ubiquity of JSON documents: JSON has become an established standard for data interchange and storage. JSON documents are lightweight, language-independent, and human-readable. Documents are a superset of all other data models so developers can structure data in the way their applications need — rich objects, key-value pairs, tables, geospatial and time-series data, or the nodes and edges of a graph.
- The flexibility of the schema: A document’s schema is dynamic and self-describing, so developers don’t need to first pre-define it in the database. Fields can vary from document to document. Developers can modify the structure at any time, avoiding disruptive schema migrations. Some document databases offer schema validation so you can optionally enforce rules governing document structures.
{"_id": 1,
"first_name": "Tom",
"email": "tom@example.com",
"cell": "765-555-5555",
"likes": [
"fashion",
"spas",
"shopping"
],"businesses": [
{"name": "Entertainment 1080",
"partner": "Jean",
"status": "Bankrupt",
"date_founded": {
"$date": "2012-05-19T04:00:00Z"
}
},
{"name": "Swag for Tweens",
"date_founded": {
"$date": "2012-11-01T04:00:00Z"
}
}
] }
All of the information about Tom is stored in a single document.
Now let’s consider how we can store that same information in a relational database. We’ll begin by creating a table that stores the basic information about the user.
ID | First name | cell | |
---|---|---|---|
1 | Tom | Tom@example.com | 555-555-555 |
A user can like many things (meaning there is a one-to-many relationship between a user and likes), so we will create a new table named “Likes” to store a user’s likes. The Likes table will have a foreign key that references the ID column in the Users table.
ID | User_Id | like |
---|---|---|
10 | 1 | Fashion |
11 | 1 | Spars |
12 | 1 | Shopping |
ID | User_ID | Name | Partner | status | date_founded |
---|---|---|---|---|---|
20 | 1 | entertainment | Jean | Bankrupt | 2011-05-19 |
21 | 1 | swag | NULL | NULL | 2011-06-12 |
In this simple example, we see that data about a user could be stored in a single document in a document database or three tables in a relational database. When a developer wants to retrieve or update information about a user in the document database, they can write one query with zero joins. Interacting with the database is straightforward, and modeling the data in the database is intuitive.
Mapping Terms and Concepts from SQL to MongoDB
Column => Field
Table => Collection
Database=>Database
knitr::include_graphics("compare6.jpeg")
Left Outer Join ⇒ $lookup (Aggregation Pipeline)
When you need to pull all of the information from one table and join it with any matching information in a second table, you can use a left outer join in SQL. MongoDB has a stage similar to a left outer join that you can use with the aggregation framework. For those not familiar with the aggregation framework, it allows you to analyze your data in real-time. Using the framework, you can create an aggregation pipeline that consists of one or more stages. Each stage transforms the documents and passes the output to the next stage. $lookup
is an aggregation framework stage that allows you to perform a left outer join to an unsharded collection in the same database.
What are the relationships between document databases and other databases?
The document model is a superset of other data models, including key-value pairs, relational, objects, graph, and geospatial.
- Key-value pairs can be modeled with fields and values in a document. Any field in a document can be indexed, providing developers with additional flexibility in querying the data.
- Relational data can be modeled differently (and some would argue more intuitively) by keeping related data together in a single document using embedded documents and arrays. Related data can also be stored in separate documents, and database references can be used to connect the related data.
- Documents map to objects in most popular programming languages.
- Graph nodes and/or edges can be modeled as documents. Edges can also be modeled through database references. Graph queries can be run using operations like $graphLookup.
- Geospatial data can be modeled as arrays in documents.
Working with MongoDB Syntax in R and Compass
- install and load
mongolite
package
- Connect to a connection string
connection_string = "mongodb+srv://myAtlasDBUser:mosesDowart@myatlasclusteredu.ywdjesc.mongodb.net/?retryWrites=true&w=majority&appName=myAtlasClusterEDU"
trips_collection = mongo(collection="trips", db="sample_training", url=connection_string)
insecting documents to a collection
insertOne()
Insert a Single Document Use insertOne() to insert a document into a collection. Within the parentheses of insertOne(), include an object that contains the document data. The syntax is as follows
<collection>.insertOne() db.
- if we wanted to insert a document to a collection called
grades
we would go ahead and write
db.grades.insertOne()
mongoDB automatically creates a database for you if it does not exist
Here’s an example:
db.grades.insertOne({
: 654321,
student_id: [
products
{: "exam",
type: 90,
score
},
{: "homework",
type: 59,
score
},
{: "quiz",
type: 75,
score
},
{: "homework",
type: 88,
score
},
],: 550,
class_id })
insertMany()
Insert Multiple Documents Use insertMany() to insert multiple documents at once. Within insertMany(), include the documents within an array. Each document should be separated by a comma. the syntax is as follows , an array of documents .
<collection>.insertMany([
db.<collection 1>,
<collection 2>,
...... ])
Here’s an example:
db.grades.insertMany([
{: 546789,
student_id: [
products
{: "quiz",
type: 50,
score
},
{: "homework",
type: 70,
score
},
{: "quiz",
type: 66,
score
},
{: "exam",
type: 70,
score
},
],: 551,
class_id
},
{: 777777,
student_id: [
products
{: "exam",
type: 83,
score
},
{: "quiz",
type: 59,
score
},
{: "quiz",
type: 72,
score
},
{: "quiz",
type: 67,
score
},
],: 550,
class_id
},
{: 223344,
student_id: [
products
{: "exam",
type: 45,
score
},
{: "homework",
type: 39,
score
},
{: "quiz",
type: 40,
score
},
{: "homework",
type: 88,
score
},
],: 551,
class_id
}, ])
Finding documents in MongoDB
- the syntax for finding documents is given by :
<collection>.find() db.
similar SQL query would be :
* FROM collection; SELECT
Retrieving a specific document from our collection
$eq Stands for Equal
: {$eq : <value>}}
{field
: <value>} {field
Find a Document with Equality When given equality with an _id
field, the find()
command will return the specified document that matches the _id
. Here’s an example:
db.zips.find({ _id: ObjectId("5c8eccc1caa187d17ca6ed16") })
Find Documents with an Array That Contains a Specified Value In the following example, “InvestmentFund” is not enclosed in square brackets, so MongoDB returns all documents within the products array that contain the specified value.
db.accounts.find({ products: "InvestmentFund"})
a similar query in
SQL
would be :
* FROM accounts WHERE products = "InvestmentFund"; SELECT
In R
Get all the rows in Mongolite
trips_collection$find()
Get a few rows with limit
trips_collection$find(limit = 5)
#> tripduration start station id start station name end station id
#> 1 589 489 10 Ave & W 28 St 284
#> 2 1770 3226 W 82 St & Central Park West 509
#> 3 694 268 Howard St & Centre St 497
#> 4 1376 527 E 33 St & 2 Ave 259
#> 5 280 512 W 29 St & 9 Ave 466
#> end station name bikeid usertype birth year
#> 1 Greenwich Ave & 8 Ave 21997 Subscriber 1982
#> 2 9 Ave & W 22 St 22541 Subscriber 1970
#> 3 E 17 St & Broadway 15747 Subscriber 1996
#> 4 South St & Whitehall St 23933 Subscriber 1963
#> 5 W 25 St & 6 Ave 23144 Subscriber 1988
#> start station location.type start station location.coordinates
#> 1 Point -74.00177, 40.75066
#> 2 Point -73.97137, 40.78275
#> 3 Point -73.99973, 40.71911
#> 4 Point -73.97606, 40.74402
#> 5 Point -73.99839, 40.75007
#> end station location.type end station location.coordinates
#> 1 Point -74.00264, 40.73902
#> 2 Point -74.00197, 40.74550
#> 3 Point -73.99009, 40.73705
#> 4 Point -74.01234, 40.70122
#> 5 Point -73.99145, 40.74395
#> start time stop time
#> 1 2016-01-01 02:00:48 2016-01-01 02:10:37
#> 2 2016-01-01 02:04:43 2016-01-01 02:34:14
#> 3 2016-01-01 02:02:18 2016-01-01 02:13:53
#> 4 2016-01-01 02:03:12 2016-01-01 02:26:09
#> 5 2016-01-01 02:10:29 2016-01-01 02:15:09
get all rows where tripduration is 1770
trips_collection$find('{"tripduration": 1770}')
#> tripduration start station id start station name end station id
#> 1 1770 3226 W 82 St & Central Park West 509
#> 2 1770 2006 Central Park S & 6 Ave 3165
#> end station name bikeid usertype birth year
#> 1 9 Ave & W 22 St 22541 Subscriber 1970
#> 2 Central Park West & W 72 St 19570 Customer
#> start station location.type start station location.coordinates
#> 1 Point -73.97137, 40.78275
#> 2 Point -73.97634, 40.76591
#> end station location.type end station location.coordinates
#> 1 Point -74.00197, 40.74550
#> 2 Point -73.97621, 40.77579
#> start time stop time
#> 1 2016-01-01 02:04:43 2016-01-01 02:34:14
#> 2 2016-01-01 12:59:01 2016-01-01 13:28:31
get all rows where usertype is
Customer
trips_collection$find('{"usertype": "Customer"}')
#> _id usertype
#> 1 572bb8222b288919b68abfaa Customer
#> 2 572bb8222b288919b68abfdd Customer
#> 3 572bb8222b288919b68ac031 Customer
#> 4 572bb8222b288919b68ac073 Customer
#> 5 572bb8222b288919b68ac0e9 Customer
$in
Operator
Find a Document by Using the $in
Operator Use the $in
operator to select documents where the value of a field equals any value in the specified array. The syntax is given by
<collection>.find({
db.<field>:{$in: [<value>,<value2>,....]}
})
Here’s an example:
db.zips.find({ tripduration: { $in: [1770, 1376] } })
$nin
$nin
stands for Not In.
Find restaurants not in these three shop ranks 1, 3, 5 .
db.restaurants.find(
{: {
cuisine$nin: ["American", "Irish", "Bakery"]
}
} )
In R
trips_collection$find('{"tripduration": {"$in" : [1770,1376]}}')
#> tripduration start station id start station name end station id
#> 1 1770 3226 W 82 St & Central Park West 509
#> 2 1376 527 E 33 St & 2 Ave 259
#> 3 1770 2006 Central Park S & 6 Ave 3165
#> end station name bikeid usertype birth year
#> 1 9 Ave & W 22 St 22541 Subscriber 1970
#> 2 South St & Whitehall St 23933 Subscriber 1963
#> 3 Central Park West & W 72 St 19570 Customer
#> start station location.type start station location.coordinates
#> 1 Point -73.97137, 40.78275
#> 2 Point -73.97606, 40.74402
#> 3 Point -73.97634, 40.76591
#> end station location.type end station location.coordinates
#> 1 Point -74.00197, 40.74550
#> 2 Point -74.01234, 40.70122
#> 3 Point -73.97621, 40.77579
#> start time stop time
#> 1 2016-01-01 02:04:43 2016-01-01 02:34:14
#> 2 2016-01-01 02:03:12 2016-01-01 02:26:09
#> 3 2016-01-01 12:59:01 2016-01-01 13:28:31
Comparison Operators
Finding Documents by Using Comparison Operators Review the following comparison operators:
$gt
,$lt
,$lte
, and$gte
$gt
: Use the$gt
operator to match documents with a field greater than the given value. For example:
db.sales.find({ "tripduration": { $gt: 3244}})
In R
trips_collection$find('{"tripduration": {"$gt" : 3244}}',limit = 5)
#> tripduration start station id start station name end station id
#> 1 7881 479 9 Ave & W 45 St 267
#> 2 49253 463 9 Ave & W 16 St 476
#> 3 11334 479 9 Ave & W 45 St 267
#> 4 11960 529 W 42 St & 8 Ave 486
#> 5 35248 250 Lafayette St & Jersey St 252
#> end station name bikeid usertype birth year
#> 1 Broadway & W 36 St 16515 Customer
#> 2 E 31 St & 3 Ave 23753 Customer
#> 3 Broadway & W 36 St 16177 Customer
#> 4 Broadway & W 29 St 23826 Customer
#> 5 MacDougal St & Washington Sq 21228 Customer
#> start station location.type start station location.coordinates
#> 1 Point -73.99126, 40.76019
#> 2 Point -74.00443, 40.74207
#> 3 Point -73.99126, 40.76019
#> 4 Point -73.99099, 40.75757
#> 5 Point -73.99565, 40.72456
#> end station location.type end station location.coordinates
#> 1 Point -73.98765, 40.75098
#> 2 Point -73.97966, 40.74394
#> 3 Point -73.98765, 40.75098
#> 4 Point -73.98856, 40.74620
#> 5 Point -73.99852, 40.73226
#> start time stop time
#> 1 2016-01-01 03:10:50 2016-01-01 05:22:11
#> 2 2016-01-01 03:20:04 2016-01-01 17:00:57
#> 3 2016-01-01 03:09:38 2016-01-01 06:18:33
#> 4 2016-01-01 03:11:43 2016-01-01 06:31:04
#> 5 2016-01-01 03:11:46 2016-01-01 12:59:14
-
$lt
Use the$lt
operator to match documents with a field less than the given value. For example:
db.sales.find({ "tripduration": { $lt: 100}})
In R
trips_collection$find('{"tripduration": {"$lt" : 100}}',limit = 5)
#> tripduration start station id start station name end station id
#> 1 89 483 E 12 St & 3 Ave 237
#> 2 83 3119 Vernon Blvd & 50 Ave 3122
#> 3 79 395 Bond St & Schermerhorn St 395
#> 4 72 79 Franklin St & W Broadway 249
#> 5 85 307 Canal St & Rutgers St 340
#> end station name bikeid usertype birth year
#> 1 E 11 St & 2 Ave 19374 Subscriber 1971
#> 2 48 Ave & 5 St 24257 Subscriber 1990
#> 3 Bond St & Schermerhorn St 24277 Subscriber 1986
#> 4 Harrison St & Hudson St 15318 Subscriber 1981
#> 5 Madison St & Clinton St 18713 Subscriber 1992
#> start station location.type start station location.coordinates
#> 1 Point -73.98890, 40.73223
#> 2 Point -73.95412, 40.74233
#> 3 Point -73.98411, 40.68807
#> 4 Point -74.00667, 40.71912
#> 5 Point -73.98990, 40.71427
#> end station location.type end station location.coordinates
#> 1 Point -73.98672, 40.73047
#> 2 Point -73.95587, 40.74436
#> 3 Point -73.98411, 40.68807
#> 4 Point -74.00900, 40.71871
#> 5 Point -73.98776, 40.71269
#> start time stop time
#> 1 2016-01-01 02:28:07 2016-01-01 02:29:36
#> 2 2016-01-01 03:11:29 2016-01-01 03:12:53
#> 3 2016-01-01 03:10:26 2016-01-01 03:11:46
#> 4 2016-01-01 04:25:25 2016-01-01 04:26:37
#> 5 2016-01-01 03:58:21 2016-01-01 03:59:47
-
$lte
: Use the$lte
operator to match documents with a field less than or equal to the given value. For example:
db.sales.find({ "customer.age": { $lte: 65}})
-
$gte
: Use the$gte
operator to match documents with a field greater than or equal to the given value. For example:
db.sales.find({ "customer.age": { $gte: 65}})
$elemMatch
Find a Document by Using the $elemMatch
Operator Use the $elemMatch
operator to find all documents that contain the specified subdocument. For example:
db.sales.find({
: {
items$elemMatch: { name: "laptop", price: { $gt: 800 }, quantity: { $gte: 1 } },
}, })
$ne
$ne
stands for Not Equal.
find restaurants with grades not equal to “A”.
db.restaurants.find(
{: {
grades$elemMatch: {
: {
grade$ne: "A"
}
}
}
} )
And , or
Finding Documents by Using Logical Operators Review the following logical operators: implicit $and
, $or
.
db.routes.find({ "usertype": "Subscriber", tripduration: { $lt: 100 } })
similar SQL query would be
* FROM routes WHERE usertype = "Subscriber" AND tripduration < 23; SELECT
db.routes.find({
$or: [{ dst_airport: "SEA" }, { src_airport: "SEA" }],
})
trips_collection$find('{"tripduration": {"$lt" : 100},"usertype":"Subscriber"}',limit = 5)
#> tripduration start station id start station name end station id
#> 1 89 483 E 12 St & 3 Ave 237
#> 2 83 3119 Vernon Blvd & 50 Ave 3122
#> 3 79 395 Bond St & Schermerhorn St 395
#> 4 72 79 Franklin St & W Broadway 249
#> 5 85 307 Canal St & Rutgers St 340
#> end station name bikeid usertype birth year
#> 1 E 11 St & 2 Ave 19374 Subscriber 1971
#> 2 48 Ave & 5 St 24257 Subscriber 1990
#> 3 Bond St & Schermerhorn St 24277 Subscriber 1986
#> 4 Harrison St & Hudson St 15318 Subscriber 1981
#> 5 Madison St & Clinton St 18713 Subscriber 1992
#> start station location.type start station location.coordinates
#> 1 Point -73.98890, 40.73223
#> 2 Point -73.95412, 40.74233
#> 3 Point -73.98411, 40.68807
#> 4 Point -74.00667, 40.71912
#> 5 Point -73.98990, 40.71427
#> end station location.type end station location.coordinates
#> 1 Point -73.98672, 40.73047
#> 2 Point -73.95587, 40.74436
#> 3 Point -73.98411, 40.68807
#> 4 Point -74.00900, 40.71871
#> 5 Point -73.98776, 40.71269
#> start time stop time
#> 1 2016-01-01 02:28:07 2016-01-01 02:29:36
#> 2 2016-01-01 03:11:29 2016-01-01 03:12:53
#> 3 2016-01-01 03:10:26 2016-01-01 03:11:46
#> 4 2016-01-01 04:25:25 2016-01-01 04:26:37
#> 5 2016-01-01 03:58:21 2016-01-01 03:59:47
Find a Document by Using the $or
Operator Use the $or operator to select documents that match at least one of the included expressions. For example:
trips_collection$find('{"$or": [{"tripduration": {"$lt": 100}}, {"usertype": "Subscriber"}]}',limit = 5)
#> tripduration start station id start station name end station id
#> 1 589 489 10 Ave & W 28 St 284
#> 2 1770 3226 W 82 St & Central Park West 509
#> 3 694 268 Howard St & Centre St 497
#> 4 1376 527 E 33 St & 2 Ave 259
#> 5 280 512 W 29 St & 9 Ave 466
#> end station name bikeid usertype birth year
#> 1 Greenwich Ave & 8 Ave 21997 Subscriber 1982
#> 2 9 Ave & W 22 St 22541 Subscriber 1970
#> 3 E 17 St & Broadway 15747 Subscriber 1996
#> 4 South St & Whitehall St 23933 Subscriber 1963
#> 5 W 25 St & 6 Ave 23144 Subscriber 1988
#> start station location.type start station location.coordinates
#> 1 Point -74.00177, 40.75066
#> 2 Point -73.97137, 40.78275
#> 3 Point -73.99973, 40.71911
#> 4 Point -73.97606, 40.74402
#> 5 Point -73.99839, 40.75007
#> end station location.type end station location.coordinates
#> 1 Point -74.00264, 40.73902
#> 2 Point -74.00197, 40.74550
#> 3 Point -73.99009, 40.73705
#> 4 Point -74.01234, 40.70122
#> 5 Point -73.99145, 40.74395
#> start time stop time
#> 1 2016-01-01 02:00:48 2016-01-01 02:10:37
#> 2 2016-01-01 02:04:43 2016-01-01 02:34:14
#> 3 2016-01-01 02:02:18 2016-01-01 02:13:53
#> 4 2016-01-01 02:03:12 2016-01-01 02:26:09
#> 5 2016-01-01 02:10:29 2016-01-01 02:15:09
Find a Document by Using the
$and
Operator Use the$and
operator to use multiple$or
expressions in your query.
db.routes.find({
$and: [
$or: [{ dst_airport: "SEA" }, { src_airport: "SEA" }] },
{ $or: [{ "airline.name": "American Airlines" }, { airplane: 320 }] },
{
] })
Updating MongoDB Documents by Using findAndModify()
The findAndModify() method is used to find and replace a single document in MongoDB. It accepts a filter document, a replacement document, and an optional options object. The following code shows an example:
db.podcasts.findAndModify({
: { _id: ObjectId("6261a92dfee1ff300dc80bf1") },
query: { $inc: { subscribers: 1 } },
update: true,
new })
updateMany()
Updating MongoDB Documents by Using updateMany()
To update multiple documents, use the updateMany() method. This method accepts a filter document, an update document, and an optional options object. The following code shows an example:
db.books.updateMany( { publishedDate: { $lt: new Date("2019-01-01") } }, { $set: { status: "LEGACY" } } )
Deleting Documents in MongoDB
To delete documents, use the deleteOne()
or deleteMany()
methods. Both methods accept a filter document and an options object.
Delete One Document The following code shows an example of the deleteOne() method:
db.podcasts.deleteOne({_id: Objectid("6282c9862acb966e76bbf20a") })
Delete Many Documents The following code shows an example of the deleteMany() method:
db.podcasts.deleteMany({category: "crime"})
Sorting and Limiting Query Results in MongoDB
Review the following code, which demonstrates how to sort and limit query results.
Sorting Results
Use cursor.sort() to return query results in a specified order. Within the parentheses of sort(), include an object that specifies the field(s) to sort by and the order of the sort. Use 1 for ascending order, and -1 for descending order.
Syntax:
db.collection.find(<query>).sort(<sort>)
Example:
// Return data on all music companies, sorted alphabetically from A to Z.
db.companies.find({ category_code: "music" }).sort({ name: 1 })
To ensure documents are returned in a consistent order, include a field that contains unique values in the sort. An easy way to do this is to include the _id field in the sort. Here’s an example:
// Return data on all music companies, sorted alphabetically from A to Z. Ensure consistent sort order
db.companies.find({ category_code: "music" }).sort({ name: 1, _id: 1 })
Limiting Results
Use cursor.limit() to specify the maximum number of documents the cursor will return. Within the parentheses of limit(), specify the maximum number of documents to return.
Syntax:
db.companies.find(<query>).limit(<number>)
Example:
// Return the three music companies with the highest number of employees. Ensure consistent sort order.
db.companies.find({ category_code: "music" })
.sort({ number_of_employees: -1, _id: 1 })
.limit(3)
Descending Order in R
trips_collection$find(sort = '{"tripduration" : -1}' , limit = 5, fields = '{"_id" : true, "tripduration" : true}')
#> _id tripduration
#> 1 572bb8222b288919b68ac07c 326222
#> 2 572bb8232b288919b68b0f0d 279620
#> 3 572bb8232b288919b68b0593 173357
#> 4 572bb8232b288919b68ae9ee 152023
#> 5 572bb8222b288919b68ac1f0 146099
Ascending Order in R
trips_collection$find(sort = '{"tripduration" : 1}' , limit = 5, fields = '{"_id" : true, "tripduration" : true}')
#> _id tripduration
#> 1 572bb8232b288919b68aef63 61
#> 2 572bb8222b288919b68ac2d4 61
#> 3 572bb8222b288919b68acefe 65
#> 4 572bb8232b288919b68afd2a 66
#> 5 572bb8232b288919b68af7cd 66
Replacing a Document in MongoDB
To replace documents in MongoDB, we use the replaceOne() method. The replaceOne() method takes the following parameters:
-
filter:
A query that matches the document to replace. -
replacement:
The new document to replace the old one with. -
options:
An object that specifies options for the update. In the previous video, we use the _id field to filter the document. In our replacement document, we provide the entire document that should be inserted in its place. Here’s the example code
db.books.replaceOne(
{: ObjectId("6282afeb441a74a98dbbec4e"),
_id
},
{: "Data Science Fundamentals for Python and MongoDB",
title: "1484235967",
isbn: new Date("2018-5-10"),
publishedDate:
thumbnailUrl"https://m.media-amazon.com/images/I/71opmUBc2wL._AC_UY218_.jpg",
: ["David Paper"],
authors: ["Data Science"],
categories
} )
Updating MongoDB Documents by Using updateOne()
The updateOne() method accepts a filter document, an update document, and an optional options object. MongoDB provides update operators and options to help you update documents. In this section, we’ll cover three of them: $set, upsert, and $push.
$set
The $set operator replaces the value of a field with the specified value, as shown in the following code:
db.podcasts.updateOne(
{: ObjectId("5e8f8f8f8f8f8f8f8f8f8f8"),
_id
},
{$set: {
: 98562,
subscribers
},
} )
upsert
The upsert option creates a new document if no documents match the filtered criteria. Here’s an example:
db.podcasts.updateOne(
: "The Developer Hub" },
{ title$set: { topics: ["databases", "MongoDB"] } },
{ : true }
{ upsert )
$push
The $push operator adds a new value to the hosts array field. Here’s an example:
db.podcasts.updateOne(
: ObjectId("5e8f8f8f8f8f8f8f8f8f8f8") },
{ _id$push: { hosts: "Nic Raboy" } }
{ )
Updating MongoDB Documents by Using findAndModify()
The findAndModify() method is used to find and replace a single document in MongoDB. It accepts a filter document, a replacement document, and an optional options object. The following code shows an example:
db.podcasts.findAndModify({
: { _id: ObjectId("6261a92dfee1ff300dc80bf1") },
query: { $inc: { subscribers: 1 } },
update: true,
new })
Updating MongoDB Documents by Using updateMany()
To update multiple documents, use the updateMany() method. This method accepts a filter document, an update document, and an optional options object. The following code shows an example:
db.books.updateMany(
: { $lt: new Date("2019-01-01") } },
{ publishedDate$set: { status: "LEGACY" } }
{ )
Deleting Documents in MongoDB
To delete documents, use the deleteOne() or deleteMany() methods. Both methods accept a filter document and an options object.
Delete One Document The following code shows an example of the deleteOne() method:
db.podcasts.deleteOne({ _id: Objectid("6282c9862acb966e76bbf20a") })
Delete Many Documents The following code shows an example of the deleteMany() method:
db.podcasts.deleteMany({category: “crime”})
Returning Specific Data from a Query in MongoDB
Review the following code, which demonstrates how to return selected fields from a query.
Add a Projection Document
To specify fields to include or exclude in the result set, add a projection document as the second parameter in the call to db.collection.find().
Syntax:
db.collection.find( <query>, <projection> )
Include a Field To include a field, set its value to 1 in the projection document.
Syntax:
db.collection.find( <query>, { <field> : 1 })
Example:
// Return all restaurant inspections - business name, result, and _id fields only
db.inspections.find(
: "Restaurant - 818" },
{ sector: 1, result: 1 }
{ business_name )
Exclude a Field To exclude a field, set its value to 0 in the projection document.
Syntax:
db.collection.find(query, { <field> : 0, <field>: 0 })
Example:
// Return all inspections with result of “Pass” or “Warning” - exclude date and zip code
db.inspections.find(
: { $in: ["Pass", "Warning"] } },
{ result: 0, "address.zip": 0 }
{ date )
While the _id field is included by default, it can be suppressed by setting its value to 0 in any projection.
// Return all restaurant inspections - business name and result fields only
db.inspections.find(
: "Restaurant - 818" },
{ sector: 1, result: 1, _id: 0 }
{ business_name )
Counting Documents in a MongoDB Collection
Count Documents Use db.collection.countDocuments() to count the number of documents that match a query. countDocuments() takes two parameters: a query document and an options document.
Syntax:
db.collection.countDocuments( <query>, <options> )
The query selects the documents to be counted.
Examples:
// Count number of docs in trip collection
db.trips.countDocuments({})
// Count number of trips over 120 minutes by subscribers
db.trips.countDocuments({ tripduration: { $gt: 120 }, usertype: "Subscriber" })
MongoDB Aggregation
Definitions Aggregation: Collection and summary of data
Stage: One of the built-in methods that can be completed on the data, but does not permanently alter it
Aggregation pipeline: A series of stages completed on the data in order
Structure of an Aggregation Pipeline
db.collection.aggregate([
{$stage1: {
{ expression1 },
{ expression2 }...
},$stage2: {
{ expression1 }...
}
} ])
Using $match
and $group
Stages in a MongoDB Aggregation Pipeline Review the following sections, which show the code for the $match
and $group
aggregation stages.
$match
The $match stage filters for documents that match specified conditions. Here’s the code for $match:
{$match: {
"field_name": "value"
} }
$group
The $group stage groups documents by a group key.
{$group:
{: <expression>, // Group key
_id<field>: { <accumulator> : <expression> }
} }
$match
and $group
in an Aggregation Pipeline
The following aggregation pipeline finds the documents with a field named “state” that matches a value “CA” and then groups those documents by the group key “$city” and shows the total number of zip codes in the state of California.
db.zips.aggregate([
{ $match: {
: "CA"
state
}
},
{$group: {
: "$city",
_id: { $count : { } }
totalZips
}
} ])
Using $sort
and $limit
Stages in a MongoDB Aggregation Pipeline
Review the following sections, which show the code for the $sort
and $limit
aggregation stages.
$sort
The $sort stage sorts all input documents and returns them to the pipeline in sorted order. We use 1 to represent ascending order, and -1 to represent descending order.
{$sort: {
"field_name": 1
} }
$limit
The $limit stage returns only a specified number of records.
{$limit: 5
}
$sort
and $limit
in an Aggregation Pipeline
The following aggregation pipeline sorts the documents in descending order, so the documents with the greatest pop value appear first, and limits the output to only the first five documents after sorting.
db.zips.aggregate([
{$sort: {
: -1
pop
}
},
{$limit: 5
} ])
Using $project, $count, and $set Stages in a MongoDB Aggregation Pipeline
$project
The $project stage specifies the fields of the output documents. 1 means that the field should be included, and 0 means that the field should be supressed. The field can also be assigned a new value.
{$project: {
:1,
state:1,
zip:"$pop",
population:0
_id
} }
$set
The $set stage creates new fields or changes the value of existing fields, and then outputs the documents with the new fields.
{$set: {
: {
place$concat:["$city",",","$state"]
},:10000
pop
} }
$count
The $count stage creates a new document, with the number of documents at that stage in the aggregation pipeline assigned to the specified field name.
{$count: "total_zips"
}