Contents


Overview

MongoDB is a scalable, high-performance, open source, structured storage system. MongoDB provides JSON-style document-oriented storage with full index support, sharding, sophisticated replication, and compatibility with the MapReduce paradigm. MongoDB focuses on flexibility, power, speed, and ease of use.

Key Features

  • Designed to work seamlessly with object-oriented languages
  • Highly available when configured as a replica set
  • Fast read/write performance on GoGrid's SSD Cloud Servers
  • Flexible schema design
  • GoGrid's 1-Button Deploy™ for easy Development and Production environments
  • Preconfigured with OS-level performance tweaks
  • Deployed on a redundant 10-Gbps high-performance fabric

Data Center Availability

MongoDB 1-Button Deploy™ is currently available in US-West-1.

Development Architecture

Although MongoDB can be deployed in standalone mode for development instances, GoGrid 1-Button Deploy™ configures a replica set. This configuration is to allow customers to test how true failover would work in production. The deployment of the latest open source version of MongoDB is automatically deployed and it is configured to replicate data from the primary to the secondary replicas. 1-Button Deploy™ will generate a replica set with 3 Cloud Servers. This architecture is perfect for testing out MongoDB and learning its capabilities for small working sets.


1-Button Deploy™ MongoDB Development environment

  • 3 Medium SSD Cloud Servers
  • Latest version of open-source MongoDB
  • Configured as a replica set: one server set as primary, the rest as secondaries



MongoDB-dev.png

Production Architecture

The Production deployment is a replica set of 5 larger Cloud Servers that are designed to handle larger working sets and production-grade workloads. The Firewall Service is also deployed to provide additional protection. Although this architecture is designed to handle large working sets, it's not configured to handle sharding. If you require sharding, you'll need to manually configure these servers and deploy additional servers as required for a sharding setup (such as servers for mongos and the configuration servers).


1-Button Deploy™ MongoDB Production environment

  • 5 X-Large SSD Cloud Servers
  • Latest version of open-source MongoDB
  • Configured as a replica set: one server set as primary, the rest as secondaries
  • MongoDB performance tuning adjustments
  • Firewall Service enabled
    • All services are blocked except SSH (22) and Ping for public traffic




MongoDB-prod.png

Quick Start

After the 1-Button Deploy™ is complete, you'll have either 3 or 5 Cloud Servers deployed with MongoDB. You'll need to login to the primary server of the replica set either with a third-party client tool (like Putty) or through the Console service. You can find your login information on the Management Console password page. The primary server should be marked as '_1'.

Once you've logged in, you can interface with the databasever (mongod) by typing the mongo command.

mongo

By default, mongo looks for a database server listening on port 27017 on the localhost interface. To connect to a server on a different port or interface, use the --port and --host options. If you want to verify that the server is the primary, it will be noticed as such in the mongo prompt. Also you can run the command:

db.isMaster()

Make sure to capitalize the "M". This will return a JSON document that will indicate which server is the primary. In general, the server with "01" in its name is the primary. You can spot it in the grid view and list view.

rs0:PRIMARY> db.isMaster()
 {
       "setName" : "rs0",
       "ismaster" : true,
       "secondary" : false,
       "hosts" : [
               "mongod-01:27017",
               "mongod-02:27017",
               "mongod-03:27017"
       ],
       "primary" : "mongod-01:27017",
       "me" : "mongod-01:27017",
       "maxBsonObjectSize" : 16777216,
       "maxMessageSizeBytes" : 48000000,
       "localTime" : ISODate("2014-01-30T17:38:46.316Z"),
       "ok" : 1
 }

If this server isn't the primary, log into the server listed as "primary" in the document. Once you're inside the mongo shell, you can enter commands to either view data or insert data. Our replica sets are configured wit auth enabled and mongod running. Follow these instructions to add the administrator assuming that authentication is already enabled.

The details of what commands you can use are in the MongoDB manual. Try out this tutorial to get to know MongoDB a little better.

Basic Concepts

MongoDB is described as a document-oriented database. This doesn't mean that it's composed of word files. You're not creating a table with columns and data types. MongoDB can accept just about any data type and dynamically creates the schema when you load data.

  • A document is the basic unit of data in MongoDB and is a record in a MongoDB collection. They are like JSON (JavaScript Object Notation) objects but exist in the database in a format known as BSON (a portmantaeu of Binary and JSON).
  • A collection is a grouping of documents. This is equivalent to an RDBMS table, however it doesn't have a set schema. For example, documents within a collection can have different fields.
  • A database in MongoDB is the same as an RDBMS database. It contains one or many collections.
  • A replica set is a cluster of MongoDB servers configured for high availability. It has a primary server and several secondaries.
  • The naming convention of all the servers are MongoDB_index_number.
  • MongoDB is the Big Data technology on the replica set, the index is a generated code, and the number is a counter.
  • The primary server is always set to a number of 1.


MongoDB RDBMS MongoDB Notes
Document Row No pre-defined types
Collection Table Flexible schema
Database Database
Replica Set Cluster

Sample Document Structure

A document or record in MongoDB is stored as BSON and can be pulled by a query. None of this data is pre-loaded into MongoDB, this is just for illustrative purposes.


db.giants.find( {role: "catcher"} )

This queries the collection "giants" for field "role" that matches the value of "catcher". You will get the following result:


{ 
   "_id": Objectid("5274276e009c1cb57a425910")
   "role": "catcher",
   "names": [ "Posey", "Sanchez", "Quiroz"],
   "moddate": "10-31-2013"

}

This is an example of what a MongoDB document can look like. They are organized as key (field) value pairs. It's recommend that to keep the keys within a document unique.

  • _id is included with every document and is automatically generated by MongoDB. This is the primary key.
  • role is the key and "catcher" is the value
  • names is an example of a JSON array of strings
  • moddate is the key and "10-31-2013" is the value

Limitations

  • The maximum BSON document size is 16 MB. If you need larger documents, then refer to the MongoDB documentation regarding GridFS.
  • _id is the primary key and must be unique in a collection
  • The order of keys in a document may change

Importing Data

You can either create your own data on MongoDB manually or through your own application. You can also import data using the mongoimport command. Because MongoDB uses BSON, data in JSON format works best, however you should be able to tweak this command to work with other formats like CSV. The following is an example of the mongoimport command:

mongoimport --db test --collection collection --file file.json
  • test is the name of the database that you want to load your data into
  • collection is the name of the collection within the database that you want to use. This can be created on the fly through this command
  • file.json is the name of the file on the server that you want to load

Replica Sets

A MongoDB cluster is called a replica set. It's very similar to an RDBMS in that there is a primary server and secondary servers. Replica sets are used for fault tolerance and high availability. If one of the servers in the replica set goes down, there are still other replicas that will have the data available. The primary is an important member because it is the only one that can receive write operations. The secondaries apply the operations from the primary so that they have the same data set. Note that this means that in some cases, the secondaries may return older or inconsistent data - this is because replication is asynchronous.


Verifying the Replica Set

Make sure that you're connected to the primary server and start the mongo shell. Enter the following method:

rs.status()

This will return a document with all the members of the replica set. If you have deployed a development replica set then you should see three entries. The secondaries will also indicate that they are syncing with the primary.

{
       "set" : "rs0",
       "date" : ISODate("2014-02-04T01:00:42Z"),
       "myState" : 1,
       "members" : [
               {
                       "_id" : 0,
                       "name" : "mongod-01:27017",
                       "health" : 1,
                       "state" : 1,
                       "stateStr" : "PRIMARY",
                       "uptime" : 373581,
                       "optime" : Timestamp(1391102214, 1),
                       "optimeDate" : ISODate("2014-01-30T17:16:54Z"),
                       "self" : true
               },
               {
                       "_id" : 1,
                       "name" : "mongod-02:27017",
                       "health" : 1,
                       "state" : 2,
                       "stateStr" : "SECONDARY",
                       "uptime" : 151718,
                       "optime" : Timestamp(1391102214, 1),
                       "optimeDate" : ISODate("2014-01-30T17:16:54Z"),
                       "lastHeartbeat" : ISODate("2014-02-04T01:00:41Z"),
                       "lastHeartbeatRecv" : ISODate("2014-02-04T01:00:41Z"),
                       "pingMs" : 0,
                       "syncingTo" : "mongod-01:27017"
               },
               {
                       "_id" : 2,
                       "name" : "mongod-03:27017",
                       "health" : 1,
                       "state" : 2,
                       "stateStr" : "SECONDARY",
                       "uptime" : 373428,
                       "optime" : Timestamp(1391102214, 1),
                       "optimeDate" : ISODate("2014-01-30T17:16:54Z"),
                       "lastHeartbeat" : ISODate("2014-02-04T01:00:41Z"),
                       "lastHeartbeatRecv" : ISODate("2014-02-04T01:00:41Z"),
                       "pingMs" : 1,
                       "syncingTo" : "mongod-01:27017"
               }
       ],
       "ok" : 1
}

You can also verify replication lag. This is something that you'll want to verify, especially in deployments across data centers because high lag can lead to inconsistent reads.

In the mongo shell of the primary, enter:

 db.printSlaveReplicationInfo()

This will return a document that indicates the last time a secondary member last read from the oplog.

source:   mongod-02:27017
        syncedTo: Thu Jan 30 2014 09:16:54 GMT-0800 (PST)
                = 373457 secs ago (103.74hrs)
source:   mongod-03:27017
        syncedTo: Thu Jan 30 2014 09:16:54 GMT-0800 (PST)
                = 373457 secs ago (103.74hrs)


Automatic Failover
A heartbeat (ping) runs between the replicas in a replica set every 2 seconds. When the primary doesn't respond to other replicas in the set for more than 10 seconds, then the replica set will select another member to become the new primary. The old primary is also marked as inaccessible. This is based on a voting concept - the secondary with the majority of votes is elected as the new primary. At that point, it can receive writes and will act like any other primary member of the replica set.


  • It's a best practice to use an odd number of replica set members. This ensures that an election will have a quorum. The minimum size for a replica set is 3 members.
  • Elections are run automatically. When an election occurs, MongoDB cannot accept writes because there is no primary.
  • A previously unavailable primary can return to the replica set, typically as a secondary. If it contains data that was not replicated to the previous secondaries, a rollback will occur to maintain consistency with the replica set. Rollback data is stored in BSON files in the rollback/ folder under the database's dbpath directory.
  • The MongoDB wiki has additional information if you need to troubleshoot issues with your replica set.

Performance Tuning

MongoDB recommends specific file level adjustments in order to tune performance of your production MongoDB replica set. The following have been implemented by GoGrid in the MongoDB 1-button-deploy™.


  • x64 OS (MongoDB is limited to 2.5GB on x86
  • Ulimit settings configured as follows:
    • -f (file size): unlimited
    • -t (cpu time): unlimited
    • -v (virtual memory): unlimited
    • -n (open files): 64000
    • -m (memory size): unlimited
    • -u (processes/threads): 32000
  • TCP KeepAlive set to 300
  • Read Ahead settings set to 32 blocks (16K)
  • noatime option for mounted drives

You can review the MongoDB recommended settings here.

Getting Help

See our Support page for information on contacting GoGrid for any questions or issues that arise.

If you need help for MongoDB, the MongoDB online manual is the best place for information.

Frequently Asked Questions

  • Is there a 1-Button Deploy™ that sets up a sharded environment?
At this time, there is no 1-Button Deploy™ for MongoDB sharding.
  • I need more servers in my replica set. How does that work?
If you have a full account, you have the ability to add additional servers. It's recommend to deploy the same servers that you already have for your current replica set. There is no automation for this, so you'll need to install mongod on this new server and update your configuration file to start using that server as a secondary.
  • What the heck is this sharding thing everyone is talking about?
MongoDB's strategy for horizontal scaling is very similar to an RDBMS. If you find that your working set is too large to fit into the largest servers that we have available, then you are a candidate for sharding. Typically this means that your working set is larger than the RAM available on your servers. Sharding means to segregate your working set into different replica sets so that you can scale above and beyond the RAM available on your servers. Sharding is a more advanced configuration and should be implemented early in the process if you already know that you'll need it. It'll require a significant increase in the number of servers required and additional configuration.
  • Which member in the replica set is an arbiter?
GoGrid's configurations do not use an arbiter. We recommend using an odd number of replica set members. You can manually add an arbiter if you deem this necessary.
  • How do I write a join?
MongoDB doesn't support joins. You'll need to write a query to pull data out of different collections.
  • Is traffic to and from MongoDB encrypted?
No, MongoDB connections aren't encrypted. We recommend keeping MongoDB replica sets with sensitive information behind a firewall and communicating via private network when possible.
  • Are there transactions in MongoDB
MongoDB is not ACID compliant so there is no built-in support for transactions. You only have atomicity for single documents. It'll be a challenge to model an architecture that has shared state across multiple collections.
  • What does noatime do and where is noexec?
We mount volumes with the noatime option to improve performance. It prevents writes from happening during reads. Noexec is recommend for the data directory for MongoDB. However, in 1-button-deploy™ this is on root so it's not a feasible option. If you attach a block storage volume and intend to use that for the data directory, you can mount that volume with the noatime and noexec options.
Personal tools