Riak is a distributed data store aimed at providing high reliability, fault tolerance and scalability. Riak’s data model is based around a key value pair . Objects stored in Riak can be any binary object such as an image, text, markup or serialized data such as JSON. In addition Riak offers a great complementary set of development tools that allows for rapid development on the system. Its API is RESTful and there are several popular bindings including Python, Erlang, Java, PHP, and .NET.
Riak has several ways to search and retrieve data. Riak has MapReduce built into it. MapReduce is a programming model used in the processing of large data sets in a parallel or distributed algorithm on a cluster. Riak supports full-text search as well as the use of secondary indexes which allow the tagging of objects and provide additional means of querying an object.
- Riak is a distributed data store
- Riak is fault tolerant
- Riak is scalable
- Riak utilizes a RESTful API with popular language bindings
- Riak has built in support for MapReduce
- Riak supports text search
- Riak supports using secondary indexes
- GoGrid's 1-Button Deploy™ for easy Development and Production environments
Data Center Availability
Riak 1-Button Deploy™ is currently available in US-West-1.
GoGrid 1-Button Deploy™ configures a 3-node cluster for Development environments. When developing, testing out the built-in replication and fault-tolerance of Riak is important. This is the best way to see how the masterless design works. It's rare to deploy Riak in a single node because it's intended to be a distributed database.
1-Button Deploy™ Riak Development environment
- 3 Large SSD Cloud Servers
- Latest version of open-source Riak
- Configured as a cluster
The Production deployment is a cluster of 5 larger cloud servers that are designed to handle larger data sets and production-grade workloads. This is the recommend size of production clusters (n+2) where n is the replication level for objects (default is 3). The Firewall Service is also deployed to provide additional protection. Riak is designed so that you can avoid sharding (and the associated expenses). Data in Riak is distributed across nodes using consistent hashing. Consistent hashing ensures data is evenly distributed around the cluster and new nodes can be added with automatic, minimal reshuffling of data. This significantly decreases risky “hot spots” in the database and lowers the operational burden of scaling. More information on this concept can be found at Basho.com
1-Button Deploy™ Riak Production environment
- 5 X-Large SSD Cloud Servers
- Latest version of open-source Riak
- Configured as a cluster
- Firewall Service enabled
- All services are blocked except SSH (22) and Ping for public traffic
After the 1-Button Deploy™ is complete, you'll have either 3 or 5 cloud servers deployed with Riak. Riak is a masterless system, meaning that all the nodes in the cluster are equal. The configuration should already have configured the Riak cluster to communicate with each each. However, you'll want to login to verify. Login to any member of the cluster either with a third-party client tool (like Putty) or through the Console service. You can find your login information on the Management Console password page.
Once you've logged in, you can verify that all members are part of the cluster by typing the following command:
This should give you a list of all the nodes in the cluster. If any are not communicating correctly or if any are missing, it should be readily apparent.
Most interactions with Riak are through its REST API or Protocol Buffers API, but there are also client libraries for Erlang, Java, Python, C/C++ and others. You can learn more from the Riak online manual.
Riak is described as a key/value store. It's similar to other key/value stores like Redis and DynamoDB. Riak can accept just about any data type and load it into the cluster.
- A object is the basic unit of data storage in Riak. It's identified by bucket, key, vector clock, and list of metadata-value pairs.
- A bucket is a grouping of keys. It's a higher level namespace that is equivalent to an RDBMS table.
- A node is a physical server within a Riak cluster. Riak is masterless meaning that all the nodes are the same.
- A cluster is grouping of nodes that communicate with each other and share data.
- The naming convention of all the servers are Riak_index_number.
- Riak is the Big Data technology on the cluster, the index is a generated code, and the number is a counter.
Sample Document Structure
A object in Riak can be pulled by a GET request. None of this data is pre-loaded into Riak, this is just for illustrative purposes.
curl -v -XGET 'http://localhost:8098/riak/giants/catcher'
You can replace "localhost" with the IP address of the node you want to connect to or its qualified domain name. You will get the following result:
HTTP/1.1 200 OK X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fvqvuh6RwZTImMfK8DDoxWm+LAA= Vary: Accept-Encoding Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue) Link: </riak/giants>; rel="up" Last-Modified: Thur, 31 Oct 2013 21:56:23 GMT ETag: "11i3vg76V6YC26pjtYIYvD" Date: Thur, 31 Oct 2013 22:07:14 GMT Content-Type: text/plain Content-Length: 5 Posey
"Posey" is the value returned when calling for the giants/catcher object. This is in the message body. Other information contained in the object is returned in the header, along with the 200 status code.
- You can make the request for data from any node in the cluster
- By default, Riak stores 3 copies of an object in the cluster
- By default, Riak divides its hash ring into 64 partitions
Riak clusters are often referred to as a "ring". This is not a physical or network architecture but rather a way to visualize how objects are handled within a Riak cluster. Riak maps objects with a key hash. The maximum hash value is 2^160 and is divided into partitions. The default partition value is 64. You can image a ring, divided into 64 sections. So the object "giants/catcher" will fall into one of those partitions. Riak also maintains 3 copies of objects by default, so you can also find this object in 2 other partitions. This replication of objects is what gives Riak its fault tolerant characteristics. There is not a one-to-one mapping between partitions and nodes. It's possible to have more than one partition per node, a concept Riak calls vnodes (virtual node). This means that you can have more than one vnode per node.
Assuming a ring with 64 partitions, a deployment with 3 nodes will have approximately 21 vnodes per node. A deployment with 5 nodes will have approximately 13 vnodes per node. Each node is then associated with a partition/vnode in the ring. In this way, an object and its copies can be on different physical nodes. Because Riak is masterless, each node is aware of every other node in the cluster, their vnodes and other data.
There is no built-in tool for doing this. Basho has a tool that they use to migrate data from one Riak cluster to another called the "riak-data-migrator". You can pull it from Github.
If you need to import raw data, you can try the different client libraries.
See our Support page for information on contacting GoGrid for any questions or issues that arise.
If you need help for Riak, the Riak online manual is the best place for information.