Given enough coffee I could rule the world



Sorting by date in MongoDB

Sorting by date is a simple task in MongoDB. But that doesn't man you have to spend valuable resources on indexing timestamps in MongoDB. Here I present a method to sort documents via the already indexed id field.

Indexes

When designing a MongoDB database for your application, you need to consider how it will scale. One limiting factor is how you index you collections. Indexes allow for fast and efficient sorting and searching of documents in your collection by the indexed fields. This is possible because the fields are kept sorted in memory.

It is easy to see the limitation of this approach - once you have too many indexes, and they don't fit into your server's physical memory, then the least used indexes will be shunted into the page file / swap partition. When these indexes are then accessed, a painfully slow fetch time is experienced as the indexes are retrieved from the disk to be used.

You can't keep buying more RAM, and will eventually reach the upper limit of your server configuration. Therefore it is imperative that you analyse your users' workflow and index those fields which they are going to sort and search data via most frequently.

Dates

One of the most common fields to query by is the timestamp, often named created_at in MongoDB. This is an 8 byte integer, meaning that while it will only take up 8 Mb per million documents in a collection, it will likely be used in almost every collection in your Database.

You can easily save the 8 bytes per document by utilizing the BSON ObjectId.

BSON Object Id

BSON (Binary JSON) is the format in which data is stored in MongoDB. Every document in MongoDB is given a BSON Object Id as a unique identifier. They are 24 byte hexadecimal strings (you can even see one in the url of this post) This id has to always be indexed to allow for fast retrieval of a specific document. The ids follow a strict format:

The anatomy of a BSON Object Id

So the first four bytes of the id are the unix timestamp at the moment in which the document was first persisted to the database. This means the id can be used to sort documents by their creation date! Say for example I want to find the first post I wrote:

   
    db.posts.find().sort(_id: 1).limit(1)
  

Here is a quick refresher on the mongo query syntax. If you wanted the most recent (ie. descending order) then it would be sort(_id: -1).

Similiarly in ruby using the Mongoid ORM (Object Relational Mapper) gem, it would be:

  
    Posts.all.order_by(_id: 1).first
  

If you wanted to extract the timestamp from a document, you could add this function to your model definition:

  
  def created_at
    # convert the BSON::ObjectId to a hex string 
    hex_id = self._id.to_s
    # take only the first 8 bytes which correspond to the time  
    hex_timestamp = hex_string_id[0..7]
    # convert the hex string (base 16) to an integer 
    timestamp_integer = hex_timestamp.to_i(16)
    # convert the unix timestamp in seconds to a time object 
    #   (which is implicitly returned)
    timestamp = Time.at(timestamp_integer)
  end
  

Which can be condensed to:

  
    def created_at
      Time.at self.id.to_s[0..7].to_i(16)
    end
  

Sources