Running Mongodb on Windows? Get a fast disk

MongoDB’s design choice of delegating memory management to the underlying OS has a great impact on its performance and operation.

If you have a write heavy large application you going to be in for some surprises. By default, every 60 seconds the windows OS blocks (5 - 20 seconds on our system) as memory mapped files are synchronously flushed to disk. Ouch…!

On Linux the same writes are asynchronous.

Reactive Extensions are curing my throttling blues

Its Copenhagen Jazz Festival in 5 weeks but its Microsoft’s Reactive Extensions currently curing my throttling blues. 

A few days ago a data/center administrator, on one of the projects I am working on, asked me to ensure my services would not exceed a threshold 10 evictions per second from a distributed cache. 

I was able to solve this by 4 lines of code.

Here are the 4 lines: 

I have been using Reactive Extensions  for 4 years but I am always pleasantly surprised how within a few lines I can solve all sorts of challenges. 

What do Business Process Design and Software Design have in common?

image

One of the generally accepted strategies in producing high quality software is the adoption of SOLID principles in code production. 

The goals of adopting the SOLID programming principle is to produce high quality small-grained components which you can be easily re-arranged to meet new business requirements. 

This allows software teams to deliver faster and enables business to achieve faster time-to-market.

The “S” in SOLID refers to the Single Responsibility Principle. This principle states the following:

"The single responsibility principle states that every class should have a single responsibility, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility”

If we substitute the highlighted word class above with  business process components  sub-process" or "department" "role"  you discover similarity in goals.

However, the application of the single responsibility principle in both business and software has some challenges.

In the software world, as Mark Seemann points out, the application of the Single Responsibility Principle (SRP) leads to many small classes.  On large projects, some developers new to the SOLID concepts feel overwhelmed by the amount of navigation that is need to find out what a high level component does and also about the shear number of ways in which the software components can be combined or composed to create a solution.

In the business process world, as my colleague and founder at ChangeDriver,  Erik Arnbak points out In today’s fast-paced and highly competitive marketplace, it is crucial for enterprises to be capable of change, in order to remain profitable and up-to-speed. However, change management has always been an issue as enterprise has difficulty identifying the right things to change”. The solution - a live business process directory.

My experience from both the business process and software design areas, is that the benefits of having many reusable components or sub-processes out weigh the disadvantage of needing a “component directory” to help discover or navigate large designs. A typical example is trying to find out which component/department is responsible for a given action in a large design. 

I have found that I can use a business process tools like ChangeDriver to model and provide a directory for software I have built using the SOLID principles. 

In ChangeDriver I use the classic swim-lane diagram to achieve this. 

image

In the diagram above, which is produced using ChangeDriver and is the blue print of real life software components running in production, the swim-lanes represent the software components. The green boxes are responsibilities assigned to the given software component. The yellow boxes are the input required for a software component it to perform its responsibility. The yellow boxes may also represent the output or side-effects of the software component.

From the diagram its easy to get an overview of how many responsibilities a component is allocated. This is just the same way in which constructor over-injection helps detect the violation of the Single Responsibility Principle.

My ultimate goal is to use SOLID, ChangeDriver, Domain Specific Languages and Code Generation Tools  to produce software robots. 

Custom query optimization for mongodb yields great results

On a recent project for one of my customers ( whom I cannot mention for confidentiality reasons) I was pleasantly surprised by the load a small mongodb cluster can cope with running on 4 cheap virtualized servers.

Admittedly I did have to write some custom query optimization to achieve the throughput shown below.

image

The business use case I was trying to solve required the database to support rather complex compound equality, range, and sort queries. The solution aimed at improving the performance of IO-bound multiple range queries that needed to scan a large number of documents.

This typical recommendation for this type of complex query is to create a compound index. However,  this strategy often leads to the anti-pattern of creating multiple compound indexes in order to cover different permutations of the query constraints. 

In brief, my strategy to solve this problem was two fold:

  1. reduce the number of elements in the compound index by creating a custom two element only index.
  2. change the order of parameters sent to mongodb based on the cardinality of the data in the custom index named in 1 above.

As of mongodb 2.4.9 the order in which compound query constraints appear has a large impact on the query performance.

If you are interested in more details about how this was done, you can contact me for more information.

The software challenges of building a massively scalable social trading/investment platform

From a software engineering view point, my work on tradingfloor.com throws up unique and interesting challenges. We are combining the feature demands of the following  types of platforms:

  1. scalable financial trading/investment platforms,  e.g.  SaxoTrader
  2. social media platforms  a la  Facebook/Twitter
  3. online news publication platforms  such as Financial Times, 
  4. online TV broadcasting platforms,  like Bloomberg TV. 
  5. and last but not least,  professional network platforms  like LinkedIn.

Here are a few of the broad challenges:

  1. reliability at scale ( massive no of users and data )
  2. consistency at scale
  3. personalized real time content delivery at scale
  4. multiple channel real time notification at scale
  5. low latency

 I will be blogging about some of the interesting software engineering solutions that apply in scenarios like this. 

image