I had initially placed a section on considerations at the top, but then I figured everyone would scroll past it anyway and go straight to the commands to run, so I am taking this opportunity to simply let you know that it’s there, and encourage you to read it. Particularly if you are reasonably new to performance testing middlewares or market data platforms.
This page assumes that you already know how to:
The only tools used here will be the performance tools provided by OpenMAMA. Taskset is provided by Red Hat / CentOS package
managers. Note that if you are creating your own
build of OpenMAMA from source, you will need to add the with_testtools=y
parameter to your build command. Specifically
we will be using:
They should be in the bin
directory of your release. If not, you can grab the latest release of OpenMAMA here.
Here are a few commands you can run. Note that the producer in an attempt to throttle publish rates with high resolution will actually pin CPU, so you’re advised to use taskset to isolate all commands. Also note that the following command line parameters and what they do:
Parameter | Works for | Description |
---|---|---|
-? | all | Usage text |
-tport [transport] | all | This specifies the MAMA Transport to use |
-rt | producer | This will make producer use CLOCK_REALTIME |
-m [middleware] | all | This is the middleware bridge to use |
-s [symbol] | all | This is the symbol name to use |
-csv [csv file] | consumer | This can be provided to include csv formatted output |
-percentiles [99,99.9] | consumer | This will calculate percentile latencies on the client |
You’ll obviously need to replace the values which are provided in our examples with your own values to taste.
Note that by default the consumer measures the time between the MamaSendTime
(which is injected by the producer into each message)
and the client receive time. Other timestamps are also available - see the help texts for details.
Most middlewares perform best under this scenario. The idea is to literally make the publisher go as fast as it can go using a single topic.
So on the publish side, you run:
./mamaproducerc_v2 -tport pub -s A -m qpid -rt
And then on the subscribe side, you run:
./mamaconsumerc_v2 -tport sub -s A -m qpid
This where cracks may start to appear. The example below will do the same test as above, but with 10,000 symbols
rather than a single topic (you can also use a -f
parameter to both commands if you have specific symbols in mind):
./mamaproducerc_v2 -tport pub -numTopics 10000 -m qpid -rt
And then on the subscribe side, you run:
./mamaconsumerc_v2 -tport sub -genSyms 10000 -m qpid
A ramp test is a great way to get a decent handle on what your middleware can do and specifically how it behaves at different data rates for latency vs throughput. It involves starting with a low rate, then gradually increasing it while observing the receive message rate and the latency. Also be careful for this test to ensure you spot whether the producer or consumer slowed down first.
This test will run a ramp test where the data rate will increase by 5% (the -steps
parameter) up to 100% of the target rate (-target
)
and it will make this increase every 10 seconds (-stepInterval
).
So on the publish side, you run:
./mamaproducerc_v2 -tport pub -numTopics 10000 -m qpid -rt -target 10000 -steps 5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100 -stepInterval 10
And then on the subscribe side, you run:
./mamaconsumerc_v2 -tport sub -genSyms 10000 -m qpid
A spike test is useful for deliberately introducing erratic data rates onto the backbone. Note that results should always be taken with a pinch of salt here though since it’s not always guaranteed that the publisher is capable of publishing at the required burst rate, so low latencies may well be caused by publisher not publishing fast enough rather than messages being super fast (e.g. TCP based transport).
This test will publish at a -target
rate of 10,000 msg/s, but will also run bursts every 10 seconds. By default these will be between a factor
of 5 and 10 times the target rate of publishing, but that can be configured using the -burstLow
and -burstHigh
parameters.
./mamaproducerc_v2 -tport pub -numTopics 10000 -m qpid -rt -target 10000 -burst -burstInterval 10
And then on the subscribe side, you run:
./mamaconsumerc_v2 -tport sub -genSyms 10000 -m qpid
Before diving into any performance testing, I recommend going through some general things to consider when you’re getting to grips with whether or not a particular platform can satisfy your performance requirements.
Are you going for throughput? Latency? Best compromise of both? Providing a single number which encompasses how well a particular OpenMAMA stack is performing is generally a really bad idea unless you have a single, narrow use case, and you’re certain that the performance testing in question has satisfied it.
In reality there are many factors in play when you want to carry out performance testing:
It’s easy to fall into the trap of thinking “it should be really fast with a latency of zero” without considering each of the above. Performance testing is always a tradeoff between lab conditions and real world scenarious. Without being cognescent of the above, you could get really good numbers which in no way translate to a real production scenario.
Now, let’s go through these points again and this time consider how this could radically impact performance:
Let’s say you didn’t consider how many clients were expected during your test. You could end up producing numbers which assumes a single client. Now, if the middleware is multicast based this might not be a big deal since each new client doesn’t place any real load on the publisher. However if the publisher uses tcp, you could find that the number of clients makes an enormous difference to performance.
There’s little point in testing for fantasy scenarious (fantasy low or fantasy high). There’s especially little point in measuring latency while doing this. There are some middlewares which have hard coded batching parameters which would mean that (for example) at data rates of 1 msg/s you will consistently get latencies of 10ms, but the latency then goes down when the rates go up and batches start getting filled and flushed quicker. Similarly there are other middlewares which can start off with very low latencies, but which then will deliberately give ground on latency in order to obtain better throughput at higher rates. Most middlewares will provide several parameters which can be used to tune this behaviour too.
What latency would be OK? What is the minimum throughput that you require? Aimlessly picking numbers with various vendors and comparing them is an interesting scientific experiment but it lacks practical application unless you have specific measurable targets in mind.
Message sizes are a big deal. Especially for middlewares and payloads which have fat enveloping in play where a large number of messages might cause much more bandwidth on the wire than a fewer number of larger messages. If you have a data source in mind, start with a typical message size from that as a starting point.
The behaviour here varies wildly across middlewares, but this is probably one of the most frequently overlooked but critical component. Pumping lots of symbols through a system can cause linked list iteration and cache misses galore depending on the middleware. Again though, too many gets into the fantasy realm and you can end up benchmarking a ridiculous scenario.
My application is leaking memory! Actually… no it’s not… that’s the middleware spooling messages for you because you can’t keep up. If you can’t process messages any faster than a certain rate, there’s little advantage in performance testing larger rates.
Sometimes you have to live within your means. It’s possible to have a middleware which will give you outrageously good numbers, but will demand custom hardware and many cores.