Sunday, September 24, 2006

Wanted: A Data Pumping Tool

In the companies that I have been involved with in my Wall Street consulting career, it is remarkable how many systems do not have Unit Testing set up.

Concepts like Unit Testing, TDD, code metrics, etc are just starting to make their ways into development groups in IB's. However, one of the areas that has been ignored is stress and soak testing.

One of the tools that we need is what I refer to as a generic Data Pumper. This is a service that can be run to generate data of a certain shape, and pump the data into waiting applications. Some types of data that we may need to pump include quotes, executions, risk, etc.

Here are the features that I would like to see from a Data Pumper:

Playback Modes

We need to have the data replayed in certain temporal formats. We can also apply a distribution curve to the replay interval.

- Burst Mode: Play back data all at once, as fast as we can.
- Interval Mode: Play the data back at certain intervals. For example, playback 500 messages per second. We can also put some sort of distribution on the interval, so that the intervals would be the lowest at the beginning and at the end of the playback period (simulating market open and close).
- Timed Mode: This would cause playback at the exact timings that actual data was generated. In this mode, we would have to first capture real data and record the exact time that the real data was received. Then we would play back the simulated data using the timings of the real data.

Transports

We need to configure the transport mechanism which the data is delivered to the waiting application.

- Tibco RV or EMS (Right now, most IB's use Tibco for the distribution of high-frequency data)
- LBM (a Tibco competitor)
- Sockets (or SmartSockets)
- MQ or MSMQ
- CPS (Morgan Stanley)

Data Generation

- Capture actual data for several days in order to provide some reference data
- We can tag certain fields for random data generation. For example, we can vary the prices of the various instruments.
- We can generate completely random data.

Formats

XML seems to be used in many places, but you have the latency involved in deserialization. Binary Objects is fast, but necessitates a homogeneous environment.

- XML
- Tibco binary message map
- delimited strings
- binary object
- Fixed-length ASCII
- Reuters (Craig will tell me about the legality of simulating data in Reuters format)

Other Considerations

- Instead of sending data directly to the end application, we can send it to an object cache, and let the object cache handle distribution.

- We need a GUI for monitoring the transmission of data, and controls to let the user dynamically modify the timing intervals.

- We need to have probes in the target application so we can monitor its performance in real time under various loads.

7 comments:

Anonymous said...

http://www.codestreet.com/

This guys have some Replay Service.
May be it is similar to you are looking for.

Anonymous said...

http://www.codestreet.com/

They also have the RMDS replay service. They have product so must
be legal.

marc said...

Hmmm... CodeStreet, huh?

My old colleague, Maldon, has a wife that use to work at CodeStreet. Hey Robert, care to comment?

Also, Mark Pollack, head of Spring.Net, is also CEO of CodeStreet. I will make sure to ask him when I see him.

(Marc->Donnan->Maldon->Pollack->Marc ... 1 degree of separation on Wall Street in common)

Anonymous said...

The latest addition to Visual Studio Team System, Visual Studio for Database Professionals (http://msdn.microsoft.com/vstudio/teamsystem/products/dbpro/ ) will let you do some of what you want. But it is really built for making database decvelopment more agile (and embraces the idea of Unit Testing for relational databases). That being said, the eventual pricetag (~$5000 like all VS Team System edtions, but this has not been officially announced, so it may change) is expensive. The upside, if you already own Visual Studio Team Suite you should be getting this for free. It is currently in beta, so you can download and test it for yourself.

As for testing and replaying webservices, I use MindReef's SoapScope http://www.mindreef.com/products/soapscope/index.php . But, this isn't perfect, either since it does not understand stuff like WS-Security (yet, but I hear it will be released soon).

Robert Maldon said...

yeah, small world, particularly if you are talking about such a small niche industry like IT in Finance.

I'm sure Mark Pollack appreciates the promotion to CEO that you gave him :)

I think you've covered the base functionality for a data pumping tool. You seem to be after both a "replay" and a load test tool, which are usually seperate beasts (but no reason you can't combine them).

I've written extensions to JMeter to do this sort of thing in the past, but you'de probably want a .NET tool.

Most implementations of such a tool I've seen are usually in-house developed (not commerical because very specific to an individual's bank's environment) and driven from XML files or a database (e.g. tool will read from an XML file, buffer in memory, then pump out to a transport).

Other transports you might want to consider: FIX and AMQ (http://www.infoq.com/news/amq).

You will need some way to verify an application has received all messages and processed those messages correctly. Spring.NET has some handy AOP for that kind of thing :)

marc said...

Silly me, forgetting about Fix! Also, add to that fpML, fixML, and SWIFT.

Anonymous said...

I am assuming here that there really is no solution out there at the moment? I am looking for the same tool/product to aid with real world testing, etc.