Registration Server Redundancy

Apr 9, 2012 at 8:38 PM

I apologize in advance, I am new to distributed computing and want to impart sufficient detail about the design of our system to make sure that

Background & Design:

We are an organization with a large number of system, build around a core OLTP system (which I will refer to as our Case Manage System), that works with entities we will refer to as Cases.

The case management system supports case life cycle management across a number of products for multiple clients, and is relatively extensible.  Its biggest short coming is that it has a number of legacy legacy clients, but no central service tier (clients include intranet portal for internal employees, internet portal for external users, and tablet devices for both internal and external users).  

Our business needs with in each product managed in the case management system vary greatly, and we have a number of ancillary sub-systems to extend the core case management capabilities for additional needs (document generation (letters & fax), report generation, real time export to client systems, telephony integration).

As we approach a systemic refactor of the clients participating in this case management platform, I would like to start building the central service tier we currently lack.  The general approach we would like to take is to build a cluster of service hosts (collectively known as the Case Action Controller)  that would receive a message from any of the case management clients at predefined events in the case life cycle, and forward that message to services that have subscribed with the Case Action Controller to receive specific messages (each of these known as Case Action Listeners).  

Case Management clients will always address the Case Action Controller from a single load balanced address, the node in the cluster that receives the message will forward the meesage to any registered Case Action Listeners who've expressed an interest in cases of that type (the criteria for this determination aren't pertinent right now).

As our business works, I can afford a single point of failure at the Case Action Listener level (I can requeue most of this manually at a later time), but not with in the Case Manage System or the Case Action Controller.

I would like to use MPAPI to build this CaseActionController level (partially to handle load, partially to avoid the controller becoming a single point of failure).  I would like to start up N nodes on N hosts, all listening on the same port.


Is MPAPI a good fit for me.  Part of what I am concerned with is that I actually need a cluster infrastructure that help nodes maintain a consistent state for some of my use cases (i.e. the nodes need to share the collection of registered case action listeners so that they can forward case events to all appropriate listeners), otherwise it would be a simple web farm. 

I am concerned that, from what i've read, there appears to be only one Registration Server in any cluster.  Is it true that a cluster has only one registration server?  Is it possible to make this registration server redundant, or is there another mechanism to provide fault tolerance?

I'm sure I'll have other questions as I dig in, but this is enough while I'm getting started.