have some problems and questions

Mar 19, 2008 at 1:56 PM
hello ive read the documentation inside and its great to find something nearly identical that i was looking for.

ive a question regarding the demo itself more than the api.

1.when u run the distributedprime... i assume u must first open the registration server located in deps folder is that ok?
2. im using only one machine, and from the appconfig file the regserver is listening on 8000
the app ask me for port#, but it comes most of the times with an error saying that the servers rejects the connection.
3. can i have multiple slaves running on a same port?(sounds dumb but is not that dumb)
4. any recomendations or sequence on how to run the demo?

best wishes and thanks in advance.
Coordinator
Mar 23, 2008 at 9:18 AM
ad 1:
You are correct - the registration server must be opened first in order for the nodes to be able to register. I provided the executable as a secondary application, but all it does is open an instance of the RegistrationServerBootStrap class defined in the MPAPI framework (namespace : MPAPI.RegistrationServer). It is possible for you to start that server in your own application if you want to - just make sure it is running before you instantiate a node.

ad 2:
I am not sure why it does that. Are you using the same port# as the registration server? Remember to alter the port number in both the registration server and the application to a port that is not rejected by your firewall. Furthermore you must alter the IP addresses in both configuration files to your machines IP address - it is the address of one of my personal machines in my local area network.

ad 3:
I am afraid that you cannot use the same port number for several slaves on the same machine - that is prohibited by the TCP/IP stack, and is not a restriction in the MPAPI framework. However, you can use the same port number on multiple machines.

ad 4:
I may have been vague in the documentation regarding the sequence that the applications must be started up in. That will be changed in the next release (although at the moment I have nothing planned). The sequence is this:
1) RegistrationServer.exe
2) DistributedPrimeCalculatorApp.exe. Choose "slave" mode.
3) DistributedPrimeCalculatorApp.exe. Choose "master" mode.
That should do the trick.

If you have any other problems, don't hesitate to write me.
Mar 28, 2008 at 2:43 AM
Following your document and demo is working for me running on VS2008.
Is it possible to have
1. Registerserver support failed-over and/or persist to database when restart.
2. Registerserver refresh Nodes list if any Nodes not terminate properly or when network connection fail on distributed Node.

Thank you for sharing such very useful framework.
Mar 28, 2008 at 4:37 AM
Run fine on XP but fail to connect on Vista. IPV6 issues?
Coordinator
Mar 31, 2008 at 10:34 AM
Edited Mar 31, 2008 at 10:35 AM
Ad 1:
I am not entirely sure what you mean by this. But I guess you mean that the registration server itself should be made failsafe so the cluster will not be affected if the registration server (or the machine on which it is running) crashes.
At the moment this is not possible, but it might be a good idea to implement some redundancy with multiple coorporating registration servers. I will look into this in the next release.

Ad 2:
I am planning to have the registration server ping each node in the cluster to see if the are alive (see Registration server pings each node in the cluster in the issue tracker). The way it works now is this: if a worker sends a message to another worker on a node that is crashed, or just not running, the framework will automatically unregister this node from the cluster. All workers gets notified through the method Worker.OnRemoteNodeUnregistered(ushort).


mpimsarn wrote:
Following your document and demo is working for me running on VS2008.
Is it possible to have
1. Registerserver support failed-over and/or persist to database when restart.
2. Registerserver refresh Nodes list if any Nodes not terminate properly or when network connection fail on distributed Node.

Thank you for sharing such very useful framework.

Coordinator
Mar 31, 2008 at 10:40 AM
I must admit that I have not tested this on Windows Vista yet, so I do not know why it is not running. Perhaps it is IPv6 issues, perhaps it is something else restricting the TCP/IP connection (firewall, other security measures in the OS). Does anyone else have this problem?

Try to recompile it all including the http://www.codeplex.com/RemotingLite remoting framework which is the framework I use in MPAPI to establish TCP/IP connections between nodes and registration server.

If this problem persists I will try to get a Vista OS, but only if it can't be helped - it is so damned expensive.


mpimsarn wrote:
Run fine on XP but fail to connect on Vista. IPV6 issues?

Mar 31, 2008 at 9:37 PM
Forc Channel() to use IPV4 seems to work _client = new TcpClient(AddressFamily.InterNetwork); when node run on Vista.
May 29, 2008 at 11:33 AM
To: sagan61
    ad2: The connection is being refused because the server is starting to listen on the specified port, but is probably using an IPv6 endpoint. This is due to the fact that on the constructor for ServiceHost, on the RemotingLite Project ServiceHost.cs line 119, the first address of the IPHostEntry is being used and so it happens that it is an IPv6 Endpoint. One thing you can do is change that code so that it looks for an address who's family is AddressFamily.InterNetwork.

To: all
    The same problem is happening on the Nodes. That is, the OpenAndConnectToRegistrationServer method uses the first of the available IPAddresses in the host entry which in this case is also IPv6. This shouldn't be a problem if both endpoints are using the same type of AddressFamily. But I think that what's preventing this from working is the creation of the proxy to the registration server, which (as far as I can understand the code) is not expecting an IPAdress of that family.
Jul 11, 2008 at 8:51 PM


fthomsen wrote:
ad 4:
I may have been vague in the documentation regarding the sequence that the applications must be started up in. That will be changed in the next release (although at the moment I have nothing planned). The sequence is this:
1) RegistrationServer.exe
2) DistributedPrimeCalculatorApp.exe. Choose "slave" mode.
3) DistributedPrimeCalculatorApp.exe. Choose "master" mode.
That should do the trick.

If you have any other problems, don't hesitate to write me.



Hi,

I am getting this "node 0 offline" error in the demo app.  What could be the problem?  It's repeatable on both of the Windows XP computers that I have access to.   A single computer is where all software is being attempted.  Port 8000 is being used by my registration server, port 8001 is being used by my slave node, and port 8002 is desired for my master node.  I think its a good policy to get the demos working before troubleshooting my own apps.

C:\Program Files\MPAPI\MPAPI_1.1.1_example\bin>DistributedPrimeCalculatorApp.exe
[m]aster or [s]lave > m
This nodes port number > 8002
15:19:58:565 | Info      | Main worker online
15:19:58:575 | Info      | Spawning 1 workers at node 0, address 192.168.1.100
15:19:58:615 | Error     | Node._Monitor(0@1, 0@0) : Remote node 0 appears to be offline

The above master node, is the 2nd node that was started.  There is already a slave node running (not shown).

In another command shell, the registration server displayed some info which seems OK. 

C:\Program Files\MPAPI\MPAPI_1.1.1_src\bin>RegistrationServer.exe
15:18:29:998 | Info      | Registration server is running.
15:18:41:444 | Info      | Registered node. Node Id : 0 , Address : 192.168.1.100 , Port : 8002
15:18:54:042 | Info      | Registered node. Node Id : 1 , Address : 192.168.1.100 , Port : 8001

The order in which I launched these programs was per the instructions:  1-RegistrationServer, 2-Slave, 3-Master.

Any suggestions to fix it?  (Again, this is 100% local execution, strictly one computer is running all demo software at this point.)


Thanks for reading.

Geoffrey

BTW the RemotingLite demos are working good now on both of my Windows XPSP2 computers.  Uninstalling IPv6 was what solved the problem on one of these computers.  To uninstall IPv6, unchecking the checkbox was not enough; I had to also click an uninstall button (which I did not notice before).
Jul 11, 2008 at 9:04 PM

Hi

Just to be clear:  It is really true, I am not joking:  The order in which I launched the programs originally, and the first several attempts, was followed precisely according to the instructions:  1-RegistrationServer, 2-Slave, 3-Master.  I can understand why the data in the screenshots I supplied may confuse some readers.   Do not read too much into the screenshots.  One or more of the screenshots may be from a different later run, in which I tried the other possible orderings of master and slave for completeness after the recommended ordering failed several times (reg, sl, mas).