Monday, April 13, 2009

Auto Config ... baby steps

While bug hunting my code I came back to one thing I always wanted, have new nodes initialize themselves, like Skynet, becoming aware of new resources and start using them. Erlang provides a so called boot server, but no documentation exists that would clarify as to how it works, only one developer posts like half the instructions as to how set it up, but I could not make it work, I get to the point where the boot server starts, it sends the boot file and the remote node exists with an initialization error (and yes I did follow the instructions and the boot file oly contains the Erlang required modules), even analyzing the source code did not make a difference, although I found some undocumented calls, like erl_boot_server:add_subnet(mask,IP) and :would_boot that add a whole subnet to the boot server, compared to the normal function that requires you add each single IP address, which makes no sense to me and would_boot tells you if a certain IP could connect or not, fun functions, if only somebody would comment on how to set it up the whole way.

Anyway, I said to myself, there must be another way and of course I found one. The Load Balancer (LB) on the master node now checks for node changes every 10 sec. and reports on nodes lost (without any action so far, in the future it might restart lost processes on other nodes), nodes that stayed up (they stay untouched) and new nodes. If a new node is found it verifies rather crudely if SMASH is loaded on that node, if not it copies the code to it and starts afterwards a slave node and tries to fire up the socket server as well, this will fail for secondary nodes on the same machine, but that's ok, the supervisor table keeps track of which one stays up, so no problem here.

What this means for the framework is that you only have to start up and connect a node to the master and not worry about anything else, I love self configuring or so called zero configuration programs, so this is a step towards it; f.e. at some point when you want to fire up simulation parts and use the LB to pick a node, you know that the basic framework is already up and running, which is nice.

Notes to self:
1) keep a record of the internal and where possible external IP address of the SOX instances, maybe the No-IP Service can be helpful here. 2) Some work needs to be done now at the observer program to auto-restart lost processes. 3) we now create a player instance for each channel, so we probably only require to track those for the simulation, maybe every xx time we save their data to Mnesia and/or to the other group (CG) instances for pure safeguarding, both would work, Mnesia is probably the slowest yet cleanest approach.



No comments: