Friday, April 17, 2009

There is always another way

"There's an Italian painter, named Carlotti, and he uh, ahem, defined beauty. He said it was the summation of the parts working together in such a way that nothing needed to be added, taken away or altered [...]" Cris Johnson (Nicholas Cage) in Next

Good thing he did not talk about our code because each time I look at it I find another way of doing things, Erlang is really much faster when it matches patterns rather than using traditional programmatic code, so perfection or beauty is still a long way down the road, but anyway.

RTFM or ... how I should really read the manual more often, I had completely overlooked the function net_kernel:monitor_nodes(true) . When you subscribe to this function, in our case the load balancer (LB) does as of now, you receive a message {nodeup,Node} the instance a new node comes up so now the code gets copied and started instantly on new nodes, it couldn't be faster than this, if you blinked you just missed the copy+start process. A message of {nodedown,Node} tells you when one disconnects.

I also stumbled upon another very useful function called rpc:sbcast(Name,Msg) which sends a message across all connected nodes to a registered named process. Until now the group process caches a list of all groups on the different nodes and new groups can take a while (up to 20 sec. so far) to be seen by others nodes, but given that we now register each group on each node, this function could speed this process up, without relying on cached mnesia data, my only questions is, what happens when 2 nodes do not yet see each other ??

Speed-Ups: Every time I look at pattern matching I learn an easier way to do something, when I first started I was tempted to use traditional loops like the function lists:foreach then I switched to:

lookup(G, [{G,Pid}|_]) -> [Pid];
lookup(G, [_|T]) -> lookup(G, T);
lookup(_,[]) -> [].

I find that structures like the one here are even faster than the other two:

[X || {Y,X} <- L] where X is our return value , L our list to look up a value from and Y our variable to pick the correct tuple from a list. This function is so fast that it is causing me some synchronization problems among my processes for reasons yet unknown. This function will also serve as a command interpreter, when the client sends a string of commands, like "{cmd1:arguments}{cmd2:arguments}", I convert that list with a function I designed into a list like this [{cmd1,"arguments},{cmd2,"arguments"}] and I can then use the above [fun(Y,X) || {Y,X} <- L] to run some function on each element in the list. I might even displace Mnesia altogether, just load data in lists into the KVS, look up / filter values with pattern matching and sync the nodes with rpc:sbcast. I could use the module dict to create a dictionary instead of a table or list, too. Each dictionary could run in its own thread, hmm interesting.
This sounds so crazy, it might actually work.

Remember that Mnesia is nice when speed is not critical, but when you need to do thousands of look ups per second across "n" nodes then the generated overhead is just too much, mnesia can't keep up, no matter how I structure it, be it several disk nodes, 1 disk node several ram nodes, the worst thing is the lack of a local cache on a ram node, hence the KVS working as a local cache. From a mnesia centric application framework I am getting more and more towards a KVS centered data structure, the above scheme might eliminate the caching errors and I do not require really the supertight transaction property that Mnesia offers, I'll gladly trade that for speed.

Designing the authorization module
(AUT): it will mean a large impact, because I will need to define a whole lot of logic around this, including what commands each user can perform, guild channels, raid & instance channel, guild structures and more, so this will take some time to complete, but the design isn't finished yet, I am still prototyping the logic. Some nice features that will most likely come up are f.e. you will be able to define as many guild channels as you like and define their authorizations to differentiate the general channel from officer, class lead, raid or other group channels below the guild. I will also consider instances and raid channels with ownerships, to prevent problems like an invited outsider stealing raid IDs, a documented problem that happened to other large MMOs. Under the devised scheme an invitee will not have ownership or admin rights and it should be an easy thing to do that certain instances must be started by the owner/admins and hence avoid getting a started raid instance stolen. The user commands wil also be important to avoid that a guild master leaves the guild without reassigning ownership to somebody else, so a GM will lack a command of GQUIT, but have a GOWN, while other members will have GQUIT, but none afiliated will have the GJOIN command sombody unafiliated will have. The authorization module will need to combine speed with functionality so a huge fun part is coming up here.

With the speedups like the ones above, I am now facing sync problems between threads (that should not exist on the first place), the first or first few messages from a client get lost in cyberspace, I know that this is due to me cacheing values from mnesia, so I am thinking that the easiest way to avoid lost messages will be to implement a ping-pong protocol to allow the client to ping the server (once authenticated) until he gets a pong response and only then start asking the server for more stuff or start chatting. The cacheing delay can be anything from unnoticeable to several seconds.

Another curious fact is that the more functionality I design the smaller the modules get.
The module CS which was in the beginning the master of the whole framework is becoming less and less important, with groups registering now, it is not much more than central resource locater/creator. The CG module is now the local node relay station, down from channel admin, the logic has gone to the channel NPC. Both are still very much required, but their role has drastically changed during development, curious.



No comments: