Wednesday, April 29, 2009

Unintended and useful feature

As mentioned in the PD below now that both modules KVS and KVS2 are working it is very easy and actually extremely useful to integrate both. I believe the best way to do that is to make KVS post into KVS2 and once the cache is expired, delete the data again, hence creating a distributed "short term memory". This looks like complete awesomeness for NPC memory data, imagine a NPC being killed over and over by the same player, he could remember him for a while or to avoid NPC grieving, maybe it elevates its level each time, quest givers remembering for some time who visited them, a reminder when somebody casts a spell or the spell cooldown time. Sounds like fun and it was fairly easy to implement, too. A couple of lines to check for KVS2 and then shoot the data to it. It had not occurred to me before, but our brain also has like 6 different memory types, so why not simulate it. And of course, the data is replicated to the other nodes that run KVS/KVS2. Now I am almost ready to replace mnesia.

And on another note, since I migrated to Ubuntu, programming is so much more fun, this is how my desktop looks like, you can see the cube spinning with an aquarium in the mid, a text windows for code edition and on each plane a separate Erlang VM, very nice, programming in 3D, not on the level of Swordfish, but well almost. Oh and the background is made from an Stargate Atlantis "the Sunken City" wallpaper I modified to become a very nice looking seamless skydome.



Friday, April 24, 2009

Replicate this !!!

Before we start, I installed the SMASH framework on a Linux machine and boy are those sockets fast, the message sync error commented in the post below does not occur, curious, latency speed gain 3:1 over Windows sockets, I am impressed.

But that is no the main topic of today, I have gone over the edge and I am currently programming a 2nd Key Value Server (KVS2). The original KVS server is a Key Value (or Name server), that caches function call results for x amount of time recaching every n seconds, the KVS2 is different. KVS2 is designed to be a database, aka mnesia, replacement.

Mnesia is a very cool design and I like it, I especially love the fact that f.e. RAM nodes can access the DB just by declaring where the master DB is, you can fragment a DB over several nodes, it has transaction capabilities and it can reside on several nodes. If you have a lazy look up application that is awesome, but I found in real life that usually the different nodes cannot keep up with transaction volume of a MMO framework in absolutely no way, if I create one master node then I face the challenge that when a slave node does a transaction it can take up to several seconds to get a result back and if the master goes down, well, so does the whole system, if you define several masters, synchronization takes too long, way to long and if one node goes down the other starts complaining and blows up.

So here comes the salvation, I hope, the birth of KVS2. The code is distributed through the LB to the code and auto started. The first node in the system reads its data from text files and other slave nodes copy the tables over, each table runs as a named process. When you change or delete data that node will issue a broadcast to the other nodes to do the same transaction, hence they sync and each node enjoys having a local cache, speed, speed, speed. If the master goes down, big deal, the other KVS2 servers are linked to the master and detect its death, so they choose a new one, so far they pick the new master simply by name, I could also very easily ask the LB and cache with KVS the value, so that they all get the same answer, but ehm, I don't. This node is then declared the new master and end of story. The master saves the data every 5 min to disk and you can even force a node to be the master, like when the original comes back up. A word of warning here, if several nodes write to the same record, you can have confusing results, as there is no check that all nodes have the same data, if messages for the same record arrive in different order, you can have different records on each node, be careful, this general scheme assumes that only one process manages a certain record, like for example in SMASH, only supervisors write records. Yet when accuracy is critical, it is very easy to send a message to the master node, the rpc interface handles that, so it can send it to the master and the master will then send it around, an easy way to ensure transactions or uniqueness without conflicting records. Simple, short and powerful. Obviously this still needs to be tweaked and debugged, but it is already working. I will later on integrate a time stamp to enable servers to sync data if someone has more recent data, but for the time being I believe this new scheme solves my database requirements nicely, I have a local cache, self replicating data and a system that will work until the last node is shut down, removing all mnesia draw backs.

I will also integrate, somehow, preferred servers to be master nodes, which should make it easier to know where the files get saved to. The keen observer may have noticed, that we lose any kind of indexes and filters, for the time being. Filters are easy to implement actually given the [X || {Y,X} <- List] functionality, normally we know the key "Y" and get "X" as the value, but we can filter like this:

Value=[Pid || {Pid,Node} <- Answers, Node=:=Filter]

grab the PID from a tuple, when we only know the node name passed to the function as "Filter". Here we grab the value as the search value.

And there will be no indexes, it is cheaper to define a small routine that creates a separate table instead. So, a couple more days until I have the KVS2 debugged and then off to replace mnesia completely with calls to KVS2, I have to replace something like 100 calls, quite doable actually.

And then I will implement the authorization scheme, urgh !!!



P.D. Debugging is almost done, all possible things seem to be accounted for, given the intrinsic structure of KVS and KVS2, they work like a short and long term memory, hmm I might call them both in the future Skynet Database, sounds better than Key Value Server, doesn't it ??

Friday, April 17, 2009

There is always another way

"There's an Italian painter, named Carlotti, and he uh, ahem, defined beauty. He said it was the summation of the parts working together in such a way that nothing needed to be added, taken away or altered [...]" Cris Johnson (Nicholas Cage) in Next

Good thing he did not talk about our code because each time I look at it I find another way of doing things, Erlang is really much faster when it matches patterns rather than using traditional programmatic code, so perfection or beauty is still a long way down the road, but anyway.

RTFM or ... how I should really read the manual more often, I had completely overlooked the function net_kernel:monitor_nodes(true) . When you subscribe to this function, in our case the load balancer (LB) does as of now, you receive a message {nodeup,Node} the instance a new node comes up so now the code gets copied and started instantly on new nodes, it couldn't be faster than this, if you blinked you just missed the copy+start process. A message of {nodedown,Node} tells you when one disconnects.

I also stumbled upon another very useful function called rpc:sbcast(Name,Msg) which sends a message across all connected nodes to a registered named process. Until now the group process caches a list of all groups on the different nodes and new groups can take a while (up to 20 sec. so far) to be seen by others nodes, but given that we now register each group on each node, this function could speed this process up, without relying on cached mnesia data, my only questions is, what happens when 2 nodes do not yet see each other ??

Speed-Ups: Every time I look at pattern matching I learn an easier way to do something, when I first started I was tempted to use traditional loops like the function lists:foreach then I switched to:

lookup(G, [{G,Pid}|_]) -> [Pid];
lookup(G, [_|T]) -> lookup(G, T);
lookup(_,[]) -> [].

I find that structures like the one here are even faster than the other two:

[X || {Y,X} <- L] where X is our return value , L our list to look up a value from and Y our variable to pick the correct tuple from a list. This function is so fast that it is causing me some synchronization problems among my processes for reasons yet unknown. This function will also serve as a command interpreter, when the client sends a string of commands, like "{cmd1:arguments}{cmd2:arguments}", I convert that list with a function I designed into a list like this [{cmd1,"arguments},{cmd2,"arguments"}] and I can then use the above [fun(Y,X) || {Y,X} <- L] to run some function on each element in the list. I might even displace Mnesia altogether, just load data in lists into the KVS, look up / filter values with pattern matching and sync the nodes with rpc:sbcast. I could use the module dict to create a dictionary instead of a table or list, too. Each dictionary could run in its own thread, hmm interesting.
This sounds so crazy, it might actually work.

Remember that Mnesia is nice when speed is not critical, but when you need to do thousands of look ups per second across "n" nodes then the generated overhead is just too much, mnesia can't keep up, no matter how I structure it, be it several disk nodes, 1 disk node several ram nodes, the worst thing is the lack of a local cache on a ram node, hence the KVS working as a local cache. From a mnesia centric application framework I am getting more and more towards a KVS centered data structure, the above scheme might eliminate the caching errors and I do not require really the supertight transaction property that Mnesia offers, I'll gladly trade that for speed.

Designing the authorization module
(AUT): it will mean a large impact, because I will need to define a whole lot of logic around this, including what commands each user can perform, guild channels, raid & instance channel, guild structures and more, so this will take some time to complete, but the design isn't finished yet, I am still prototyping the logic. Some nice features that will most likely come up are f.e. you will be able to define as many guild channels as you like and define their authorizations to differentiate the general channel from officer, class lead, raid or other group channels below the guild. I will also consider instances and raid channels with ownerships, to prevent problems like an invited outsider stealing raid IDs, a documented problem that happened to other large MMOs. Under the devised scheme an invitee will not have ownership or admin rights and it should be an easy thing to do that certain instances must be started by the owner/admins and hence avoid getting a started raid instance stolen. The user commands wil also be important to avoid that a guild master leaves the guild without reassigning ownership to somebody else, so a GM will lack a command of GQUIT, but have a GOWN, while other members will have GQUIT, but none afiliated will have the GJOIN command sombody unafiliated will have. The authorization module will need to combine speed with functionality so a huge fun part is coming up here.

With the speedups like the ones above, I am now facing sync problems between threads (that should not exist on the first place), the first or first few messages from a client get lost in cyberspace, I know that this is due to me cacheing values from mnesia, so I am thinking that the easiest way to avoid lost messages will be to implement a ping-pong protocol to allow the client to ping the server (once authenticated) until he gets a pong response and only then start asking the server for more stuff or start chatting. The cacheing delay can be anything from unnoticeable to several seconds.

Another curious fact is that the more functionality I design the smaller the modules get.
The module CS which was in the beginning the master of the whole framework is becoming less and less important, with groups registering now, it is not much more than central resource locater/creator. The CG module is now the local node relay station, down from channel admin, the logic has gone to the channel NPC. Both are still very much required, but their role has drastically changed during development, curious.



Monday, April 13, 2009

Auto Config ... baby steps

While bug hunting my code I came back to one thing I always wanted, have new nodes initialize themselves, like Skynet, becoming aware of new resources and start using them. Erlang provides a so called boot server, but no documentation exists that would clarify as to how it works, only one developer posts like half the instructions as to how set it up, but I could not make it work, I get to the point where the boot server starts, it sends the boot file and the remote node exists with an initialization error (and yes I did follow the instructions and the boot file oly contains the Erlang required modules), even analyzing the source code did not make a difference, although I found some undocumented calls, like erl_boot_server:add_subnet(mask,IP) and :would_boot that add a whole subnet to the boot server, compared to the normal function that requires you add each single IP address, which makes no sense to me and would_boot tells you if a certain IP could connect or not, fun functions, if only somebody would comment on how to set it up the whole way.

Anyway, I said to myself, there must be another way and of course I found one. The Load Balancer (LB) on the master node now checks for node changes every 10 sec. and reports on nodes lost (without any action so far, in the future it might restart lost processes on other nodes), nodes that stayed up (they stay untouched) and new nodes. If a new node is found it verifies rather crudely if SMASH is loaded on that node, if not it copies the code to it and starts afterwards a slave node and tries to fire up the socket server as well, this will fail for secondary nodes on the same machine, but that's ok, the supervisor table keeps track of which one stays up, so no problem here.

What this means for the framework is that you only have to start up and connect a node to the master and not worry about anything else, I love self configuring or so called zero configuration programs, so this is a step towards it; f.e. at some point when you want to fire up simulation parts and use the LB to pick a node, you know that the basic framework is already up and running, which is nice.

Notes to self:
1) keep a record of the internal and where possible external IP address of the SOX instances, maybe the No-IP Service can be helpful here. 2) Some work needs to be done now at the observer program to auto-restart lost processes. 3) we now create a player instance for each channel, so we probably only require to track those for the simulation, maybe every xx time we save their data to Mnesia and/or to the other group (CG) instances for pure safeguarding, both would work, Mnesia is probably the slowest yet cleanest approach.



Tuesday, April 7, 2009

There is a bug in the electrical system !! (Agent J)

With the new functionality in place some rewrites were necessary and some debugging of course, *sigh*. It's impressive how many bugs one can find in his own code, I did quite some trouble shooting these days and I have finally found some strange disconnect errors, where a client would randomly DC for no apparent reason, but my latest stress tests with a couple of test clients bombarding the SMASH framework are keeping up nicely, so after the first 30 Mio I will make the audacious assumption that the bugs are now gone, the counter will hit 100 Mio sent messages across the system by tomorrow, I hope for the best =) !! Good news also is that the system is much faster than before and the KVS is shining in our quest for speed lookups. Hmmm, now I will need to read about LUA code again and prep that Apocalyx client for testing.



P.D. 100+ Mio messages ran through SMASH and it keeps running and running, the bugs have been eliminated and the framework is now more than ever a stable multi server soft real time messaging system, so far so good. =)