SMASH - MMO Engine in Erlang: May 2009

Tuesday, May 26, 2009

Regarding Status Post

I believe the Status Post (below) was long, long overdue. It describes in more detail the structure used in SMASH. Comments are forbidden on it, so please do so on this post, the status report will stay linked on the right and change according to progress obtained.

They say a picture says more than a 1000 words, so I hope that this makes all the other postings crystal clear and show what piece fits where or maybe it eliminates the last bit of clarity you had on SMASH.

I believe another post with screenshots of Apocalyx and development pictures might be nice, too, maybe the next post.

Cheers,

Sunweaver

SMASH Status

Last edited on 25.06.2009

Introduction
This document is a more technical description of where SMASH stands, describing the internal flow and architecture. Please make sure you have read the Introduction first. If you are looking for an interesting read, I also recommend Thor Alexander's two books "Massively Multiplayer Game Development", I use them very heavily for SMASH's development, knowing the books will make it easier to get a grasp on certain concepts. This document will change over time as progress is made, so today's read will be different from tomorrow's and then again, maybe it won't.

The Goal of SMASH
Once SMASH is finished you will be able to create a Massive Multiplayer Online Game (MMO), a messenger-, a dynamic webpage-, an IRC- or SMS-network or anything that requires sending messages across groups of users on different servers, while being completely independent of the platform, as long as Erlang compiles on the OS. SMASH can be upgraded in run-time without missing a beat. The application behavior is dictated by the channel controlling NPCs or Sim Proxies (see below). This is why you will hear me call SMASH a "framework" here and there, because it's mostly a generic communication network on which you can build whatever functionality you require.

The SMASH Concept
At its very heart, SMASH is a chat network, which was inspired by Joe Armstrong's example from "Programming Erlang: Software for a Concurrent World". Joe be at ease, I replaced all your good code with all new, lousy code.

But anyway, every event in SMASH is essentially a message sent from one participant to another. Messages are always tell commands, either to another player or a virtual player, also called NPC. The means of interaction with several other players and with the game world is via chat groups, which are controlled by NPCs, those NPCs are fundamentally the same object than a player proxy, but with a different plugin to control its behavior. To chat in channel "X" the player sends a tell to the NPC "X" which validates, if needed, the intended message and then relays it to all other subscribers of that channel "X". So, for simple chat channels those NPCs might be doing nothing else than relaying text or they might filter offensive messages and raising flags or mute a player for a while. NPCs will handle zone behaviors, by receiving requests from players, validating them and relaying resulting actions; other NPCs in that zone will be implemented as simple scripts or as other NPCs, that receive messages like any other player and act upon them. At one point in the far future, all NPCs might be virtual players, have a decent AI and really "live" in this simulated world.

Zone NPCs might want to communicate with a Physique server to validate movement action or with other zone NPCs for overlapping zones, or maybe you have a server where a simulation runs and have that simulation feed its data into SMASH; Leo, author of Apocalyx has done a great many demos, so maybe I will modify some of his demos to transmit their actions via TCP/IP and players would be able to watch one of his soccer simulation, or car races or robot fights. Time will tell.

How does a single SMASH node work

In practice a single commands in the Erlang shell starts the master node which triggers certain start-up functions and initializes the framework. Each new node starts up as a slave node, the services get started from the master node automatically.

A client connects to the SOX server (SOX) which returns a client socket for each new connection and notifies the chat server (CS) module of it. The CS starts a new chat client (CC) or client proxy, which is the representation of a player in the framework, so if someone disconnects his virtual representation persists for a programmer defined interval. So if someone disconnects during a fight, his toon will persist and most likely die, disconnecting won't save him, some online games omit this. The CC can have a plugin (action handler) linked to it that will validate incoming messages, so one moment the toon is under player control, the next it might be doing some scripted action, or a NPC lives in this game world driven by this kind of plugin support.

The-soon-to-be authorization module will automatically join the player to the channels he needs to be in and grant him the authorized commands with which he can operate in this virtual world. For each group a specialized NPC is started that controls the behavior of this channel (Sim Proxy). It might load a Logic Module for it, connect to a Physique server, load scripts to simulate objects in the zone, a dictionary to filter offensive words or even connect via TCP/IP to an application written in your favorite programming language. At some point some NPCs will need to start by themselves to simulate an on going economy, weather or connect to another simulation.

The command structure will be {CHANNEL:REQ:PARAM1;PARAM2; ... PARAMn} from the client side, where the client can only send requests, the CC will validate if a user is authorized to that request command and if so forward it to the CNPC, which will be the unique instance to convert that request into some action and send a chat command back to each client in its channel. The CC will pick this up, modify its value on the serve side and send back the command to the client, which handle the commands with the following structure: {CHANNEL:WHO:{CMD1:PARAM1;PARAM2; ... PARAMn}{CMD2:PARAMS}{CMD3:PARAMS}} . This protocol is currently plain text for easy trouble shooting.

Underlying to the whole framework are some services like the Observer (OB) and Load Balancer (LB) that will restart broken services, auto start code on new nodes and decide where processes should run according to available resources. The lowest layer so far is the database, which will be migrated from Mnesia to KVS*, the Key Value Services. The envisioned structure is KVS for temporary NPC data or to cache behind the result of a function call to speed up certain things. Its data is also replicated among all nodes by KVS2. KVS2 on the other hand is a RAM resident, self replicating database. It saves data periodically and reorganizes by itself should the master node fail. In the future the OB and LB will follow that behavior to create a fully self reorganizing network. All relevant data will be thrown from local loop data to KVS/KVS2 to enable a swift restart of fallen services without any data loss. KVS* requires that SMASH follows the WORM principle (write once, read many) where only one writes a record, but any other can read it, several processes writing to the same record will mostly certainly lead to disaster and chaos. On the other hand KVS* enables SMASH to have a fast replicating database without write locks. KVS3 does not exist as of now and maybe never will. The 3rd database layer is always handled in a separate module, programs must never write directly to the master database. Currently this 3rd layer is mnesia, in the future it may be KVS3 or a separate database server cluster which will be accessed through a TCP/IP, bundling access to that database into one module makes it easy to change without breaking everything. This layer is meant for persistent data that requires 100% accuracy and has no special speed requirement, like billing data, realm data, account information etc. This layer will be fully implemented when everything else works.

As you can see, there are not all that many pieces to the puzzle, the intention is to have as little and as few but versatile parts as necessary. And from those right-sized buildings blocks we shall make something large, yet manageable (hopefully). Still missing here are AI, state machines, scripts, ... you name it, a lot of stuff.

How do several SMASH nodes work together

As stated above SMASH is meant to handle many servers as one unit, so on each node the same services run and communicate through the unique Sim Proxy to each local chat group object, which in return notifies each player or virtual player (maybe smart and not zone bound NPCs).

Erlang-wise these nodes need to be able to ping each other. This inter-node behavior already works. What's more, you can compile and update all of the code in run-time without missing a beat and given that Erlang code is platform independent, SMASH couldn't care less if a node runs on Windows, Mac, Linux, BSD or "Your-favorite-OS-here", a small tip here, linux sockets have proven to be roughly 3x faster than Windows sockets and beginning NT4 Workstation Windows sockets will only serve up to 10 connection concurrently (on desktop versions), while others will stay idle, I don't think this has changed, so you will need to use a server version to handle many clients in parallel or use some other platform for massive amounts of connections.

The vision of the finished SMASH framework

This picture shows clients connecting to a global TCP/IP server which reads global data and decides where the user was at last. So each SMASH cluster might be a zone, an instance, a realm, a site or a different game altogether. The clients would always see the same IP it connects to, hiding the full infrastructure behind it, although this is still talking about the far future, this might well be the culmination of the whole project, a sort of "SMASH Game

s Googolplex", let's call it SMAGG in short for future reference.

At this point I have no idea how easy it will be to write a game in SMASH, once the CC plugin support is due, I will see to this, SMASH will most certainly not be for script kiddies, but rather for seasoned developers with advanced skills, but I sure hope that the framework will lift the tons of weight for you to make a large scale project happen. Once finished I will write a Database editor that will hopefully have the power to create a full game, we'll see.

The current size of SMASH is under 1MB (source +compiled programs), so as you can see Erlang rocks when it comes to code size and I believe even once finished the compiled programs will be around 1 MB. The largest part will be most definitely the database.

Currently undecided is whether it's going to be open-sourced, if it will be worth while writing a book about or if I'll just scratch everything and plant a tree or two.

Contributions or how you could help

Well, while I have no real time table, given that this is just another hobby, you might be able to help, if you have a/some public domain ... :

1) city sized BSP level(s) I might use as a zoned demo level(s).
2) action figures for the demos, be it cowboys, spacetroopers, aliens, elves, etc.
3) any kind of 3D objects, animals, furniture, etc.

I am not really a 3D artist, so any free material might help, if not I'll just use Apocalyx demo material and when releasing a first playable demo ask the user to use all of Leo's .DAT files.

Signed,

Sunweaver

Monday, May 18, 2009

There is always another way .. again

Ulf was right, io:format(user,Format, Args) does print on the console it's running on, which works out so much better for me than having to install any code previously on a VM. There is always another way, nice trick, thanks !! I will leave the code below anyway as a proof of concept that I did at least once in my life OTP compliant Erlang code :) !!

if anybody out there knows how to make a node ping the master node from the commands line, please let me know, so far I can only make this work with -mnesia extra_db_nodes [] -mnesia start. I tried -net_adm ping , but it does not work.

My favorite fun function of the month:
[net_adm:ping(X) || X <- nodes(known)]
While I could not have any exact documentation on what "known" stands for, it seems to be a list of all nodes from previous start ups, hence this function pings any previously known node and the result looks like a Wild West Showdown =).

But anyway, with the above modification in place, once a new server connects to the master node (where the load balancer runs) the code gets copied and started automatically, the new KVS2 database gets sync'ed and the output is now relayed locally, it works like a charm. KVS2 updates now always use the most recent record.

Currently in the works is the authorization scheme to channels and what users can use what commands. I am "erlagnizing" the concept to make pattern matching easy. I believe it's going to be divided into 4-5 table, a user table with groups/channels, guild info, allowed commands and revoked commands, an account level, factions and other stuff. Another table with the channel permissions, a table for account levels to differentiate guests from normal players or supervisors, a factions table, a guild table and last not least a commands resolver table that assigns commands permitted according to account level and channel permissions. Commands, will be the heart piece of the framework to enable players to move, fight, eat, sell, chat and other stuff, or supervisors to create new contents in run-time. The revoked commands are meant as a feature f.e. to silence a player, or make drinking during fights impossible, immobilize a player and other mean administrator things.

Cheers,

Sunweaver

Thursday, May 14, 2009

New remote process starter module

Although process starting has never been an issue, I find one thing not so cool in Erlang, all io:format output gets sent to the node where you started it, which is very inconvenient IMHO, so I created something simple and new: the smash proxy (SP), which is a remote process starter. Imagine you already have a set of servers running and want to join new ones with everything and starting the services on them remotely, so you monitor new nodes coming online, copy the code over to them and start processes on them, but this time around, all messages they generate will be displayed on their console window, not on the console you triggered that from.

When starting up a new node (slave1@pcname) start the SP with the command line options "-s sp" or "-s sp start ", the first option will start the service, but it won't join you to the other nodes, the second option will generate a net_adm:ping(

) saving you from having to do it manually. From this point on you can start processes from another node on this one by sending rpc:sbcast([slave1@pcname], sp, {start,{Mod, Fun, Arg}}) [note hat Arg must be a list, as it is passed on to spawn] to it and SP will start the process for you locally. So it's a remote local process caller.

Here is how:
1) Create a folder where all the other Erlang OTP applications reside, I called it "sp" (app root folder), underneath that create a folder called ebin and one called src.
2) create a file called "info" under that root folder and put these lines in it:

group: tools
short: A generic SMASH process starter

3) go under source and create a file called "sp.erl", then throw these lines of code in it:

-module(sp).
-compile(export_all).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rpc(Mod, Fun, Arg) -> rpc({start, {Mod, Fun, Arg} }).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
start() ->
register(sp, spawn(?MODULE, loop, []) ).
start(Master) ->
register(sp, spawn(?MODULE, loop, []) ),
io:format("SP: Spawning ping to: ~p~n",[Master]),
?MODULE:rpc(net_adm, ping, Master).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Normal rpc call to this same node
rpc(Query) ->
sp ! {self(), Query},
receive
{kvs, Reply} ->
Reply
after 3000 ->
{error, noresponse}
end.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
loop() ->
receive
{Rpcpid, {start, {Mod, Fun, Arg}}} ->
P1=spawn(Mod, Fun, Arg),
Rpcpid ! {kvs, P1},
loop();
{start, {Mod, Fun, Arg}} ->
io:format("SP: Spawning ~p ~p ~p~n",[Mod, Fun, Arg]),
catch(spawn(Mod, Fun, Arg)),
loop();
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Standard interface
quit ->
io:format("SP: Smash Proxy shutting down=~p~n",[quit]),
{ok, true};
upgrade ->
io:format("SP: Smash Proxy upgrading ...~n"),
?MODULE:loop();
Any ->
io:format("SP: Smash Proxy received Msg=~p~n", [Any]),
loop()
end.

4) Now create sp.app under the ebin folder:

{application, sp,
[{description, "Serves to start applications remotely"},
{vsn, "1.0"},
{modules,
[
sp
]},
{registered, [sp]},
{applications, [stdlib, kernel, net_adm, io, mnesia]},
{env, []}
}
]}.

5) Create a file sp.appup under ebin:

{"1.0.0.0",[],[]}.

6) Compile the erl file and make sure that the generated beam ends up under the ebin folder.

You are done now, call the proxy at start up as described above and you can now do remote starts with local display of data. The src folder is not required on each node, only the ebin, so make sure to throw it into each new node installation and you are done. Or take the erl/beam file only and start the Erlang VM from the same folder, it does the same job. If you ever need to upgrade the code in runtime, then replace the beam file and send a command like this to it: sp ! upgrade to renew the code or do a rpc:sbcast to all nodes, as you please. I hope it helps for your projects, too.

Cheers,

Sunweaver

Monday, May 11, 2009

KVS2 continued

I have extensively tested the stability of KVS2 and I am almost satisfied. The infrastructure is rather difficult to kill now, which is a nice thing.

All data written now stores when the last update happened to it and when the data is due to expire, to enable short term memory data functionality and in the future sync'ing data among master nodes. Now let's invent a new term, each database has a certain foundation and principles it's based on, mnesia f.e. defines that as the ACID properties, eg. atomicity, consistency, isolation and durability. Given the design choice of KVS2 we define:

The WORM principle: or "write once, read many"
Only one single process writes a record, every other process can read from it. If several processes write to the same record you will have at some point inconsistent data, given the lack of write locks and random order writes.

It would be more appropriate to call it " one writes, many read", but who can remember the acronym OWMR ??

So, f.e. loot data must be implemented on a zone level and hence be managed through the "zone player" process and auction house handling must be implemented as if it was a zone, WORM for the win.

If we apply the WORM principle to our programming style, in theory, we preserve the ACID property that mnesia enforces on application level and we add an "S" for speed to ACID, becoming ACIDS, because mnesia fails to live up to the part where it should be real-time.

Writing a distributed database is a lot of work and fun. I wonder, how long will I be able to do without write locks ?? Hmmm !!

Improvements needed:
1) Make sure that a newly elected master node is really running --> with a separate process on each node that verifies the masters list, its status and trigger a reelection process when needed
2) During start up, ping all previously know nodes, try to find running instances there, get more node names, ping them and only then start up or else you get several masters elected

Cheers,

Sunweaver