Friday, December 7, 2012

Production Mistakes

Once upon a time I had a very bad scare. One of my responsibilities was to maintain a vendor created application that handles the flow of commodity trades on an exchange into our risk management system. That particular aspect of my job was not my favorite part but someone had to do it, right?

This vendor application was primarily a configuration driven monstrosity. Messages are in XML and the conversion from the format that the exchange provided us with to the schema that our back end systems required was primarily done using a tag mapping approach. For example, if the exchange provides us with an XML messages and the counterparty in the deal was represented as an ID, say 12345, the application then maps that number to a mnemonic that is used by our system, say BGKG. Simple, right?

Yes, it is very simple. It also doesn't scale for a damn. There were around 500-ish of these mappings in production. And that's just the tag mapping portion. That doesn't take into account any business logic driven mappings, XSL files that mangle the XML, etc. This quickly becomes a pain point for maintenance, especially when there are potentially multiple different places for logic to hide on any given mapping. The logic could be in the XSL outside the application, the message logic built into the application, there are templates for each message route that can contain the logic, and several more places. On top of this there was nothing preventing the spread of logic affecting the same element(s) from being shared among ALL of these locations!

Now that you have more background than you care about, I will actually get to the scary part. When I first started working on this, the traders that actually execute the trades on the exchange have an ID assigned by the exchange. The means of mapping this to our system ID was to have a list of all the traders exchange ID's in an XSL file and test for a match, replacing the exchange ID with the local system ID. This is ugly for several reasons. First, it's long, tedious and error prone to maintain this list. I don't know when the last time it was updated before I saw it but I know that they never removed anyone from that list, ever. Even if someone left the company, they stayed there. Second, this means that the Risk Management group (the business unit owner of the application) was dependent on IT development (i.e., me) to make a code change, promote it from the development environment through the QA environment to the production environment every time they wanted to add a trader to the system. This can become painful for a company that grows through acquisition on a regular basis, by the way. I could go on but you get the point and I'm sure you can come up with your own reasons I haven't thought of as to why this is a bad way to handle it.

I wanted to empower the business users of this application to be able to manage their own traders in the system. They do it for all the other aspects and this is supposed to be configurable by the business users anyway. Configuration changes don't have to go through the same channels as development and I don't want a call while on vacation to run something like this through development to production because they can't do it themselves.

I started by creating a simple DB schema that holds look up tables exchange IDs to our system IDs (and a few other things). Nothing complicated but if I put that kind of mapping into a database, I can then stand a web front end up on it and make that available to the users. Tada!! Now they can configure to their hearts content without bothering me.

Now, life being what it is, while I converted the mappings to use the database and had promoted it to production several months earlier, I had not set up a web front end yet.

We were testing changes in QA that would allow us to trade financial instruments that we had never traded before. Obviously we needed to test and make sure that the new trade types worked as expected but also to do some regression testing and make sure that these changes hadn't broken existing trades. We were getting some odd results from the new trades in QA so I went investigating. After discounting all the obvious answers, I went back to basics and looked again at them. It turns out that the configuration of the data source that was being used for the table look ups was not providing the right answer because it was pointed at the wrong database. It was still looking at the development environment.

I had a sudden cold chill and after the shivers stopped I dared to look at the configuration in production and saw that yes, indeed, production was pointed to the development environment. I thought I might have a heart attack. Now, kids, this is emphatically NOT anything you ever want to have happen to you. Bear in mind that this industry was heavily regulated by SOX. Any idea what would happen if a SOX auditor got wind of something like that? Phrases like, 'career changing learning experience', spring immediately to mind.

Now, the good news is that all is well that ends well. Everything was fine, we got it all changed without a hiccup and life went on. The part that scares me is what might have happened. I mean, this is development, man. If had taken it into my head at any point during those intervening months to blow away the entire database, much less wipe a table (both of which I can, will and are perfectly valid things to do in development) what kind of damage would have been done? Fortunately, not a lot because recovery would have been simple (thank GOD) but how long would it have taken for us to figure out what happened? It took me a couple of days of poking around to finally decide I should go back to basics and check the data source configuration at the application server level.

My point here is, that mistake made me want to crap my pants. It was amateur hour kind of stuff that I should never have let happen. Even if I wasn't the one actually handling the deployment, I sit over the shoulder of the SA and DBA's for almost every production deployment I make and I should have seen it, thought about it or something! But despite all that that, it was good for me in the end. Like most everyone who has ever done something as boneheaded as that, you can be sure I will be a damned sight more careful with my deployment instructions and double-checking both my and the deployment engineers work. It also reminded me that while things like vendor applications are a fact of life sometimes and you don't have an actual build process for them, there is nothing preventing you from creating a deployment build for it. I might not have to compile code but I can damned sure set up the build server to check out a tokenized version of the any configuration files that have changed and have those tokens replaced with the proper environment settings for whatever environment is being promoted to. Also, you can set up such a build so that it breaks on deployment if you forgot to put in the environments DB password, for example. You really don't store password in your source code repository right? Right??

The only real defenses against mistakes like these are to be disciplined and diligent in your pursuit of the perfect build process to automate these tedious things and/or to document them. I suck at documentation but it would have saved much time and headache if I didn't. Obviously, I was not diligent or disciplined enough in my build process, mostly because until that event happened, I really didn't think of it as a build process. My only defense is that working with vendor products is not something that I have had a lot of experience at. I mean, create and deploy a JBoss server with whatever configuration files it needs but that was always in the context of writing actual code. This puts my brain in a completely different mindset than I was in when I was thinking, how can I configure this vendor application in a more useful fashion.

Ah well, live and learn.

Monday, December 3, 2012

Resurrection of a Classic

I had originally intended to limit my posts to just development related topics but I have found that I don't always feel I have something remotely interesting to say about that even once a week so I am taking a moment to post about something else, namely, my current favorite game.

I loved the original X-COM: UFO Defense published by MicroProse in the mid-nineties. I was in the military  at the time and living in the barracks. There were only a couple of guys that had a PC and the games always turned into a group effort within our circle of friends. Four or five of us would gather around the PC, drink beer and at each turn of the game we would swap drivers and someone else would deploy the squad while the others either cheered them on, badger them at what we thought was a poor move or drop our jaws when the completely unexpected would happen.

This game had a LOT going for it. It was detailed, complex, and especially at the later stages of the game, quite time consuming to control each of the 12 team members you could send on a given mission. We played through it any number of times as well as the follow up games that were published. We all gained immense enjoyment out of it and I remembered it fondly for many years.

You can imagine my reaction when I heard that X-COM was getting a reboot over fifteen years later and not only was it available for the PC but for consoles as well. I was overjoyed at the idea that my beloved game was going to be resurrected using technologies that were hard to imagine at the time but I was also extremely worried. I had no idea if the game that I remembered so well would be recognizable to me any longer.

I am happy to say that the fear was misplaced and the game that I remembered, while not exactly what it was, was still, at heart, the same game I fell in love with. They managed to maintain the same feel that I had in the first game while at the same time streamlining the controls, (and though I hate to admit it) the tedium of having to take care of so many details for so many characters on a team.

The transition was very cleverly accomplished. You still have resources, research, engineering, facilities and soldiers that need to be developed and managed, just like the original but the menu structures and controls for handling soldiers in mission have been streamlined and well thought out.

Maybe it's just my imagination but the amount of items that can be developed and created seems to be more limited than the original. For example, I was surprised to discover that there was only one aircraft, a fighter, that could be developed in the game using alien technologies. The original had a number of different aircraft both fighters and troop transports, that could be developed and built. It was disappointed to know that I couldn't develop a superior troop transport and that I would forever be limited to a maximum of six troops on any given mission.

The limitation on the number of troops concerned me because at the later stages, in the original at least, I don't think you could have reasonably completed the game. I was afraid that the game was going to abruptly become either too soft or too difficult in the later stages.

Thankfully, I needn't have worried. In the reboot, the game designers have overcome this lack of firepower in the field by added some features that both simplified the development of soldiers and balanced the gameplay between the large numbers of enemy troops and the limited squad size of your team.

In the original, if you wanted to outfit any given soldier with the proper gear, you had to view that soldiers complete stat and skill list, from the demolition (for rockets) or throwing skills (for grenades) to the stamina stat (how many things the soldier could do on a single turn) to the strength of a given soldier (how much gear can they carry before they just can't move).

The reboot introduces a class system for each solider. After the first promotion they are given a class of heavy, assault, support or sniper. Each class has it's own skill tree and each skill tree has two separate but complimentary branches that you can choose between as they advance. This both simplifies soldier development (I don't have to worry about developing a soldiers strength to carry a larger weapon) and load out (only the heavy can carry a rocket launcher). At the same time, the skill trees give each class distinct advantages that allow each soldier to easily overcome one or two aliens if played to the strengths of its class.

The engineering foundry also contributes to this, allowing engineering research that will provide across the board improvements to equipment or the development of an entirely new branch of technologies.

The fifteen plus years of hardware improvements in consoles (I play on my PS3) are pretty self-explanatory. Obviously the graphics are better but also the way the cinematics are worked into a soldiers (or aliens) actions, as well as the cut scenes are extremely well done and lend flavor to the game without getting in the way.

All in all, I would say that X-COM: Enemy Unknown is a huge win for Firaxis and 2K Games and, thankfully, me.