Saturday, January 19, 2008

Dumbing down of the SA a dangerous cycle

Risk aversion is growing out of control with the SA caught in the fulcrum between cost of build resilient architectures on one end and the cost of maintaining skills and knowledge at the other.

Having worked in the IT industry now since 1987 I have noticed a very disturbing trend.  American corporate IT management is unknowingly cutting off their proverbial noses to spite their faces.  

Another way to say that might be;  Because of managements fear of systems administrators making stupid mistakes they are demanding SA's have less and less access to the systems, in hopes of preventing outages and downtime.  It is this keeping SA's at arms length that is also dumbing down their skill sets and exacerbating the problem, which in turn lends credence to management that in fact the SA's are not competent.

Customer Management seeking to solve this problem is turning more and more often to their equipment providers for the answers.  And why not, surely the vendor would know more about their gear than anyone else....  Unfortunately this is an understandable misconception.  

Some examples of this.  A major telecom IT manager demanded that the vendor write shutdown scripts that would automate every aspect of stopping the customers applications and jobs, and bring the system down gracefully, after several shutdowns were incorrectly executed causing corruption.  

In a more striking case, the SA's lack of training and experience with cluster software, ultimately lead to the complete removal of the software in favor of a non-automated manual failover with only basic services.  Management explaining that the software was simply too complicated to use.  This cost the vendor several million dollars, but also costs the customer in several ways as well.  The time to discover the failure and manually shift resources.  The confidence their customers (the business typically) lose in IT's ability to support their applications, etc.

When working with teams that regularly test such things as clustering failover, backups and restores and are overall allowed to maintain their gear through reboots and patching, I have discovered strong knowledgeable teams. They know their gear and what to do in a critical situation and are more competent in general.

Lack of familiarity causes fear, that's just good old human nature.  The SA's fear of their gear begins to occur when they are not allowed to become familiar with it, not allowed to work with it.  They are not getting the training and even if trained they are not allowed to exercise their craft to build the skills needed.  So when the shit hits the fan, they are not able to take appropriate actions and wind up making the situation worse and only confirming the managements expectations of them.  This often times results in an attack on the vendors service personnel or vendor in general.  

And as this vicious cycle continues, corporate America is looking overseas for the expertise they want and need and have failed to cultivate locally.  

Call to Arms: IT Manager, this is a two way street, you have to take some risk and push back on upper management to get architectures that allow for regular scheduled maintenance and keep your folks trained, and in turn you will sleep at night knowing you have the best of the best on the job, and they do not require your hand holding at every step to restore complex environments to production.



No comments: