Random Thoughts – Randocity!

How not to run a business (Part 3) — SaaS edition

Posted in business, cloud computing, computers by commorancy on May 8, 2012

So, we’ve talked about how not to run a general business, let’s get to some specifics. Since software as a service (SaaS) is now becoming more and more common, let’s explore software companies and how not to run these.

Don’t add new features because you can

If a customer is asking for something new, then add that new feature at some appointed future time. Do not, however, think that that feature needs to be implemented tomorrow. On the other hand, if you have conceived something that you think might be useful, do not spend time implementing it until someone is actually asking for it. This is an important lesson to learn. It’s a waste of time to write code that no one will actually use. So, if you think your feature has some merit, invite your existing customers to a discussion by asking them if they would find the proposed feature useful. Your customers have the the final say. If the majority of your customers don’t think they would use it, scrap the idea. Time spent writing a useless feature is time wasted. Once written, the code has to be maintained by someone and is an additional waste of time.

Don’t tie yourself to your existing code

Another lesson to learn is that your code (and app) needs to be both flexible and trashable. Yes, I said trashable. You need to be willing to throw away code and rewrite it if necessary. That means, code flows, changes and morphs. It does not stay static. Ideas change, features change, hardware changes, data changes and customer expectations change. As your product matures and requires more and better infrastructure support, you will find that your older code becomes outdated. Don’t be surprised if you find yourself trashing much of your existing code for completely new implementations taking advantage of newer technologies and frameworks. Code that you may have written from scratch to solve an early business problem may now have a software framework that, while not identical to your code, will do what your code does 100x more efficiently. You have to be willing to dump old code for new implementations and be willing to implement those ideas in place of old code. As an example, usually early code does not take high availability into account. Therefore, gutting old code that isn’t highly available for new frameworks that are is always a benefit to your customers. If there’s anything to understand here, code is not a pet to get attached to. It provides your business with a point in time service set. However, that code set must grow with your customer’s expectations. Yes, this includes total ground-up rewrites.

Don’t write code that focuses solely on user experience

In software-as-a-service companies, many early designs can focus solely on what the code brings to the table for customer experience. The problem is that the design team can become so focused on writing the customer experience that they forget all about the manageability of the code from an operational perspective. Don’t write your code this way. Your company’s ability to support that user experience will suffer greatly from this mistake. Operationally, the code must be manageable, supportable, functional and must also start up, pause and stop consistently. This means, don’t write code so that when it fails it leaves garbage in tables, half-completed transactions with no way to restart the failed transactions or huge temporary files in /tmp. This is sloppy code design at best. At worst, it’s garbage code that needs to be rewritten.

All software designs should plan for both the user experience and the operational functionality. You can’t expect your operations team to become the engineering code janitors. Operations teams are not janitors for cleaning up after sloppy code that leaves garbage everywhere. Which leads to …

Don’t write code that doesn’t clean up after itself

If your code writes temporary tables or otherwise uses temporary mechanisms to complete its processing, clean this up not only on a clean exit, but also during failure conditions. I know of no languages or code that, when written correctly, cannot cleanup after itself even under the most severe software failure conditions. Learn to use these mechanisms to clean up. Better, don’t write code that leaves lots of garbage behind at any point in time. Consume what you need in small blocks and limit the damage under failure conditions.

Additionally, if your code needs to run through processing a series of steps, checkpoint those steps. That means, save the checkpoint somewhere. So, if you fail to process step 3 of 5, another process can come along and continue at step 3 and move forward. Leaving half completed transactions leaves your customers open to user experience problems. Always make sure your code can restart after a failure at the last checkpoint. Remember, user experience isn’t limited to a web interface…

Don’t think that the front end is all there is to user experience

One of the mistakes that a lot of design teams fall into is thinking that the user experience is tied to the way the front end interacts. Unfortunately, this design approach has failure written all over it. Operationally, the back end processing is as much a user experience as the front end interface. Sure, the interface is what the user sees and how the user interacts with your company’s service. At the same time, what the user does on the front end directly drives what happens on the back end. Seeing as your service is likely to be multiuser capable, what each user does needs to have its own separate allocation of resources on the back end to complete their requests. Designing the back end process to serially manage the user requests will lead to backups when you have 100, 1,000 or 10,000 users online.

It’s important to design both the front end experience and the back end processing to support a fully scalable multiuser experience. Most operating systems today are fully capable of multitasking utilizing both multiprocess and multithreaded support. So, take advantage of these features and run your user’s processing requests concurrently, not serially. Even better, make sure they can scale properly.

Don’t write code that sets no limits

One of the most damaging things you can do for user experience is tell your customers there are no limits in your application. As soon as those words are uttered from your lips, someone will be on your system testing that statement. First by seeing how much data it takes before the system breaks, then by stating that you are lying. Bad from all aspects. The takeaway here is that all systems have limits such as disk capacity, disk throughput, network throughput, network latency, the Internet itself is problematic, database limits, process limits, etc. There are limits everywhere in every operating system, every network and every application. You can’t state that your application gives unlimited capabilities without that being a lie. Eventually, your customers will hit a limit and you’ll be standing there scratching your head.

No, it’s far simpler not to make this statement. Set quotas, set limits, set expectations that data sets perform best when they remain between a range. Customers are actually much happier when you give them realistic limits and set their expectations appropriately. Far fetched statements leave your company open to problems. Don’t do this.

Don’t rely on cron to run your business

Ok, so I know some people will say, why not? Cron, while a decent scheduling system, isn’t without its own share of problems. One of its biggest problems, however, is that its smallest level of granularity is once per minute. If you need something to run more frequently than every minute, you are out of luck with cron. Cron also requires hard coded scripts that must be submitted in specific directories for cron to function. Cron doesn’t have an API. Cron supports no external statistics other than by digging through log files. Note, I’m not hating on cron. Cron is a great system administration tool. It has a lot of great things going for it with systems administration use when utilizing relatively infrequent tasks. It’s just not designed to be used under heavy mission critical load. If you’re doing distributed processing, you will need to find a way to launch in a more decentralized way anyway. So, cron likely won’t work in a distributed environment. Cron also has a propensity to stop working internally, but leave itself running in the process list. So, monitoring systems will think it’s working when it’s not actually launching any tasks.

If you’re a Windows shop, don’t rely on Windows scheduler to run your business. Why? Windows scheduler is actually a component of Internet Explorer (IE). When IE changes, the entire system could stop or fail. Considering the frequency with which Microsoft releases updates to not only the operating system, but to IE, you’d be wise to find another scheduler that is not likely to be impacted by Microsoft’s incessant need to modify the operating system.

Find or design a more reliable scheduler that works in a scalable fault tolerant way.

Don’t rely on monitoring systems (or your operations team) to find every problem or find the problem timely

Monitoring systems are designed by humans to find problems and alert. Monitoring systems are by their very nature, reactive. This means that monitoring systems only alert you AFTER they have found a problem. Never before. Worse, most monitoring systems only alert of problems after multiple checks have failed. This means that not only is the service down, it’s been down for probably 15-20 minutes by the time the system alerts. In this time, your customers may or may not have already seen that something is going on.

Additionally, for any monitoring for a given application feature, the monitoring system needs a window into that specific feature. For example, monitoring Windows WMI components or Windows message queues from a Linux monitoring system is near impossible. Linux has no components at all to access, for example, the Windows WMI system or Windows message queues. That said, a third party monitoring system with an agent process on the Windows system may be able to access WMI, but it may not.

Always design your code to provide a window into critical application components and functionality for monitoring purposes. Without such a monitoring window, these applications can be next to impossible to monitor. Better, design using standardized components that work across all platforms instead of relying on platform specific components. Either that or choose a single platform for your business environment and stick with that choice. Note that it is not the responsibility of the operations team to find windows to monitor. It’s the application engineering team’s responsibility to provide the necessary windows into the application to monitor the application.

Don’t expect your operations team to debug your application’s code

Systems administrators are generally not programmers. Yes, they can write shell scripts, but they don’t write code. If your application is written in PHP or C or C++ or Java, don’t expect your operations team to review your application’s code, debug the code or even understand it. Yes, they may be able to review some Java or PHP, but their job is not to write or review your application’s code. Systems administrators are tasked to manage the operating systems and components. That is, to make sure the hardware and operating system is healthy for the application to function and thrive. Systems administrators are therefore not tasked to write or debug your application’s code. Debugging the application is the task for your software engineers. Yes, a systems administrator can find bugs and report them, just as anyone can. Determining why that bug exists is your software engineers’ responsibility. If you expect your systems administrators to understand your application’s code in that level of detail, they are no longer systems administrators and they are considered software engineers. Keeping job roles separate is important in keeping your staff from becoming overloaded with unnecessary tasks.

Don’t write code that is not also documented

This is a plain and simple programming 101 issue. Yes, it’s very simple. Your software engineers’ responsibilities are to write robust code, but also document everything they write. That’s their job responsibility and should be part of their job description. If they do not, cannot or are unwilling to document the code they write, they should be put on a performance review plan and without improvement, walked to the door. Without documentation, reverse engineering their code can take weeks for new personnel. Documentation is critical to your businesses continued success, especially when personnel changes. Think of this like you would disaster recovery. If you suddenly no longer had your current engineers available and you had to hire all new engineers, how quickly could the new engineers understand your application’s code enough to release a new version? This ends up a make or break situation. Documentation is the key here.

Thus, documentation must be part of any engineer’s responsibility when they write code for your company. Code review is equally important by management to ensure that the code not only seems reasonable (i..e, no gotos), but is fully documented and attributed to that person. Yes, the author’s name should be included in comments surrounding each section of code they write and the date the code was written. All languages provide ways to comment within the code, require your staff to use it.

Don’t expect your code to test itself or that your engineers will properly test it

Your software engineers are far too close to the code to determine if the code works correctly under all scenarios. Plain and simple, software doesn’t test itself. Use an independent quality testing group to ensure that the code performs as expected based on the design specifications. Yes, always test based on the design specifications. Clearly, your company should have a road map of features and exactly how those features are expected to perform. These features should be driven by customer requests for new features. Your quality assurance team should have a list of new all features being placed into each new release to write thorough test cases well in advance. So, when the code is ready, they can put the release candidate into the testing environment and run through their test cases. As I said, don’t rely on your software engineers to provide this level of test cases. Use a full quality assurance team to review and sign off on the test cases to ensure that the features work as defined.

Don’t expect code to write (or fix) itself

Here’s another one that would be seemingly self-explanatory. Basically, when a feature comes along that needs to be implemented, don’t expect the code to spring up out of nowhere. You need competent technical people who fully understand the design to write the code for any new feature. But, just because an engineer has actually written code doesn’t mean the code actually implements the feature. Always have test cases ready to ensure that the implemented feature actually performs the way that it was intended.

If the code doesn’t perform what it’s supposed to after having been implemented, obviously it needs to be rewritten so that it does. If the code written doesn’t match the requested feature, the engineer may not understand the requested feature enough to implement it correctly. Alternatively, the feature set wasn’t documented well enough before having been sent to the engineering team to be coded. Always document the features completely, with pseudo-code if necessary, prior to being sent to engineering to write actual code. If using an agile engineering approach, review the progress frequently and test the feature along the way.

Additionally, if the code doesn’t work as expected and is rolled to production broken, don’t expect that code to magically start working or that the production team has some kind of magic wand to fix the problem. If it’s a coding problem, this is a software engineering task to resolve. Regardless of whether or not the production team (or even a customer) manages to find a workaround is irrelevant to actually fixing the bug. If a bug is found and documented, fix it.

Don’t let your software engineers design features

Your software engineers are there to write the code based features derived from customer feedback. Don’t let your software engineers write code for features not on the current road map. This is a waste of time and, at the same time, doesn’t help get your newest release out the door. Make sure that your software engineers remain focused on the current set of features destined for the next release. Focusing on anything other than the next release could delay that release. If you’re wanting to stick to a specific release date, always keep your engineers focused on the features destined for the latest release. Of course, fixing bugs from previous releases is also a priority, so make sure they have enough time to work on these while still working on coding for the newest release. If you have the manpower, focus some people on bug fixing and others on new features. If the code is documented well enough, a separate bug fixing team should have no difficulties creating patches to fix bugs from the current release.

Don’t expect to create 100% perfect code

So, this one almost goes without saying, but it does need to be said. Nothing is ever bug free. This section is here is to illustrate why you need to design your application using a modular patching approach. It goes back to operations manageability (as stated above). Design your application so that code modules can drop-in replace easily while the code is running. This means that the operations team (or whomever is tasked to do your patching) simply drops a new code file in place, tells the system to reload and within minutes the new code is operating. Modular drop in replacements while running is the only way to prevent major downtime (assuming the code is fully tested). As an SaaS company, should always design your application with high availability in mind. Doing full code releases, on the other hand, should have a separate installation process than drop in replacement. Although, if you would like to utilize the dynamic patching process for more agile releases, this is definitely an encouraged design feature. The more easily you design manageability and rapid deployment into your code for the operations team, the less operations people you need to manage and deploy it.

Without the distractions of long involved release processes, the operations team can focus on hardware design, implementation and general growth of the operations processes. The more distractions your operations team has with regards to bugs, fixing bugs, patching bugs and general code related issues, the less time they have to spend on the infrastructure side to make your application perform its best. As well, the operations team also has to keep up with operating system patches, software releases, software updates and security issues that may affect your application or the security of your user’s data.

Don’t overlook security in your design

Many people who write code, write code to implement a feature without thought to security. I’m not necessarily talking about blatantly obvious things like using logins and passwords to get into your system. Although, if you don’t have this, you need to add it. It’s clear, logins are required if you want to have multiple users using your system at once. No, I’m discussing the more subtle but damaging security problems such as cross-site scripting or SQL injection attacks. Always have your site’s code thoroughly tested against a suite of security tools prior to release. Fix any security problems revealed before rolling that code out to production. Don’t wait until the code rolls to production to fix security vulnerabilities. If your quality assurance team isn’t testing for security vulnerabilities as part of the QA sign off process, then you need to rethink and restructure your QA testing methodologies. Otherwise, you may find yourself becoming the next Sony Playstation Store news headline at Yahoo News or CNN. You don’t really want this type of press for your company. You also don’t want your company to be known for losing customer data.

Additionally, you should always store user passwords and other sensitive user data in one-way encrypted form. You can store the last 4 digits or similar of social security numbers or the last 4 of account numbers in clear text, but do not store the whole number in either plain text, with two-way encryption or in a form that is easily derived (md5 hash). Always use actual encryption algorithms with reasonably strong one-way encryption to store sensitive data. If you need access to that data, this will require the user to enter the whole string to unlock whatever it is they are trying to access.

Don’t expect your code to work on terabytes of data

If you’re writing code that manages SQL queries or, more specifically, are constructing SQL queries based on some kind of structured input, don’t expect your query to return timely when run against gigabytes or terabytes of data, thousands of columns or billions of rows or more. Test your code against large data sets. If you don’t have a large data set to test against, you need to find or build some. It’s plain and simple, if you can’t replicate your biggest customers’ environments in your test environment, then you cannot test all edge cases against the code that was written. SQL queries have lots of penalties against large data sets due to explain plans and statistical tables that must be built, if you don’t test your code, you will find that these statistical tables are not at all built the way you expect and the query may take 4,000 seconds instead of 4 seconds to return.

Alternatively, if you’re using very large data sets, it might be worth exploring such technologies as Hadoop and Cassandra instead of traditional relational databases to handle these large data sets in more efficient ways than by using databases like MySQL. Unfortunately, however, Hadoop and Cassandra are noSQL implementations, so you forfeit the use of structured queries to retrieve the data, but very large data sets can be randomly accessed and written to, in many cases, much faster than using SQL ACID database implementations.

Don’t write islands of code

You would think in this day and age that people would understand how frameworks work. Unfortunately, many people don’t and continue to write code that isn’t library or framework based. Let’s get you up to speed on this topic. Instead of writing little disparate islands of code, roll the code up under shared frameworks or shared libraries. This allows other engineers to use and reuse that code in new ways. If it’s a new feature, it’s possible that another bit of unrelated code may need to pull some data from another earlier implemented feature. Frameworks are a great way to ensure that reusing code is possible without reinventing the wheel or copying and pasting code all over the place. Reusable libraries and frameworks are the future. Use them.

Of course, these libraries and frameworks need to be fully documented with specifications of the calls before they can be reused by other engineers in other parts of the code. So, documentation is critical to code reuse. Better, the use of object oriented programming allows not only reuse, but inheritance. So, you can inherit an object in its template form and add your own custom additions to this object to expand its usefulness.

Don’t talk and chew bubble gum at the same time

That is, don’t try to be too grandiose in your plans. Your team has limited time between the start of a development cycle and the roll out of a new release. Make sure that your feature set is compatible with this deadline. Sure, you can throw everything in including the kitchen sink, but don’t expect your engineering team to deliver on time or, if they do actually manage to deliver, that the code will work half as well as you expect. Instead, pair your feature sets down to manageable chunks. Then, group the chunks together into releases throughout the year. Set expectations that you want a certain feature set in a given release. Make sure, however, that that feature set is attainable in the time allotted with the number of engineers that you have on staff. If you have a team of two engineers and a development cycle of one month, don’t expect these engineers to implement hundreds of complex features in that time. Be realistic, but at the same time, know what your engineers are capable of.

Don’t implement features based on one customer’s demand

If someone made a sales promise to deliver a feature to one, and only one customer, you’ve made a serious business mistake. Never promise an individual feature to an individual customer. While you may be able to retain that customer based on implementing that feature, you will run yourself and the rest of your company ragged trying to fulfill this promise. Worse, that customer has no loyalty to you. So, even if you expend the 2-3 weeks day and night coding frenzy to meet the customer’s requirement, the customer will not be any more loyal to you after you have released the code. Sure, it may make the customer briefly happy, but at what expense? You likely won’t keep this customer as a customer any longer. By the time you’ve gotten to this level of desperation with a customer, they are likely already on the way out the door. So, these crunch requests are usually last-ditch efforts at customer retention and customer relations. Worse, the company runs itself ragged trying desperately to roll this new feature almost completely ignoring all other customers needing attention and projects, yet these harried features so completely end up as customized one-offs that no other customer can even use the feature without a major rewrite. So, the code is effectively useless to anyone other than the requesting customer who’s likely within inches of terminating their contract. Don’t do it. If your company gets into this desperation mode, you need to stop and rethink your business strategy and why you are in business.

Don’t forget your customer

You need to hire a high quality sales team who is attentive to customer needs. But, more than this, they need to periodically talk to your existing clients on customer relations terms. Basically, ask the right questions and determine if the customer is happy with the services. I’ve seen so many cases where a customer appears completely happy with the services. In reality, they have either been shopping around or have been approached by competition and wooed away with a better deal. You can’t assume that any customer is so entrenched in your service that they won’t leave. Instead, your sales team needs to take a proactive approach and reach out to the customers periodically to get feedback, determine needs and ask if they have any questions regarding their services. If a contract is within 3 months of renewal, the sales team needs to be on the phone and discussing renewal plans. Don’t wait until a week before the renewal to contact your customers. By a week out, it’s likely that the customers have already been approached by competition and it’s far too late to participate in any vendor review process. You need to know when the vendor review process happens and always submit yourself to that process for continued business consideration from that customer. Just because a customer has a current contract with you does not make you a preferred vendor. More than this, you want to always participate in the vendor review process, so this is why it’s important to contact your customer and ask when the vendor review process begins. Don’t blame the customer that you weren’t included in any vendor review and purchasing process. It’s your sales team’s job to find out when vendor reviews commence.

Part 2 | Chapter Index | Part 4

Tagged with: ,

Comments are encouraged under these rules: 1. No personal attacks allowed. 2. Comments with personal attacks will not be posted. 3. Please keep your words civil. Thank you for contributing!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: