operations | Random Thoughts

Software Engineering and Architecture

Posted in botch, business, Employment by commorancy on October 21, 2018

Excellence Here’s a subject of which I’m all too familiar and is in need of commentary. Since my profession is technical in nature, I’ve definitely run into various issues regarding software engineering, systems architecture and operations. Let’s Explore.

Software Engineering as a Profession

One thing that software engineers like is to be able to develop their code on their local laptops and computers. That’s great for rapid development, but it causes many problems later, particularly when it comes to security, deployment, systems architecture and operations.

For a systems engineer / devops engineer, the problem arises when that code needs to be productionalized. This is fundamentally a problem with pretty much any newly designed software system.

Having come from from a background of systems administration, systems engineering and devops, there are lots to be considered when wanting to deploy freshly designed code.

Designing in a Bubble

I’ve worked in many companies where development occurs offline on a notebook or desktop computer. The software engineer has built out a workable environment on their local system. The problem is, this local eneironment doesn’t take into account certain constraints which may be in place in a production environment such as internal firewalls, ACLs, web caching systems, software version differences, lack of compilers and other such security or software constraints.

What this means is that far too many times, deploying the code for the first time is fraught with problems. Specifically, problems that were not encountered on the engineer’s notebook… and problems that sometimes fail extremely bad. In fact, many of these failures are sometimes silent (the worst kind), where everything looks like it’s functioning normally, but the code is sending its data into a black hole and nothing is actually working.

This is the fundamental problem with designing in a bubble without any constraints.

I understand that building something new is fun and challenging, but not taking into account the constraints the software will be under when finally deployed is naive at best and reckless at the very worse. It also makes life as a systems engineer / devops engineer a living hell for several months until all of these little failures are sewn shut.

It’s like receiving a garment that looks complete, but on inspection, you find a bunch of holes all over that all need to be fixed before it can be worn.

Engineering as a Team

To me, this is situation means that software engineer is not a team player. They might be playing on the engineering team, but they’re not playing on the company team. Part of software design is designing for the full use case of the software, including not only code authoring, but systems deployment.

If systems deployment isn’t your specialty as a software engineer, then bring in a systems engineer and/or devops engineer to help guide your code during the development phase. Designing without taking the full scope of that software release into consideration means you didn’t earn your salary and you’re not a very good software engineer.

Yet, Silicon Valley is willing to pay “Principal Engineers” top dollar for these folks failing to do their jobs.

Building and Rebuilding

It’s an entirely a waste of time to get to the end of a software development cycle and claim “code complete” when that code is nowhere near complete. I’ve had so many situations where software engineers toss their code to us as complete and expect the systems engineer to magically make it all work.

It doesn’t work that way. Code works when it’s written in combination with understanding of the architecture where it will be deployed. Only then can the code be 100% complete because only then will it deploy and function without problems. Until that point is reached, it cannot be considered “code complete”.

Docker and Containers

More and more, systems engineers want to get out of the long drawn out business of integrating square code into a round production hole, eventually after much time has passed, molding the code into that round hole is possible. This usually takes months. Months that could have been avoided if the software engineer had designed the code in an environment where the production constraints exist.

That’s part of the reason for containers like Docker. When a container like Docker is used, the whole container can then be deployed without thought to square pegs in round holes. Instead, whatever flaws are in the Docker container are there for all to see because the developer put it there.

In other words, the middle folks who take code from engineering and mold it onto production gear don’t relish the thought of ironing out hundreds of glitchy problems until it seamlessly all works. Sure, it’s a job, but at some level it’s also a bit janitorial, wasteful and a unnecessary.

Planning

Part of the reason for these problems is the delineation between the engineering teams and the production operations teams. Because many organizations separate these two functional teams, it forces the above problem. Instead, these two teams should be merged into one and work together from project and code inception.

When a new project needs code to be built that will eventually be deployed, the production team should be there to move the software architecture onto the right path and be able to choose the correct path for that code all throughout its design and building phases. In fact, every company should mandate that its software engineers be a client of operations team. Meaning, they’re writing code for operations, not the customer (even though the features eventually benefit the customer).

The point here is that the code’s functionality is designed for the customer, but the deploying and running that code is entirely for the operations team. Yet, so many software engineers don’t even give a single thought to how much the operations team will be required support that code going forward.

Operational Support

For every component needed to support a specific piece of software, there needs to be a likewise knowledgeable person on the operations team to support that component. Not only do they need to understand that it exists in the environment, the need to understand its failure states, its recovery strategies, its backup strategies, its monitoring strategies and everything else in between.

This is also yet another problem that software engineers typically fail to address in their code design. Ultimately, your code isn’t just to run on your notebook for you. It must run on a set of equipment and systems that will serve perhaps millions of users. It must be written in ways that are fail safe, recoverable, redundant, scalable, monitorable, deployable and stable. These are the things that the operations team folks are concerned with and that’s what they are paid to do.

For each new code deployment, that makes the environment just that much more complex.

The Stacked Approach

This is an issue that happens over time. No software engineer wants to work on someone else’s code. Instead, it’s much easier to write something new and from scratch. It’s easy for software engineer, but it’s difficult for the operations team. As these new pieces of code get written and deployed, it drastically increases the technical debt and burden on the operations staff. Meaning, it pushes the problems off onto the operations team to continue supporting more and more and more components if none ever get rewritten or retired.

In one organization where I worked, we had such an approach to new code deployment. It made for a spider’s web mess of an environment. We had so many environments and so few operations staff to support it, the on-call staff were overwhelmed with the amount of incessant pages from so many of these components.

That’s partly because the environment was unstable, but that’s partly because it was a house of cards. You shift one card and the whole thing tumbles.

Software stacking might seem like a good strategy from an engineering perspective, but then the software engineers don’t have to first line support it. Sometimes they don’t have to support it at all. Yes, stacking makes code writing and deployment much simpler.

How many times can engineering team do this before the house of cards tumbles? Software stacking is not an ideal any software engineering team should endorse. In fact, it’s simply comes down to laziness. You’re a software engineer because writing code is hard, not because it is easy. You should always do the right thing even if it takes more time.

Burden Shifting

While this is related to software stacking, it is separate and must be discussed separately. We called this problem, “Throwing shit over the fence”. It happens a whole lot more often that one might like to realize. When designing in a bubble, it’s really easy to call “code complete” and “throw it all over the fence” as someone else’s problem.

While I understand this behavior, it has no place in any professionally run organization. Yet, I’ve seen so many engineering team managers endorse this practice. They simply want their team off of that project because “their job is done”, so they can move them onto the next project.

You can’t just throw shit over the fence and expect it all to just magically work on the production side. Worse, I’ve had software engineers actually ask my input into the use of specific software components in their software design. Then, when their project failed because that component didn’t work properly, they threw me under the bus for that choice. Nope, that not my issue. If your code doesn’t work, that’s a coding and architecture problem, not a component problem. If that open source component didn’t work in real life for other organizations, it wouldn’t be distributed around the world. If a software engineer can’t make that component work properly, that’s a coding and software design problem, not an integration or operational problem. Choosing software components should be the software engineer’s choice to use whatever is necessary to make their software system work correctly.

Operations Team

The operations team is the lifeblood of any organization. If the operations team isn’t given the tools to get their job done properly, that’s a problem with the organization as a whole. The operations team is the third hand recipient of someone else’s work. We step in and fix problems many times without any knowledge of the component or the software. We do this sometimes by deductive logic, trial and error, sometimes by documentation (if it exists) and sometimes with the help of a software engineer on the phone.

We use all available avenues at our disposal to get that software functioning. In the middle of the night the flow of information can be limited. This means longer troubleshooting times, depending on the skill level of the person triaging the situation.

Many organizations treat its operations team as a bane, as a burden, as something that shouldn’t exist, but does out of necessity. Instead of treating the operations team as second class citizens, treat this team with all of the importance that it deserves. This degrading view typically comes top down from the management team. The operations team is not a burden nor is it simply there out of necessity. It exists to keep your organization operational and functioning. It keeps customer data accessible, reliable, redundant and available. It is responsible for long term backups, storage and retrieval. It’s responsible for the security of that data and making sure spying eyes can’t get to it. It is ultimately responsible to make sure the customer experience remains at a high excellence standard.

If you recognize this problem in your organization, it’s on you to try and make change here. Operations exists because the company needs that job role. Computers don’t run themselves. They run because of dedicated personnel who make it their job and passion to make sure those computers stay online, accessible and remain 100% available.

Your company’s uptime metrics are directly impacted by the quality of your operations team staff members. These are the folks using the digital equivalent of chewing gum and shoelaces to keep the system operating. They spend many a sleepless night keeping these systems online. And, they do so without much, if any thanks. It’s all simply part of the job.

Software Engineer and Care

It’s on each and every software engineer to care about their fellow co-workers. Tossing code over the fence assuming there’s someone on the other side to catch it is insane. It’s an insanity that has run for far too long in many organizations. It’s an insanity that needs to be stopped and the trend needs to reverse.

In fact, by merging the software engineering and operations teams into one, it will stop. It will stop by merit of having the same bosses operating both teams. I’m not talking about at a VP level only. I’m talking about software engineering managers need to take on the operational burden of the components they design and build. They need to understand and handle day-to-day operations of these components. They need to wear pagers and understand just how much operational work their component is.

Only then can engineering organizations change for the positive.

As always, if you can identify with what you’ve read, I encourage you to like and leave a comment below. Please share with your friends as well.

↩︎

Tagged with: administration, architecture, computers, datacenter, design, development, engineering, operations, software, systems, technology

Online ordering: Some companies just don’t get it

Posted in shopping, technologies by commorancy on December 12, 2010

In the past week, I’ve run into two different companies that obviously haven’t the first clue about running their online presence. I’ll bet that this is just the tip of the iceberg, but there it is.

Online ordering with store pickup

Fry Electronics doesn’t get it. The point to online ordering with store pickup is to save time. Unfortunately, using Fry’s store pickup by ordering online saves you no time. In fact, it takes more time than just buying directly in the store and leaves more questions than answers.

I found an item on Frys.com web site that I wanted to buy and noticed they now offered store pickup. I thought, “Great”. So, I proceeded to place the order online. Unfortunately, I didn’t have a profile with Frys.com, so I had to create one along with entering shipping and billing info, credit card number and various other information they required. So, this usually takes about 5 minutes to complete. Granted, it doesn’t take that long to enter this information, but you’ll soon see that this time was completely wasted.

So, I enter the information they require, choose my store for pickup and click ‘Place Order’ like you normally do on any e-commerce site. So, the order is all placed, I have my receipt in hand and on the receipt it says to to remember to bring the card you used to the store. I think, “No problem”. I ordered after hours. So, I knew that I would have to pick up the order the next day.

The next day I take my printed receipt with the order number to the store, like they request. I walk into the store and ask where to pick up online orders.

First mistake

The door greeter tells me to get in line and pick up the online ordered item at any cashier in the front. I thought, “Uh oh, this is not starting off well”. No dedicated desk means the cashiers will be completely inexperienced in this process and, to my lack of surprise, they were inexperienced. Anyway, I step up to the cashier and hand her the online receipt. She proceeds to type something into the register, looks confused about something and then tells me to hold on while she goes and locates the order.

Second mistake

Twenty minutes later, after wandering around and disappearing, she finally comes back with the item in hand. I could have wandered the store, found the item, visited a cashier and exited Fry’s in the time it took her to locate the item.

Third mistake

With item in hand, she proceeds to tell me that I need to finish paying for the order at her station. I’m thinking, “What?” I had thought I already paid on the Frys.com web site as I was given fully completed receipt for the order with a valid order number. So, I attempt to validate this information and ask, “I have to pay again? I thought I already paid on the web site”. She proceeds to explain that it’s not actually an order but a ‘reservation’ for an item. I asked, then why do I have to give fully detailed information (billing, shipping, credit card, CVV, etc) for a reservation? Of course, she’s a non-native English speaker and plays dumb like she didn’t understand what I said. So, I try to verify this again and she says that I won’t be double-charged (which is, of course, my first thought considering I had to provide my CC card info full and complete).

So, not only did they waste my time online asking for information they didn’t need to create a ‘reservation’, the cashier wasted 20 minutes trying to locate the item in the store which wasn’t picked and stored properly from my order. Worse, after walking out of the store, I still have no idea if my card is to be charged twice.

I head home and call Frys.com to clarify what the hell went on. I explained that what they are doing is less than clear and the whole process is time wasteful. Every other online order with store pickup system I’ve used at other stores charges for the order online and then only requires identification to pickup at the store. They might or might not even print a receipt. But, you definitely don’t pay for the item in the store like Fry’s requires.

Fry’s made major mistakes in this process. Wasting my time by making me enter all of that information, not properly picking the the item requiring the cashier to wander the store in search of the item, and then requiring the consumer to pay at the register for an item that already appears to have been paid. The additional mistake that Fry’s made was not having a dedicated pickup desk to handle online pickups. There is no reason to require the consumer to stand in line for a cashier. Online ordering with store pickup is supposed to save time. In fact, I probably doubled the amount of time that was needed to get the item. I would have been better off just heading to the store, finding the item and heading up to the cashiers to pay. What a waste.

Out of stock ordering

Virgin Mobile doesn’t get it. This issue isn’t limited to Virgin mobile, it just happens to be the most recent example of this problem. So, I decide want to buy one of Virgin Mobile’s MiFi 2200 devices. I visit the site and try to place the item in my cart. Instead, I see a red error message that says ‘Sorry, that item is currently unavailable’. It doesn’t say anything about being out of stock. Just that it is unavailable (whatever that means). Ok, here’s the issue. If the item is ‘Out of Stock’, that’s fine. Just tell us this. No cryptic messages.

First Mistake

Even if the item is out of stock, but you know you’ll have more back in stock tomorrow, then take the order against the future stock. The mistake here is that Virgin has lost a sale. I may not come back tomorrow and purchase. I want to purchase today. I made the decision to purchase today. Tomorrow I may change my mind and go with something else. In fact, I may go with something else simply from the stupid fact that Virgin mobile wouldn’t sell it even when it’s ‘Out of Stock’.

Second Mistake

I called the sales line and the ‘sales rep’ proceeded to transfer me to the ‘Broadband help desk’. Where they transferred my call is not an order line. It’s a help desk / customer service portal. No where on the line does it say ‘Press 1 for sales’. In fact, it doesn’t mention sales anywhere on the line. So, I press on and get through to an operator. The first time I call, the representative on the ‘help desk’ tells me that there is web site trouble and I should order tomorrow (see Virgin Mobile first mistake above). I call back and the second person says the item is ‘Out of Stock’ and they should have them in ‘tomorrow’.

So, I’m at a loss. If you’re in a company selling online, an item is out of stock but you know it will be back in stock tomorrow, why would you want to prevent taking orders against that future stock? I mean, seriously, this is stupid. Just tell the consumer when they should be back in stock. The consumer can make the decision to wait or not. If you prevent ordering altogether, you’re losing sales.

You would think companies the size of Fry’s and Virgin Mobile would have their act together, but they don’t. Companies wonder why their sales suck, yet they don’t look at these convoluted processes that don’t work and that throw roadblocks in front of the buyer. So, instead of the buyer buying, we walk away and don’t buy.

Retailers, wake up. Just because you think a process is working for you, you need to reevaluate just how it impacts the consumer.

Tagged with: operations, procedures, retail, stupid

2 comments

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

	commorancy on Mary Poppins: Who exactly is…
	Jason on TV Review: Wayward Pines
	commorancy on How to Overcome Apple’s…
	commorancy on Recipe: How to make Sushi…
	commorancy on Retro Review: Earth Final…

Random Thoughts – Randocity!

Software Engineering and Architecture

Online ordering: Some companies just don’t get it

Top Posts

Randocity Archives

Flickr Photos

Recent Comments

Subscribe to Blog via Email

Notices

Random Thoughts – Randocity!

Software Engineering and Architecture

Share Randocity:

Online ordering: Some companies just don’t get it

Share Randocity:

Top Posts

Randocity Archives

Flickr Photos

Recent Comments

Subscribe to Blog via Email

Notices