This statement comes up almost daily when working with potential customers. Many phone calls have gravitated toward this statement. Especially when talking about our monthly IaaS pricing or Cloud Recovery pricing. Or even discussing AWS or Azure pricing in comparison.
My goal here is to relay what goes into designing your own private cloud at the SMB scale. It’s a rabbit hole topic for me so I will attempt to condense it down to the essentials while giving you a broad overview.
We will focus on a fictitious design requirement loosely based on my experience working with a recent potential customer. We will take those requirements and work through a rough design process and discuss what is involved in building your own private cloud at SMB scale.
The customer approached us with the need to copy all their Veeam backup data to a third party cloud provider. In addition to this requirement they needed the ability to start those VM’s in the cloud to verify that those backups are known good. The client however expressed that they did not need the VM’s to actually run in the cloud with any elevated level of performance. The VM’s simply need to start and operate for verification.
Now the paragraph above can spider web to a larger conversation about Veeam backup/data verification, sandbox testing, etc.
So at this point we have some basic requirements.
- Require a secondary site to more/store backup data (location TBD).
- Secondary site must be compatible or utilize Veeam software technologies.
- Secondary site must be built to start VM’s from Veeam format.
We did not discuss any financial constraints at this early stage in the relationship. I can make a simple assumption that “cost effective” is the key term here. Which this term will be placed into perspective as we continue this design process.
We did not discuss any technical constraints at this early stage in the relationship as well. From my initial conversations it appeared that the IT staff supporting this theoretical private cloud may consist of a few individuals with a general to mature understanding of virtualization and recovery. It is often hard to ascertain this from a brief initial conversation. Some folks play their cards pretty close.
Rounding out the basic pre design framework is Requirements, Constraints and Assumptions. I made one assumption based on past experience. If the requirement is to be able to start backup copies of VM’s at a secondary site, the client will eventually want to run other testing procedures on those VM’s in that site. That could be code, patch updates, test/dev, etc.
I would strongly advise that the secondary site also be built to handle VM recovery and quasi failover in case the primary site is offline for an extended period of time. The technical requirements in regard to deployable assets for the second site would be nominally different financially. Being able to operate VM’s requires compute. In its simplest form compute means memory, CPU and disk (size and IOPS). Adding compute to the required design and spread over the expected life and supportability of the platform would probably make financial sense and align with the overall goals of this particular business. Is there value and does it outweigh costs to deploy and operate?
This is a theoretical exercise based on building your own private cloud to suit a theoretical need. We will not be diving super deep. (That’s reserved for billable design work!)
At this point we can move into Plan, Deliver, Operate. We will wrap Optimize and Change around it all.
I loosely follow MOF in my design process.
During the Planning stage we will work together to understand the business strategy and goals and how this initiative supports those strategies and goals. Financial aspects of the desired service are also reviewed during this phase. A budget is created that is inline with the desired outcome and realistic in terms of the expectations of the service. It is very common that at this point the intended project is killed. Very often the financial constraints are not conducive to correctly building a reliable platform to deliver the services desired. I stress “correctly” “reliable” and “desired” here. Anyone can build a cheap something. Building the service correctly to tolerate failure, scalability and performance requires more effort and attention to detail.
Concerning attention to detail. Our theoretical design will incorporate use of vSphere and Veeam technologies. Logical diagrams will move to physical diagrams and every aspect of the physical diagrams must be in line with interoperability and multiple HCL’s.
- The version of vSphere needs to compatible with the desired versions and functionality of Veeam.
- Server storage adapters need to be validated for compatibility with vSphere, switching and the desired array platform.
- Cost effective vSphere licensing needs to be determined based on required feature sets.
Then there is the need to simply understand in detail the service requirements. In this case what kind of requirements does vSphere and Veeam have in order operate this service. What kind of constraints are we working within?
Does the client have large full backups that need to be moved weekly? What is the current IOPs usage on the current storage platform? What kind of IOPS will be needed at the secondary site for any Veeam injections/re-injections and failover?
Other design requirement planning examples.
- Failover tolerance, average load and max load must be decided upon and delivered correctly at the compute layer.
- Security of the service must be accounted for. How is this service going to be secured and by who? (IPS/AV/Patching, etc)
- Support and management costs of the service. This includes internal support, vendor support, etc.
- Performance monitoring and availability tracking has to be designed and agreed upon.
- All operating policies and procedures are agreed upon and created.
- Service life expectancy is decided and planned around.
Data Center / Colocation selection.
There will be many decisions to make and lots of planning if a third party facility is required.
- Location of facility.
- Other services offered by that facility.
- Internet connectivity (HSRP/BGP).
- Costs for space, power and cooling.
- Access to the facility.
- Initial power and bandwidth commit if using their connectivity services.
- Competency of the staff and management.
- Reputation of uptime and durability.
- SLA offered.
Money quickly starts to become an issue during the Planning stage. Even when dealing with the most financially capable of companies. The Planning phase can quickly spider web into a full blast faucet of ideas that all sound amazing and all cost money. Cost containment and scope creep become the popular terms here.
Attention to detail and designing within product requirements and constraints is important to a successful delivery.
“How hard can it be?”
Well here is one small example that outlines vSphere port requirements and should be used when developing your security posture. This is one very small aspect of the overall design.
Network Port Diagram for vSphere 6.0
Planning will continue through the Deliver phase. Especially project planning. Project planning and management will keep the project on track and ensure milestones and deliverables are being hit. Planning never really stops through the whole lifecycle of a product/service.
Rack stack and cabling will happen here as well. Something as simple as a Visio rack diagram plan must be followed correctly to ensure correct future rack and power growth. Cable management, power distribution, PDU/ATS configurations are just a few of the rack related items required.
All items will be procured and deployed. This includes security devices, networking, storage, compute, and software. All items must be configured for interoperability and best practice and at the same time inline with your application requirements.
The underlying infrastructure components and software need to be configured for basic use and optimized over time.
The service or application being deployed is delivered and configured. The application or service is brought online and integrated with other services or sites and prepped for testing.
All items deployed in the stack both hardware and software are tested for failure, failover, backup and recovery.
Don’t forget to properly configure the little things like NTP and validated SMTP relay via smart host and out to your e-mail provider?
It is also during this phase that your planned support structure starts taking shape. Those responsible for various aspects of the services support are being engaged.
Help desk, application analysis’s, network engineers, security engineers, etc are all doing their part to not only deploy the service but optimize it and support if during delivery.
Once all process, procedures and documents are agreed upon, the service is stabilized and the service is optimized we can move into the Operate phase.
Congratulations. The service is now in Operational phase. Now you have to keep it there. What does that mean exactly? Here are a few examples of the duties performed at this stage.
- Performance monitoring.
- Change management.
- Future design and new product/software evaluation to support your service.
- Incident Management.
The Operate phase will continue through the predetermined life expectancy of the deployed service. This is often decided upon during the Planning phase. But it is very common for SMB to simply leave this open ended. Initial budgeting and planning should include not only costs to deploy and support but long term support costs for future upgrades and replacements. Purchasing an array warranty extension at the end of year 3 may be a very costly expenditure for small businesses. It is good to buy support for as long as possible with the initial purchase but not to exceed your intended life expectancy of the product and use of the purchased hardware.
The process outlined can take months depending on cooperation of all parties involved. During that time things can change rapidly. Market conditions, pricing, business goals, financial resources and original business drivers for the service can change rapidly. Being familiar with all process involved is required to get to market quickly and efficiently. Every service has a customer. That customer can be both internal or external. The goal with most companies and their service delivery is to reach that customer as quickly as possible and in a positive manner.
Cloud Computing and the ability to deliver compute resources on demand has cut service delivery time drastically. Its also shitted financial burden into op-ex. While operating costs can be fairly predictable, bursting of services can increase the monthly rate of expenditure. Cost containment is still necessary in Cloud computing. Some may argue even more so than physical deployment. Services that charge for Gets/Puts/TX/RX can add up in a sneaky fashion.
Operating your service in Cloud can shift time and financial resource investment from underlying redundancy and resiliency to application design redundancy and resiliency. You can now focus on keeping your service available at the application layer while allowing a Cloud provider to handle the foundation and connectivity of the physical world below the software.