Serverless Solution for Account and Network Management on Hybrid Cloud
An Elegant Serverless Solution for Account and Network Management on Hybrid Cloud
New generation organizations are building businesses natively on the public cloud. With ever-increasing offerings by cloud providers, intelligently managing the organization’s account and controlling the visibility of individual accounts is a daunting task. Building a sustainable, cost-effective and reliable networking backbone for data flow is critical to these companies. Increasingly these companies have attached on-premise provisioning to meet some of the security, compliance, edge latency or other SLA requirements. This very nature of business has pushed the envelope to adopt Hybrid cloud strategy.
Businesses that are not born in the cloud are not primed to take advantage of these latest digital technologies. Anchored to traditional means for computing, storage and networking backbone results in losing the competitive edge to more agile, responsive and innovative organizations. It’s has become a question of “when” for these businesses to transform into a hybrid cloud and not adopt a transition approach.
Rahi Systems Inc., a global IT solutions provider, are experts in helping with rapid adoption of Hybrid cloud for companies that are looking to stay ahead. Among many other advantages, the Hybrid cloud transformation brings increased revenue by decreasing the lead time for deployment, cuts costs by managing the resources better and shortens the resolution cycle. This adoption improves the customer experience by constantly improving measurable indices like uptime, response time, error rate and availability.
Lumina Networks, a new generation organization, is a Rahi Systems Inc. customer, building its businesses on the Amazon Web Services (AWS) platform. The customer’s requirement was to have a robust solution to manage individual AWS accounts and the VPC’s created on these accounts – a mechanism for a single source of truth to configure and administer security and network policies. A part of their request was to structure a security model spanning across all the accounts enabling data flow to be secure for both east-west and north-south traffic, organizing VPC’s into environmental group to intelligently manage the resource access and isolations as required for traffic flow amongst environments.This provides secure and granular remote access to the VPC resources using a software-defined VPN. In addition, to the aforementioned requirements, this orchestration is to be monitored for optimum system health and efficient performance by devising a customized solution leveraging AWS-provided tools like Lambda, CloudTrail, CloudWatch, SNS etc. The rationale behind having a monitoring setup in place is to optimize key resources, manage firewall policies and VPN access across accounts, introduce automation as required coupled with the ability to generate advance notification to alert on possible outages with a certain degree of accuracy and automatic recovery after outages.
Rahi Systems champions in providing solutions for companies with complex requirements like Lumina Networks by building serverless solutions to address network automation and monitoring, data flow orchestration and account management on the hybrid cloud environment built with public cloud platform like AWS. Delivering network backbone using IaaS from AWS and lambda technology in pursuing relentless automation of management tasks that have complex dependencies in addition to simple ‘if this then that’(ITTT) logic.
Fig 1: Development phases of serverless solution
The above picture shows the four distinct stages of development of an elegant serverless solution for Intelligent Account and Data Flow Orchestration on Hybrid cloud with AWS.
2.1 Account Management
It’s important to deep dive and expense some time upfront to ensure the AWS account setup is done right, in a consistent and scalable manner, especially with multi-account. Done right, it could save rework and replanning down the track, especially as the number of AWS accounts in the organization grows. This may not be the most exciting part of AWS, but it is an important part to get right.
Configuring multiple users into a single account is neither practical nor scalable, and poses a challenge in enforcing accountability. Organizations quickly adopt a multi-account structure which helps with:
- Cost Optimization and billing
- Security and governance
- Controlling workloads
- Resource grouping
- Defining business units
To address the above need we introduce a concept with two new artifacts-
- Controlled Account
- Operational Environment
Rahi Systems spends considerable amount of time on this process with the customers, as its an important founding step. For example, in Lumina’s use case a vending account approach makes it convenient and scalable.
Fig 2: AWS Organization
AWS account admins at the organizational level lose complete control of the vending account if its created with the individual email id. So we introduce the concept of controlled account. Controlled accounts are AWS accounts created for the purpose of handing out to the individual users of the company. Instead of creating an account with the email address of the individual an alias email address or an address in distribution group that belongs to AWS account admin is used to create this new account. Then, adding this individual user to this newly created AWS account with near admin rights gives the individual the desired perception of complete control.
The business purpose determines operating environments for accounts. Operational Environment is not an AWS artifact, its a collection of accounts that have similar security, compliance and data flow restrictions and requirements. For example, in Lumina’s use case the controller accounts were segregated simply into Production(Prod), Development(Dev) and Testing(Test) environments. Accounts that are created with the root user email id are owned by an IT admin instead of individual user email is referred to as controlled accounts.
Rahi Systems encourages configuring the Organization accounts to follow a few simple guidelines suggested below.
- Create a controlled account with an email address that is a distribution group.
- When required, repurpose the controlled accounts as individual user accounts by creating a user with individual email id and applying the identity-based policies.
- Designate certain controlled accounts to only host network management devices.
- Also, designate and publish the controlled accounts that host common applications
- Assign distinct CIDR range to each of these controlled accounts.
- Setup CloudTrail logging across account to enable automation.
- Setup account permission to allow lambda functions to execute in the default VPC
- Each account assigned to an environment.
- Choose a controlled account as master as a central account for executing lambda functions to manage, monitor and automate across accounts.
AWS Control Tower, is a service for automated landing zone setup and governance using guardrails. With its general availability from Q2 2019, some of the above issues can be addressed.
2.2 Network Design for VPC Management
Virtual Private Cloud resembles a traditional datacenter network to provide isolated sections within the cloud to launch resources in a virtual network tailored to your business purpose. It provides complete control over route tables, gateways, IP ranges. For large enterprises the number of VPC’s grow quickly especially when account vending methodology is used as described above. VPC peering
is a quick and easy way to communicate with each other. A VPC peering
connection is a networking connection between two VPCs that enables you
to route traffic between them using private IPv4 addresses or IPv6
addresses. Instances in either VPC can communicate with each other as if
they are within the same network. You can create a VPC peering
connection between your own VPCs, or with a VPC in another AWS account.
Rahi Systems assists in simplifying the network design and automating network management with serverless utilities. The network is architected to be resilient, fail-proof and cost efficient, especially for customers who do not have dedicated network professionals to design, size and manage their network. These serverless utilities are part of the control plane that is architected to help customers to manage the entire life cycle of the VPC’s.
As a first step, you work with our customers to discover the current design and complexity to evaluate and recommend an efficient solution to manage VPC’s, routing, IP allocation and traffic engineering.
Consider a fairly complex network provisioned within AWS having nVPCs using mVPC peerings to communicate with each other. The traffic across VPCs are managed and controlled by network administrator using security group and Network ACL’s. With every addition of a VPC – vpc peerings, route table, security groups and ACL’s required to be configured and updated manually by the administrator. To ease the deployment and management of such complex networks Rahi systems have designed and deployed solutions integrating transit gateway. It considerably simplifies the management by using centralized architecture similar to a hub and spoke model to handle VPC connections and control communication across VPCs. This solution significantly reduces the operational cost and number of VPC connections. For example, Lumina Networks needed a technical solution to fulfil their business requirements that would help them effortlessly manage their network and cut down on the cost. This business case was resolved by deploying the simple setup below:
Fig 3. Network Infrastructure Design
3.0 Continuous Integration (CI)/ Continuous Deployment (CD):
Setting up this complex infrastructure described above becomes exponentially difficult to manage. As new features are enabled and new accounts are added the growth becomes impossible to monitor and manage using AWS console.
AWS CloudFormation enables you to create and provision AWS infrastructure deployments predictably and repeatedly. It provides a common language to model and provision AWS and third party application resources in your cloud environment. AWS CloudFormation allows you to use programming languages or a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts. This gives you a single source of truth for your AWS and third party resources.
However the developers on a team might work in isolation for an extended period of time and only merge their changes to the master branch once their work was completed. This made merging code changes difficult and time-consuming, and also resulted in bugs accumulating for a long time without correction. These factors made it harder to deliver updates to customers quickly. AWS solution to this is to use AWS Codepipeline, where everyone merging code changes to a central repository multiple times a day. Each merge triggers an automated build and test. Changes in the source code are deployed automatically into the development/production server to provide full CI/CD solution.
Some of the benefits of CI/CD pipeline are:
- Early Bug detection
- Smaller code changes, conflicts are simpler
- Fault isolation is simpler and quicker
- Automating the whole process is easier
- Cost Effective process
- Increased visibility and transparency across the team
Fig 4. CI/CD Pipeline
Other solutions are used by DevOps team to manage and provision the technology stack automatically rather than using the manual process is in the above diagram, with the help of Bitbucket, Jenkins, Ansible and Terraform. Jenkins is the open source tool which is used to build and test the software projects continuously, making it easier for developers to integrate changes into project. Jenkins integrates life-cycle process including build, document, test, package, stage, deploy, static analysis and much more. The whole multi-cloud infrastructure can be managed using Terraform. Some of the benefits of using Terraform are:
- Immutable infrastructure
- Declarative code
- Automating the changes
- Resource graph
Rahi Systems implemented an advanced cloud solution conceived to address the project requirements for Lumina Networks using both Terraform and CloudFormation script. Scripts are written in the form of reusable and scalable modules. Terraform state file (terraform.tfstate) is stored on the S3 backend. Below diagram shows the various stages involved in deploying the AWS stack using Terraform.
Fig 5: Terraform Workflow
The code is accessible on BitBucket Link.
As company’s workload increases, automation is the key to cost effectively sustain operations. AWS provides excellent solution for comprehensive logging, event propagation and event triggering for observation and notification. However, to enable automation for the complex setup and effectively consolidate, manage, and analyze these different logs customers choose to implement centralized logging solutions across multiple accounts and regions. We architect a centralized logging solution that can drive control plane designed for customer specific needs.
Actionable intelligence is available in real time and when the conditions are met, a set of lambda functions are triggered based on CloudTrail extraction without events reaching ELK stack. This is achieved by having a copy of lambda installed and configured on every account. Managing multiple provisions for changes, upgrades, bug fixes with a single port visibility makes it difficult even with centralized logging enabled across accounts. Rahi addresses this by deploying a centralized control plane which primarily leverages lambda for orchestrating. This allows a single copy of lambda function to be applied across all accounts and regions as shown in the picture below. Events from CloudTrail of every region’s master account is directed to CloudWatch to send notification using SNS.
Fig 6. Lambda Cross Account Workflow
This orchestration for Lumina significantly reduced the massive information flow into a serialized, easily trackable and manageable setup. The control plane acts on the health of network configuration, workload, gateway attachments, VPN configuration, security groups configuration, high availability, network and compute provisioning and VPC management with account vending.
The detailed architecture of this control plane is outside the scope of this article. Customers are encouraged to reachout to Rahi Systems solutions architect team to discuss and tap in for expert custom solution for your needs. An example of attaching VPC to Production, Development or Testing environment using automation is summarized for completeness.
- User tags VPC in a certain format, if not the user is notified to correct the format. After a certain period of time VPCs with a rogue tag format are removed
- Maintain reference VPC database along with its subnet attachment
- Identify VPC subnet tag and its association of environment from the format
- Identify the action as an event move or new VPC with the help of reference database
- Create a transit gateway attachment for every new VPC in the corresponding environment
- Auto associate the subnet to TGW Routing Table and enable propagation
- For migration across environments, apply the detachment tag before reattaching the attachment tag to the requested environment
- Leverage the integration with third party event notifier if any, like Slack to send the notification to channel.
The sequence of action required is depicted below.
Fig 7. Lambda Cross Account Access Flow
Companies today need to evolve, innovate and differentiate themselves by concentrating on its core values. More and more companies are looking towards Managed Services. Managed Services helps to unburden and reduce operations overhead and risks by automating common activities, such as change requests, monitoring, patch management, security, and backup services, and provides full-lifecycle services to provision, run, and support cloud infrastructure.
AWS Managed Services achieving it by offering a step-by-step process for extending security, identity, and compliance perimeter to the cloud, including the critical tasks of Active Directory integration and compliance certification mapping (HIPAA, GDPR, SOC, NIST, ISO, PCI), taking responsibility for operating the cloud environment such as analyzing alerts and responding to incidents.
Rahi Systems has the expertise to optimise not only the AWS platform but the entire hybrid infrastructure. We suggest the best practices towards achieving this balance continuously. Educate the workforce as required, thus allowing companies to accelerate their understanding of how to best manage the infrastructure. We helps dirisking the migration efforts of resource and workloads on and off public cloud.
Lumina Networks has leveraged some of the expertise Rahi Systems can provide. Looking towards establishing Rahi Systems as their Security Operations Center(SOC). Rahi has integrated Slack for notification.
Competitive companies today are increasingly looking towards digital transformation, data center consolidation and DevOps at scale because of key drivers like faster innovation cycle, scalability, reduced risk, optimized cost and more that hybrid cloud would bring them.
Rahi System help architect a global scale hybrid solution, define holistic operations model, provide training and de-risk companies by providing them with managed services. We help remove barriers for companies that are just getting ready to adopt cloud in some form, or are large cloud heavy enterprises who need best practices for cost and budget management, and companies anywhere in-between these two states in the hybrid cloud journey.
With regard to usecase this article specifically concentrates on describing the serverless solution that Rahi Systems implemented for Lumina Networks to address its network backbone and simplify account management on AWS platform. Lumina Networks continues to engage Rahi System in this journey to become self-reliant and automate the operations as their core business expands. Some minor pieces of the solution are still in under development. However we strongly believe the solutions has broader scope and applications in the enterprise space.
If you need further information and help with any specific requirements please contact anyone on the team.
Reference:https://aws.amazon.com/blogs/apn/cloud-transformation-is-not-just-a-cheaper-alternative-to-a-data-center-but-a-path-to-it-innovation/ https://internetofthingsagenda.techtarget.com/definition/IFTTT-If-This-Then-That https://www.rahisystems.com/blog/a-look-at-aws-transit-gateways/ https://semaphoreci.com/blog/2017/07/27/what-is-the-difference-between-continuous-integration-continuous-deployment-and-continuous-delivery.html https://searchitoperations.techtarget.com/definition/Infrastructure-as-Code-IAC https://d1.awsstatic.com/whitepapers/building-a-scalable-and-secure-multi-vpc-aws-network-infrastructure.pdf?did=wp_card&trk=wp_card