Cloud Deployments | Cloud Academy Blog

How to Go Serverless Like a Pro

Cloud Academy Team — Tue, 22 Oct 2019 07:14:42 +0000

So, no servers?

Yeah, I checked and there are definitely no servers.

Well…the cloud service providers do need servers to host and run the code, but we don’t have to worry about it. Which operating system to use, how and when to run the instances, the scalability, and all the architecture is managed by the provider.

But that doesn’t mean there’s no management at all. It’s a common misconception that in a serverless paradigm, we don’t have to care about monitoring, testing, securing, and other details that we are used to managing in other paradigms. So let’s explore the main characteristics that we need to take in consideration when building a serverless solution.

First, why serverless?

One of the great advantages of serverless is that you only pay for what you use. This is commonly known as “zero-scale” which means that when you don’t use it, the function can be reduced down to zero replicas so it stops consuming resources — not only network I/O, but also CPU and RAM — and then brought back to the required amount of replicas when it is needed.

The trigger of a function on an AWS Lambda can be an API gateway event, a modification on a DynamoDB table or even a modification on an S3 file as defined in What Are AWS Lambda Triggers? But to really save money on serverless, you need to take into consideration all of the services that a Lambda needs to work. Serverless architecture provides many advantages, but it also introduces new challenges. In this article, we’ll provide best practices when building a serverless solution.

To deep dive into building, deploying, and managing the serverless framework, check out Cloud Academy’s Serverless Training Library. It’s loaded with content and hands-on labs to give you the practical experience you need to integrate serverless architecture into your cloud IT environment.

Costs

Storage

Even though it is not a direct cost, it is a common architectural design to store some of the assets used on a Lambda on an S3 bucket, so we need to add the S3 cost to the total cost.

Network

If you’re sending or receiving large amounts of data on each request, you need to carefully review this cost because on peak hours it can easily go really high.

API calls

This is another hidden cost, since it’s not charged to the Lambda resources. You may have a lot of API calls to consume database information or others, so it still is an important part of the total cost.

Cold starts

A cold start is the first time the Lambda is getting executed after shutting down to zero replicas (40 to 60 minutes after the last execution). At a cold start, the Lambda might spend a larger time to get everything ready and respond. So even though it is not an actual extra charge, you might want to avoid cold starts by increasing your memory limits or create a script that “warms up” the lambda by calling it every few minutes. Either of the two solutions represents an extra cost for the Lambda.

The actual execution time

The execution time is measured by periods of 100ms. So if we have invocations that run for less than 100ms, let’s say 25ms, it would end up costing the same. And that’s why sometimes we spend more money than what we actually should. Even if the execution time exceeds only by 5 milliseconds (105ms) We still have to pay for the whole period of execution time.

To get all of this information about how much are we really spending, we need to monitor the Lambda.

Monitoring the Lambda

A common mistake is to confuse zero administration with zero monitoring. On a serverless environment, we still need to pay attention to the metrics, and these will be a bit different from the traditional ones like CPU, memory, disk size, etc. Lambda CloudWatch Metrics provides very useful metrics for every deployed function. According to the AWS documentation, these metrics include:

Invocation Count: Measures the number of times a function is invoked in response to an event or invocation API call.
Invocation Duration: Measures the elapsed time from when the function code starts executing to when it stops executing.
Error Count: Measures the number of invocations that failed due to errors in the function (response code 4XX).
Throttled Count: Measures the number of Lambda function invocation attempts that were throttled due to invocation rates exceeding the customer’s concurrent limits (error code 429).
Iterator Age: Measures the age of the last record for each batch of records processed. Age is the difference between the time the Lambda received the batch, and the time the last record in the batch was written to the stream. This is present only if you use Amazon DynamoDB stream or Kinesis stream.
DLQ Errors: Shows all the messages that Lambda failed to handle. If the event was configured to be handled by the DLQ, it can be sent again to the Lambda function, generate a notification, or just be removed from the queue.

Besides the default metrics, there are plenty of monitoring services like Dashbird, Datadog, and Logz.io that can be integrated, so we can have additional metrics for a better logs visualization.

Right now, everything seems very clear and straightforward, right? We have some new metrics and configurations to learn, but it is pretty similar to our traditional structures.

But what about tests? Can we even make local tests for serverless?

Tests

Local testing

Since we don’t manage the infrastructure anymore, can we run it locally? If so, how can we do that?

We do have some options to simulate the serverless environment locally, like LocalStack and Docker-Lambda. They can simulate serverless functions and a few other services, such as an API Gateway. But most of these tools have some differences with the real environment, like permissions, authentication layer, and other services.

The best way to check if everything is working as intended is writing the actual tests!

Unit testing

Unit tests are always a most — whether or not your app is serverless. They are the cheapest (fastest to write and execute). We can use mocked-up functions to test the business logic in isolation.

Integration testing

Integration testing will allow you to catch errors when your function interacts with external services. This tests becomes very important since serverless apps usually rely on a combination of external functionalities that communicates to each other constantly.

GUI testing

UI tests are usually expensive and slow because we have to run it on a manual, human-like environment. But, serverless makes it cheaper because of a fast and cheap parallelization.

To make the app easier to test, a good approach is to divide the function into many smaller functions that join together to accomplish the same task. One of the best ways to do it is by applying an Hexagonal Architecture.

Conclusion

Serverless architecture might be a big paradigm change, providing us with a whole new bag of useful tools and advantages. But it also introduces new challenges to the developers that need to make decisions about the new options they have. Learning the best practices before start developing it is always the easiest and short path to adopt any new paradigm. Hopefully, these tips will help you to decide which are the best approaches in your new project.

The post How to Go Serverless Like a Pro appeared first on Cloud Academy.

What is Kubernetes? An Introductory Overview

Cloud Academy Team — Wed, 12 Jun 2019 00:44:48 +0000

In part 1 of my webinar series on Kubernetes, I introduced Kubernetes at a high level with hands-on demos aiming to answer the question, “What is Kubernetes?” After polling our audience, we found that most of the webinar attendees had never used Kubernetes before, or had only been exposed to it through demos. This post is meant to complement the session with more introductory-level information about Kubernetes.

Containers have been helping teams of all sizes to solve issues with consistency, scalability, and security. Using containers, such as Docker, allow you to separate the application from the underlying infrastructure. Gaining that separation requires some new tools in order to get the most value out of containers, and one of the most popular tools used for container management and orchestration is Kubernetes. It’s is an open-source container orchestration tool designed to automate deploying, scaling, and operating containerized applications.

Kubernetes was born from Google’s 15-year experience running production workloads. It is designed to grow from tens, thousands, or even millions of containers. Kubernetes is container-runtime agnostic. You actually use Kubernetes to manage Rocket containers today. This article covers the basics about Kubernetes, but you can deep dive into the tool with Cloud Academy’s Intro to Kubernetes Learning Path.

|
What can Kubernetes do?

Kubernetes’ features provide everything you need to deploy containerized applications. Here are the highlights:

Container Deployments and Rollout Control. Describe your containers and how many you want with a “Deployment.” Kubernetes will keep those containers running and handle deploying changes (such as updating the image or changing environment variables) with a “rollout.” You can pause, resume, and rollback changes as you like.
Resource Bin Packing. You can declare minimum and maximum compute resources (CPU and memory) for your containers. Kubernetes will slot your containers into where ever they fit. This increases your compute efficiency and ultimately lowers costs.
Built-in Service Discovery and Autoscaling. Kubernetes can automatically expose your containers to the internet or other containers in the cluster. It automatically load balances traffic across matching containers. Kubernetes supports service discovery via environment variables and DNS, out of the box. You can also configure CPU-based autoscaling for containers for increased resource utilization.
Heterogeneous Clusters. Kubernetes runs anywhere. You can build your Kubernetes cluster for a mix of virtual machines (VMs) running the cloud, on-premises, or bare metal in your datacenter. Simply choose the composition according to your requirements.
Persistent Storage. Kubernetes includes support for persistent storage connected to stateless application containers. There is support for Amazon Web Services EBS, Google Cloud Platform persistent disks, and many, many more.
High Availability Features. Kubernetes is planet scale. This requires special attention to high-availability features such as multi-master or cluster federation. Cluster federation allows linking clusters together so that if one cluster goes down, containers can automatically move to another cluster.

These key features make Kubernetes well suited for running different application architectures from monolithic web applications, to highly distributed microservice applications, and even batch driven applications.

How does Kubernetes compare to other tools?

Container orchestration is a deservedly popular trend in cloud computing. At the beginning, the industry focused on pushing container adoption, advancing next to deploying containers in production at scale. There are many useful tools in this area. To learn about some of the other tools in this space, we will explore a few of them by comparing their features to Kubernetes.

The key players here are Apache Mesos/DCOS, Amazon’s ECS, and Docker’s Swarm Mode. Each has its own niche and unique strengths.

DCOS (or DataCenter OS) is similar to Kubernetes in many ways. DCOS pools compute resources into a uniform task pool. The big difference is that DCOS targets many different types of workloads, including but not limited to, containerized applications. This makes DCOS attractive for organizations that are not using containers for all of their applications. DCOS also includes a kind of package manager to easily deploy systems like Kafka or Spark. You can even run Kubernetes on DCOS given its flexibility for different types of workloads.

ECS is AWS’s entry in container orchestration. ECS allows you create pools of EC2 instances and uses API calls to orchestrate containers across them. It’s only available inside AWS and is less feature complete compared to open source solutions. It may be useful for those deep into the AWS ecosystem.

Docker’s Swarm Mode is the official orchestration tool from Docker Inc. Swarm Mode builds a cluster from multiple Docker hosts. It offers similar features compared to Kubernetes or DCOS with one notable exception. Swarm Mode is the only tool to work natively with the docker command. This means that associated tools like docker-compose can target Swarm Mode clusters without any changes.

Here are my general recommendations:

Use Kubernetes if you’re only working with containerized applications that may or may not be only Docker.
If you have a mix of container and non-containerized applications, use DCOS.
Use ECS if you enjoy AWS products and first-party integrations.
If you want a first party solution or direct integration with the Docker toolchain, use Docker Swarm.

Now you have some context and understanding of what Kubernetes can do for you. The demo in the webinar covered the key features. Today, I’ll be able to cover some of the details that we didn’t have time for in the webinar session. We will start by introducing some Kubernetes vocabulary and architecture.

What is Kubernetes?

Kubernetes is a distributed system. It introduces its own vernacular to the orchestration space. Therefore, understanding the vernacular and architecture is crucial.

Terminology

Kubernetes “clusters” are composed of “nodes.” The term cluster refers to nodes in the aggregate. “Cluster” refers to the entire running system. A node is a worker machine within Kubernetes, (previously known as “minion”). A node may be a VM or a physical machine. Each node has software configured to run containers managed by Kubernetes’ control plane. The control plane is the set of APIs and software (such as kubectl) that Kubernetes users interact with. The control plane services run on master nodes. Clusters may have multiple masters for high availability scenarios.

The control plane schedules containers onto nodes. In this context, the term “scheduling” does not refer to time. Think of it from a kernel perspective: The kernel “schedules” processes onto the CPU according to many factors. Certain processes need more or less compute, or have different quality-of-service rules. The scheduler does its best to ensure that every process gets CPU time. In this case, scheduling means deciding where to run containers according to factors like per-node hardware constraints and the requested CPU/Memory.

Architecture

Containers are grouped into “pods.” Pods may include one or more containers. All containers in a pod run on the same node. The “pod” is the lowest building block in Kubernetes. More complex (and useful) abstractions come on top of “pods.”

“Services” define networking rules for exposing pods to other pods or exposing pods to the public internet. Kubernetes uses “deployments” to manage deploying configuration changes to running pods and horizontal scaling. A deployment is a template for creating pods. Deployments are scaled horizontally by creating more “replica” pods from the template. Changes to the deployment template trigger a rollout. Kubernetes uses rolling deploys to apply changes to all running pods in a deployment.

Kubernetes provides two ways to interact with the control plane. The kubectl command is the primary way to do anything with Kubernetes. There is also a web UI with basic functionality.

Most of these terms were introduced in some way during the webinar. I suggest reading the Kubernetes glossary for more information.

What is Kubernetes? Demo walkthrough

In the webinar demo, we showed how to deploy a sample application. The sample application is a boiled-down micro-service. It includes enough to demonstrate features that real applications require.

There wasn’t enough time during the session to include everything I planned, so here is an outline for what did make it into the demo:

Interacting with Kubernetes with kubectl
Creating namespaces
Creating deployments
Connecting pods with services
Service discovery via environment variables
Horizontally scaling with replicas
Triggering a new rollout
Pausing and resuming a rollout
Accessing container logs
Configuring probes

I suggest keeping this post handy while watching the webinar for greater insight into the demo.

The “server” container is a simple Node.js application. It accepts a POST request to increment a counter and a GET request to retrieve the counter. The counter is stored in redis. The “poller” container continually makes the GET request to the server to print the counter’s value. The “counter” container starts a loop and makes a POST request to the server to increment the counter with random values.

I used Google Container Engine for the demo. You can follow along with Minikube if you like. All you need is a running Kubernetes cluster and access to the kubectl command.

How do I deploy Kubernetes?

First, I created a Kubernetes namespace to hold all the different Kubernetes resources for the demo. While it is not strictly required in this case, I opted for this because it demonstrates how to create a namespace and using namespaces is a general best practice.

I created a deployment for redis with one replica. There should only be one redis container running. Running multiple replicas, thus multiple databases, would create multiple sources of truth. This is a stateful data tier. It does not scale horizontally. Then, a data tier service was created. The data tier service matches containers in the data pod and exposes them via an internal IP and port.

The same process repeats for the app tier. A Kubernetes deployment describes the server container. The redis location is specified via an environment variable. Kubernetes sets environment variables for each service on all containers in the same namespace. The server uses REDIS_URL to specify the host, port, and other information. Kubernetes supports environment variable interpolation with $() syntax. The demo shows that composing application-specific environment variable names from Kubernetes provides environment variables. An app tier service is created as well.

Next comes the support tier. The support tier includes the counter and poller. Another deployment is created for this tier. Both containers find the server container via the API_URL environment variable. This value is composed of the app tier service host and port.

At this point, we have a running application. We can access logs via the kubectl logs command, and we can scale the application up and down. The demo configures both types of Kubernetes probes (aka “health checks”). The liveness probe tests that the server accepts HTTP requests. The readiness probe tests that the server is up and has a connection to redis and is thus “ready” to serve API requests.

Security: It’s never too early to start

Even if you’re new to all of this, it’s a good idea to lay a foundation of robust security when digging into an important service like Kubernetes.

You can start with some high-level best practices such as:

Authenticate securely: use OpenID connect tokens which are based on OAuth 2.0, an open and modern form of authentication.
Role Based Access Control (RBAC): use RBAC here and everywhere in your cloud deployments to always consider who actually needs access, especially at the admin level.
Secure all the traffic: whether it’s with network policies or at the more basic pod level via pos security contexts, it’s important to control access within and throughout via the network.
Stay up-to-date: an easy habit to enact immediately is to conduct updates at a regular cadence in order to help maintain your security posture.

What is Kubernetes? Part 2 – stay tuned

Part 1 of the series focused on answering the question “What is Kubernetes?” and introducing core concepts in a hands-on demo. Part 2 covers using Kubernetes in production operations. You’ll gain insight into how to use Kubernetes in a live ecosystem with real-world complications. I hope to see you there!

The post What is Kubernetes? An Introductory Overview appeared first on Cloud Academy.

What is Chaos Engineering? Failure Becomes Reliability

Cloud Academy Team — Thu, 29 Mar 2018 01:41:02 +0000

In the IT world, failure is inevitable. A server might go down, an app may fail, etc. Does your team know what to do during a major outage? Do you know what instances may cause a larger systems failure? Chaos engineering, or chaos as a service, will help you fail responsibly.

It almost sounds counterintuitive to think that failing is one of the best security and reliability measures, but this is what chaos engineering is all about. The simple idea behind it is to create chaotic scenarios to test the systems you have in place. Break things on purpose. It’s so punk!

The Origins of Chaos Engineering

In our blog, we have talked about Site Reliability Engineering before but Chaos engineering is a relatively new phenomenon. It all started with Netflix’s move to the AWS cloud in 2010. Netflix saw the cloud as vulnerable. They believed that no instance in the cloud could guarantee permanent uptime. So, they created Chaos Monkey. Chaos Monkey was designed to randomly disable production instances to ensure survivability during common types of failures.
Chaos Monkey wasn’t enough, though. Netflix wanted to create an entire virtual army of chaos, the Simian Army, which includes: Latency Monkey, Conformity Monkey, Doctor Monkey, Janitor Monkey, Security Monkey, 10-18 Monkey, and Chaos Gorilla. I won’t go into each monkey’s function, but the idea is simple: Create chaos, guarantee reliability.

The Simian Army may be a fun tool, but it wasn’t always fun for customers. Some of the monkeys were responsible for customer-related problems. The chaos was too uncontrollable. Effectively managing failure like this requires controlled simulation. Thus, Netflix created Failure Injection Testing.

Failure Injection Testing (FIT) was designed to give developers a “blast radius” rather than unmanaged chaos. Mapping out specific places where the tests will occur eliminates the risks. These tests are supposed to be proactive, giving IT teams real experience in dealing with outages and other common problems. Without FIT, chaos as a service wouldn’t be a viable product for a mass audience. Netflix introduced the FIT practice in 2014 when Kolton Andrus was working at the company. Andrus later became the co-founder of Gremlin, a company that offers chaos as a service.

What is Chaos as a Service?

Chaos as a service isn’t exactly chaotic in its current state. The Simian Army may have caused real chaos, but its use as a service is far more controlled and logical. Essentially, if you could simulate chaos in your day-to-day life to maximize your personal efficiency, wouldn’t you?

Putting out fires is a term I constantly hear about in the world of IT. Networking fires, production fires, release fires, etc. Everything is so reactionary, but it doesn’t have to be. Simulation is the best way to learn how to manage a real-world situation. Think of chaos engineering as an experiment. If you’re performing an experiment, you have a hypothesis. Thus, if you don’t have a clue what will happen during a failure, it might not be the right time to use chaos engineering.

You should have some idea about what will happen after you run a chaos experiment. The original Chaos Monkey may have created mostly random chaos to test its systems, but this approach isn’t optimal. Teams should have some idea of what to expect. Having a detailed knowledge and expectation of your systems will make these experiments more effective. If you’re wrong, you will only better understand your systems and know what to fix.

The Benefits of Chaos Engineering

Now, chaos engineering may sound a lot like testing, but it’s not that simple. The primary difference between testing and chaos engineering is the scale and the results. Testing tools are usually simplistic in practice. You provide a testing tool with a condition, and it gives you a result. There’s only so much that can be learned this way. Chaos engineering creates an experimental scenario to not only test your systems but to test yourself and your team. You might discover far more than you asked for. By causing deliberate failures, IT teams will gain confidence that their systems can deal with failures before they occur in production. All complex cloud systems will eventually fail. Using chaos engineering will allow you to recognize what’s wrong with the system, what you can do to fix it, and how to better deal with failure in real time. Building the most effective system requires experimentation. Chaos engineering allows you to run specific scenarios that could happen at any time while a product or service is live. Running these scenarios allows you to measure specific aspects of a failure. Maybe the scenario returned the exact result you expected, maybe it resulted in something completely new. Either way, you’re able to improve your systems and provide the most reliable service to your customers.

Gaining insight into system problems also creates a better production environment. Everyone will know what to look for in the future, and what systems might be vulnerable. You can make changes in your cloud environments based on your results.

The Unpredictability of the Cloud

One of the biggest concerns about the cloud is its relative unpredictability. Netflix introduced chaos engineering to combat their concerns about the cloud. The services that cloud platforms rely on can be inconsistent and chaos engineering is the perfect way to manage this.

Containers, microservices, and distributed systems are becoming a staple for cloud computing. These tools are incredibly useful, but they must be properly maintained. Any cloud provider may be vulnerable to occasional downtime. How you deal with cloud-related problems shouldn’t be figured out through hypotheticals or during a real outage. Chaos engineering can be the unpredictability that the cloud brings. You can use it to discover what to do with your systems in a non-critical environment. Simulating failure allows IT teams to verify that cloud systems are behaving as expected. This kind of tool is invaluable.

Chaos Can Be Fun!

If there were a real zombie apocalypse, it would be nowhere near as enjoyable as a video game or film. The same goes for failure. Simulating failure can easily be turned into a fun activity. You can specify when the failure is going to happen and can prepare a game day around it. These simulated failures have no real consequence, so it’s a great way to channel your love for computing!

Start your cloud training journey with Cloud Academy. Check out more of my posts at Solutions Review here.

The post What is Chaos Engineering? Failure Becomes Reliability appeared first on Cloud Academy.

What is HashiCorp Vault? How to Secure Secrets Inside Microservices

Cloud Academy Team — Thu, 30 Mar 2017 01:11:45 +0000

Whether you are a developer or a system administrator, you will have to manage the issue of sharing “secrets” or secure information. In this context, a secret is any sensitive information that should be protected. For example, if lost or stolen, your passwords, database credentials, or cloud provider keys could damage your business. Safe storage and sharing for this information are becoming more difficult with modern complex infrastructures. In today’s post, we’re going to explore how to get started with HashiCorp and how secure information can be managed in a microservice, Docker-based environment using HashiCorp Vault.

The drawbacks of common approaches

To deal with the problem of managing secure information, developers and sysadmins can choose from a few common approaches:

Stored in the image: While this approach is easy to achieve, it’s one that should be avoided in any production environment. Secrets are accessible by anyone who has access to the image and because they will persist in the previous layers of the image, they cannot be deleted.
Environment variables: When starting up our containers, we can easily set the environment variables using the -e Docker run parameter. This approach is much better than the previous one but it still has some drawbacks. For example, a common security gap is that secrets could appear in debug logs.
Secrets mounted in volumes: We can create a file that stores our secrets and then mount it at container startup. This is easily done and probably better than the previous approaches. However, it still has some limitations. With this approach, it becomes difficult to manage in infrastructures with a large number of running containers where each container only needs a small subset of secrets.

In addition to the cons mentioned above, all of these approaches share some common problems, including:

Secrets are not managed by a single source. In complex infrastructures, this is a big problem and ideally, we want to manage and store all of our secrets from a single source.
If secrets have an expiration time, we will be required to perform some manual actions to refresh them.
We cannot share just a subset of our credentials to specific users or services.
We do not have any audit logs to track who requested a particular secret and when, or any logs for failed requests. These are things that we should be aware of since they could represent potential external attacks.
Even if we find an external attack, we don’t have an easy way to perform a break-glass procedure to stop secrets from being shared with external services or users.

All of the above problems can be easily mitigated and managed using a dedicated tool such as HashiCorp Vault. This makes particular sense in a microservice environment where we want to manage secrets from a single service and expose them as a service to any allowed service or user.

What is HashiCorp Vault?

From the official Vault documentation:

Vault secures, stores, and tightly controls access to tokens, passwords, certificates, API keys, and other secrets in modern computing. Vault handles leasing, key revocation, key rolling, and auditing. Through a unified API, users can access an encrypted Key/Value store and network encryption-as-a-service, or generate AWS IAM/STS credentials, SQL/NoSQL databases, X.509 certificates, SSH credentials, and more.

Using Vault, we can delegate the management of our secrets to a single tool. Vault will take care of the at rest and in transit encryption of each secret. It has built-in support for several authentications, storage, and audit backends, and it was built with high availability in mind. Vault also makes it easy to set up multi-datacenter replication.

Get started with HashiCorp Vault

Vault makes use of a storage backend to securely store and persist encrypted secrets. In today’s example, we’ll use the PostgreSQL backend. We will begin by starting a container named vault-storage-backend from the official PostgreSQL image with vault as database name, username, and password:

$ docker run -d -e POSTGRES_PASSWORD=vault -e POSTGRES_USER=vault -e POSTGRES_DB=vault --name vault-storage-backend postgres

Since Vault’s PostgresSQL storage backend will not automatically create anything once set up, we need to execute some simple SQL queries to create the required schema and indexes.

Let’s connect to the Docker container and open a PSQL session:

$ docker exec -it vault-storage-backend bash
$ su - postgres
$ psql vault

Required schema and indexes can be easily created by executing the following SQL statements:

CREATE TABLE vault_kv_store (
    parent_path TEXT COLLATE "C" NOT NULL,
    path        TEXT COLLATE "C",
    key         TEXT COLLATE "C",
    value       BYTEA,
    CONSTRAINT pkey PRIMARY KEY (path, key)
);
CREATE INDEX parent_path_idx ON vault_kv_store (parent_path);

We don’t need to do anything else inside the PostgreSQL container so we can close the session and go back to the host terminal.
Now that PostgreSQL is properly configured, we need to create a configuration file to inform Vault that its storage backend will be the Vault database inside the vault-storage-backend container. Let’s do that by defining the following configuration file named config.hcl.

# config.hcl
{
  "backend": {"postgresql": {"connection_url": "postgres://vault:vault@storage-backend:5432/vault?sslmode=disable"}},
  "listener": {"tcp": {"address": "0.0.0.0:8200", "tls_disable": 1}}
}

Using Vault we can make use of the Access Control Policies – ACLs – to define different policies to allow or deny access to specific secrets. Before proceeding, let’s define a simple file that will be used to allow read-only access to each secret contained inside secret/web path to any authenticated user or service that will be identified as part of that policy:

# web-policy.hcl
path "secret/web/*" {
  policy = "read"
}

Both files will be stored inside a Docker data container to be easily accessible from other linked containers. Let’s create the container by executing:

$ docker create -v /config -v /policies --name vault-config busybox

Next, we will copy both of the files inside it:

$ docker cp config.hcl vault-config:/config/
$ docker cp web-policy.hcl vault-config:/policies/

Since we want to make use of Vault’s auditing capabilities and we want to make logs persistent, we will store them in a local folder on the host and then mount it in Vault’s container. Let’s create the local folder:

mkdir logs

Finally, we can start our Vault server by launching a container named vault-server:

docker run \
  -d \
  -p 8200:8200 \
  --cap-add=IPC_LOCK \
  --link vault-storage-backend:storage-backend  \
  --volumes-from vault-config \
  -v $(pwd)/logs:/vault/logs \
  --name vault-server \
  vault server -config=/config/config.hcl

As you can see, we are using the official Vault image available on the Docker hub. Vault is running on port 8200 inside the container and that port is exposed on port 8200 of the localhost. The PostgreSQL container is linked and named storage-backend inside the container, which is the same alias used in the configuration file config.hcl. Volumes are mounted from the data-container named vault-config and the localhost’s logs folder is mounted at /vault/logs/ inside the container. Finally, we have started Vault using the configuration defined inside the config.hcl configuration file.

To interact from the localhost to Vault we can define an alias:

$ alias vault='docker exec -it vault-server vault "$@"'
$ export VAULT_ADDR=http://127.0.0.1:8200

We can then initialize Vault by executing:

$ vault init -address=${VAULT_ADDR}

We will receive an output similar to the following:

Unseal Key 1: QZdnKsOyGXaWoB2viLBBWLlIpU+tQrQy49D+Mq24/V0B
Unseal Key 2: 1pxViFucRZDJ+kpXAeefepdmLwU6QpsFZwseOIPqaPAC
Unseal Key 3: bw+yIvxrXR5k8VoLqS5NGW4bjuZym2usm/PvCAaMh8UD
Unseal Key 4: o40xl6lcQo8+DgTQ0QJxkw0BgS5n6XHNtWOgBbt7LKYE
Unseal Key 5: Gh7WPQ6rWgGTBRSMecuj8PR8IM0vMIFkSZtRNT4dw5MF
Initial Root Token: 5b781ff4-eee8-d6a1-ea42-88428a7e8815
Vault initialized with 5 keys and a key threshold of 3. Please
securely distribute the above keys. When the Vault is re-sealed,
restarted, or stopped, you must provide at least 3 of these keys
to unseal it again.
Vault does not store the master key. Without at least 3 keys,
your Vault will remain permanently sealed.

The Vault was successfully initialized and now it is in a sealed state. In order to start interacting with it, we will first need to unseal it.

In the previous output, we can see five different unseal keys. This is because Vault makes use of Shamir’s Secret Sharing. Basically, this means that we will need to provide at least three of the five generated keys to unseal the vault. That’s why each key should be shared with a single person inside your organization/team. In this way, a single malicious person will never be able to access the vault to steal or modify your secrets. The number of generated and required keys can be modified when you initially set up your Vault.

Let’s unseal our vault using three of the provided keys:

vault unseal -address=${VAULT_ADDR} QZdnKsOyGXaWoB2viLBBWLlIpU+tQrQy49D+Mq24/V0B
vault unseal -address=${VAULT_ADDR} bw+yIvxrXR5k8VoLqS5NGW4bjuZym2usm/PvCAaMh8UD
vault unseal -address=${VAULT_ADDR} Gh7WPQ6rWgGTBRSMecuj8PR8IM0vMIFkSZtRNT4dw5MF

The final output will be:

Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
Unseal Nonce:

This means that the vault has been correctly unsealed and we can finally start interacting with it.

Additionally, to unseal keys, we can find an Initial Root Token key in the previous vault init command output. Authenticating Vault using that token grants us root access to Vault. Let’s authenticate using it:

$ vault auth -address=${VAULT_ADDR} 5b781ff4-eee8-d6a1-ea42-88428a7e8815

The received output will be:

Successfully authenticated! You are now logged in.

First, we need to enable Vault’s audit. To do that, you will need to execute the following:

$ vault audit-enable -address=${VAULT_ADDR} file file_path=/vault/logs/audit.log

From this point forward, every interaction with the Vault will be audited and persisted in a log file inside the logs folder on the localhost.

We can now write and read our first secret:

$ vault write -address=${VAULT_ADDR} secret/hello value=world
$ vault read -address=${VAULT_ADDR} secret/hello

The output will be exactly what we expect:

Key             	Value
---             	-----
refresh_interval	768h0m0s
value

Next, let’s associate the policy defined in the previous web-policy.hcl file to verify that ACLs are working as expected:

$ vault policy-write -address=${VAULT_ADDR} web-policy /policies/web-policy.hcl

Now we can write a new secret inside secret/web path:

$ vault write -address=${VAULT_ADDR} secret/web/web-apps db_password='password'

Vault has built-in support for many different authentication systems. For example, we can authenticate users using LDAP or GitHub. We want to keep things simple here, so we will make use of the Username & Password authentication backend. We first need to enable it:

$ vault auth-enable -address=${VAULT_ADDR} userpass

Next, let’s create a new user associated to the policy web-policy and with web as username and password:

$ vault write -address=${VAULT_ADDR} auth/userpass/users/web password=web policies=web-policy

Let’s authenticate this new user to Vault:

$ vault auth -address=${VAULT_ADDR} -method=userpass username=web password=web

Vault informs us that we have correctly authenticated to it, and since the policy associated to the user has read-only access to the secret/web path, we are able to read the secrets inside that path by executing:

$ vault read -address=${VAULT_ADDR} secret/web/web-apps

However, if we try to execute:

$ vault read -address=${VAULT_ADDR} secret/hello

We will receive the following:

Error reading secret/hello: Error making API request.
URL: GET http://127.0.0.1:8200/v1/secret/hello
Code: 403. Errors:
* permission denied

This means that Vault’s ACLs checks are working fine. We can also see the denied request from the audit logs by executing:

tail -f logs/audit.log

In fact, in the output we will see:

{
   "time":"2017-03-21T15:32:44Z",
   "type":"request",
   "auth":{
      "client_token":"",
      "accessor":"",
      "display_name":"",
      "policies":null,
      "metadata":null
   },
   "request":{
      "id":"e0c254e6-5701-79ac-2959-34db59d1c9cf",
      "operation":"read",
      "client_token":"hmac-sha256:3c0d732a6899fdae57018b4b341b08e1348e21cb866412e0a394ad48e3d4e8c4",
      "client_token_accessor":"hmac-sha256:48128e5b762f1ec376cebe9a3c41b85a2042d7e937b14b634f8c287a6deddd6c",
      "path":"secret/hello",
      "data":null,
      "remote_address":"127.0.0.1",
      "wrap_ttl":0,
      "headers":{
      }
   },
   "error":"permission denied"
}

In this scenario, we could easily integrate external services such as AWS CloudWatch and AWS Lambda to revoke access to users or completely seal the vault.

For example, if we would like to revoke the access to web user we could execute:

$ vault token-revoke -address=${VAULT_ADDR} -mode=path auth/userpass/users/web

Or if we would like to completely seal the vault, we can execute:

$ vault seal -address=${VAULT_ADDR}

Let’s now imagine that we have an external service running on a different container that needs access to some secrets stored with Vault. Let’s start a container from the official Python image and directly attach to its Bash.

docker run -it --link vault-server:vault-server python bash

To programmatically interact with Vault we first need to install the official Python client for Vault, called hvac.

$ pip install hvac

Let’s now try to access to some secrets from this new container via Vault:

import hvac
client = hvac.Client(url='http://vault-server:8200')
# We authenticate to Vault as web user
client.auth_userpass('web', 'web')
# This will work
client.read('secret/web/apps')
# This will not work since the authenticated user is associated with the ACLs web-policy
client.read('secret/hello')

Summary

Today we have seen how secrets can be delegated to a single point of access and management using HashiCorp Vault and how it can be set up in a microservice, container-based environment. We have only scratched the surface of Vault’s features and capabilities.

To get started with the HashiCorp Vault course, sign in to your Cloud Academy account. I also absolutely recommend spending some time with the official Getting started guide to go deeper into Vault’s concepts and functionalities.

The post What is HashiCorp Vault? How to Secure Secrets Inside Microservices appeared first on Cloud Academy.

How to Deploy Docker Containers on AWS Elastic Beanstalk Applications

Cloud Academy Team — Fri, 03 Feb 2017 08:45:08 +0000

In this post, we are going to look at how to deploy two Docker containers on AWS Elastic Beanstalk Applications.

Today, Docker containers are being used by many companies in sophisticated microservice infrastructures. From a developer point of view, one of the biggest benefits of Docker is its ability to package our code into reusable images. This assures us that our code will work as expected wherever it will be run. Development is also made easier and faster using Docker. Project dependencies are installed into isolated containers without the need to install them on our local machine. This allows us to develop applications with different requirements in a secure and isolated way.

Too often, developers are stuck trying to find an easy way to run their containers in production. With so many different technologies available, choosing among them isn’t an easy choice. Topics like high availability, scalability, fault tolerance, monitoring, and logging should always be part of a solid production environment. However, without enough knowledge, it may be difficult to achieve them using containerized applications.

Docker containers on AWS Elastic Beanstalk

AWS Elastic Beanstalk is a service for quickly deploying and scaling applications in the Amazon cloud. This includes services developed with Java, .NET, PHP, Python, Ruby, and Docker. Its support for Docker containers makes it an excellent choice for deploying Dockerized applications into solid production environments that are easy to manage and update.

Today, we’ll use a very practical example to show how easy it is to deploy Dockerized applications. The code we will use is available in this Github repo. Feel free to clone it locally to follow along with us.

The scenario

Our scenario is a basic web application with a single “Hello World” API endpoint served by a proxy server. We are going to implement it with two containers. The first container runs a Flask application with uWSGI, and the second container runs Nginx as a reverse proxy.

Local environment

We want to start by declaring our project dependencies in a file called requirements.txt:

Flask==0.12
uWSGI==2.0.14

Our desired application behavior can be easily implemented with Flask creating a file called main.py with this content:

from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/')
def index():
    return jsonify('Hello World!')
if __name__ == '__main__':
    app.run(port=9000)

To locally spin up our application with uWSGI, we can execute:

uwsgi --socket 0.0.0.0:9000 --protocol http -w main:app

Our application is ready. Now, we can define how to containerize it by defining the following Dockerfile:

FROM python:2.7
EXPOSE 9000
COPY requirements.txt /
COPY main.py /
RUN ["pip", "install", "-r", "requirements.txt"]
CMD ["uwsgi", "--socket", "0.0.0.0:9000", "--protocol", "http", "-w", "main:app"]

To build it, simply execute the command “docker build -t server .“. Docker will build an image called “server” with our code. Once it is complete, we can start a container from it by executing the command “docker run -d -p 9000:9000 –name server server“. If we open a browser to http://127.0.0.1:9000/ we will see our “Hello World” page.

Next, we should add a second container that runs Nginx to use it as a reverse proxy to our web server container running uWSGI. Start creating this configuration file called default.conf inside another folder:

server {
  listen 80;
  location / {
    proxy_pass http://server:9000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
  }
}

This can now be packed into an image with this simple Dockerfile placed in the same folder of default.conf:

FROM nginx:latest
COPY default.conf /etc/nginx/conf.d/
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

We can build the image for our proxy container by executing the command “docker build -t proxy”. Now, we are able to start a new container from our image by executing “docker run -d -p 8080:80 –link server:server –name proxy proxy“. If we open our browser to http://127.0.0.1:8080/ we will see that our request is proxied from the proxy container by running Nginx through the application running uWSGI.

Now that we have achieved our goal locally, it’s time to replicate the same situation in a production environment.

Production environment

To start, we should store our images in a secure Docker repository. If we want a private and cost-effective repository within our AWS account, we can use AWS Elastic Container Registry (ECR). Otherwise, we can simply push our images to the docker hub. Using ECR is simple and fast, and we just need to log into the AWS console, select ECS, and then create two new repositories for our images. ECR will provide us with instructions for pushing our local images.

Before going into production, the last thing we need to do is to define a configuration file to inform Elastic Beanstalk about how to use our images. This can be done by creating a file called Dockerrun.aws.json:

{
  "AWSEBDockerrunVersion": 2,
  "containerDefinitions": [
    {
      "name": "server",
      "image": "lzac/eb-docker-server",
      "essential": true,
      "memory": 200,
      "cpu": 1
    },
    {
      "name": "proxy",
      "image": "lzac/eb-docker-proxy",
      "essential": true,
      "memory": 200,
      "cpu": 1,
      "portMappings": [
        {
          "hostPort": 80,
          "containerPort": 80,
          "protocol": "tcp"
        }
      ],
      "links": [
        "server:server"
      ]
    }
  ]
}

As you can see, we are defining the same situation that we had locally: we need one container running the application server and another one running the proxy. They are linked together using the standard Docker linking pattern. For the purposes of this post, images are publicly stored on Docker Hub. Modify their URL if you are following the steps and you if pushed them to ECR or any different Docker registry.

Everything we need to run our containers in production is ready. From the AWS console, go to Elastic Beanstalk and start creating a new application. The only required information that we need to provide is the following:

Application name: Create your own name.
Application type: Select “Multi-container Docker.”
Application code: Select “Upload your code” and upload the Dockerrun.aws.json file.

Next, click on “Configure more options” and modify the “Capacity” section by selecting “Load Balanced” as environment type using from two to four instances in any availability zone.

Now, click “Create app.” Elastic Beanstalk will start provisioning every resource needed to run our code within the AWS cloud. It will create:

S3 bucket
Security group
Load balancer
Auto Scaling
CloudWatch metrics and alarms
EC2 instances
ECS cluster

Through these services, Elastic Beanstalk is automatically configuring many of the features that should always be part of best practices in any production environment. Provisioned EC2 instances will have Docker pre-installed. Once initiated, they will pull both images from the repository and start the required containers linked as defined in our configuration file. Once deployed, we are able to see the correct output of our application by opening a browser to the public URL provided by AWS Elastic Beanstalk.

We can now make use of the powerful Elastic Beanstalk configuration panel to modify a variety of settings for our application in just a few minutes. Elastic Beanstalk will transparently apply them for us.

AWS Elastic Beanstalk console

Let’s take a quick look at each section of the Elastic Beanstalk console to see what we can do:

Scaling: Our environment can be changed in a single instance, or we can easily modify the number of instances needed to run our application and which triggers should be used to scale-up or down the number of instances. This section allows us to easily set up the horizontal scaling that we want.
Instances: We can modify the type of instances, the key pair to SSH inside them, and the IAM role they have to assume. This is vertical scaling made simple!
Notifications: Only a single field: Add an email address that will receive Elastic Beanstalk event notifications via email.
Software configuration: We can select the instance profile for our application, enable S3 log file rotation, and configure the AWS CloudWatch alarms for our application. In the last part of the page, we can add environment variables that will be passed to our container. And, we can make use of this capability to securely store secrets and credentials that should not be stored in our source code.
Updates and deployments: This section gives us the chance to define how new deployments should be managed by Elastic Beanstalk. We really have a lot of options here. If you would like to go deeper into this topic, check out the official AWS documentation here.
Health: AWS Elastic Beanstalk makes use of a health check URL to find out if the application is running well on each instance. This is useful for stopping instances that are not working as expected and starting new ones if needed. We can set this URL in this section, define a service role for the monitoring system, and modify the health reporting for our environment.
Managed updates: If we want to perform periodical platform updates, this is the section to use. We can define when and for which level that updates should be applied to our system.
Load balancing: We can modify the load balancer of our stack in this section. For example, if we want to setup HTTPS for our application, we can easily do so here.
VPC: In this section, we can easily define the availability zones where our resources should be placed. We can also define whether or not a public IP address should be used and the visibility of our load balancer.
Data tier: If the application makes use of an RDS database, we can use this section to define it.

To sum up, we’ve taken a look at how to deploy Docker containers on AWS Elastic Beanstalk applications. As you can see, AWS Elastic Beanstalk is very easy to use and it is an excellent solution for deploying Docker containers on the AWS cloud. Developers are not forced to learn any new technologies and deployments can be easily made without particular operations knowledge/experience.

The post How to Deploy Docker Containers on AWS Elastic Beanstalk Applications appeared first on Cloud Academy.

How to Deploy an App from GitHub with AWS CodeDeploy

Cloud Academy Team — Tue, 02 Aug 2016 22:30:57 +0000

Application development is comprised of different stages. One such critical step is app deployment and code management. In this article, we’re going to share how you can use a deployment system that will enable you to automate the deployment and updating of your application – AWS CodeDeploy. It’s one of the three AWS tools which will help you integrate, deploy, and manage your app on the cloud: CodeDeploy, CodeCommit, and CodePipeline.

What Exactly is AWS CodeDeploy?

AWS CodeDeploy is deployment system that enables a developer or a team of developers to automate the software release process. In other words, it is a collection of settings that relates to the environment on which the application is to be deployed, how many instances can be implemented at once, and so on. It efficiently deploys your code to a fleet of EC2 instances while leaving as much of the fleet online as possible. The size of a fleet can vary from a single instance to thousands of instances.
The first step to getting started with AWS Code deploys is setting up EC2 instances. Then, you’ll need to tag them – this will allow you to define deployment groups, install the CodeDeploy agent on your hosts, and set up trust-roles to allow communication between the CodeDeploy and CodeDeploy agents.

The Key Functions of CodeDeploy

CodeDeploy also specifies information regarding the trusted role which automates the communication between the various EC2 instances and CodeDeploy. However, CodeDeploy doesn’t specify the code to be deployed or what to do during the deployment. The code to be implemented is stored as an archive in S3 and is referred to as an Application Revision. The “how to deploy” component of CodeDeploy is specified by the AppSpec file located inside the Application Revision. Here’s what you need to know about them:

AppSpec: This file resides in the repo and communicates with the CodeDeploy by indicating applications to be deployed, the location in they are to be deployed to, and allow you to specify the lifecycle scripts. These scripts are to be run at different stages during deployment and can be used to stop the service, install dependencies, and run database migrations.

Application Revision: It is a zip file which contains all the codes to be deployed. You can create it by packaging up the entire repo or a sub-directory of the repo. This file must be stored in the application review using the syntax /appspec.yml.

Since you will have one application in your repo when you package up the entire repo, you can then create your Application Revision (excluding the .git). When you do this, appsec.yml needs to be replaced in your repo root directory.
Now that we’ve learned what CodeDeploy is and established how it works, let’s move on to the steps which will show you how to deploy an application from GitHub with AWS CodeDeploy.

Deploying an Application from GitHub with AWS CodeDeploy

Installing and setting up your GitHub account: First download and install GitHub on your system. If you want to use AWS CLI to install a revision from GitHub over the instance, also install and configure the AWS CLI.
For creating a repository, you will be required to open GitHub account, so start with registering. You will just need to provide an email address, a username, and also the password.
Creating a GitHub repository: The next step after registering is creating a repository which will be used to store a revision. If you have an existing repository, rename it to CodeDeployGitHubDemo and skip this step and move ahead. However if you don’t then follow these steps:

Now over the GitHub home page, perform either of the subsequent steps:

Over Your repositories, select New Repository.
Over navigation bar, select Create new (+), and pick New repository.
Now, over Create a new repository page, perform the following task:

Over the Repository name box, type CodeDeployGitHubDemo.
Choose Public.
Remove Initialize this repository with README check box. You need to create a README.md file
in the following step.
Select Create repository
Now, after creating a repository, you will need to follow the following instruction
for using the command line. These guidelines differ from the operating system you're using:

For Unix or Linux: From the terminal, you need to run the below-mentioned commands, in sequence, where the username will be your GitHub username.

mkdir /tmp/CodeDeployGitHubDemo
cd /tmp/CodeDeployGitHubDemo
touch README.md
git init
git add README.md
git commit -m "My first commit"
git remote add origin https://github.com/user-name/CodeDeployGitHubDemo.git
git push -u origin master

You should then leave the command prompt open in the /tmp/CodeDeployGitHubDemo location.
For Windows: As an administrator, from the command prompt, run the following commands, in sequence:

mkdir c:\temp\CodeDeployGitHubDemocd c:\temp\CodeDeployGitHubDemonotepad README.md

Now, in the notepad save the README.md file. Close the notepad. Run the following commands, in sequence, where again the username will be your GitHub username:

git init
git add README.md
git commit -m "My first commit"
git remote add origin https://github.com/user-name/CodeDeployGitHubDemo.git
git push -u origin master

Then leave the command prompt open in the /tmp/CodeDeployGitHubDemo location.

Uploading the Application to your GitHub Repository: When you’re uploading the application using a revision make sure it follows the guidelines in Plan a Revision and Add an AppSpec File which we discussed earlier.
If the change follows the guidelines, then you’re ready for deploying the application to the instance.
Procuring an Instance: You will need to create an Amazon EC2 instance running any server such as Linux, Windows, RHEL or Ubuntu instance configured for use in AWS CodeDeploy deployments. Once you have verified and installed an instance set with AWS CodeDeploy, proceed to the next step.
Connecting the Application to the Instance: Now you are required to use the AWS CodeDeploy console for deploying the revision from the GitHub repository to the instance. Shadow the following steps for using the change:
You need to sign into the AWS Management console and then open the AWS CodeDeploy console and sign in using the credentials used earlier.
Select create new applications and key CodeDeployGitHubDemo-App.
Now, over the deployment group name box, key in CodeDeployGitHubDemo-DepGrp and choose a type tag.
Now, go to the Deployment Config, Service Role ARN.
Select Create Application.

Over the Application details page, in Deployment groups, select the button next to CodeDeployGitHubDemo-DepGrp.
In the Actions menu, select Deploy New Revision.
Over the Create New Deployment page, in the Revision Type area, select My application is stored in GitHub.
Select Connect with GitHub. The page which appears will ask you to authorize AWS CodeDeploy for interacting with GitHub for the application known as CodeDeployGitHubDemo-App.
Shadow the instructions over the Sign in the page to sign in with the GitHub account.
Now, on the Authorize application page, select Authorize Application.
On the AWS CodeDeploy Create New Deployment page, in the Repository Name box, key in the GitHub username which you had used while signing in, followed by a forward slash (/), followed by the name of the repository where you pushed your application revision (for instance, My-GitHub-User-Name/CodeDeployGitHubDemo)
If you are not sure about value to type, or if you need to specify a different repository:

In a different web browser tab, open your GitHub dashboard.
In Your repositories, hover your mouse pointer on the target repository name. A tooltip will appear which will display the GitHub user or organization name, followed by a forward slash character (/), followed by the name of the repository. Key in this displayed value on to the Repository Name box.

Over the Commit ID box, key in the ID of the commit associated with the push of your application revision to GitHub.
If you are not sure of the value to type:

In a different web browser tab, open your GitHub dashboard.
In your repositories, select CodeDeployGitHubDemo.
In the list of commits, search and copy the commit ID associated with the push of your application revision to GitHub. The ID is typically 40 characters in length and comprises both numbers and letters.
Key in the commit ID into the Commit ID box.

Leave the Deployment Description box blank.
Leave the Deployment Config drop-down list at the default of CodeDeployDefault.OneAtATime,
and select Deploy Now.

Wrap-Up

You now know how to deploy an application with GitHub and AWS CodeDeploy. If you have any questions, be sure to leave a comment below. For more information about CodeDeploy and other development best practices on AWS, take a look at our Developer Fundamentals for AWS course. It’s jam-packed with information about CodeDeploy, CodeCommit, and CodePipeline.

Reference: Tutorial: Deploy an Application from GitHub Using AWS … (n.d.). Retrieved from http://docs.aws.amazon.com/codedeploy/latest/userguide/github-integ-tutorial.htm.

The post How to Deploy an App from GitHub with AWS CodeDeploy appeared first on Cloud Academy.

Serverless Framework: A Deep Overview of the Best AWS Lambda + API Gateway Automation Solution

Cloud Academy Team — Tue, 24 May 2016 15:32:03 +0000

Architecting AWS-Powered Microservices in Python with Serverless Framework: What You Need to Know

We have been talking a lot about the new serverless cloud over the last few months. The news is all good. There have been tremendous improvements announced since AWS Lambda’s launch in 2014. Competing cloud vendors are working hard to catch up with AWS when it comes to serverless infrastructure-as-a-service.

Developing in a serverless fashion will definitely improve the life of developers and DevOps professionals — although we are still in the early days of this revolution.

Smart developers quickly realized the need for automation and structure, especially when orchestrating a nontrivial system of APIs and microservices.

In this post, we’re going to cover the following topics:

The history of Serverless Framework.
The basics of architecting applications in Serverless Framework.
The Serverless Framework command-line interface (CLI).
How to develop Serverless Framework applications in Python.

The Serverless Framework: A Brief History

With the goal of making the serverless transition smoother for everyone, the AWS community started working on a complete solution to develop AWS-powered microservices or backends for web, mobile, and IoT applications. The migration required facilitation because of the building-block nature of AWS Lambda and its complex symbiosis with Amazon API Gateway.
That’s how the Serverless Framework was born (formerly JAWS). Thanks to the hard work of @austencollins, JAWS made it to the HN homepage last Summer and was successfully presented at re:Invent 2015 in Las Vegas, Nevada thereafter.

To date, the project counts more than 60 contributors on GitHub and boasts an active ecosystem of plugins and tutorials. Take a look at an awesome curated list of resources related to the Serverless project.

ServerlessConf and the Serverless Community

The future will be Serverless. [@ServerlessConf]

The Serverless community is constantly growing, and not only within the AWS ecosystem. It is robustly supported by active meetups in San Francisco, New York, Melbourne, and Sydney.

If you feel like witnessing history in NYC, keep an eye on the (sold out) ServerlessConf this week. I will personally attend the event, and I’ll be glad to meet you there.

It will be fun to attend, but don’t panic if you didn’t secure a ticket. All the videos will be available online after the event.

Serverless Framework: The Basics

The Serverless Framework forces structure into serverless code by providing a minimal and clear organization for your lambda functions. The advantage of this structure means you will benefit from all the AWS best practices that were painstakingly built into in the framework. This structure effectively frees developers from low-level details about versions, aliases, stages, variables, roles, etc.

Rising above these details leaves room for greater productivity (and maybe a longer lunch).

As a software engineer, this freedom from drudgery allows you to focus on the business logic — which is the main goal of AWS Lambda itself.

On the operations side, the framework will also take care of the deployment process, which can easily become the most annoying part of using AWS Lambda – especially if your use case is complex.

Typically, you would need to bind together Lambda functions and API Gateway endpoints, configure the request/response integrations, deal with multiple stages and aliases (i.e. dev, prod, etc.), build, and upload your deployment package.
Once you successfully complete these steps, you’ll likely be required to do it again in more than one AWS region.

Installation and Code Organization

The Serverless Framework is built in JavaScript and requires Node V4. You can install it via npm:

npm install serverless -g

Once the package is installed globally, you can create or download a new project and start working on your Lambda functions in JavaScript (Node.js 0.10 or 4.3) or Python 2.7.

Most of the time, your Lambda functions will require some external dependencies. Keep in mind that you will need to upload these libraries together with your own code.

Finally, the compiled code must be compatible with the operating system used by AWS Lambda under the hood (e.g. CentOS).
The Serverless Framework suggests developers organize their code as follows:

Group your functions into subfolders
(i.e. “/functions/myfunction1/“, “/functions/myfunction2/“, etc.)
Define a module to share common code among your functions
(i.e. “/functions/lib/“), this way you will separate the Lambda boilerplate from your core logic
Define your dependencies as a package.json or requirements.txt file, eventually for each function, so that you install them via npm or pip: the framework will take care of zipping and uploading your node modules or Python Virtualenv.

Please note that the framework is still in Beta release (currently v0.5.5) and future versions might contain breaking changes due to the quickly evolving serverless landscape. That said, I am confident that most of the above mentioned best practices are fairly stable, as are the following ones.

Serverless Framework CLI and Concepts Overview

The framework exposes an interactive CLI that lets you manage and automate every phase of your workflow. You’ll need to become comfortable with the following critical concepts:

Projects: you can configure independent serverless projects, either by starting from scratch or from ready-to-use, installable and shareable projects (available as npm packages).
Plugins: the framework comes with a rich set of plugins that add useful functionalities to the core, such as CORS support, code optimization, and linting, Cloudwatch + SNS logging, TDD, and others. Feel free to create and publish your own plugins to better structure and share your code. There is also an official plugins registry.
Stages: each stage represents an isolated environment, such as dev or prod. You can use them to separate and isolate your AWS resources. As a best practice, the framework lets you start with a default dev stage.
Regions: these are an exact mapping of the official AWS regions. You may want to deploy each stage into a different region to achieve total separation, or even distribute your resource across multiple regions to optimize your system performance.
AWS Profiles: each profile corresponds to a valid combination of your AWS account access keys. You may want to use a different profile for each stage and eventually use totally different AWS accounts.
Functions: functions are the real core of your project, and the Serverless framework lets you organize them as you like, even if some sort of structure is highly recommended (really). Each function will ultimately correspond to a folder in your file system, with its own code (only JavaScript or Python, for now), its configuration, and a test event.
Endpoints: each function can potentially be bound to multiple endpoints, which correspond to API Gateway resources and methods (i.e. HTTP endpoints). You’ll need to declare them in each function configuration.
Events: if you need to invoke your function upon particular events – or in a recurring basis – you can declare a set of events in its configuration file (e.g. dynamodbstream, kinesisstream, s3, sns or schedule).
Resources: for each stage and region, you can define a set of CloudFormation resources, such as IAM roles, policies, etc. Interestingly, you can easily show the diff between your local configuration and the deployed resources.
Templates & Variables: these are a useful tool to reduce the size and redundancy of your configuration files. Variables can hold only numbers and strings, while templates allow you to define complex structures, in a hierarchical fashion.
Dashboard: the framework provides an interactive deployment dashboard, to visualize and select which resources to be deployed concurrently. You can also be shown a graphical summary of the current status of your project (see screenshot below).

A Real-World Serverless Example in Python

The Serverless Framework comes with a very useful boilerplate project for JavaScript functions. It will give you a great overview of how to organize your projects and provide a stable base to bootstrap a new one.

I personally took the time to create a new starter project for Python functions.

You can quickly download and install it by executing the following command:

serverless project install serverless-starter-python

This project, in particular, doesn’t come with any dependencies, but normally you’d need to install them via npm:

cd serverless-starter-python
npm install

Then, you will need to install your Python dependencies in a specific folder, as follows:

cd restApi
pip install -t vendored/ -r requirements.txt

Note: the “restApi” folder contains all our functions in this project. Eventually, you can have more than one root-level folders, and even sub-folders (see “restApi/multi/show” and “restApi/multi/create“).

But let’s focus on “restApi/continent“: this Lambda function takes a country name as input (e.g. “Germany“) and will return the corresponding continent (e.g. “Europe“). In order to demonstrate how to use Python dependencies, I implemented the function by using the countrycode module (see requirements.txt).

As a best practice, I organized the Lambda function code as follows:

“restApi/continent/handler.py” only contains the basic handler logic, plus some sys.path magic to include the common library.
“restApi/lib/continent.py” defines a continent_by_country_name function, which implements the actual business logic.
“restApi/lib/__init__.py” takes care of exposing the library utilities and including our Python dependencies in the sys.path (the “vendored” folder where we installed our requirements).

Here is the main function logic:

from countrycode import countrycode
def _error(msg):
    """ Utility to handle custom errors (see response mapping) """
    raise Exception("[BadRequest] %s" % msg)
def continent_by_country_name(event):
    """ Get continent name (e.g. "Europe"), given a country name (e.g. "Italy") """
    country_name = event.get('country')
    if not country_name:
        return _error("Invalid event (required country)")
    continent = countrycode(codes=[country_name], origin="country_name", target='continent')
    if not continent:
        return _error("Invalid country: %s" % country_name)
    return {
        "continent": next(iter(continent)),
    }

In order to demonstrate how to handle errors, I have wrapped the required logic in a _error function: it will simply raise a Python Exception with a well-defined structure. Indeed, here is how I defined the corresponding endpoint responses:

{
  "name": "continent",
  "runtime": "python2.7",
  "handler": "continent/handler.handler",
  ...
  "endpoints": [
    {
      "path": "continent",
      "method": "GET",
      "requestTemplates": "$${apiRequestTemplate}",
      "responses": {
        "400": {
          "selectionPattern": "^\\[BadRequest\\].*",
          "responseModels": "$${apiResponseModelsError}",
          "responseTemplates": "$${apiResponseTemplateError}",
          "statusCode": "400"
        },
        ...
      }
    }
  ]
}

As you can see in the “selectionPattern” definition, API Gateway will match any “[BadRequest]” prefix whenever you raise a Python Exception and bind it to a 400 – Bad Request HTTP response. Here you can see Templates in action: you can define and re-use them everywhere with the $${NAME} syntax.

Furthermore, I customized how the HTTP response is sent back in case of 400 errors, in our “restApi/continent/s-templates.json”:

{
  "apiRequestTemplate": {
    "application/json": {
      "country": "$input.params('country')"
    }
  },
  "apiResponseModelsError": {
    "application/json": "Error"
  },
  "apiResponseTemplateError": {
    "application/json": {
      "message": "$input.path('$.errorMessage')"
    }
  }
}

Here I defined three templates:

apiRequestTemplate is used to map the querystring “country” parameter into an explicit event parameter;
apiResponseModelsError associates API Gateway’s default Error model to our 400 response, which contains only a “message” parameter;
apiResponseTemplateError defines how to build our 400 HTTP Response body and is particularly useful if you want to hide the default stack trace and map “errorMessage” into the model’s “message” parameter.

Let’s test our function locally.

I have defined a local test event in “restApi/continent/event.json“, which contains a simple JSON structure such as {“country”: “Germany”}. You can easily run your function with the following command:

serverless function run continent

If everything went right, the framework will print out something like {“continent”:”Europe”}.

Once you are happy with your functions, you can launch the (beautiful) deployment dashboard:

serverless dash deploy

This dashboard is designed to be interactive. Here you can choose which resources you want to deploy, with a very intuitive interface. As I mentioned earlier, the framework will take care of all the details. It will create new Lambda versions, associate them with the right alias, re-deploy API Gateway stages, update every template mapping, etc.

Here is what the interactive dashboard looks like:

Once you confirm the operation, everything will be checked and deployed smoothly. You will also be given the HTTP endpoints for each updated Endpoint.

Of course, you can also decide to remove your functions and endpoints with a single command:

serverless function remove continent

Keep in mind that the framework will only delete the AWS Lambda aliases and the API Gateway resources associated to the corresponding stage. It means that all your functions versions will be still available on AWS, should you need them later on. This is possible because the framework never uses your Lambda functions’ $LATEST version, which is highly discouraged in general unless you are just prototyping.

More Serverless Resources

I personally believe the Serverless Framework – even if still in Beta – is a complete and robust solution for organizing complex projects. I expect these frameworks will continue improving and continue impressing users with new and more powerful features.

I would like to mention a few more serverless frameworks worth investigating:

Zappa: facilitates the deployment of all Python WSGI applications on AWS Lambda + API Gateway.
Kappa: a command line tool that (hopefully) makes it easier to deploy, update, and test functions for AWS Lambda.
Apex: lets you build, deploy, and manage AWS Lambda functions with ease (with Golang support!).
Lambda-complex: a Node.js framework for applications that run entirely within Lambda, SQS, and other high abstraction layers AWS services.
λ Gordon: an automation tool based on self-contained CloudFormation templates (Python, Javascript, Java, Golang and Scala support).

Do You Always Need a Full-Fledged Framework?

Sometimes, you want to keep complexity and friction as low as possible, especially for small projects or experiments. In these cases, you may prefer simpler solutions that better integrate with your existing code and solve only a single problem. Here is a small collection of such projects:

λambdify: a tool that turns any python callable into an AWS Lambda function.
Python-lambda: a toolset for developing and deploying serverless Python code in AWS Lambda.
Cloudia.js: it helps you easily deploy Node.js microservices to Amazon Web Services.

Here’s How You Can Easily Learn the Fundamentals of Serverless Design

There are plenty of resources for learning serverless. Here are some of the best:

Official AWS Documentation.
ServerlessCode.com, by Ryan Scott Brown.
AWS Lambda in Action, by Danilo Poccia.
Cloud Academy’s blogs, courses, and Labs.
Serverless Architectures on AWS, by Peter Sbarski.

The Serverless Cloud

It turns out serverless computing is also popular outside of the AWS ecosystem. Every cloud vendor is building its own serverless solution.

Why? Apparently, because developers really like this new level of abstraction. Roughly speaking, not dealing with operations and server maintenance simply sounds better, even without considering the powerful event-oriented approach offered by the serverless architecture.
Even though serverless computing doesn’t completely remove operations from your workflow, I personally believe it makes developers more comfortable, and drastically increases the ownership of their code.
Here are some of the main AWS Lambda alternatives, even if most of them are still alpha versions or not as complete as they should be:

Google Cloud Functions: here is my review of Google’s serverless solution (still in limited preview); it currently supports only JavaScript.
Microsoft Azure Functions: Azure’s solution – still in preview – looks very promising, but I haven’t had a chance to try it yet. It supports a wide variety of languages, including JavaScript, C#, Python, PHP, Bash, Batch, and PowerShell.
IBM Bluemix OpenWhisk: also IBM’s solution is still in alpha and I might give it a try soon. It currently supports JavaScript and Swift.
Iron.io: it is a serverless platform built on top of Docker and Golang. This one is language agnostic and currently supports Golang, Python, Ruby, PHP, and .NET.

I am still looking forward to a future where a great codebase like the Serverless Framework is vendor neutral and allows developers to work on the Cloud platform of their choice.

Let us know what you like or dislike about serverless computing, especially if you are using the Serverless Framework.

The post Serverless Framework: A Deep Overview of the Best AWS Lambda + API Gateway Automation Solution appeared first on Cloud Academy.

Cloud Learning Management Systems Support Corporate Training and Development

Cloud Academy Team — Tue, 15 Mar 2016 16:25:18 +0000

Corporate training is an investment in the present. It yields immediate results for all involved.

Image courtesy of pixabay.com, licensed under CC0 Public Domain.

Business success depends on a well-trained workforce. It makes sense that a motivated, engaged staff will ultimately result in improved profitability.
I contend that most businesses should implement employee training periodically, and some industries should make an effort to secure continuous training and development for their teams. It makes sense that various tech-related roles, such as engineers, IT professionals or developers, must stay on top of emerging trends and ongoing technological developments. Their roles depend on maintaining a mastery of emerging tech trends and putting these concepts into practical use for the benefit of their organizations. Tech businesses carefully monitor developments and constantly strain their workforces in adapting to meet new marketplace demands. What is the best use of resources in this pursuit? I suggest smart, flexible corporate training.

There is an ecosystem of commercial education and training vendors, such as Cloud Academy, who offers consumer and enterprise programs on a variety of technology and cloud-related topics at beginner, intermediate, and advanced levels.

A deeper insight into the matter made me realize that development experts and human resources professionals increasingly rely on cloud-based platforms for managing training because workers typically embrace the convenience with enthusiasm. Companies of all sizes finally have access to affordable digital employee training that delivers outstanding results. The flexibility of modern systems allows employers to schedule training in ways that don’t disrupt core workflows. Learners consume content on standard and mobile devices when their schedules permit. I believe this to be a game-changer.

Importance of Training and Development
Traditionally there has been a misconception that training costs detract from profitability. Some managers put employees to work with minimal training and are puzzled when workers appear indifferent about their jobs.

I contend that failure to invest resources training people sets the stage for dead-end careers, unproductive workers, and customer dissatisfaction. From a financial perspective, employers have a genuine interest in developing human capital. When employees succeed, they enjoy a sense of purpose and accomplishment. Satisfied workers welcome professional engagement and their success breeds a culture of success.
It’s safe to say that providing meaningful career support empowers employees and offers them a positive vision for the future and a greater purpose in life.

Training and Development Requirements
Businesses in different industries require different levels of training. Most organizations should spend time determining their training/development requirements. In some settings, the process should begin with a formal analysis of workflows and job designs. A thorough review should provide enough data identifying the key training objectives that will improve job performance and satisfaction at all levels. Training plans should include a long-range view that anticipates which skills employees will need in future roles as they advance.
After identifying training requirements and objectives, companies can more accurately gather data to estimate their potential investment costs. It’s only natural that businesses will search for the most efficient ways of developing and delivering training materials to their workers. The solutions may be external, internal or, more typically, a hybrid model. Thanks to technology-based training options, businesses have a near-unlimited selection of great offerings, from simple clerical training to extremely specific technical topics such as a Solutions Architect Professional Level Certification for AWS course.

After considering organizational needs and the options available to fulfill them, companies can decide on the solution that works best for them. Available Cloud-based Learning Management System (LMS) programs conform to the requirements of almost any business.
Below I offer an overview of some LMS programs that I’ve found work well for corporate training.

Cloud-Based LMS Explained
Web-based LMS platforms make administering and using employee training programs possible from almost anywhere in the world. These Cloud-based systems enable you to easily deliver content, record activity, track/report performance. Learning occurs entirely online, where the data is secured by providers.
Cloud-based solutions offer more convenience than traditional LMS applications that required the use of company servers, IT staff, and hands-on maintenance. Cloud-based LMS products require no installation because everyone involved accesses them through web browsers. The time and money saved by using the Cloud for LMS can go toward the training courses.

Cloud LMS products work on a pay-as-you-go pricing model that enables companies to pay only for the resources they use. System is typically included with the base price. Gone is the fear of system updates or configuration changes. Program plans and subscriptions can scale dynamically based on demand. You can expand or shrink training and development projects based on your business needs.
Trainers accessing Cloud-based LMS products have an efficient process for creating and delivering content based on their particular needs. After importing course content in the Cloud, trainers can assign learners to the various classes and run reports on training activities on individual employees and their organization. LMS administrators can modify or improve courses to continually improve the effectiveness of their training program based on the real-time feedback.

Learners have the ability to access training materials wherever and whenever they want as long as they have an internet conncetion. Thanks to smartphones and tablets, training via cloud-based LMS is easy to schedule and complete.

Top Cloud-Based LMS Programs
1. Docebo – A modular LMS that includes and advanced test engine, robust reporting, and performance tracking.
2. TalentLMS – This enterprise-friendly solution lets trainers assemble e-learning courses in minutes and customize them with a look and feel consistent with other corporate online assets.
3. Litmos – Multi-language and localization support, as well as custom branding, make Litmos an appealing LMS. The product allows the development of premium course content for customers and internal content for employees.
4. Firmwater LMS – Designed primarily for training companies, this robust platform can work for any business that will commit to online learning.
5. CourseMill – This LMS also has a traditional software-based counterpart, giving companies many choices for deploying e-learning. It also comes with a compliance interface that ends employee confusion over their responsibilities for completing a course.
Most LMS products offer free trial periods, so companies may evaluate them before purchase.

Mastering Cloud Computing
The growing omnipresence and utilization of Cloud Computing technology increases the demand for cloud-savvy IT professionals. By providing cloud integration, LMS helps businesses automate their training processes while synchronizing the data between the LMS and other platforms. Businesses can leverage various online learning platforms, such as Cloud Academy, aimed at developing their staff’s Cloud Computing skills, by integrating such platforms with the LMS of their choice to provide their staff with specific, high-quality learning materials that ensure skill building.

I encourage you to review some learning paths, courses, labs, and quizzes designed and built by working professionals around cloud topics. These programs lead to greater professional growth in critical areas. Users may follow a path to certification or simply for greater subject mastery and job mobility.

Conclusion
Businesses should always consider the training needs of their human capital. My own experience has taught me that well-trained people produce more and have a greater sense of satisfaction with their jobs. When people have the opportunity to learn and develop at both professional and personal levels, they can look forward to new opportunities for advancement and achievement with their current employers. Owners and managers develop a stable, professional organization that satisfies customers and generate profits.

Choosing Cloud-based LMS solutions reduces the implementation costs of training programs while making corporate training easier to administer. People have the advantage of completing their training courses at times and places convenient to them. Businesses have many LMS options from which to choose. With some research you will find a solution that best serves the needs of your organization.

I include some Success stories from Cloud Academy users that offer insight into the process and results of online corporate training.

The post Cloud Learning Management Systems Support Corporate Training and Development appeared first on Cloud Academy.

Migrating Data to AWS Using the AWS Schema Conversion Tool: A Preview

Cloud Academy Team — Wed, 06 Jan 2016 09:00:45 +0000

We will explore the best ways of migrating data to AWS using the AWS Schema Conversion Tool.

The AWS Schema Conversion Tool is a component of AWS Data Migration Service (DMS) which is still in preview mode at the time of writing this post (December 2015). We have discussed the benefits of migrating to AWS in a previous post.

With DMS, Amazon is wooing corporate customers with a low-cost means of moving their database workloads to the cloud. Amazon already has a managed database service called RDS (Relational Database Service). However RDS has its own limitations, and one of those limitations is related to data migration. Until now, there was no simple way of migrating schema and data from an on-premise or EC2-hosted database to RDS. Because RDS is a managed service it doesn’t support a number of features specific to the database engines. Previously, the only solution would require DBAs and developers writing their own migration scripts and testing them in a painful and an iterative manner until all the wrinkles have been ironed out.

Amazon addresses this issue with Data Migration Service. As Amazon’s public website for DMS states, the service will allow seamless and continuous data replication between a source and a target environment during a migration process. These environments may use either the same database engines or different ones as source and target. It is worth noting that the source can be an on-premise database instance, an EC2-hosted instance, or even another RDS instance. The target can be an EC2 instance or another RDS instance. AWS now supports a number of database engines, so that these combinations can support a wide variety of use cases. Migrating data to AWS is progressively easier with each passing week. Here’s a good post about the AWS Data Migration Tool that was written at re:Invent 2015.

Anyone who ever worked in a data migration project knows the process is never simple — never. A number of disciplines are involved here: solutions architects, infrastructure architects, engineers, DBAs, developers, and quality assurance testers. All these roles form components of project teams whose work can span from a few weeks to many months. Successful database migration is generally iterative in nature. A repeatable process may recreate the target database structure every time, and load it with fresh source data repeatedly for quality assurance purposes. This process continues until everything works perfectly and the approvals are given for the next step. In my experience, the sooner this iterative process is put into practice, the better the chance of successful migration.

But a seamless data migration can only happen when the source and destination databases use similar or, at least, compatible structures. It may be relatively easy to migrate an on-premise SQL Server database to an RDS instance running the same engine, but, not so easy if the target is, say, MySQL. That’s because both engines use different methods of defining database objects and their code syntax is also widely different. There is no easy way out here. When the source and target databases are different, DBAs and developers have to bite the bullet and script the schema objects and programming logic in the target environment’s own language. More often than not, this becomes the most time-consuming part of migrating data to AWS.

This is where the AWS Schema Conversion Tool can come in really handy. You are probably wondering, “can the AWS Schema Conversion Tool save time for the database engineer?” The answer is yes! The AWS Schema Conversion Tool can connect to a source database engine and reverse engineer a source database structure to a format suitable for the target database. This includes converting both database schema objects (tables, indexes, constraints etc.) and database logic (functions, stored procedures, triggers etc). Where it can’t convert a source item, it will flag it and recommend remediation steps. You can then apply the converted schema and logic to the target environment from this same tool and make any manual changes to the target as suggested.

Migrating data to AWS has been common for a while and database migration toolkits are nothing new either. Most commercial database vendors offer them for free, but there are few that offer multiple source and target options, and I can’t think of any others available for almost all operating systems. The AWS Schema Conversion Tool is a desktop product that runs on Windows, Mac OS, Fedora and Ubuntu Linux. It’s free and can be downloaded once you sign up for a preview of the Database Migration Service.
Being the first of its kind from AWS, this tool has some limitations in its first release. It’s used for converting databases to RDS instances only — more precisely MySQL or Aurora in RDS. The source database engines can be either MS SQL Server or Oracle. That means there’s no support for PostgreSQL or the more recent addition, MariaDB.

Installing the AWS Schema Conversion Tool

We tested the AWS Schema Conversion Tool by installing it on a Windows 10 machine as well as on a Mac (running Yosemite). The installation process is fairly simple and fast in both cases.
Once the tool is installed, you need to download and install the JDBC drivers for the supported database engines. This is because The AWS Schema Conversion Tool uses JDBC to connect to databases. In our case, we downloaded and installed two:

Once the installation is complete, you need to tell the AWS Conversion Tool where to find those drivers. This is done by selecting the Settings > Global Settings menu option and choosing Drivers from the dialog box. In the image below, we have specified the paths for both the drivers:

With the initial setup complete, you can start a new migration project (every migration has to be part of a project). In our case, we chose to convert the well-known AdventureWorks database running in an EC2 instance to a MySQL 5.6 RDS instance.
The first step is to define the project properties. When in this screen, you see the source can be either SQL Server or Oracle and the destination can be MySQL or Aurora.

Connecting to the Source Database

Once the project is created, it’s time to connect to the source instance. In the image below we are connecting to our SQL Server 2012 instance in EC2.

Although SQL Servers have instance names (default instances typically have the same name as the Windows box), we did not specify it here. The user account was created before. One thing to note here is, that the user account must be a SQL Server standard login and not a Windows domain or local account. If the SQL Server is running in Windows-only authentication mode, this will not work because no standard logins (even sa) will be available.
Once the source is connected, its databases and schemas under each database are loaded in the left side of the screen.

Now if you are a SQL Server DBA or developer, you will notice a couple of things immediately:

A SQL Server instance has many components that are visible from the SQL Server Management Studio — the de-facto client tool for SQL Server. These objects include logins and server roles, jobs, alerts, linked servers, replication publishers/subscribers and so on. Even within each database, there are objects like, users, partitions, full text indexes etc. When you connect to the instance with the AWS Schema Conversion Tool, only databases and their objects, like tables, indexes, constraints or stored procedures are accessible.

If you are wondering why the tool works this way, remember, the Schema Conversion Tool is built for converting only database components, nothing else. That’s why it shows database objects from the source instance only.

The same thing can be said about Oracle databases. There are many different links for schemas, tables, users, directory objects, jobs or RMAN backups within the Oracle Enterprise Manager Database Control. Again, only database objects for different schemas are viewable from the conversion tool.

In SQL Server Management Studio, similar types of components in a database are grouped together. For example, all tables from all schemas in a database are listed under the tables folder. The same types of folders exist for views, stored procedures, or functions. You may also note that even within each type of object, there can be further grouping for related objects. For example, the node for a particular table will have all its indexes listed under one folder and all the constraints listed under another folder, and so on. With the AWS Schema Conversion Tool, this classification is shown in a different way. Here, each schema under a database will have its own folder. Under each schema, there will be sub-folders for tables, views, stored procedures and so on. In essence, even though the objects could belong to the same database, their placement may be under different schemas.

The reason for this difference is in the naming convention. The term “database schema” is loosely used to mean two different things. One meaning refers to schema as the overall structural definition of the database. This definition includes the actual command to create the database as well as the commands to create objects within the database. The other meaning of schema refers to a namespace that belongs to a user account. In traditional database engineering terms, a schema is related to a database user — all the objects owned by a user belong to that user’s schema. A database then becomes just the holder for all the schemas in it.

SQL Server did not implement the concept of schemas in its older versions. However it’s very much a part of the database engine now and Microsoft has implemented the ANSI version of schema definition in its database product. On a high-level though, everything is still seen from an individual database perspective.

AWS Schema Conversion Tool sees things differently. It treats “schemas” in the traditional sense, a grouping of database objects from the same user. This means when you convert the source database components, the highest level you can convert from is the schema level, not the SQL Server database level. That means each separate schema under an SQL Server database will end up as a separate MySQL database in the target.

For a visual understanding, we are showing a side-by-side comparison:

Connecting to the Destination Database

Next, we connected to our target environment, the MySQL RDS instance. Here we provided similar information:

Once connected, the target instance schema objects are loaded in the right pane of the tool.

Running the Database Migration Assessment Report

It’s now time to check what components from our source databases (AdventureWorks) can be converted to MySQL format. For this we selected a schema, right-clicked on it and selected “Create Report” from the pop-up menu.

This will kick in the schema conversion engine. AWS Schema Conversion Tool will take its time to reverse engineer every object within the schema and check if the generated code can be run against the target instance without any change. If it can, it will make a note of this finding. If it can’t, it will make a note about why it’s not possible. At the end, the assessment will generate a report. That report is called the Database Migration Assessment Report.

You can access the Database Migration Assessment Report from the View menu of the conversion tool. There are two parts of the assessment report: the Summary part and the Action Items part.
The Summary page gives an overall picture of the migration possibility. It shows how many objects in each type of component it can convert and how many it can’t.

The Action Items page will go deeper into the analysis. It will list every object that can’t be converted to the target engine and the reason for their failure. I find it hugely helpful that the AWS Schema Conversion Tool highlights the particular command or syntax in the generated code that caused the problem. This code is viewable from the lower half of the Action Items screen.

In the following images, we are seeing the summary and action items report for two schemas. You can see the bottom half of the Action Items screen shows the generated code for a particular object that can’t be converted. When you select an individual problem item from the top half of the screen, its related object is highlighted in the left pane and its generated code is shown in the bottom.

Converting the Source Schema

With the assessment report giving you the overall migration possibility, you have two choices:

Make changes to the source database structure so it matches the target database syntax. This may mean changing data types for table columns or rewriting stored procedures, triggers, and functions with different syntax — in most migration scenarios, this is not an option.
Apply the converted schema to the target instance as is, and then make manual changes to the target database by creating tables with correct data types, or write code with the correct syntax. The second option is the preferred method by most data engineers.

Migrating the schema in the target environment is a two-step process:

Convert the schema. This is where the AWS Schema Conversion Tool will generate a local script to apply to the target environment.
Apply the converted schema. The converted schema code is actually applied to the target database instance, and the target objects are created.

In the images below, we have chosen to convert two schemas within the AdventureWorks database (HumanResources and dbo):

Applying the Converted Schema

With the schemas converted to MySQL format, the target instance will show the databases created. However, they don’t exist yet. To physically create the database objects, we had to apply the schemas from the target database pane:

And that completes the initial schema migration. If you are interested in migrating to virtual machines, you may find our post on Migrating Virtual Machines to the AWS Cloud useful.

Conclusion

We found the AWS Schema Conversion Tool a simple and intuitive piece of software with an elegant installation.

As noted before, there is no option to migrate to or from a PostgreSQL databases. Postgres is run in many corporate database workloads and we don’t know when this functionality will be available in the tool. The same can be said about other database engines like MySQL or MariaDB as sources. The target environment can only be one type and that’s RDS. This may help Amazon push its customers to adopt RDS, but it isn’t helpful if a customer wants to run their database in EC2.

Even if the schema migration runs smoothly and most of the objects can be transferred easily, database engineers still have to support other “moving parts.” This includes a multitude of components like: SQL Server Agent Jobs, Oracle Jobs, custom cron jobs, Windows scheduled tasks, network shares, users and privileges, linked servers, linked databases, partitions, full-text indexes, and so on.

The philosophy behind this tool’s purpose is focused. The tool is not meant to be a magic bullet for a turn-key migration – no tool can claim to have such functionality. Data mapping is the biggest time-sink for most migrations and this tool address that issue very well.

As we saw from our example, it’s not easy to migrate even a sample database in a lift-and-shift manner. That may disappoint many DBAs. After all, everyone wants a smooth migration where the tool does all the heavy lifting. We believe the AWS Schema Conversion Tool’s strengths lie in its ability to pinpoint exactly where the migration problems are, and suggesting ways to address those issues.

The post Migrating Data to AWS Using the AWS Schema Conversion Tool: A Preview appeared first on Cloud Academy.

Cloud Deployments | Cloud Academy Blog

How to Go Serverless Like a Pro

Costs

Storage

Network

API calls

Cold starts

The actual execution time

Monitoring the Lambda

Tests

Local testing

Unit testing

Integration testing

GUI testing

Conclusion

What is Kubernetes? An Introductory Overview

| What can Kubernetes do?

How does Kubernetes compare to other tools?

What is Kubernetes?

Terminology

Architecture

What is Kubernetes? Demo walkthrough

How do I deploy Kubernetes?

Security: It’s never too early to start

What is Kubernetes? Part 2 – stay tuned

What is Chaos Engineering? Failure Becomes Reliability

The Origins of Chaos Engineering

What is Chaos as a Service?

The Benefits of Chaos Engineering

The Unpredictability of the Cloud

Chaos Can Be Fun!

What is HashiCorp Vault? How to Secure Secrets Inside Microservices

The drawbacks of common approaches

What is HashiCorp Vault?

Get started with HashiCorp Vault

Summary

How to Deploy Docker Containers on AWS Elastic Beanstalk Applications

Docker containers on AWS Elastic Beanstalk

The scenario

Local environment

Production environment

AWS Elastic Beanstalk console

How to Deploy an App from GitHub with AWS CodeDeploy

What Exactly is AWS CodeDeploy?

The Key Functions of CodeDeploy

Deploying an Application from GitHub with AWS CodeDeploy

Wrap-Up

Serverless Framework: A Deep Overview of the Best AWS Lambda + API Gateway Automation Solution

Architecting AWS-Powered Microservices in Python with Serverless Framework: What You Need to Know

The Serverless Framework: A Brief History

ServerlessConf and the Serverless Community

Serverless Framework: The Basics

Installation and Code Organization

Serverless Framework CLI and Concepts Overview

A Real-World Serverless Example in Python

More Serverless Resources

Do You Always Need a Full-Fledged Framework?

Here’s How You Can Easily Learn the Fundamentals of Serverless Design

The Serverless Cloud

Cloud Learning Management Systems Support Corporate Training and Development

Corporate training is an investment in the present. It yields immediate results for all involved.

Image courtesy of pixabay.com, licensed under CC0 Public Domain.

Migrating Data to AWS Using the AWS Schema Conversion Tool: A Preview

We will explore the best ways of migrating data to AWS using the AWS Schema Conversion Tool.

Installing the AWS Schema Conversion Tool

Connecting to the Source Database

Connecting to the Destination Database

Running the Database Migration Assessment Report

Converting the Source Schema

Applying the Converted Schema

Conclusion

|
What can Kubernetes do?