Limoncelli (author of "The Practice of Cloud System Administration"), who worked for a long time at Google Inc, believes that 2010 is the year of transition from the era of the traditional Internet to the era of cloud computing.
* 1985-1994 – the time of mainframes (large computers) and intra-corporate data exchange, in which you can easily plan the load
* 1995-2000 – the era of the emergence of Internet companies,
* 2000-2003
* 2003-2010
* 2010-2019
The increase in the productivity of a separate machine is less than the increase in cost, for example, an increase in productivity by 2 times leads to an increase in cost significantly more than 2 times. At the same time, each subsequent increase in productivity is much more expensive. Consequently, each new user was more expensive.
Later, in the period 2000-2003, an ecosystem was able to form, providing a fundamentally different approach:
* the emergence of distributed computing;
* the emergence of low-power mass equipment;
* maturation of OpenSource solutions, allowing you to install software on multiple machines, not bundled with a processor license;
* maturation of telecommunication infrastructure;
* increasing reliability due to the distribution of points of failure;
* the ability to increase performance if needed in the future by adding new components.
The next stage was unification, which was most pronounced in 2003-2010:
* providing in the data center not a place in the closet (power-location), but already unified hardware purchased in bulk for the whole cent;
* saving on resources;
* virtualization of the network and computers.
Amazon set another milestone in 2010 and ushered in the era of cloud computing. The stage is characterized by the construction of large-scale data cents with a deliberate surplus in capacity to obtain a lower cost of computing power due to wholesale, based on savings for oneself and a profitable sale of their surplus at retail. This approach is applied not only to computing power, infrastructure, but also software, forming it as services to reduce the cost of their use by selling them at retail to both large companies and beginners.
The need for uniformity of the environment
Usually, novice Linux developers prefer to work from under Windows, so as not to learn an unfamiliar OS and stuff their own cones on it, because before everything was far from so simple and so debugged. Often, developers are forced to work from under Windows because of corporate preferences: 1C, Directum and other systems run only on Windows, and the rest, and most importantly the network infrastructure, is tailored for this operating system. Working from Windows leads to a large loss of working time for both developers and DevOps on fixing both minor and major differences in operating systems. These differences start to show up from the simplest tasks, for example, that it may be easier to make a page in pure HTML. But an incorrectly configured editor will put in the BOM and line feeds accepted in Windows: "\ n \ r" instead of "\ n"). BOM, when gluing the header, body and footer of the page, will create indents between them, they will not be visible in the editor, since these are formed by bytes of meta information about the file type, which in Linux do not have such a meaning and are perceived as translation of the indentation. Other newlines in GIT do not allow you to see the difference you made, because the difference is on each line.
Now let's take the Front developer. At first glance, what is difficult, because JS (JavaScript), HTML and CSS are interpreted natively by the browser. Previously, the layout of all different pages was done – it was checked by the designer and the customer and was given to the PHP developer for integration with the framework or cms. In order not to edit the header on each page, and then find out for a long time when they started to differ and which one was more correctly used by HAML. HAML adds additional syntax to HTML to avoid boiling: loops, file connections, in our case, a single header and footer. But it requires a special program to transform HTML into pure HTML. In MS Windows, this is solved by installing the compiler program and connecting it to the IDE, since all these features are in the IDE WEBStorm. With CSS and its size, duplicates, dependencies and support for different browsers, everything is much more complicated – LESS was used there, and now he headed the more functional SASS and libraries of support for different browsers, which requires the RUBY compiler and such a bundle usually does not work the first time. And for JS, CoffeScript was used. All this needs to be run through compression and validation programs (HTML validation is usually not needed).
When the project starts to grow and ceases to be separate pages with "JS inserts", and becomes SPA (Single Page Application, one page web applications), where everything is created by JS, and already collectors (Galp, Grunt), package managers and NodeJS are not assembled, the difficulties are becoming more and more. All these programs are free and were originally developed for Linux, designed to work from the BASH console and under Windows do not always behave well and are difficult to automate in graphical interfaces, despite the efforts of the IDE developers. Therefore, many WEB developers have switched from MS Windows to MacOS, which is a fork of UNIX systems, BASH is natively built into it.
Docker as lightweight virtual machines
Initially, the problem of isolation of provisions and projects was solved by virtualization – system software that emulates at a certain level the environment, which can be hardware (a computer as a set of components such as a processor, RAM, network device and others, if necessary) or, less often, operating system. The system administrator chooses the amount of RAM (no more free), processor, network device. Installs the operating system and, if necessary, drivers, installs the necessary programs. If you need a workplace for a second developer, he does the same. To install programs, it looks into the / bin directory of the first one and installs the missing ones. And here the first quiet problem arises, which has not yet manifested itself, that the programs are installed in different versions, but this will be a headache for developers, if one developer has a lot of work, and the other does not, or a headache for this sysadmin – if the developer works, in production – not.
With the increase in the number of jobs, the following problems are added:
* Less than 30% of the performance of the parent system is available to you, because all the commands that the processor must execute are executed by the virtualization program. To increase performance, the VT-X processor mode allows the processor, in which the processor directly executes commands from the virtual environment, and in cases of incompatibility, it throws an exception. True, these throws are hundreds of times more expensive than ordinary commands, so adult virtualization systems (VirtualBox, VMVare, and others) try to filter and modify potentially incompatible commands, which can significantly increase performance.
* Each workstation has to be created anew, for this the system administrator writes a script that automates this process, but it is naturally not ideal, and coming to constantly update it, make patches of incompatibilities that were installed by the programmers.
* A linear increase in the occupied disk space from the number of containers, and an exponential increase from product versions, despite the fact that one instance takes up a lot of space. That is, each sandbox contains a program emulation instance, an operating system image, all installed programs and developer code, which is quite a lot. The only thing that is one thing is the installation of the virtual machine itself, and then only within the framework of one physical server. For example, if we have 10 developers, then the size will be 10 times larger, with 3 product versions – 30.
All these disadvantages for the WEB are intended to be solved by Docker. Hereinafter, we will talk about Docker for Linux, and will not consider the slightly different kernel implementation for FreeBSD, and the implementation for Windows 10 Professional, within which either a stripped-down Linux kernel or independent development of Windows containerization is purchased. The main idea is not to create a layer (virtualization of hardware, its own OS and hypervisor), but to differentiate rights. You do not multiply to put in the MS Windows container, but you can put both RedHut and Debian, since the kernel is one, and the differences are in the files, creating a sandbox (separate directories and prohibiting going beyond its chapels) with these files. Also, we are talking about WEB solutions, since for native solutions, problems may arise when the program needs to have exclusive access from the container (Docker sandbox) to the OS kernel, for example, for native rendering of windows. You can also limit the amount of memory, processor time, number of processes.
Lightweight Virtualization or Lightweight Isolation – A Look at Docker Implementation
Let's take a look at the history of the appearance of the prerequisites for the emergence of Docker, namely the prerequisites, since Docker itself does not implement isolation, let alone virtualization, but organizes work with it from the first. Unlike virtualization, which resembles a hangar with its own world and its own foundation, on which you can impose whatever your heart desires, for example, we give out a lawn, then for isolation you can draw an analogy with a fence. Isolation appeared in the Linux kernel gradually, in parts responsible for different levels, and in parallel, programs appeared to provide an interface and the concept of applying this isolation in real projects. Isolation consists of 6 types of resource limitation.
The first in the kernel was the isolation of the file system, which allows you to create a sandbox using the chroot command back in 1979, from outside the sandbox is completely visible, but when you go inside the folder over which the command is executed becomes root, and you will not be able to return. The next one was the delimitation of processes, so the sandbox exists and the host system as long as the process with pid (number) 1 exists. For the sandbox it is its own, outside the sandbox it is a normal process. Further, the steel distinctions of CGroups were tightened: user groups, memory and others. All this exists in the kernel of any Linux, regardless of whether you have Docker installed or not. Throughout history, attempts were made, OpenSource and commercial, to create containers, developing the functionality themselves, and similar solutions found their users, but they did not penetrate the masses. Docker at the beginning of its existence used a fairly stable but difficult to use LXC containerization solution. Gradually he replaced LXC with native CGroup. Docker also supports the salinity of its image (more on that later), but does not implement it itself, but uses UnuonFS (UFS).
Docker and disk space
Since Docker does not implement functionality, but uses the built-in Linux kernel, and does not have a graphical interface under the hood, it itself takes up very little space.
Since the container uses the host OS kernel, the base image (usually the OS) contains only complementary packages. So the Debian Docker image is 125Mb, and the ISO image is 290Mb. To check that one core is used in the container, we will display information about it: uname -a or cat / proc / version , and information about the container environment itself cat / etc / ussue
Docker builds an image based on the instructions in the Dockerfile, which can be located remotely or locally, and can be created from it at any time. Therefore, if you are not using the image at the moment, then you can delete it. An exception is the image created from the container using the Docker commit command , but it is not very correct to create this way, and you can always select from the Dockerfile image with the Docker history command and delete the image. The advantage of storing images is that you do not need to wait while it is being created: the OS and libraries are downloaded.
Docker itself uses an image called Image, which is built based on the instructions in the Dockerfile. When you create several containers on its basis, the space practically does not increase, since the container is just a process and config settings. When changing files in the container, the files themselves are not saved, but the changes made are saved, which will be deleted after the container is transferred. This guarantees in 99% of cases a completely identical environment, and as a result, it is not important to place preparatory operations common for all containers for installing specific programs in the image, a side effect of which is the absence of their duplication. To be able to save data, folders and files are mounted to the host (parent) system. Therefore, you can run a hundred or more containers on a regular computer, and you will not see any changes in the free local on the disk. At the same time, if the developers use git, and how can they do without it, and they often accumulate, then there may be no need to mount the folders with the source code.
The docker image is not a monolithic image of your product, but a layer cake of images, the layers of which are cached. This allows you to significantly save time for creating an image. Caching can be disabled with the build –no-cache = true command switch if Docker does not recognize that the data is mutable. Docker can see the changes in the ADD statement adding a file from the host system to the container by the hash of the file. So, if you create two containers, one with NGINX and the other with MySQL, both of which are based on Ubuntu 14.04, there will be three image layers: MySQL, NGINX, and Ubuntu. The images can be viewed with the Docker history command . It also works for your projects – when copying 2 versions of the code to your image using the ADD command with your product, you will have 3 layers and two images: the base one, with the code of the first version and the code of the second version, regardless of the number of containers. The number of layers is limited to 127. It is important to note that when cloning a project, you need to specify the version, not just git clone , but git clone –branch v1 and git clone –branch v2 , otherwise Docker will cache the layer created by the Git Clone command and when creating the second we get the same image.
Docker does not consume resources, but only limits them if it is specified in the settings when creating the container (for memory, the key is m, for the processor – c). Since Docker supports different containerization filesystems, there is no unified interface to customize. But, in any case, the resource is consumed as much as required, and not as much as allocated, as in virtual machines.
Such concern about the occupied disk space and the weightlessness of the containers themselves entails irresponsibility in downloading images and creating containers.
Garbage collection by container
Due to the fact that the container provides much more opportunities than a virtual machine, the situation is complicated by leaving garbage after Docker is running. The problem is solved simply by running the moore collector, which appeared in version 1.13 or more difficult for earlier versions by writing a script you need.
Just as simple to create a docker container run name_image , it is also simple to delete docker rm -f id_container . Often, in order to just experiment, it is convenient to run the container interactively docker run -ti name_image bash and we will immediately find ourselves in the container. When we exit it with Cntl + D , it will be stopped. To automatically remove the output field, use the –rm parameter . But because containers are so weightless, they are so easy to create, they are often thrown and not removed, which leads to their explosive growth. You can look at the running ones with the docker ps command , and at the stopped ones – docker ps -a . To prevent this, use the docker containers prune garbage collector , which was introduced in version 1.13 and which will remove all stopped containers. For earlier versions, use the docker rm $ script (docker ps -q -f status = exited) . If its launch is not desirable for you, most likely you are using Docker incorrectly, since it is almost as quick and easy to pull a container from an image as to restore it to work. If you need to save state in the container, then this is done by mounting folders or volumes.
A slightly more complicated situation is with images. When creating a container, if there is no image, it will be downloaded. Since one image can be for several containers, then when the container itself is deleted, it is not deleted. You will have to delete it manually docker rmi name_image , and if it is used, a warning will simply be issued. The cost of saving disk space comes at the cost of the fact that Docker cannot simply determine whether an image is needed yet or not. Since version 1.13, it can, using the docker imgae prune -a command , analyze which images are not used by containers and delete them. You need to be more careful here if Docker cannot get the image again, but the assumption of such a situation is not very correct. One such situation is the creation of a clustered image, while the Dockerfile config describing the process of its creation was lost, otherwise you can get the image from the Dockerfile using the docker build name_image command . It is correct to immediately take action and restore the Dockerfile from the image by looking at the commands that create images using Docker history name_image . The second situation is to create an image from a running container using the Docker commit command , and not from the Dockerfile, which is so actively popularized, but also actively deprecated.
Since an image consists of layers that are shared in different images, these layers remain in different emergency situations. Since we cannot use them separately, it is safe to delete them with the docker image prune command .
To save the results of the container's work, you can mount the host machine folder to the container folder. We can explicitly specify the folder on the host machine, for example, docker run -v / page_host: / page_container nama_image , or enable it to be generated by docker run -v / page_container nama_image . To remove generated folders (volumes) that are no longer used by containers, enter the Docker volume prune command . For the collection of unused networks, there is also a garbage collector.
There is also a single garbage collector, in fact, simply combining specialized docker system prune parameters into one with logically compatible parameters . There is a tendency to put it in crowns. You can also look at the space occupied by all containers, all images and all volumes using the docker system df command , and also without grouping – docker system df -v .
Many of the issues described here by garbage collection are handled by Docker-compose. In addition, it greatly simplifies life, unless you run the container once for experiments. So the command Docker-compose up starts the containers, and docker-compose down -v removes them, and all dependencies between them are also removed. All container launch parameters are described in Docker-compose.YML, as well as the relationships between them. Thanks to this, when changing the launch parameters of containers, you do not need to worry about deleting the old ones and creating new ones, you do not need to register all the parameters of the containers – just fill in with the up parameter , and it will either re-create or update the container configuration.
To prevent cluttering the system, Docker has a built-in configurable limit on the number of containers and images, reminding you to clean the system by running the garbage collector.
Saving time on container creation
We already met in the previous topic about images, about their layers and caching. Let's look at them in terms of container creation time. Why is this so important, after all, by analogy with virtualization, the system administrator started the creation of the container and while he passes it to the programmer, by this time he will definitely be assembled. It is important to note that a lot has changed since then, namely, the principles and requirements for the ecosystem and its use have changed. So, for example, if earlier the developer, having developed and tested his code at his workplace, sent it to the QA manager for testing it for compliance with business requirements, and when his turn comes to this code, the tester at his workplace will run this code and check … Now the infrastructure is handled by DevOps, which establishes a continuous process for delivering features developed by programmers, and containers are created automatically with each submission to the production branch for automated testing. At the same time, so that the work of some tests does not affect the work of others, a separate container is created for each test, and often the tests run in parallel in order to instantly show the result to the developer, while he remembers what he did and did not switch his attention to another task.
For standard programs: no need to install, no need to maintain
We often use a huge number of ready-made solutions. When choosing a solution, we are faced with a dilemma: on the one hand, it is more universal and more proven than we can afford to do, on the other hand, it is complex enough to figure out how to properly install and configure it ourselves, in order to install all dependencies, resolve conflicts, set up for initial use. Now installation and configuration has become much easier, standardized, low-level problems are largely absent. But before we continue, let's digress and take a look at the process from getting started to starting to use the app within the story:
* In those days, when all programs were written in assembler, the programs were distributed by mail, users had already installed and tested them, because testing in the companies was not provided. In case of problems, the user informed the developer about the problems to the company and, after fixing them, received by mail the already corrected version on the disk. The process is very long and the user tested it himself.
* During the distribution on disks, companies already wrote their software products in higher-level languages, tested them for different OS versions. Hereinafter, we will consider free software. The program already contained a MakeFile, which itself compiled and installed the program.
* Since the advent of the Internet, software is massively installed using package managers, when they exit, it is downloaded and installed from the remote OS repository. He tries to monitor and maintain the compatibility of the compatibility of programs. Further study and use of the program: how to start it, how to configure it, how to understand that it works falls on the user or the system administrator.
* With the advent of Docker Hub and WEB, applications are downloaded and run by a container. It usually does not need to be configured for initial operation.
For containers and images in general, the server can adjust the amount of free space and the occupied space. By default, 10G is allocated for all containers and images, while this volume should remain as dm.min_free_space = 5%, but it is better to put it in the config, which may have to be created as /etc/docker/daemon.json :
{
"storage-opts": [
"dm.basesize = 50G",
"dm.min_free_space = 5%",
]
}
You can limit the resources consumed by the container in its settings:
* -m 256m – maximum size of RAM consumption (here 256Mb);
* -c 512 – CPU usage priority weight (1024 by default);
* —Cpuset = "0,1" – numbers of allowed processor cores.
Product transfer and distribution
To transfer a project, for example, to a customer, and distribute it between developers and servers, you can use installation scripts, archives, images, and containers. Each of these ways to distribute a project has its own characteristics, disadvantages and advantages. Let's talk about them and compare.
lines, but the main thing is that it has a special mode, enabled by the -p switch , which dynamically outputs the number of lines we need, when new ones arrive, it updates the output, for example, docker logs name_container | tail -p .
When there are too many applications to manually monitor their work separately, it is advisable to centralize application logs. For centralization, numerous programs can be used that collect logs from different services and send them to a central repository, for example, Fluentd. It is convenient to use ElasticSearch to store logs, simply by writing them to a search engine. It is highly desirable that the logs are in a structured format – JSON. This will allow you to sort them, select the ones you need, identify trends using built-in aggregate functions, perform analysis and forecasting, and not just search by text. For analysis, the Kubana web interface included in the Elastic stack.
Logging is important not only for long-running applications. So for test containers, it is convenient to get the output of the passed dough. This can be done by writing in the Dockerfile in the CMD section: NPM run, which will run the tests.
Image storage:
* public and private Docker Hub (http://hub.docker.com)
* for private and secret projects, you can create your own image repository. The image is called registry
Docker for building apps and one-off jobs
Unlike virtual machines, launching, which is associated with significant human and computational costs, Docker is often used to perform one-time actions when software needs to be launched one-time, and it is desirable not to spend effort on installing and removing it. To do this, a container is launched, which is mounted to the folder with our application, which performs the required actions on it and, after they are completed, is deleted. An example is a JavaScript project for which you need to build and run tests. At the same time, the project itself does not use NodeJS, but contains only collector configs, for example, WEBPack, and written tests. To do this, we start the build container in iterative mode, in which you can control the build process, if necessary, and after the build is completed, the container will stop and delete itself, for example, you can run something like this at the root of the application: docker run -it –rm -v $ (pwd): app node-build . Tests can be carried out in a similar way. As a result, the application is built and tested on a test server, but the software that is not required for its operation on the production server will not be adopted and will not consume resources, and can be transferred to the production server, for example, using a container. In order not to write documentation on starting the build and testing, you can put two corresponding configs Docker-compose-build.yml and Docker-compose-test.yml and call them Docker-compose up -f ./docker-compose-build.
Management and access
We manage containers using the Docker command . Now, let's say there is a need to manage remotely. Using VNC, SSH, or something else to manage your Docker team will probably be too time consuming if the task gets complicated. That's right, first you will need to figure out what Docker is, because the Docker command and the Docker program are not the same thing, or rather, the Docker command is a console client for managing the Docker Engine client-server application. The team interacts with the Docker Machine server through the Docker REST API, which is intended for remote interaction with the server. But, in this case, you need to take care of authorization and SSL-encryption of traffic. This is ensured by the creation of keys, but in general, if the task is centralized management, differentiation of rights and security, it is better to look towards products that initially provide this and use Docker as a container launch, and not as a system.
By default, for security reasons, the client communicates with the server over a Unix socket (special file /var/run/socket.sock), not over a network socket. To work through a Unix socket, you can tell the curl sending program to use curl –unix-socket /var/run/docker.sock http: /v1.24/containers/json , but this feature is supported since curl 1.40, which is not supported on CentOS. To solve this problem and to communicate between remote servers, we use a network socket. To activate it, stop the systemctl server stop docker and start it with the settings dockerd -H tcp: //0.0.0.0: 9002 & (listen to everyone on port 9002, which is not permissible for production). After running, the docker ps command will not work, but docker -H 5.23.52.111:9002 ps or docker -H tcp: //geocode1.essch.ru: 9202 ps . By default, Docker uses port 2375 for http, and 2376 for https. In order not to change the code everywhere and not to specify the socket every time, we will write it in the environment variables:
export DOCKER_HOST = 5.23.52.111: 9002
Docker ps
Docker info
unser DOCKER_HOST
It is important to write export to make the variable available to all programs: child processes. Also, there should be no spaces around = . Now we can access the Dockerd server from anywhere on the same network. To manage remote and local Docker Engine (Docker on hosts), the Docker Machine program has been developed. Interaction with it is carried out using the Docker-machine command. First, let's install:
base = https: //github.com/docker/machine/releases/download/v0.14.0 &&
curl -L $ base / docker-machine – $ (uname -s) – $ (uname -m)> / usr / local / bin / docker-machine &&
chmod + x / usr / local / bin / docker-machine
Group of related applications
We already have several different applications, let's say NGINX, MySQL and our application. We isolated them in different containers, now they do not conflict with each other and NGINX and MySQL, we did not waste time and effort on making our own configuration, but simply downloaded: Docker run mysql , docker run Nginx , and for the application docker build.; docker run myapp -p 80:80 bash . As you can see, it would seem that everything is very simple, but there are two points: control and interconnection.
To demonstrate control, we will take the container of our application, and we will implement two possibilities – start and creation (bulkhead). For manual start, when we know that the container has already been created, but just stopped, it is enough to execute docker start myapp , but for automatic mode this is not enough, and we need to write a script that would take into account whether the container already exists, whether there is an image for it :
if $ (docker ps | grep myapp)
then
docker start myapp
else
if! $ (docker images | grep myimage)
docker build.
fi
docker run -d –name myapp -p 80:80 myimage bash
fi
… And to create it, you need to delete the container, if it exists:
if $ (docker ps | grep myapp)
docker rm -f myapp
fi
if! $ (docker images | grep myimage)
docker build.
fi
docker run -d –name myapp -p 80:80 myimage bash
… It is clear that you need to general parameters, the name of the image, the container to be displayed in variables, to check that the Dockerfile is there, it is valid, and only after that delete the container and much more. To understand the real scale, without going into the interaction of containers, about cloning (scaling) these groups and the like, but I will just mention that the Docker run command can exceed one to two dozen lines. For example, a dozen of forwarded ports, mountable folders, memory and processor limits, connections with other containers, and a few more specific parameters. Yes, this is not good, but it is difficult to divide into many containers in this version, due to the lack of a container interaction map. But the question arises: Isn't there a lot to do to just provide the user with the opportunity to start the container or rebuild? Often, the answer of the system administrator boils down to the fact that only a select few can be given access. But even here there is a solution: Docker-compose is a tool for working with a group of containers:
# docker-compose
version: v1
services:
myapp:
container-name: myapp
images: myimages
ports:
– 80:80
build:.
… For start docker-compose up -d , and for bulkhead docker down; docker up -d . Moreover, when changing the configuration, when a complete bulkhead is not needed, it will simply be updated.
Now that we simplify the process of managing a single container, let's work with a group. But here, for us, only the config itself will change:
# docker-compose
version: v1
services:
mysql:
images: mysql
Nginx:
images: nginx
ports:
– 80:80
myapp:
container-name: myapp
build:.
depence-on: mysql
images: myimages
link:
– db: mysql
– Nginx: Nginx
… Here we see the whole picture as a whole, the containers are connected by one network, where the application can access mysql and NGINX via the db and NGINX hosts, respectively, the myapp container will be created only when after raising the mysql database, even if it takes some time.
Service Discovery
With the growth of the cluster, the probability of nodes falling increases and manual detection of what has happened becomes more complicated; Service Discovery systems are designed to automate the detection of newly appeared services and their disappearance. But in order for the cluster to be able to detect the state, given that the system is decentralized – the nodes must be able to exchange messages with each other and choose a leader, examples are Consul, ETCD and ZooKeeper. We will consider Consul based on its following features: the whole program is one file, it is extremely easy to use and configure, has a high-level interface (ZooKeeper does not have it, it is believed that over time, third-party applications that implement it should appear), is written in a non-demanding language to computing machine resources (Consul – Go, ZooKeeper – Java) and neglected its support in other systems, such as, for example, ClickHouse (supports ZooKeeper by default).
Let's check the distribution of information between the nodes using a distributed key-value storage, that is, if we added records to one node, then they should spread to other nodes, and it should not have a hard-coded Master node. Since Consul consists of one executable file, download it from the official website at the link https://www.consul.io/downloads. html on each node:
wget https://releases.hashicorp.com/consul/1.3.0/consul_1.3.0_linux_amd64.zip -O consul.zip
unzip consul.zip
rm -f consul.zip
Now you need to start one node, for now, as master consul -server -ui , and others as slave consul -server -ui and consul -server -ui . After that, we will stop Consul, which is in master mode, and launch it as an equal, as a result of Consul – they will re-elect the temporary leader, and in case of a yoke of failure, they will re-elect again. Let's check the work of our cluster consul members :
consul members;
And so let's check the distribution of information in our storage:
curl -X PUT -d 'value1' .....: 8500 / v1 / kv / group1 / key1
curl -s .....: 8500 / v1 / kv / group1 / key1
curl -s .....: 8500 / v1 / kv / group1 / key1
curl -s .....: 8500 / v1 / kv / group1 / key1
Let's set up service monitoring, for more details see the documentation https://www.consul.io/docs/agent/options. html #telemetry, for that .... https://medium.com/southbridge/monitoring-consul-with-statsd-exporter-and-prometheus-bad8bee3961b
In order not to configure, we will use the container and mode for development with the already configured IP address at 172.17.0.2:
essh @ kubernetes-master: ~ $ mkdir consul && cd $ _
essh @ kubernetes-master: ~ / consul $ docker run -d –name = dev-consul -e CONSUL_BIND_INTERFACE = eth0 consul
Unable to find image 'consul: latest' locally
latest: Pulling from library / consul
e7c96db7181b: Pull complete
3404d2df15cb: Pull complete
1b2797650ac6: Pull complete
42eaf145982e: Pull complete
cef844389e8c: Pull complete
bc7449359c58: Pull complete
Digest: sha256: 94cdbd83f24ec406da2b5d300a112c14cf1091bed8d6abd49609e6fe3c23f181
Status: Downloaded newer image for consul: latest
c6079f82500a41f878d2c513cf37d45ecadd3fc40998cd35020c604eb5f934a1
essh @ kubernetes-master: ~ / consul $ docker inspect dev-consul | jq '. [] | .NetworkSettings.Networks.bridge.IPAddress'
"172.17.0.4"
essh @ kubernetes-master: ~ / consul $ docker run -d –name = consul_follower_1 -e CONSUL_BIND_INTERFACE = eth0 consul agent -dev -join = 172.17.0.4
8ec88680bc632bef93eb9607612ed7f7f539de9f305c22a7d5a23b9ddf8c4b3e
essh @ kubernetes-master: ~ / consul $ docker run -d –name = consul_follower_2 -e CONSUL_BIND_INTERFACE = eth0 consul agent -dev -join = 172.17.0.4
babd31d7c5640845003a221d725ce0a1ff83f9827f839781372b1fcc629009cb
essh @ kubernetes-master: ~ / consul $ docker exec -t dev-consul consul members
Node Address Status Type Build Protocol DC Segment
53cd8748f031 172.17.0.5:8301 left server 1.6.1 2 dc1
8ec88680bc63 172.17.0.5:8301 alive server 1.6.1 2 dc1
babd31d7c564 172.17.0.6:8301 alive server 1.6.1 2 dc1
essh @ kubernetes-master: ~ / consul $ curl -X PUT -d 'value1' 172.17.0.4:8500/v1/kv/group1/key1
true
essh @ kubernetes-master: ~ / consul $ curl $ (docker inspect dev-consul | jq -r '. [] | .NetworkSettings.Networks.bridge.IPAddress'): 8500 / v1 / kv / group1 / key1
[
{
"LockIndex": 0,
"Key": "group1 / key1",
"Flags": 0,
"Value": "dmFsdWUx",
"CreateIndex": 277,
"ModifyIndex": 277
}
]
essh @ kubernetes-master: ~ / consul $ firefox $ (docker inspect dev-consul | jq -r '. [] | .NetworkSettings.Networks.bridge.IPAddress'): 8500 / ui
With the determination of the location of the containers, it is necessary to provide authorization; for this, key stores are used.
dockerd -H fd: // –cluster-store = consul: //192.168.1.6: 8500 –cluster-advertise = eth0: 2376
* –cluster-store – you can get data about keys
* –cluster-advertise – can be saved
docker network create –driver overlay –subnet 192.168.10.0/24 demo-network
docker network ls
Simple clustering
In this article, we will not consider how to create a cluster manually, but will use two tools: Docker Swarm and Google Kubernetes – the most popular and most common solutions. Docker Swarm is simpler, it is part of Docker and therefore has the largest audience (subjectively), and Kubernetes provides much more capabilities, more tool integrations (for example, distributed storage for Volume), support in popular clouds, and more easily scalable for large projects (large abstraction, component approach).
Let's consider what a cluster is and what good it will bring us. A cluster is a distributed structure that abstracts independent servers into one logical entity and automates work on:
* In the event of a server crash, containers are dropped (new ones created) to other servers;
* even distribution of containers across servers for fault tolerance;
* creating a container on a server suitable for free resources;
* Expanding the container in case of failure;
* unified management interface from one point;
* performing operations taking into account the parameters of servers, for example, the size and type of disk and the characteristics of containers specified by the administrator, for example, associated containers with a single mount point are placed on this server;
* unification of different servers, for example, on different OS, cloud and non-cloud.
We will now move from looking at Docker Swarm to Kubernetes. Both of these systems are orchestration systems, both work with Docker containers (Kubernetes also supports RKT and Containerd), but the interactions between containers are fundamentally different due to the additional Kubernetes abstraction layer – POD. Both Docker Swarm and Kubernetes manage containers based on IP addresses and distribute them to nodes, inside which everything works through localhost, proxied by a bridge, but unlike Docker Swarm, which works for the user with physical containers, Kubernetes for the user works with logical – POD. A logical Kubernetes container consists of physical containers, the networking between which occurs through their ports, so they are not duplicated.
Both orchestration systems use an Overlay Network between host nodes to emulate the presence of managed units in a single local network space. This type of network is a logical type that uses ordinary TCP / IP networks for transport and is designed to emulate the presence of cluster nodes in a single network to manage the cluster and exchange information between its nodes, while at the TCP / IP level they cannot be connected. The fact is that when a developer develops a cluster, he can describe a network for only one node, and when a cluster is deployed, several of its instances are created, and their number can change dynamically, and in one network there cannot be three nodes with one IP address and subnets (for example, 10.0.0.1), and it is wrong to require the developer to specify IP addresses, since it is not known which addresses are free and how many will be required. This network takes over tracking the real IP addresses of nodes, which can be allocated randomly from the free ones and change as the nodes in the cluster are re-created, and provides the ability to access them via container IDs / PODs. With this approach, the user refers to specific entities, rather than the dynamics of changing IP addresses. Interaction is carried out using a balancer, which is not logically allocated for Docker Swarm, but in Kubernetes it is created by a separate entity to select a specific implementation, like other services. Such a balancer must be present in every cluster and, but within the Kubernetes ecosystem, is called a Service. It can be declared either separately as a Service or in a description with a cluster, for example, as a Deployment. The service can be accessed by its IP address (see its description) or by its name, which is registered as a first-level domain in the built-in DNS server, for example, if the name of the service specified in the my_service metadata , then the cluster can be accessed through it like this: curl my_service; … This is a fairly standard solution, when the components of the system, along with their IP addresses, change over time (re-created, new ones are added, old ones are deleted) – send traffic through a proxy server, IP or DNS addresses for the external network remain constant, while internal ones can change, leaving taking care of their approval on the proxy server.
Both orchestration systems use the Ingress overlay network to provide access to themselves from the external network through the balancer, which matches the internal network with the external one based on the Linux kernel IP address mapping tables (iptalbes), separating them and allowing information to be exchanged even if there are identical IP addresses in internal and external network. And, here, to maintain the connection between these potentially conflicting networks at the IP level, an overlay Ingress network is used. Kubernetes provides the ability to create a logical entity – an Ingress controller, which will allow you to configure the LoadBalancer or NodePort service depending on the traffic content at a level above HTTP, for example, routing based on address paths (application router) or encrypting TSL / HTTPS traffic, like GCP does and AWS.
Kubernetes is the result of evolution through internal Google projects through Borg, then through Omega, based on the experience gained from experiments, a fairly scalable architecture has developed. Let's highlight the main types of components:
* POD – regular POD;
* ReplicaSet, Deployment – scalable PODs;
* DaemonSet – it is created in each cluster node;
* services (sorted in order of importance): ClusterIP (by default, basic for the rest), NodePort (redirects ports open in the cluster, for each POD, to ports from the range 30000-32767 for accessing specific PODs from the external), LoadBalancer ( NodePort with the ability to create a public IP address for Internet access in public clouds such as AWS and GCP), HostPort (opens ports on the host machine corresponding to the container, that is, if port 9200 is open in the container, it will also be open on the host machine for forward traffic) and HostNetwork (the containers in the POD will be in the host's network space).
The wizard contains at least: kube-APIserver, kube-sheduler and kube-controller-manager. Slave composition:
* kubelet – checking the health of a system component (nodes), creating and managing containers. It is located on each node, accesses the kube-APIserver and matches the node on which it is located.
* cAdviser – node monitoring.
Let's say we have hosting and we have created three AVS servers. Now you need to install Docker and Docker-machine on each server, how to do this was described above. Docker-machine itself is a virtual machine for Docker containers, we will only build an internal driver for it – VirtualBox – so as not to install additional packages. Now, from the operations that must be performed on each server, it remains to create Docker machines, the rest of the operations for setting up and creating containers on them can be performed from the master node, and they will be automatically launched on free nodes and redistributed when their number changes. So, let's start the Docker-machine on the first node:
docker-machine create –driver virtualbox –virtualbox-cpu-count "2" –virtualbox-memory "2048" –virtualbox-disk-size "20000" swarm-node-1
docker-machine env swarm-node-1 // tcp: //192.168.99.100: 2376
eval $ (docker-machine env swarm-node-1)
We launch the second node:
docker-machine create –driver virtualbox –virtualbox-cpu-count "2" –virtualbox-memory "2048" –virtualbox-disk-size "20000" swarm-node-2
docker-machine env swarm-node-2
eval $ (docker-machine env swarm-node-2)
We launch the third node:
docker-machine create –driver virtualbox –virtualbox-cpu-count "2" –virtualbox-memory "2048" –virtualbox-disk-size "20000" swarm-node-3
eval $ (docker-machine env swarm-node-3)
Let's connect to the first node, initialize the distributed storage in it and pass it the address of the manager (leader) node:
docker-machine ssh swarm-node-1
docker swarm init –advertise-addr 192.168.99.100:2377
docker node ls // will display the current
docker swarm join-token worker
If tokens are forgotten, they can be obtained by executing the commands docker swarm join-token manager and docker swarm join-token worker in a node with a distributed storage .
To create a cluster, it is necessary to register (join) all its future nodes with the Docker swarm join –token … 192.168.99.100:2377 command , a token is used for authentication, to discover them, they must be on the same subnet. You can view all servers with the docker node info command
The docker swarm init command will create a cluster leader, while alone, but in the response received, the command necessary to connect other nodes to this cluster will be given, important information in which is a token, for example, docker swarm join –token … 192.168 .99.100: 2377 . Connect to the remaining nodes via SSH using the docker-machine SSH name_node command and execute it.
For the interaction of containers, the bridge network is used, which is a switch. But for several replicas to work, a subnet is needed, since all replicas will have the same ports, and proxying is done by ip using a distributed storage, and it does not matter whether they are physically located on the same server or different. It should be noted right away that balancing is carried out according to the roundrobin rule, that is, one by one to each replica. It is important to create an overlay network to create DNS on top of it and be able to refer to replicas by name. Let's create a subnet:
docker network create –driver overlay –subnet 10.10.1.0/24 –opt encrypted services
Now we need to fill the cluster with containers. To do this, we create not a container, but a service, which is a template for creating containers on different nodes. The number of replicas to be created is specified during creation in the –replicas key , and the distribution is random over the nodes, but as uniform as possible. In addition to replicas, the service has a load balancer, the ports from which (input ports for all replicas) are proxied are specified in the -p switch , and Server Discovery (discovery of working replicas, determining their IP addresses, restarting) replicas are performed by the balancer independently.
docker service create -p 80:80 –name busybox –replicas 2 –network services busybox sleep 3000
Let's check the status of the docker service ls service , the status and uniformity of the distribution of the docker service ps busybox container replicas and its work wget -O- 10.10.1.2 . Service is a higher-level abstraction, which includes the container and organizing its update (by no means only), that is, to update the container parameters, you do not need to delete and create it, but simply update the service, and the service will first create a new one with the updated configuration, and only after it starts will delete the old one.
Docker Swarm has its own load balancer Ingress load balacing, which balances the load between replicas on the port declared when creating the service, in our case it is port 80. The entry point can be any server with this port, but the response will be received from the server to which the request was forwarded by the balancer.
We can also save data to the host machine, as in a regular container, there are two options for this:
docker service create –mount type = bind, src = …, dst = .... name = .... ..... #
docker service create –mount type = volume, src = …, dst = .... name = .... ..... # to host
The application is deployed via Docker-compose running on nodes (replicas). When updating the Docker-compose configuration, you need to update the Docker stack, and the cluster will be consistently updated: one replica will be deleted and a new one will be created in its place in accordance with the new config, then the next one. If an error occurs, a rollback will be made to the previous configuration. Well, let's get started:
docker stack deploy -c docker-compose.yml test_stack
docker service update –label-add foo = bar Nginx docker service update –label-rm foo Nginx docker service update –publish-rm 80 Nginx docker node update –availability = drain swarm-node-3 swarm-node-3
Docker swarm
$ sudo docker pull swarm
$ sudo docker run –rm swarm create
docker secrete create test_secret docker service create –secret test_secret cat / run / secrets / test_secret health check: hello-check-cobbalt example pipeline: trevisCI -> Jenkins -> config -> https://www.youtube.com/watch? v = UgUuF_qZmWc https://www.youtube.com/watch?v=6uVgR9WPjYM
Docker company-wide
Let's take a company-wide view: we have containers and we have servers. It doesn't matter if these are two virtual machines and several containers or hundreds of servers and thousands of containers, the problem is to distribute containers on the servers, you need a system administrator and time, if there is little time and a lot of containers, you need a lot of system administrators, otherwise they will be suboptimally distributed. that is, the server (virtual machine) is working, but not at full capacity and the resources are being sold. In this situation, container orchestration systems are designed to optimize distribution and save human resources.
Consider evolution:
* The developer creates the necessary containers by hand.
* The developer creates the necessary containers using previously prepared scripts.
* The system administrator, using any configuration and deployment management system, such as Chef, Pupel, Ansible, Salt, sets the topology of the system. The topology indicates which container is located in which place.
* Orchestration (schedulers) – semi-automatic distribution, maintenance of the state and reaction to system changes. For example: Google Kubernetes, Apache Mesos, Hashicorp Nomad, Docker Swarm mode, and YARN, which we'll cover. New ones also appear: Flocker (https://github.com/ClusterHQ/Flocker/), Helios (https://github.com/spotify/helios/).
There is a native Docker-swarm solution. Of the adult systems, Kubernetes (Kubernetes) and Mesos showed the most popularity, while the former is a universal and completely ready-to-use system, and the latter is a complex of various projects combined into a single package and allowing you to replace or change their components. There is also a huge number of less popular solutions that are not promoted by such giants as Google, Twitter and others: Nomad, Scheduling, Scalling, Upgrades, Service Descovery, but we will not consider them. Here we will consider the most ready-made solution – Kubernetes, which has gained great popularity for its low entry threshold, support and sufficient flexibility in most cases, pushing Mesos into the niche of customizable solutions when customization and development is economically justified.
Kubernetes has several ready-made configurations:
* MiniKube – a cluster of one local machine, designed to overcome the threshold of entry and experiments;
* kubeadm;
* kops;
* Kubernetes-Ansible;
* microKubernetes;
* OKD;
* MicroK8s.
To start the cluster yourself, you can use
KubeSai – Free Kubernetes
The smallest structural unit is called POD, which corresponds to the YML file in Docker-compose. The process of creating a POD, like other entities, is done declaratively: by writing or changing a configuration YML file and applying it to a cluster. And so, let's create a POD:
# test_pod.yml
# kybectl create -f test_pod.yaml
containers:
– name: test
image: debian
To run multiple replicas:
# test_replica_controller.yml
# kybectl create -f test_replica_controller.yml
apiVersion: v1
kind: ReplicationController
metadata:
name: Nginx
spec:
replicas: 3
selector:
app: Nginx // label by which the replica determines the presence of running containers
template:
containers:
– name: test
image: debian
For balancing, a type of service (logical entity) is used – LoadBalancer, in addition to which there is also ClasterIP and Node Port:
appVersion: v1
kind: Service
metadata:
name: test_service
apec:
type: LoadBalanser
ports:
– port: 80
– targetPort: 80
– protocol: TCP
– name: http
selector:
app: WEB
Overlay network plugins (created and configured automatically): Contig, Flannel, GCE networking, Linux bridging, Calico, Kube-DNS, SkyDNS. #configmap apiVersion: v1 kind: ConfigMap metadata: name: config_name data:
Similar to secrets in Docker-swarm, there is a secret for Kubernetes, an example of which can be NGINX settings:
#secrets
apiVersion: v1
kind: Secrets
metadata: name: test_secret
data:
password: ....
And to add a secret to POD, you need to specify it in the POD config:
....
valumes:
secret:
secretName: test_secret
…
Kubernetes has more flavors of Volumes:
* emptyDir;
* hostPatch;
* gcePersistentDisc – drive on Google Cloud;
* awsElasticBlockStore – A disk on Amazon AWS.
volumeMounts:
– name: app
nountPath: ""
volumes:
– name: app
hostPatch:
....
Feature for UI: Dashbord UI
Additionally available:
* Main metrics – collection of metrics;
* Logs collect – collecting logs;
* Scheduled JOBs;
* Autentification;
* Federation – distribution by data centers;
* Helm is a package manager similar to Docker Hub.
https://www.youtube.com/watch?v=FvlwBWvI-Zg
Docker commands
Docker is a more modern counterpart to RKT containers.
In Linux, when a process terminates with PID = 1, then NameSpace is also buried, which leads to the shutdown of the OS, in the case of a container, similarly, since it is a special case of the OS. The delimitation of processes in itself does not provide additional overhead, as well as monitoring and limiting resources for processes, because systemd provides the same configuration options in the host OS. Network virtualization occurs completely: both localhost and bridge, which allows you to create bridges from several containers to one localhost and thereby make it a single one for them, which is actively used in POD Kubernetes.
Run a temporary container interactively -it . To enter, you need to press Ctrl + D, which will send a signal to shutdown, after which it will be removed by –rm to avoid clogging the system with stopped modern containers. If the image is created in such a way that the application is launched in the shell in the container, which is wrong, then the signal will be poisoned to the application, and the container will continue to work with the shell, in which case, to exit in a separate terminal, you will need to kill it by its name –name name_container. For instance,:
Docker run –rm -it –name name_container ubuntu BASH
In the beginning, the Docker CLI had a simple set of commands to manage the lifecycle of containers. Among them:
* Docker run to run the container;
* Docker ps to view running containers;
* Docker rm to remove a container;
* Docker build to create your own image;
* Docker images to view existing containers;
* Docker rmi to remove the image.
But with the growing popularity, the teams became more and more and it was decided to group them into groups, so instead of the simple "Docker run", the "Docker container" command appeared, which has 25 commands in the 19 version of Docker. These are cleanup, and stop and restore, and logs and various kinds of container connections. The same fate befell the work with images. But, the old commands have remained so far due to compatibility and convenience, because in most cases a basic set is required. Let's stop at it:
Starting a container:
docker run -d –name name_container ubuntu bash
Remove a running container:
docker rm -f name_container
Output of all containers:
docker ps -a
Output of running containers:
docker ps
Output of containers with consumed resources:
docker stats
Displaying processes in a container:
docker top {name_container}
Connect to the container through the sh shell (there is no BASH in alpine containers):
docker exec -it sh
Cleaning the system from unused images:
docker image prune
Remove hanging images:
docker rmi $ (docker images -f "dangling = true" -q)
Show image:
docker images
Create image in dir folder with Dockerfile:
docker build -t docker_user / name_image dir
Delete image:
docker rmi docker_user / name_image dir
Connect to Docker hub:
docker login
Submit the latest revision (the tag is added and shifted automatically, if not specified otherwise) the image on the Docker hub:
docker push ocker_user / name_image dir: latest
For a broader list at https://niqdev.github.io/devops/docker/.
Building a Docker Machine can be described in the following steps:
Creating a VirtualBox virtual machine
docker-machine create name_virtual_system
Creating a generic virtual machine
docker-machine create -d generic name_virtual_system
List of virtual machines:
docker-machine ls
Stop the virtual machine:
docker-machine stop name_virtual_system
Start a stopped virtual machine:
docker-machine start name_virtual_system
Delete virtual machine:
docker-machine rm name_virtual_system
Connect to virtual machine:
eval "$ (docker-machine env name_virtual_system)"
Disconnect Docker from VM:
eval $ (docker-machine env -u)
Login via SSH:
docker-machine ssh name_virtual_system
Quit the virtual machine:
exit
Run the sleep 10 command in the virtual machine:
docker-machine ssh name_virtual_system 'sleep 10'
Running commands in BASH environment:
docker-machine ssh dev 'bash -c "sleep 10 && echo 1"'
Copy the dir folder to the virtual machine:
docker-machine scp -r / dir name_virtual_system: / dir
Make a request to the containers of the virtual machine:
curl $ (docker-machine ip name_virtual_system): 9000
Forward port 9005 of host machine to 9005 virtual machine
docker-machine ssh name_virtual_system -f -N -L 9005: 0.0.0.0: 9007
Master initialization:
docker swarm init
Running multiple containers with the same EXPOSE:
essh @ kubernetes-master: ~ / mongo-rs $ docker run –name redis -p 6379 -d redis
f3916da35b6ba5cd393c21d5305002b78c32b089a6cc01e3e2425930c9310cba
essh @ kubernetes-master: ~ / mongo-rs $ docker ps | grep redis
f3916da35b6b redis "docker-entrypoint.s…" 8 seconds ago Up 6 seconds 0.0.0.0:32769->6379/tcp redis
essh @ kubernetes-master: ~ / mongo-rs $ docker port reids
Error: No such container: reids
essh @ kubernetes-master: ~ / mongo-rs $ docker port redis
6379 / tcp -> 0.0.0.0:32769
essh @ kubernetes-master: ~ / mongo-rs $ docker port redis 6379
0.0.0.0:32769
Build is the first solution to copy all files and install. As a result, when any file changes, all packages will be reinstalled:
COPY ./ / src / app
WORKDIR / src / app
RUN NPM install
Let's use caching and split the static files and the installation:
COPY ./package.json /src/app/package.json
WORKDIR / src / app
RUN npm install
COPY. / src / app
Using the base image template node: 7-onbuild:
$ cat Dockerfile
FROM node: 7-onbuild
EXPOSE 3000
$ docker build.
In this case, files that do not need to be included in the image, such as system files, for example, Dockerfile, .git, .node_modules, files with keys, they need to be added to node_modules, files with keys, they need to be added to .dockerignore .
–v / config
docker cp config.conf name_container: / config /
Real-time statistics of used resources:
essh @ kubernetes-master: ~ / mongo-rs $ docker ps -q | docker stats
CONTAINER ID NAME CPU% MEM USAGE / LIMIT MEM% NET I / O BLOCK I / O PIDS
c8222b91737e mongo-rs_slave_1 19.83% 44.12MiB / 15.55GiB 0.28% 54.5kB / 78.8kB 12.7MB / 5.42MB 31
aa12810d16f5 mongo-rs_backup_1 0.81% 44.64MiB / 15.55GiB 0.28% 12.7kB / 0B 24.6kB / 4.83MB 26
7537c906a7ef mongo-rs_master_1 20.09% 47.67MiB / 15.55GiB 0.30% 140kB / 70.7kB 19.2MB / 7.5MB 57
f3916da35b6b redis 0.15% 3.043MiB / 15.55GiB 0.02% 13.2kB / 0B 2.97MB / 0B 4
f97e0697db61 node_api 0.00% 65.52MiB / 15.55GiB 0.41% 862kB / 8.23kB 137MB / 24.6kB 20
8c0d1adc9b9c portainer 0.00% 8.859MiB / 15.55GiB 0.06% 102kB / 3.87MB 57.8MB / 122MB 20
6018b7e3d9cd node_payin 0.00% 9.297MiB / 15.55GiB 0.06% 222kB / 3.04kB 82.4MB / 24.6kB 11
^ C
When creating images, you need to consider:
** changing a large layer, it will be recreated, so it is often better to split it, for example, create one layer with 'NPM i' and copy the code on the second;
* if the file in the image is large and the container is changed, then from the read-only image layer the file will be completely copied to the editing layer, therefore, the containers are supposed to be lightweight, and the content is usually placed in a special storage. code-as-a-service: 12 factors (12factor.net)
* Codebase – one service – they are a repository;
* Dependeces – all dependent services in the config;
* Config – configs are available through the environment;
* BackEnd – exchange data with other services via an API-based network;
* Processes – one service – one process, which allows in the event of a fall to unambiguously track (the container itself ends) and restart it;
* Independence of the environment and no influence on it.
* СI / CD – code control (git) – build (jenkins, GitLab) – relies (Docker, jenkins) – deploy (helm, Kubernetes). Keeping the service lightweight is important, but there are programs not designed to run in containers like databases. Due to their peculiarity, certain requirements are imposed on their launch, and the profit is limited. So, because of big data, they are not only slow to scale, and rolling-abdate is unlikely, and the restart must be performed on the same nodes as their data for reasons of performance of access to them.
* Config – service relationships are defined in the configuration, for example, docker-compose.yml;
* Port bindign – services communicate through ports, while the port can be selected automatically, for example, if EXPOSE PORT is specified in the Dockerfile, then when a container is called with the -P flag, it will be terminated to the free one automatically.
* Env – environment settings are passed through environment variables, not through configs, which allows them to be added to the service config configuration, for example, docker-compose.yml
* Logs – logs are streamed over the network, for example, ELK, or printed to the output, which is already streamed by Docker.
Dockerd internals:
essh @ kubernetes-master: ~ / mongo-rs $ ps aux | grep dockerd
root 6345 1.1 0.7 3257968 123640? Ssl Jul05 76:11 / usr / bin / dockerd -H fd: // –containerd = / run / containerd / containerd.sock
essh 16650 0.0 0.0 21536 1036 pts / 6 S + 23:37 0:00 grep –color = auto dockerd
essh @ kubernetes-master: ~ / mongo-rs $ pgrep dockerd
6345
essh @ kubernetes-master: ~ / mongo-rs $ pstree -c -p -A $ (pgrep dockerd)
dockerd (6345) – + – docker-proxy (720) – + – {docker-proxy} (721)
| | – {docker-proxy} (722)
| | – {docker-proxy} (723)
| | – {docker-proxy} (724)
| | – {docker-proxy} (725)
| | – {docker-proxy} (726)
| | – {docker-proxy} (727)
| `– {docker-proxy} (728)
| -docker-proxy (7794) – + – {docker-proxy} (7808)
Docker-File:
* cleaning caches from package managers: apt-get, pip and others, this cache is not needed in production, only
takes up space and loads the network, but nowadays it is not often not relevant, since there are multi-stage
assembly, but more on that below.
* group commands of the same entities, for example, get APT cache, install programs and uninstall
cache: in one instruction – the code of only programs, with the spaced version – the code of the programs and the cache,
because if you do not delete the cache in one instruction, then it will be saved in the layer, regardless of
follow-up actions.
* separate instructions by frequency of change, so for example, if not split installation
software and code, then when you change something in the code, then instead of using the ready-made
layer with programs, they will be reinstalled, which will entail significant preparation time
image that is critical for developers:
ADD ./app/package.json / app
RUN npm install
ADD ./app / app
Docker alternatives
** Rocket or rkt – containers for the CoreOS operating environment from RedHut, specially designed to use containers.
** Hyper-V is an environment for running Docker on the Windows operating system, which is a wrapper (lightweight virtual machine) of the container.
Docker has branched off its core components, which it uses as primitives, which have become standard components for implementing containers such as RKT, bundled into the containerd project:
* CRI-O – OpanSource project aimed from the beginning to fully support CRI (Container Runtime Interface) standards, github.com/opencontainers/runtime-spec/">Runtime Specification and github.com/opencontainers/image-spec">Image Specification as a general interface for the interaction of the orchestration system with containers. Along with Docker, support for CRI-O 1.0 has been added to Kubernetes (more on this) since version 1.7 in 2007, as well as MiniKube and Kubic. Has a CLI (Common Line Interface) implementation in the Pandom project, which almost completely repeats Docker commands, but without orchestration (Docker Swarm), which is the default tool in Linux Fedora.
* CRI (Kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-Kubernetes/">Container Runtime Interface) – an environment for running containers, universally providing primitives (Executor, Supervisor, Metadata, Content, Snapshot , Events and Metrics) for working with Linux containers (process spaces, groups, etc.).
** CNI (Container Networking Interface) – work with the network.
Portainer
The simplest monitoring option would be Portainer:
essh @ kubernetes-master: ~ / microKubernetes $ cat << EOF> docker-compose.monitoring.yml
version: '2'
>
services:
portainer:
image: portainer / portainer
command: -H unix: ///var/run/Docker.sock
restart: always
ports:
– 9000: 9000
volumes:
– /var/run/Docker.sock:/var/run/Docker.sock
– ./portainer_data:/data
>
EOF
essh @ kubernetes-master: ~ / microKubernetes $ docker-compose -f docker-compose.monitoring.yml up -d
Monitoring with Prometheus
Monitoring – maintaining the continuity of work, tracking the current situation (identifying, localizing and sending about the incident, for example, in SaaS PagerDuty), predicting possible situations, visualization, building models for the normal operation of IAOps (Artificial Intelligence For It Operations, https: //www.gartner .com / en / information-technology / glossary / aiops-artificial-intelligence-operations).
Monitoring contains the following steps:
* identification of the incident;
* notification of the incident;
* localization;
* decision.
Monitoring can be classified by level into the following types:
* infrastructure (operating system, servers, Kubernetes, DBMS),;
* applied (application logs, traces, application events),;
* business processes (points in transactions, traces of transactions).
Monitoring can be classified according to the principle:
* distributed (traces),;
* synthetic (availability),;
* IAOps (forecasting, anomalies).
Monitoring is divided into two parts according to the degree of analysis: logging systems and incident investigation systems. An example of logging
serves as ELK stack, and incident investigation – Sentry (SaaS). For micro-services, a tracing system is also added.
requests such as Jeger or Zipkin. The logging system simply writes all the logs that are available.
The incident investigation system writes much more information, but writes it only in case of errors in the application, for example,
environment parameters, versions of installed packages, stack trace and so on, which allows you to get maximum information when viewing
by mistake, rather than collecting it piece by piece from the server and the GIT repository. But the set and format of information depends on the environment, therefore
the incident system needs to be integrated with various language platforms, and even better with specific frameworks. So Sentry
poisons environment variables, a piece of code and an indication of where the error occurred, parameters of the program and platform
environments, method calls.
Ecosystem monitoring can be divided into:
* Built into Cloud Cloud: Azure Monitoring, Amazon CloudWatch, Google Cloud Monitoring
* Provided as a service with support for various SaaS integrations: DataDog, NewRelic
* CloudNative: Prometheus
* For dedicated servers OnPremis: Zabbix
Zabbix was developed in 1998 and released to OpenSource under the GPL in 2001. At that time, the traditional interface:
without any design, with a lot of tabs, selectors and the like. Since it was developed for
own needs, it contains specific solutions. He is oriented
monitoring devices and their components such as disks, networks, printers, routers and the like. For
interactions can be used:
Agents – installed on servers, collecting many metrics and poisoning Zabbix server
HTTP – Zabbix makes requests over http, for example printers
SNMP – a network protocol for communicating with network devices
IPMI is a protocol for communicating with server devices such as routers
In 2019, Gratner presented a rating of monitoring systems in its square:
** Dynatrace;
** Cisco (AppDynamics);
** New Relic;
** Broadcom (CA Technologies);
** Riverbed and Microsoft;
** IBM;
** Oracle;
** SolarWinds;
** Micro Focus;
** ManageEngine and Tingyun.
Not included in the square:
** Correlsense;
** Datadog;
** Elastic;
** Honeycomb;
** Instant;
** Jennifer Soft;
** Light Step;
** Nastel Technologies;
** SignalFx;
** Splunk;
** Sysdig.
When we run an application in a Docker container, all the standard output (what is displayed in the console) of the running program (process) is buffered. We can view this buffer with the docker logs name_container program . If we follow the Docker ideology – "one process, one container" – we can view the logs of an individual program. It is convenient to use the less and tail commands to view logs. The first program allows you to conveniently scroll through the logs with the keyboard arrows, search for the desired one based on matches and using a regular expression pattern, like the text editor vi . The second program displays the amount we need
An important criterion for ensuring smooth operation is the control of free space. So, if there is no space left, then the database will not be able to write data, with other components the situation can be more dire than the loss of new data. Docker has limit settings not only for individual containers, at least 10%. During imaging or container startup, an error may be thrown that the specified limits have been exceeded. To change the default settings, you need to tell the Dockerd server the settings, after stopping it with service docker stop (all containers will be stopped) and after resuming it with service docker start (the containers will be resumed). Settings can be set as options / bin / dockerd –storange-opt dm.basesize = 50G –stirange-opt
In Container, we have authorization, control over our containers, with the ability to create them for testing and see graphs on the processor and memory. More will require a monitoring system. There are quite a few monitoring systems, for example, Zabbix, Graphite, Prometheus, Nagios, InfluxData, OkMeter, DataDog, Bosum, Sensu and others, of which Zabbix and Prometheus are the most popular. The first is traditionally used, since it is the leading deployment tool, which admins love for its ease of use (all you need to do is to have SSH access to the server), low-level, which allows you to work not only with servers, but also with other hardware, such as routers. The second is the opposite of the first: it is focused exclusively on collecting metrics and monitoring, focused as a ready-made solution, and not a framework and fell in love with programmers, set it according to the principle, chose metrics and received graphs. The key feature between Zabbix and Prometheus is not in the preferences of some to customize in detail for themselves and others to spend much less time, but in the scope. Zabbix is focused on setting up work with a specific hardware, which can be anything, and often very exotic in a corporate environment, and for this entity, a manual collection of metrics is written, a schedule is manually configured. For a dynamically changing environment of cloud solutions, even if it is just a Docker container, and even more so if it is Kubernetes, in which a huge number of entities are constantly created, and the entities themselves, apart from the general environment, are not of particular interest, it is not suitable for this in Prometheus Service Discovery is built-in and navigation is supported for Kubernetes through the namespace, the balancer (service) and the group of containers (POD), which can be configured in Grafana in the form of tables. In Kubernetes, according to The News Stack 2017, Kubernetes User and Experience is used in 63% of cases, in the rest there are more rare cloud monitoring tools.
Metrics can be system (for example, CRU, RAM, ROM) and application (service and application metrics). System metrics are core metrics that are used by Kubernetes for scaling and the like and non-core metrics that are not used by Kubernetes. Here is an example of bundles for collecting metrics:
* cAdvisor + Heapster + InfluxDB
* cAdvisor + collectd + Heapster
* cAdvisor + Prometheus
* snapd + Heapster
* snapd + SNAP cluster-level agent
* Sysdig
There are many monitoring systems and services on the market. We will consider exactly OpenSource, which can be installed in your cluster. They can be divided according to the model of obtaining metrics: into those who collect logs by polling, and those who expect that metrics will be poisoned in them. The latter are simpler both in structure and in use on a small scale. An example would be InfluxDB, which is a database that you can write to. The downside of this solution is the difficulty of scaling both in terms of support and load. If all services write at the same time, then they can overload the monitoring system, especially since it is difficult to scale, since the endpoint is registered in each service. The first group to practice a pull model of interaction is Prometheus. It is also a database with a daemon that polls services based on their registrations in the configuration file and pulls labels in a specific format, for example:
cpu_usage: 2
cpu_usage {app: myapp}: 2
Prometheus is a mature product, it was developed in 2012, and in 2016 it was included in the CNCF (Cloud Native Computing Foundation) consortium. Prometheus consists of:
* TSDB (Time Series Satabase) database, which looks more like a storage queue for metrics, with a specified accumulation period, for example, a week, allowing hundreds of thousands of metrics to be processed per second. This base is local to Prometheus, does not support horizontal scaling, in the case of Prometheus it is achieved by raising several of its instances and sharding them. Prometheus supports data aggregation, which is useful for reducing the amount of accumulated data, as well as archiving the database from memory to disk.
* Service Discovery support Kubernetes in a box through a public API through polling PODs filtered according to the config on port 9121 of the TPC.
* Grafana (a separate product, added by default) – a universal UI with dashboards and charts that supports Prometheus via PromQL.
To return metrics, you can use ready-made solutions or develop your own. For the vast majority of system metrics there is an exporter, and for applied metrics, you often have to give your own metrics. Exporters are general and specialized. For example, NodeExporter provides most of the metrics, including those for processes, but there are two of them, and there are more specialized metrics. If you run Prometheus without exporters, then it will give out almost a thousand metrics, but these are the metrics of Prometheus itself, and there will be no node_ * prefixes in them. For these metrics to appear, you need to enable NodeExporter and write a URL to it in the Prometheus configuration to collect the metrics it provides. For NodeExporter, this can be localhost or the node address and port 9256. Usually, exporters specialize in product-specific metrics, for example:
** node_exporter – node metrics (CRU, Memory, Network);
** snmp_exporter – SNMP protocol metrics;
** mysqld_exporter – MySQL database metrics;
** consul_exporter – Consul database metrics;
** graphite_exporter – Graphite database metrics;
** memcached_exporter – Memcached database metrics;
** haproxy_exporter – HAProxy balancer metrics;
** CAdvisor – container metrics;
** process-exporter – detailed process metrics;
** metrics-server – CRU, Memory, File-descriptors, Disks;
** cAdvisor – a Docker daemon metrics – containers monitoring;
** kube-state-metrics – deployments, PODs, nodes.
Prometheus supports remote data writing (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write), for example, to TSDB distributed storage for Prometheus – Weave Works Cortex, using a setting in the configuration, which allows data analysis from multiple Prometheus:
remote_write:
– url: "http: // localhost: 9000 / receive"
Let's consider his work on a ready-made instance. I'll take www.katacoda.com/courses/istio/deploy-istio-on-kubernetes for this and go through it. Our Prometheus is located on its standard port 9090:
controlplane $ kubectl -n istio-system get svc prometheus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE
prometheus ClusterIP 10.99.70.170
To open its UI, I'll go to the WEB tab and change the address 80 to 9090: https://2886795314-9090-ollie08.environments.katacoda.com/graph. In the input line, you need to enter the desired metric in the PromQL (Prometheus query language) language, as well as InfluxQL for InfluxDB and SQL for TimescaleDB. For example, I will enter "CRU", and it will display me a list containing it. There are two tabs under the line: a tab with a graph and a tab for displaying in a tabular form. I will be looking at a tabular view. I selected machine_cru_cores and clicked Execute. Common metrics usually have similar names, for example machine_cru_cores and node_cru_cores. The metrics themselves consist of the name, tags in brackets and the value of the metric, in the same form they need to be requested, in the same form they are displayed in the table.
machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "=" controlplane ", kubernetes_io_hostname" = "controlplane"
machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01"
If the network is MEMORY, then you can select machine_memory_bytes – the size of the RAM on the machine (server or virtual):
machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "}
machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node901", kubernetes_io_hostname = "node901"
But in bytes it is not clear, so we will use PromQL to translate to Gb: machine_memory_bytes / 1000/1000/1000
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "controlplane", kubernetes_io25}
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io48}
Let's enter for memory_bytes to search for container_memory_usage_bytes – used memory. The list contains all containers and their current memory consumption, I will give only three:
container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepod-pods-beseff633. b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes-cadvisorname," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor " , name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name = "etcd-controlplane"} 45056
container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepods-pods-besteff2. 76711789af076c8f2331d8212dad4c044d263c5cc3fa333347921bd6de7950a4.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-caduvisor ", kubernetes_dio_host , name = "k8s_POD_kube-proxy-nhzhn_kube-system_5a815a40-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-nhzhn", pod_name = "kube-proxy-450 nhz
container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepa-poda-besteffort. 24ef0e898e1bb7dec9854b67291171aa9c5715d7683f53bdfc2cef49a19744fe.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" node01 ", job =" kubernetes-caduvisor amuber, kubernetes_dio_arch ", kubernetes_dio_arch , name = "k8s_POD_kube-proxy-6v49x_kube-system_6473aeea-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-6v49x", pod_name = "kube-proxy-835549x
Let's set the label that is contained in the metrics to filter out one: container_memory_usage_bytes {container_name = "prometheus"}
container_MEMORY_usage_bytes {beta_Kubernetes_io_arch = "amd64", beta_Kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubePODs.slice / kubePODs-burstableODslice-burdeaf2.slice. b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6.scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", JOB =" Kubernetes-cadvisor ", Kubernetes_io_arch =" amd64 ", Kubernetes_io_hostname =" node01 ", Kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus- 5b77b7d695-knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", POD =" prometheus-5b77b7d695-knf44 ", POD_name =" prometheus-5b77b7d44
283443200
Let's bring in Mb: container_memory_usage_bytes {container_name = "prometheus"} / 1000/1000
{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6 .scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695 -knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}
286.18752
Filter by container_memory_usage_bytes {container_name = "prometheus", instance = "node01"}
beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6. scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695- knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}
289.890304
And on the second one it is not: container_memory_usage_bytes {container_name = "prometheus", instance = "node02"}
no data
There are also aggregate functions sum (container_memory_usage_bytes) / 1000/1000/1000
{} 22.812798976
max (container_memory_usage_bytes) / 1000/1000/1000
{} 3.6422983679999996
min (container_memory_usage_bytes) / 1000/1000/1000
{} 0
You can also group by labels instance: max (container_memory_usage_bytes) by (instance) / 1000/1000/1000
{instance = "controlplane"} 1.641836544
{instance = "node01"} 3.6622745599999997
You can perform actions with the same type of labels and filter out: container_memory_mapped_file / container_memory_usage_bytes * 100> 80
{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-pode45f10af1ae684722cbd74cb11807900.slice / docker-5cb2f2083fbc467b8b394b27b69686d309f951450bcb910d509572aea9922806 .scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" controlplane ", kubernetes_io_os =" linux ", name = "k8s_POD_kube-controller-manager-controlplane_kube-system_e45f10af1ae684722cbd74cb11807900_0", namespace = "kube-system", pod = "kube-controller-manager-controlplane", pod_name = "kube-controller-manager-controlplane"}
80.52631578947368
You can look at the file system metrics using container_fs_limit_bytes, which produces a large list – I will give a few of it:
container_fs_limit_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", device = "/ dev / vda1", id = "/ kubepods.slice / kubepods-besteffort.subods / kubepods-besteffort.slice -besteffort-pod0e619e5dc53ed9efcef63f5fe1d7ee71.slice / docker-b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname = "controlplane", kubernetes_io_os = "linux", name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name "} etcd-controlplane =" etc
253741748224
container_fs_limit_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", device = "/ dev / vda1", id = "/ kubepods.slice / kubepods-besteffort.subods / kubepods-besteffort.slice -besteffort-pod5a815a40_f2de_11ea_88d2_0242ac110032.slice / docker-76711789af076c8f2331d8212dad4c044d263c5cc3fa333347921bd6de7950a4.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname = "controlplane", kubernetes_io_os = "linux", name = "k8s_POD_kube-proxy-nhzhn_kube-system_5a815a40-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube_name =", podhn "kube-proxy-nhzhn"}
253741748224
It contains the metrics of RAM through its device: "container_fs_limit_bytes {device =" tmpfs "} / 1000/1000/1000"
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", device = "tmpfs", id = "/", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes control_ioplane_host , kubernetes_io_os = "linux"} 0.209702912
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", device = "tmpfs", id = "/", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_host , kubernetes_io_os = "linux"} 0.409296896
If we want to get the minimum disk, then we need to remove the RAM device from the list: "min (container_fs_limit_bytes {device! =" Tmpfs "} / 1000/1000/1000)"
{} 253.74174822400002
In addition to metrics that indicate the value of the metric itself, there are metrics and counters. Their names usually end in "_total". If we look at them, we will see an ascending line. To get the value, we need to get the difference (using the rate function) over a period of time (indicated in square brackets), something like rate (name_metric_total) [time]. Time is usually kept in seconds or minutes. The prefix "s" is used to represent seconds, for example 40s, 60s. For minutes – "m", for example, 2m, 5m. It is important to note that you cannot set a time shorter than the exporter polling time, otherwise the metric will not be displayed.
And you can see the names of the metrics that you could record along the path / metrics:
controlplane $ curl https://2886795314-9090-ollie08.environments.katacoda.com/metrics 2> / dev / null | head
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds {quantile = "0"} 3.536e-05
go_gc_duration_seconds {quantile = "0.25"} 7.5348e-05
go_gc_duration_seconds {quantile = "0.5"} 0.000163193
go_gc_duration_seconds {quantile = "0.75"} 0.001391603
go_gc_duration_seconds {quantile = "1"} 0.246707852
go_gc_duration_seconds_sum 0.388611299
go_gc_duration_seconds_count 74
# HELP go_goroutines Number of goroutines that currently exist.
Raising the Prometheus and Graphana ligament
We examined the metrics in the already configured Prometheus, now we will raise Prometheus and configure it ourselves:
essh @ kubernetes-master: ~ $ docker run -d –net = host –name prometheus prom / prometheus
09416fc74bf8b54a35609a1954236e686f8f6dfc598f7e05fa12234f287070ab
essh @ kubernetes-master: ~ $ docker ps -f name = prometheus
CONTAINER ID IMAGE NAMES
09416fc74bf8 prom / prometheus prometheus
UI with graphs for displaying metrics:
essh @ kubernetes-master: ~ $ firefox localhost: 9090
Add the go_gc_duration_seconds {quantile = "0"} metric from the list:
essh @ kubernetes-master: ~ $ curl localhost: 9090 / metrics 2> / dev / null | head -n 4
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds {quantile = "0"} 1.0097e-05
go_gc_duration_seconds {quantile = "0.25"} 1.7841e-05
Going to the UI at localhost: 9090 in the menu, select Graph. Let's add to the dashboard with the chart: select the metric using the list – insert metrics at cursor . Here we see the same metrics as in the localhost: 9090 / metrics list, but aggregated by parameters, for example, just go_gc_duration_seconds. We select the go_gc_duration_seconds metric and show it on the Execute button . In the console tab of the dashboard, we see the metrics:
go_gc_duration_seconds {instance = "localhost: 9090", JOB = "prometheus", quantile = "0"} 0.000009186 go_gc_duration_seconds {instance = "localhost: 9090", JOB = "prometheus", quantile = "0.25"} 0.000012056 = go_congc_ instance "localhost: 9090", JOB = "prometheus", quantile = "0.5"} 0.000023256 go_gc_duration_seconds {instance = "localhost: 9090", JOB = "prometheus", quantile = "0.75"} 0.000068848 go_gc_duration_seconds {instance = "localhost: 9090 ", JOB =" prometheus ", quantile =" 1 "} 0.00021869
by going to the Graph tab – their graphical representation.
Now Prometheus collects metrics from the current node: go_ *, net_ *, process_ *, prometheus_ *, promhttp_ *, scrape_ * and up. To collect metrics from Docker, we tell him to write his metrics in Prometheus on port 9323:
eSSH @ Kubernetes-master: ~ $ curl http: // localhost: 9323 / metrics 2> / dev / null | head -n 20
# HELP builder_builds_failed_total Number of failed image builds
# TYPE builder_builds_failed_total counter
builder_builds_failed_total {reason = "build_canceled"} 0
builder_builds_failed_total {reason = "build_target_not_reachable_error"} 0
builder_builds_failed_total {reason = "command_not_supported_error"} 0
builder_builds_failed_total {reason = "Dockerfile_empty_error"} 0
builder_builds_failed_total {reason = "Dockerfile_syntax_error"} 0
builder_builds_failed_total {reason = "error_processing_commands_error"} 0
builder_builds_failed_total {reason = "missing_onbuild_arguments_error"} 0
builder_builds_failed_total {reason = "unknown_instruction_error"} 0
# HELP builder_builds_triggered_total Number of triggered image builds
# TYPE builder_builds_triggered_total counter
builder_builds_triggered_total 0
# HELP engine_daemon_container_actions_seconds The number of seconds it takes to process each container action
# TYPE engine_daemon_container_actions_seconds histogram
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.005"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.01"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.025"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.05"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.1"} 1
In order for the docker daemon to apply the parameters, it must be restarted, which will lead to the fall of all containers, and when the daemon starts, the containers will be raised in accordance with their policy:
essh @ kubernetes-master: ~ $ sudo chmod a + w /etc/docker/daemon.json
essh @ kubernetes-master: ~ $ echo '{"metrics-addr": "127.0.0.1:9323", "experimental": true}' | jq -M -f / dev / null> /etc/docker/daemon.json
essh @ kubernetes-master: ~ $ cat /etc/docker/daemon.json
{
"metrics-addr": "127.0.0.1:9323",
"experimental": true
}
essh @ kubernetes-master: ~ $ systemctl restart docker
Prometheus will only respond to metrics on the same server from different sources. In order for us to collect metrics from different nodes and see the aggregated result, we need to put an agent collecting metrics on each node:
essh @ kubernetes-master: ~ $ docker run -d \
–v "/ proc: / host / proc" \
–v "/ sys: / host / sys" \
–v "/: / rootfs" \
–-net = "host" \
–-name = explorer \
quay.io/prometheus/node-exporter:v0.13.0 \
–collector.procfs / host / proc \
–collector.sysfs / host / sys \
–collector.filesystem.ignored-mount-points "^ / (sys | proc | dev | host | etc) ($ | /)"
1faf800c878447e6110f26aa3c61718f5e7276f93023ab4ed5bc1e782bf39d56
and register to listen to the address of the node, but for now everything is local, localhost: 9100. Now let's tell Prometheus to listen to agent and docker:
essh @ kubernetes-master: ~ $ mkdir prometheus && cd $ _
essh @ kubernetes-master: ~ / prometheus $ cat << EOF> ./prometheus.yml
global:
scrape_interval: 1s
evaluation_interval: 1s
scrape_configs:
– job_name: 'prometheus'
static_configs:
– targets: ['127.0.0.1:9090', '127.0.0.1:9100', '127.0.0.1:9323']
labels:
group: 'prometheus'
EOF
essh @ kubernetes-master: ~ / prometheus $ docker rm -f prometheus
prometheus
essh @ kubernetes-master: ~ / prometheus $ docker run \
–d \
–-net = host \
–-restart always \
–-name prometheus \
–v $ (pwd) /prometheus.yml:/etc/prometheus/prometheus.yml
prom / prometheus
7dd991397d43597ded6be388f73583386dab3d527f5278b7e16403e7ea633eef
essh @ kubernetes-master: ~ / prometheus $ docker ps \
–f name = prometheus
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7dd991397d43 prom / prometheus "/ bin / prometheus –c…" 53 seconds ago Up 53 seconds prometheus
1702 host metrics are now available:
essh @ kubernetes-master: ~ / prometheus $ curl http: // localhost: 9100 / metrics | grep -v '#' | wc -l
1702
out of all the variety, it is difficult to find the ones you need for everyday tasks, for example, the amount of memory used by node_memory_Active. There are metrics aggregators for this:
http: // localhost: 9090 / consoles / node.html
http: // localhost: 9090 / consoles / node-cpu.html
But it's better to use Grafana. Let's install it too, you can see an example:
essh @ kubernetes-master: ~ / prometheus $ docker run \
–d \
–-name = grafana \
–-net = host
grafana / grafana
Unable to find image 'grafana / grafana: latest' locally
latest: Pulling from grafana / grafana
9d48c3bd43c5: Already exists
df58635243b1: Pull complete
09b2e1de003c: Pull complete
f21b6d64aaf0: Pull complete
719d3f6b4656: Pull complete
d18fca935678: Pull complete
7c7f1ccbce63: Pull complete
Digest: sha256: a10521576058f40427306fcb5be48138c77ea7c55ede24327381211e653f478a
Status: Downloaded newer image for grafana / grafana: latest
6f9ca05c7efb2f5cd8437ddcb4c708515707dbed12eaa417c2dca111d7cb17dc
essh @ kubernetes-master: ~ / prometheus $ firefox localhost: 3000
We will enter the login admin and the password admin, after which we will be prompted to change the password. Next, you need to perform the subsequent configuration.
In Grafana, the initial login is admin and this password. First, we are prompted to select a source – select Prometheus, enter localhost: 9090, select the connection not as to the server, but as to the browser (that is, over the network) and select that we have basic authentication – that's all – click Save and Test and Prometheus is connected.
It is clear that it is not worth giving out a password and login from admin rights to everyone. To do this, you will need to create users or integrate them with an external user database such as Microsoft Active Directory.
I will select in the Dashboard tab and activate all three reconfigured dashboards. From the New Dashboard list in the top menu, select the Prometheus 2.0 Stats dashboard. But, there is no data:
I click on the "+" menu item and select "Dashboard", it is proposed to create a dashboard. A dashboard can contain several widgets, for example, charts that can be positioned and customized, so click on the add chart button and select its type. On the graph itself, we select edit by choosing a size, click edit, and the most important thing here is the choice of the displayed metric. Choosing Prometheus
Complete assembly available:
essh @ kubernetes-master: ~ / prometheus $ wget \
https://raw.githubusercontent.com/grafana/grafana/master/devenv/docker/ha_test/docker-compose.yaml
–-2019-10-30 07: 29: 52– https://raw.githubusercontent.com/grafana/grafana/master/devenv/docker/ha_test/docker-compose.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com) … 151.101.112.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com) | 151.101.112.133 |: 443 … connected.
HTTP request sent, awaiting response … 200 OK
Length: 2996 (2.9K) [text / plain]
Saving to: 'docker-compose.yaml'
docker-compose.yaml 100% [=========>] 2.93K –.– KB / s in 0s
2019-10-30 07:29:52 (23.4 MB / s) – 'docker-compose.yaml' saved [2996/2996]
Obtaining application metrics
Up to this point, we have looked at the case where Prometheus polled the standard metric accumulator, getting the standard metrics. Now let's try to create an application and submit our metrics. First, let's take a NodeJS server and write an application for it. To do this, let's create a NodeJS project:
vagrant @ ubuntu: ~ $ mkdir nodejs && cd $ _
vagrant @ ubuntu: ~ / nodejs $ npm init
This utility will walk you through creating a package.json file.
It only covers the most common items, and tries to guess sensible defaults.
See `npm help json` for definitive documentation on these fields
and exactly what they do.
Use `npm install
save it as a dependency in the package.json file.
name: (nodejs)
version: (1.0.0)
description:
entry point: (index.js)
test command:
git repository:
keywords:
author: ESSch
license: (ISC)
About to write to /home/vagrant/nodejs/package.json:
{
"name": "nodejs",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \" Error: no test specified \ "&& exit 1"
},
"author": "ESSch",
"license": "ISC"
}
Is this ok? (yes) yes
First, let's create a WEB server. I'll use the library to create it:
vagrant @ ubuntu: ~ / nodejs $ npm install Express –save
npm WARN deprecated Express@3.0.1: Package unsupported. Please use the express package (all lowercase) instead.
nodejs@1.0.0 / home / vagrant / nodejs
└── Express@3.0.1
npm WARN nodejs@1.0.0 No description
npm WARN nodejs@1.0.0 No repository field.
vagrant @ ubuntu: ~ / nodejs $ cat << EOF> index.js
const express = require ('express');
const app = express ();
app.get ('/ healt', function (req, res) {
res.send ({status: "Healt"});
});
app.listen (9999, () => {
console.log ({status: "start"});
});
EOF
vagrant @ ubuntu: ~ / nodejs $ node index.js &
[1] 18963
vagrant @ ubuntu: ~ / nodejs $ {status: 'start'}
vagrant @ ubuntu: ~ / nodejs $ curl localhost: 9999 / healt
{"status": "Healt"}
Our server is ready to work with Prometheus. We need to configure Prometheus for it.
The Prometheus scaling problem arises when the data does not fit on one server, more precisely, when one server does not have time to record data and when the processing of data by one server does not suit the performance. Thanos solves this problem by not requiring federation setup, by providing the user with an interface and API that it broadcasts to Prometheus instances. A web interface similar to Prometheus is available to the user. He himself interacts with agents that are installed on instances as a side-car, as Istio does. He and the agents are available as containers and as a Helm chart. For example, an agent can be brought up as a container configured on Prometheus, and Prometheus is configured with a config followed by a reboot.
docker run –rm quay.io/thanos/thanos:v0.7.0 –help
docker run -d –net = host –rm \
–v $ (pwd) /prometheus0_eu1.yml:/etc/prometheus/prometheus.yml \
–-name prometheus-0-sidecar-eu1 \
–u root \
quay.io/thanos/thanos:v0.7.0 \
sidecar \
–-http-address 0.0.0.0:19090 \
–-grpc-address 0.0.0.0:19190 \
–-reloader.config-file /etc/prometheus/prometheus.yml \
–-prometheus.url http://127.0.0.1:9090
Notifications are an important part of monitoring. Notifications consist of firing triggers and a provider. A trigger is written in PromQL, as a rule, with a condition in Prometheus. When a trigger is triggered (metric condition), Prometheus signals the provider to send a notification. The standard provider is Alertmanager and is capable of sending messages to various receivers such as email and Slack.
For example, the metric "up", which takes the values 0 or 1, can be used to poison a message if the server is off for more than 1 minute. For this, a rule is written:
groups:
– name: example
rules:
– alert: Instance Down
expr: up == 0
for: 1m
When the metric is equal to 0 for more than 1 minute, then this trigger is triggered and Prometheus sends a request to the Alertmanager. Alertmanager specifies what to do with this event. We can prescribe that when the InstanceDown event is received, we need to send a message to the mail. To do this, configure Alertmanager to do this:
global:
smtp_smarthost: 'localhost: 25'
smtp_from: 'youraddress@example.org'
route:
receiver: example-email
receivers:
– name: example-email
email_configs:
– to: 'youraddress@example.org'
Alertmanager itself will use the installed protocol on this computer. In order for it to be able to do this, it must be installed. Take Simple Mail Transfer Protocol (SMTP), for example. To test it, let's install a console mail server in parallel with the Alert Manager – sendmail.
Fast and clear analysis of system logs
OpenSource full-text search engine Lucene is used for quick search in logs. On its basis, two low-level products were built: Sold and Elasticsearch, which are quite similar in capabilities, but differ in usability and license. Many popular assemblies are built on them, for example, just a delivery set with ElasticSearch: ELK (Elasticsearch (Apache Lucene), Logstash, Kibana), EFK (Elasticsearch, Fluentd, Kibana), and products, for example, GrayLog2. Both GrayLog2 and assemblies (ELK / EFK) are actively used due to the lesser need to configure non-test benches, for example, you can put EFK in a Kubernetes cluster with almost one command
helm install efk-stack stable / elastic-stack –set logstash.enabled = false –set fluentd.enabled = true –set fluentd-elastics
An alternative that has not yet received much consideration are systems built on the previously considered Prometheus, for example, PLG (Promtail (agent) – Loki (Prometheus) – Grafana).
Comparison of ElasticSearch and Sold (systems are comparable):
Elastic:
** Commercial with open source and the ability to commit (via approval);
** Supports more complex queries, more analytics, out of the box support for distributed queries, more complete REST-full JSON-BASH, chaining, machine learning, SQL (paid);
*** Full-text search;
*** Real-time index;
*** Monitoring (paid);
*** Monitoring via Elastic FQ;
*** Machine learning (paid);
*** Simple indexing;
*** More data types and structures;
** Lucene engine;
** Parent-child (JOIN);
** Scalable native;
** Documentation from 2010;
Solr:
** OpenSource;
** High speed with JOIN;
*** Full-text search;
*** Real-time index;
*** Monitoring in the admin panel;
*** Machine learning through modules;
*** Input data: Work, PDF and others;
*** Requires a schema for indexing;
*** Data: nested objects;
** Lucene engine;
** JSON join;
** Scalable: Solar Cloud (setting) && ZooKeeper (setting);
** Documentation since 2004.
At the present time, micro-service architecture is increasingly used, which allows due to weak
the connectivity between their components and their simplicity to simplify their development, testing, and debugging.
But in general, the system becomes more difficult to analyze due to its distribution. To analyze the condition
in general, logs are used, collected in a centralized place and converted into an understandable form. Also arises
the need to analyze other data, for example, access_log NGINX, to collect metrics about attendance, mail log,
mail server to detect attempts to guess a password, etc. Take ELK as an example of such a solution. ELK means
a bunch of three products: Logstash, Elasticsearch and Kubana, the first and last of which are heavily focused on the central and
provide ease of use. More generally ELK is called Elastic Stack, since the tool for preparing logs Logstash
can be replaced by analogs such as Fluentd or Rsyslog, and the Kibana renderer can be replaced by Grafana. For example, although
Kibana provides great analysis capabilities, Grafana provides notifications when events occur, and
can be used in conjunction with other products, for example, CAdVisor – analysis of the state of the system and individual containers.
EKL products can be self-installed, downloaded as self-contained containers for which you need to configure
communication or as a single container.
For Elasticsearch to work properly, you need the data to come in JSON format. If the data is submitted to
text format (the log is written in one line, separated from the previous one by a line break), then it can
provide only full-text searches as they will be interpreted as one line. For transmission
logs in JSON format, there are two options: either configure the product under investigation to be output in this format,
for example, for NGINX there is such a possibility. But, often this is impossible, since there is already
the accumulated database of logs, and traditionally they are written in text format. For such cases, it is necessary
post processing of logs from text format to JSON, which is handled by Logstash. It is important to note that if
it is possible to immediately transfer data in a structured form (JSON, XML and others), then this follows
do, because if you do detailed parsing, then any deviation is a one-sided deviation from the format
will lead to inoperability, and if superficial – we lose valuable information. Anyway, parsing in
this system is a bottleneck, although it can be scaled to a limited extent to a service or log
file. Fortunately, more and more products are starting to support structured logging, such as
the latest versions of NGINX support logs in JSON format.
For systems that do not support this format, you can use the conversion to it using such
programs like Logstash, File bear and Fluentd. The first one is included in the standard Elastic Stack delivery from the vendor
and can be installed in one way ELK in Docker – container. It supports fetching data from files, network and
standard stream both at the input and at the output, and most importantly, the native Elastic Search protocol.
Logstash monitors log files based on modification date or receives over the network telnet data from a distributed
systems, for example, containers and, after transformation, it is sent to the output, usually in Elastic Search. It is simple and
comes standard with the Elastic Stack, making it easy and hassle-free to configure. But thanks to
Java machine inside is heavy and not very functional, although it supports plugins, for example, synchronization with MySQL
to send new data. Filebeat provides slightly more options. An enterprise tool for everything
cases of life can serve Fluentd due to its high functionality (reading logs, system logs, etc.),
scalability and the ability to roll out across Kubernetes clusters using the Helm chart, and monitor everything
data center in the standard package, but about this relevant section.
To manage logs, you can use Curator, which can archive old ones from ElasticSearch
logs or delete them, increasing the efficiency of its work.
The process of obtaining logs is logical carried out by special collectors: logstash, fluentd, filebeat or
others.
fluentd is the least demanding and simpler analogue of Logstash. Customization
produced in /etc/td-agent/td-agent.conf, which contains four blocks:
** match – contains settings for transferring received data;
** include – contains information about file types;
** system – contains system settings.
Logstash provides a much more functional configuration language. Logstash agent daemon – logstash monitors
changes in files. If the logs are not located locally, but on a distributed system, then logstash is installed on each server and
runs in agent mode bin / logstash agent -f /env/conf/my.conf . Since run
logstash only as an agent for sending logs is wasteful, then you can use a product from those
the same developers Logstash Forwarder (formerly Lumberjack) forwards logs via the lumberjack protocol to
logstash to the server. You can use the Packetbeat agent to track and retrieve data from MySQL
(https://www.8host.com/blog/sbor-metrik-infrastruktury-s-pomoshhyu-packetbeat-i-elk-v-ubuntu-14-04/).
Also logstash allows you to convert data of different types:
** grok – set regular expressions to rip fields from a string, often for logs from text format to JSON;
** date – in case of archived logs, set the date when the log was created not as the current date, but take it from the log itself
** kv – for logs like key = value;
** mutate – select only the required fields and change the data in the fields, for example, replace the "/" character with "_";
** multiline – for multi-line logs with delimiters.
For example, you can decompose a log in the format "date type number" into components, for example "01.01.2021 INFO 1" decompose into a hash "message":
filter {
grok {
type => "my_log"
match => ["message", "% {MYDATE: date}% {WORD: loglevel} $ {ID.id.int}"]
}
}
The $ {ID.id.int} template takes the class – the ID template, the resulting value will be substituted into the id field and the string value will be converted to the int type.
In the "Output" block, we can specify: output data to the console using the "Stdout" block, to a file – "File", transfer via http via JSON REST API – "Elasticsearch" or send by mail – "Email". You can also order conditions for the fields obtained in the filter block. For instance,:
output {
if [type] == "Info" {
elasticsearch {
host => localhost
index => "log -% {+ YYYY.MM.dd}"
}
}
}
Here the Elasticsearch index (a database, if we can analogy with SQL) changes every day. To create a new index, you do not need to create it specially – this is how NoSQL databases do it, since there is no strict requirement to describe the structure – property and type. But it is still recommended to describe it, otherwise all fields will be with string values, if a number is not specified. To display Elasticsearch data, a plugin of the WEB-ui interface in AngularJS – Kibana is used. To display a timeline in its charts, you need to describe at least one field with the date type, and for aggregate functions – a numeric one, be it an integer or floating point. Also, if new fields are added, indexing and displaying them requires re-indexing the entire index, so the most complete description of the structure will help to avoid the very time-consuming operation of reindexing.
The division of the index by days is done to speed up the work of Elasticsearch, and in Kibana you can select several by pattern, here log- * , the limitation of one million documents per index is also removed.
Consider a more detailed Logstash output plugin:
output {
if [type] == "Info" {
elasticsearch {
claster => elasticsearch
action => "create"
hosts => ["localhost: 9200"]
index => "log -% {+ YYYY.MM.dd}"
document_type => ....
document_id => "% {id}"
}
}
}
Interaction with ElasticSearch is carried out through the JSON REST API, for which there are drivers for most modern languages. But in order not to write code, we will use the Logstash utility, which also knows how to convert text data to JSON based on regular expressions. There are also predefined templates, like classes in regular expressions, such as % {IP: client} and others, which can be viewed at https://github.com/elastic/logstash/tree/v1.1.9/patterns. For standard services with standard settings on the Internet there are many ready-made configs, for example, for NGINX – https://github.com/zooniverse/static/blob/master/logstash- Nginx.conf. More similarly, it is described in the article https://habr.com/post/165059/.
ElasticSearch is a NoSQL database, so you don't need to specify a format (set of fields and its types). For searching, he still needs it, so he defines it himself, and with each format change, re-indexing occurs, in which work is impossible. To maintain a unified structure in the Serilog logger (DOT Net) there is an EventType field in which you can encrypt a set of fields and their types, for the rest you will have to implement them separately. To analyze the logs from a microservice architecture application, it is important to set the ID while it is being executed, that is, the request ID, which will be unchanged and transmitted from the microservice to the microservice, so that you can trace the entire path of the request.
Install ElasticSearch (https://habr.com/post/280488/) and check that curl -X GET localhost: 9200 works
sudo sysctl -w vm.max_map_count = 262144
$ curl 'localhost: 9200 / _cat / indices? v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open graylog_0 h2NICPMTQlqQRZhfkvsXRw 4 0 0 0 1kb 1kb
green open .kibana_1 iMJl7vyOTuu1eG8DlWl1OQ 1 0 3 0 11.9kb 11.9kb
yellow open indexname le87KQZwT22lFll8LSRdjw 5 1 1 0 4.5kb 4.5kb
yellow open db i6I2DmplQ7O40AUzyA-a6A 5 1 0 0 1.2kb 1.2kb
Create an entry in the blog database and post table curl -X PUT "$ ES_URL / blog / post / 1? Pretty" -d '
ElasticSearch search engine
In the previous section, we looked at the ELK stack that ElasticSearch, Logstash, and Kibana make up. In the full set, and often it is still extended by Filebeat – more tailored to work with the Logstash extension, for working with text logs. Despite the fact that Logstash quickly performs its task unnecessarily, they do not use it, and logs in JSON format are sent via the dump upload API directly to Logstash.
If we have an application, then pure ElasticSearch is used, which is used as a search engine, and Kibana is used as a tool for writing and debugging queries – the Dev Tools block. Although relational databases have a long history of development, the principle remains that the more demoralized the data, the slower it becomes, because it has to be merged with every request. This problem is solved by creating a View, which stores the resulting selection. But although modern databases have acquired impressive functionality, up to full-text search, they still cannot be compared in the efficiency and functionality of search with search engines. I will give an example from work: several tables with metrics, which are combined in a query into one, and a search is performed by the selected parameters in the admin panel, such as a date range, a page in pagination and content in a chat column term. This is not a lot, at the output we get a table of half a million rows, and the search by date and part of the row fits in milliseconds. But pagination slows down, in the initial pages its request takes about two minutes, in the final pages – more than four. At the same time, it will not work to combine a request for logical data and receive pagination in the forehead. And the same overgrowth, while it is not optimized, is executed in ElasticSearch in 22 milliseconds and contains both the data and the number of all data for pagination.
It is worth warning the reader against abandoning a rash relational database, although ElasticSearch contains a NoSQL database, but it is intended solely for search and does not contain full-fledged tools for normalization and recovery.
ElasticSearch does not have a console client in the standard delivery – all interaction is carried out via http calls GET, PUT and DELETE. Here is an example of using the Curl program (command) from the linux OS BASH shell:
# Create records (table and database are created automatically)
curl -XPUT mydb / mytable / 1 -d '{
....
} '
# Received values by id
curl -XGET mydb / mytable / 1
curl -XGET mydb / mytable / 1
# Simple search
curl -XGET mydb -d '{
"search": {
"match": {
"name": "my"
}
}
} '
# Removing base
curl -XDELETE mydb
Cloud systems as a source of continuous scaling: Google Cloud and Amazon AWS
In addition to hosting and renting a server, in particular a virtual VPS, you can use cloud solutions (SAS, Service As Software) solutions, that is, to carry out the work of our WEB application (s) only through the control panel using a ready-made infrastructure. This approach has both pros and cons, which depend on the customer's business. If from the technical side the server itself is remote, but we can connect to it, and as a bonus we get the administration panel, then for the developer the differences are more significant. We will divide projects into three groups according to the place of deployment: on hosting, in your data center, or using VPS and in the cloud. Companies using hosting due to significant restrictions imposed on development – the inability to install their software and the instability and size of the provided capacity – mainly specialize in custom (streaming) development of sites and stores, which, due to small requirements for the qualifications of developers and undemanding knowledge of the infrastructure the market is ready to pay for their labor at a minimum. The second group includes companies that implement completed projects, but developers are excluded from working with the infrastructure by the presence of system administrators, build engineers, DevOps and other infrastructure specialists. Companies choosing cloud solutions generally justify overpaying for ready-made infrastructure and capacities by their extensibility (relevant for startups when the load growth is not predictable). For the implementation of such projects, they generally hire highly qualified specialists of a wide range to implement non-standard solutions, where the infrastructure is already just a tool, and there are simply no specialists in it. The developers are entrusted with the functions of designing the project as a whole, as a whole, and not a program in isolation from the infrastructure. These are mainly foreign companies that are ready to pay well for the labor of valuable employees.
For deployment, we will use Kubernetes to counter the vender lock, when the project infrastructure is tied to the API of a specific cloud provider and will not allow moving to other or our own clouds without significant changes in the application itself. Kubernetes is supported by Amazon AWS, Google Cloud, Microsoft Azure, on-premises installation of one instance using MiniKube.
We will use Google Cloud, for the current 2018 it provides free use for one year of limited resources ($ 300), while there are limits that can be viewed in the IAM and Administration -> Quotas menu . It is important to note that cloud providers do not provide tariffs in the modern range, but provide tariffs for the use of certain capacities, that is, the site is visited little – we pay little, it is difficult to process a lot of data – we pay a lot. For this reason, when the company's computing power needs are predictable (not a startup), it may be advisable to use its own capabilities for a constant load, which can be economically feasible, without risking limited computing power.
And so we go to cloud.google.com, register, bind a debit card with a minimum balance and go to the console.cloud.google.com console, where you can take a tutorial on the interface for general familiarization. In the menu, click the Payment item: I have $ 300 untouched demo money and 356 days left (funds are not debited in real time).
If you look at it as a basis for Back-End for mobile development (MBasS, Mobile backend as a service), then it is provided by different providers: Google Firebase, AWS Mobile, Azure Mobile
Google App Engine
Cluster creation via WEB interface
Let's first check the restrictions (quotas) Menu -> Products -> IAM and administration -> Quotas, and if you are on a test account, then Static IP addresses will be equal to 1, then the balancer will not be able to create and you will have to delete the cluster. Let's create a cluster in Menu – Resources – Kubernetes Engine in three replicas of the micromachine and the latest version of Kubernetes. In the lower left corner in the Marketplace item, create 2 NGINX instances. After creating the cluster, click on the Services tab and go to the IP address.
Marketplace: Networking, Free, Kubernetes Applications: NGINX Let's create a custom standard-cluster- NGINX cluster, choosing a minimum of CPU and RAM, 2 nodes instead of 3 and the latest version of Kubernetes (I chose 1.11.3, and my code will be compatible with – at least 1.10). In the Menu – Resources – Kubernetes Engine in the Cluster tab, click the Connect button. Cluster management on the command line is carried out using the cubectl command, you can read about it in the documentation: https://kubernetes.io/docs/reference/kubectl/overview/ and the list at https://gist.github.com/ipedrazas/95391ffd88190bea94ca188d3d2c1cbe …
Creating a virtual machine:
You can create a software project, but you can only use it on a paid account:
NAME_PROJECT = bitrix_12345;
NAME_CLUSTER = bitrix;
gcloud projects create $ NAME_CLUSTER –name $ NAME_CLUSTER;
gcloud config set project $ NAME_CLUSTER;
gcloud projects list;
A few subtleties: the –zone key is required and put at the end, the disk should not be less than 10Gb, and the type of machines can be taken from https://cloud.google.com/compute/docs/machine-types. If we have only one replica, then by default a minimal configuration for testing is created:
gcloud container clusters create $ NAME_CLUSTER –zone europe-north1-a
You can see it in the admin panel by expanding the drop-down list in the header and opening the All projects tab.
gcloud projects delete NAME_PROJECT;
, if more – standard, the parameters of which we will edit:
$ gcloud container clusters create mycluster \
–-machine-type = n1-standard-1 –disk-size = 10GB –image-type ubuntu \
–-scopes compute-rw, gke-default \
–-machine-type = custom-1-1024 \
–-cluster-version = 1.11 –enable-autoupgrade \
–-num-nodes = 1 –enable-autoscaling –min-nodes = 1 –max-nodes = 2 \
–-zone europe-north1-a
The –enable-autorepair key starts the work of monitoring the availability of the node, and if it crashes, it will be recreated. The key requires a Kubernetes version of at least 1.11, and at the time of this writing, the default version is 1.10 and therefore you need to set it with a key, for example, –cluster-version = 1.11.4-gke.12 . But you can fix only the major version –cluster-version = 1.11 and set the auto-update version –enable-autoupgrade . We will also set auto-assuring the number of nodes if there are not enough resources: –num-nodes = 1 –min-nodes = 1 –max-nodes = 2 –enable-autoscaling .
Now let's talk about virtual cores and RAM. By default, the n1-standart-1 machine is raised , which has one virtual core and 3.5Gb of RAM, in triplicate, which together gives three virtual cores and 10.5Gb of RAM. It is important that the cluster has only at least two virtual processor cores, otherwise, formally, according to the limits for Kubernetes system containers, they will not be enough for full operation (containers, for example, system containers, may not rise). I will take two nodes, one core each and the total number of cores will be two. The same situation is with RAM, 1Gb (1024Mb) of RAM was enough for me to raise a container with NGINX, but to raise a container with LAMP (Apache MySQL PHP) is no longer there, the system service kube-dns-548976df6c- mlljx , which is responsible for DNS in the pod. Despite the fact that it is not vitally important and will not be useful to us, the next time it may not rise up a more important one instead. It is important to note that my cluster with 1Gb was normally raised and everything was fine, my total volume of 2Gb turned out to be a borderline value. I set 1080Mb (1.25Gb), taking into account that the minimum level of RAM is 256Mb (0.25Gb) and my volume must be a multiple of it and be at least 1Gb for one core. As a result, the cluster has 2 cores and 2.5Gb instead of 3 cores and 10.5Gb, which is a significant optimization of resources and prices on a paid account.
Now we need to connect to the server. We already have the key on the server $ {HOME} /. Kube / config and now we just need to log in:
$ gcloud container clusters get-credentials b –zone europe-north1-a –project essch
$ kubectl port-forward Nginxlamp-74c8b5b7f-d2rsg 8080: 8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [:: 1]: 8080 -> 8080
$ google-chrome http: // localhost: 8080 # this won't work in Google Shell
$ kubectl expose Deployment Nginxlamp –type = "LoadBalancer" –port = 8080
To use kubectl locally, you need to install gcloud and use it to install kubectl using the gcloud components install kubectl command , but let's not complicate the first steps for now.
In the Services section of the admin panel, POD will be available not only through the front-end balancer service, but also through the internal balancer Deployment. Although it will be saved after the re-creation, the config is more maintainable and obvious.
It is also possible to make it possible to adjust the number of nodes in automatic mode depending on the load, for example, the number of containers with established resource requirements, using the keys –enable-autoscaling –min-nodes = 1 –max-nodes = 2 .
Simple cluster in GCP
There are two ways to create a cluster: through the Google Cloud Platform graphical interface or through its API with the gcloud command. Let's see how this can be done through the UI. Next to the menu, click on the drop-down list and create a separate project. In the Kubernetes Engine section, choose to create a cluster. Let's give the name, 2CPU, the europe-north-1 zone (the data center in Finland is closest to St. Petersburg) and the latest version of Kubernetes. After creating the cluster, click on connect and select Cloud Shell. To create through the API, click the button in the upper right corner to display the console panel and enter in it:
gcloud container clusters create mycluster –zone europe-north1-a
After a while, it took me two and a half minutes, 3 virtual machines will be raised, the operating system is installed on them and the disk is mounted. Let's check:
esschtolts @ cloudshell: ~ (essch) $ gcloud container clusters list –filter = name = mycluster
NAME LOCATION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
mycluster europe-north1-a 35.228.37.100 n1-standard-1 1.10.9-gke.5 3 RUNNING
esschtolts @ cloudshell: ~ (essch) $ gcloud compute instances list
NAME MACHINE_TYPE EXTERNAL_IP STATUS
gke-mycluster-default-pool-43710ef9-0168 n1-standard-1 35.228.73.217 RUNNING
gke-mycluster-default-pool-43710ef9-39ck n1-standard-1 35.228.75.47 RUNNING
gke-mycluster-default-pool-43710ef9-g76k n1-standard-1 35.228.117.209 RUNNING
Let's connect to the virtual machine:
esschtolts @ cloudshell: ~ (essch) $ gcloud projects list
PROJECT_ID NAME PROJECT_NUMBER
agile-aleph-203917 My First Project 546748042692
essch app 283762935665
esschtolts @ cloudshell: ~ (essch) $ gcloud container clusters get-credentials mycluster \
–-zone europe-north1-a \
–-project essch
Fetching cluster endpoint and auth data.
kubeconfig entry generated for mycluster.
We don't have a cluster yet:
esschtolts @ cloudshell: ~ (essch) $ kubectl get pods
No resources found.
Let's create a cluster:
esschtolts @ cloudshell: ~ (essch) $ kubectl run Nginx –image = Nginx –replicas = 3
deployment.apps "Nginx" created
Let's check its composition:
esschtolts @ cloudshell: ~ (essch) $ kubectl get deployments –selector = run = Nginx
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
Nginx 3 3 3 3 14s
esschtolts @ cloudshell: ~ (essch) $ kubectl get pods –selector = run = Nginx
NAME READY STATUS RESTARTS AGE
Nginx-65899c769f-9whdx 1/1 Running 0 43s
Nginx-65899c769f-szwtd 1/1 Running 0 43s
Nginx-65899c769f-zs6g5 1/1 Running 0 43s
Let's make sure that all three replicas of the cluster are distributed evenly across all three nodes:
esschtolts @ cloudshell: ~ (essch) $ kubectl describe pod Nginx-65899c769f-9whdx | grep Node:
Node: gke-mycluster-default-pool-43710ef9-g76k / 10.166.0.5
esschtolts @ cloudshell: ~ (essch) $ kubectl describe pod Nginx-65899c769f-szwtd | grep Node:
Node: gke-mycluster-default-pool-43710ef9-39ck / 10.166.0.4
esschtolts @ cloudshell: ~ (essch) $ kubectl describe pod Nginx-65899c769f-zs6g5 | grep Node:
Node: gke-mycluster-default-pool-43710ef9-g76k / 10.166.0.5
Now let's install the load balancer:
esschtolts @ cloudshell: ~ (essch) $ kubectl expose Deployment Nginx –type = "LoadBalancer" –port = 80
service "Nginx" exposed
Let's check that it was created:
esschtolts @ cloudshell: ~ (essch) $ kubectl expose Deployment Nginx –type = "LoadBalancer" –port = 80
service "Nginx" exposed
esschtolts @ cloudshell: ~ (essch) $ kubectl get svc –selector = run = Nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE
Nginx LoadBalancer 10.27.245.187 pending> 80: 31621 / TCP 11s
esschtolts @ cloudshell: ~ (essch) $ sleep 60;
esschtolts @ cloudshell: ~ (essch) $ kubectl get svc –selector = run = Nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE
Nginx LoadBalancer 10.27.245.187 35.228.212.163 80: 31621 / TCP 1m
Let's check its work:
esschtolts @ cloudshell: ~ (essch) $ curl 35.228.212.163:80 2> \ dev \ null | grep h1
In order not to copy the full names every time, save them in variables (more about the JSONpath format in the Go documentation: https://golang.org/pkg/text/template/#pkg-overview):
esschtolts @ cloudshell: ~ (essch) $ pod1 = $ (kubectl get pods -o jsonpath = {. items [0] .metadata.name});
esschtolts @ cloudshell: ~ (essch) $ pod2 = $ (kubectl get pods -o jsonpath = {. items [1] .metadata.name});
esschtolts @ cloudshell: ~ (essch) $ pod3 = $ (kubectl get pods -o jsonpath = {. items [2] .metadata.name});
esschtolts @ cloudshell: ~ (essch) $ echo $ pod1 $ pod2 $ pod3
Nginx-65899c769f-9whdx Nginx-65899c769f-szwtd Nginx-65899c769f-zs6g5
Let's change the pages in each POD by copying the unique pages to each replica, and check the balancing by checking the distribution of requests across the POD:
esschtolts @ cloudshell: ~ (essch) $ echo 1> test.html;
esschtolts @ cloudshell: ~ (essch) $ kubectl cp test.html $ {pod1}: / usr / share / Nginx / html / index.html
esschtolts @ cloudshell: ~ (essch) $ echo 2> test.html;
esschtolts @ cloudshell: ~ (essch) $ kubectl cp test.html $ {pod2}: / usr / share / Nginx / html / index.html
esschtolts @ cloudshell: ~ (essch) $ echo 3> test.html;
esschtolts @ cloudshell: ~ (essch) $ kubectl cp test.html $ {pod3}: / usr / share / Nginx / html / index.html
esschtolts @ cloudshell: ~ (essch) $ curl 35.228.212.163:80 && curl 35.228.212.163:80 && curl 35.228.212.163:80
3
2
one
esschtolts @ cloudshell: ~ (essch) $ curl 35.228.212.163:80 && curl 35.228.212.163:80 && curl 35.228.212.163:80
3
one
one
Let's check the failover of the cluster by deleting one POD:
esschtolts @ cloudshell: ~ (essch) $ kubectl delete pod $ {pod1} && kubectl get pods && sleep 10 && kubectl get pods
pod "Nginx-65899c769f-9whdx" deleted
NAME READY STATUS RESTARTS AGE
Nginx-65899c769f-42rd5 0/1 ContainerCreating 0 1s
Nginx-65899c769f-9whdx 0/1 Terminating 0 54m
Nginx-65899c769f-szwtd 1/1 Running 0 54m
Nginx-65899c769f-zs6g5 1/1 Running 0 54m
NAME READY STATUS RESTARTS AGE
Nginx-65899c769f-42rd5 1/1 Running 0 12s
Nginx-65899c769f-szwtd 1/1 Running 0 55m
Nginx-65899c769f-zs6g5 1/1 Running 0 55m
As we can see, immediately after the POD became unavailable (the process of deleting it began) its replacement began to be created. Soon, the cluster will fully restore its structure. After we have finished our experiments, remove the virtual machines with the cluster:
esschtolts @ cloudshell: ~ (essch) $ gcloud container clusters delete mycluster –zone europe-north1-a;
The following clusters will be deleted.
– [mycluster] in [europe-north1-a]
Do you want to continue (Y / n)? Y
Deleting cluster mycluster … done.
Deleted [https://container.googleapis.com/v1/projects/essch/zones/europe-north1-a/clusters/mycluster].
esschtolts @ cloudshell: ~ (essch) $ gcloud container clusters list –filter = name = mycluster
Total. We created a cluster and created a load balancer with just two run and expose commands, now we can go to the balancer's IP address and watch the NGINX welcome page in the browser. In this case, the cluster recovers itself, for this we emulated a failure of the pod by deleting it – it was created again.
Cluster Reproducibility
Let's take a look at the situation from the previous chapter, in which we created a cluster, deleted a replica, and it recovered. The fact is that we do not manage commands directly, but with the help of commands we create descriptions of the required configuration of the cluster and place it in the distributed storage, after which the state of the nodes is maintained in accordance with this description in the distributed storage. We can also get and edit these descriptions, or write ourselves and then upload them to a distributed storage. This will allow us to save the state on disk in the form of YAML files and restore it back, as is often done when moving from a production server to a test one. In addition, we get the opportunity to more flexibly customize the state, but since we are not limited to commands.
esschtolts @ cloudshell: ~ (essch) $ kubectl get deployment / Nginx –output = yaml
apiVersion: extensions / v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: 2018-12-16T10: 23: 26Z
generation: 1
labels:
run: Nginx
name: Nginx
namespace: default
resourceVersion: "1612985"
selfLink: / apis / extensions / v1beta1 / namespaces / default / deployments / Nginx
uid: 9fb3ad6a-011c-11e9-bfaa-42010aa60088
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
run: Nginx
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
run: Nginx
spec:
containers:
– image: Nginx
imagePullPolicy: Always
name: Nginx
resources: {}
terminationMessagePath: / dev / termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
– lastTransitionTime: 2018-12-16T10: 23: 26Z
lastUpdateTime: 2018-12-16T10: 23: 26Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
– lastTransitionTime: 2018-12-16T10: 23: 26Z
lastUpdateTime: 2018-12-16T10: 23: 28Z
message: ReplicaSet "Nginx-64f497f8fd" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
This will be superfluous for us, so I will delete the unnecessary, because when creating, we specified only the name and image, the rest was filled with default values:
apiVersion: extensions / v1beta1
kind: Deployment
metadata:
labels:
run: Nginx
name: Nginx
spec:
selector:
matchLabels:
run: Nginx
template:
metadata:
labels:
run: Nginx
spec:
containers:
– image: Nginx
name: Nginx
You can also create a template:
gcloud services enable compute.googleapis.com –project = $ {PROJECT}
gcloud beta compute instance-templates create-with-container $ {TEMPLATE} \
–-machine-type = custom-1-4096 \
–-image-family = cos-stable \
–-image-project = cos-cloud \
–-container-image = gcr.io / kuar-demo / kuard-amd64: 1 \
–-container-restart-policy = always \
–-preemptible \
–-region = $ {REGION} \
–-project = $ {PROJECT}
gcloud compute instance-groups managed create $ {TEMPLATE} \
–-base-instance-name = $ {TEMPLATE} \
–-template = $ {TEMPLATE} \
–-size = $ {CLONES} \
–-region = $ {REGION} \
–-project = $ {PROJECT}
High service availability
To ensure high availability, you need to redirect traffic to the spare in the event of an application crash. Also, it is often important that the load is evenly distributed, since the application in a single instance is not able to handle all the traffic. To do this, a cluster is created, for example, let's take a more complex image in order to parse a larger number of nuances:
esschtolts @ cloudshell: ~ / bitrix (essch) $ cat deploymnet.yaml
apiVersion: apps / v1
kind: Deployment
metadata:
name: Nginxlamp
spec:
selector:
matchLabels:
app: lamp
replicas: 1
template:
metadata:
labels:
app: lamp
spec:
containers:
– name: lamp
image: mattrayner / lamp: latest-1604-php5
ports:
– containerPort: 80
esschtolts @ cloudshell: ~ / bitrix (essch) $ cat loadbalancer.yaml
apiVersion: v1
kind: Service
metadata:
name: frontend
spec:
type: LoadBalancer
ports:
– name: front
port: 80
targetPort: 80
selector:
app: lamp
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods
NAME READY STATUS RESTARTS AGE
Nginxlamp-7fb6fdd47b-jttl8 2/2 Running 0 3m
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE
frontend LoadBalancer 10.55.242.137 35.228.73.217 80: 32701 / TCP, 8080: 32568 / TCP 4m
kubernetes ClusterIP 10.55.240.1 none> 443 / TCP 48m
Now we can create identical copies of our clusters, for example, for Production and Develop, but balancing will not work as expected. The balancer will find PODs by label, and PODs in both production and Developer clusters correspond to this label. Also, placing clusters in different projects will not be an obstacle. Although, for many tasks, this is a big plus, but not in the case of a cluster for developers and production. The namespace is used to delimit the scope. We use them discreetly, when we list PODs without specifying a scope, we are displayed by default , but the PODs are not taken out of the system scope:
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get namespace
NAME STATUS AGE
default Active 5h
kube-public Active 5h
kube-system Active
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods –namespace = kube-system
NAME READY STATUS RESTARTS AGE
event-exporter-v0.2.3-85644fcdf-tdt7h 2/2 Running 0 5h
fluentd-gcp-scaler-697b966945-bkqrm 1/1 Running 0 5h
fluentd-gcp-v3.1.0-xgtw9 2/2 Running 0 5h
heapster-v1.6.0-beta.1-5649d6ddc6-p549d 3/3 Running 0 5h
kube-dns-548976df6c-8lvp6 4/4 Running 0 5h
kube-dns-548976df6c-mcctq 4/4 Running 0 5h
kube-dns-autoscaler-67c97c87fb-zzl9w 1/1 Running 0 5h
kube-proxy-gke-bitrix-default-pool-38fa77e9-0wdx 1/1 Running 0 5h
kube-proxy-gke-bitrix-default-pool-38fa77e9-wvrf 1/1 Running 0 5h
l7-default-backend-5bc54cfb57-6qk4l 1/1 Running 0 5h
metrics-server-v0.2.1-fd596d746-g452c 2/2 Running 0 5h
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods –namespace = default
NAMEREADY STATUS RESTARTS AGE
Nginxlamp-b5dcb7546-g8j5r 1/1 Running 0 4h
Let's create a scope:
esschtolts @ cloudshell: ~ / bitrix (essch) $ cat namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: development
labels:
name: development
esschtolts @ cloudshell: ~ (essch) $ kubectl create -f namespace.yaml
namespace "development" created
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get namespace –show-labels
NAME STATUS AGE LABELS
default Active 5h none>
development Active 16m name = development
kube-public Active 5h none>
kube-system Active 5h none>
The essence of working with scope is that for specific clusters we set the scope and we can execute commands specifying it, while they will apply only to them. At the same time, except for the keys in commands such as kubectl get pods I do not appear in the scope, therefore the configuration files of controllers (Deployment, DaemonSet and others) and services (LoadBalancer, NodePort and others) do not appear, allowing them to be seamlessly transferred between the scope, which especially relevant for the development pipeline: developer server, test server, and production server. Scopes are set in the cluster context file $ HOME / .kube / config using the kubectl config view command . So, in my cluster context entry, the scope entry does not appear (default is default ):
– context:
cluster: gke_essch_europe-north1-a_bitrix
user: gke_essch_europe-north1-a_bitrix
name: gke_essch_europe-north1-a_bitrix
You can see something like this:
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl config view -o jsonpath = '{. contexts [4]}'
{gke_essch_europe-north1-a_bitrix {gke_essch_europe-north1-a_bitrix gke_essch_europe-north1-a_bitrix []}}
Let's create a new context for this user and cluster:
esschtolts @ cloudshell: ~ (essch) $ kubectl config set-context dev \
> –namespace = development \
> –cluster = gke_essch_europe-north1-a_bitrix \
> –user = gke_essch_europe-north1-a_bitrix
Context "dev" modified.
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl config set-context dev \
> –namespace = development \
> –cluster = gke_essch_europe-north1-a_bitrix \
> –user = gke_essch_europe-north1-a_bitrix
Context "dev" modified.
As a result, the following was added:
– context:
cluster: gke_essch_europe-north1-a_bitrix
namespace: development
user: gke_essch_europe-north1-a_bitrix
name: dev
Now it remains to switch to it:
esschtolts @ cloudshell: ~ (essch) $ kubectl config use-context dev
Switched to context "dev".
esschtolts @ cloudshell: ~ (essch) $ kubectl config current-context
dev
esschtolts @ cloudshell: ~ (essch) $ kubectl get pods
No resources found.
esschtolts @ cloudshell: ~ (essch) $ kubectl get pods –namespace = default
NAMEREADY STATUS RESTARTS AGE
Nginxlamp-b5dcb7546-krkm2 1/1 Running 0 10h
You could add a namespace to the existing context:
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl config set-context $ (kubectl config current-context) –namespace = development
Context "gke_essch_europe-north1-a_bitrix" modified.
Now create a new cluster in the scope dev (it is now the default, and it can be omitted –namespace = dev ) and removed from the field by default visibility default (it is no longer the default for our cluster, and it is necessary to specify –namespace = default ):
esschtolts @ cloudshell: ~ (essch) $ cd bitrix /
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl create -f deploymnet.yaml -f loadbalancer.yaml
deployment.apps "Nginxlamp" created
service "frontend" created
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl delete -f deploymnet.yaml -f loadbalancer.yaml –namespace = default
deployment.apps "Nginxlamp" deleted
service "frontend" deleted
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods
NAMEREADY STATUS RESTARTS AGE
Nginxlamp-b5dcb7546-8sl2f 1/1 Running 0 1m
Now let's look at the external IP address and open the page:
esschtolts @ cloudshell: ~ / bitrix (essch) $ curl $ (kubectl get -f loadbalancer.yaml -o json
| jq -r .status.loadBalancer.ingress [0] .ip) 2> / dev / null | grep '
Customization
Now we need to change the standard solution to our needs, namely, add configs and our application. For simplicity's sake, we'll mark (change the default) .htaccess file at the root of our application , making it simple to place our application in the / app folder . The first thing that begs to be done is to create a POD and then copy our application from the host to the container (I took Bitrix):
While this solution works, it has a number of significant disadvantages. The first thing is that we need to wait from outside by constantly polling the POD when it will raise the container and we will copy the application into it and should not do this if the container has not been raised, as well as handle the situation when it breaks our POD, external services, can rely on the status of the POD, although the POD itself will not be ready yet until the script is executed. The second drawback is that we have some kind of external script that needs to be logically not separated from the POD, but at the same time it needs to be manually launched from outside, where it is stored and somewhere there should be instructions for its use. And finally, we can have a lot of these PODs. At first glance, the logical solution is to put the code in the Dockerfile:
esschtolts @ cloudshell: ~ / bitrix (essch) $ cat Dockerfile
FROM mattrayner / lamp: latest-1604-php5
MAINTAINER ESSch ESSchtolts@yandex.ru>
RUN cd / app / && (\
wget https://www.1c-bitrix.ru/download/small_business_encode.tar.gz \
&& tar -xf small_business_encode.tar.gz \
&& sed -i '5i php_value short_open_tag 1' .htaccess \
&& chmod -R 0777. \
&& sed -i 's / # php_value display_errors 1 / php_value display_errors 1 /' .htaccess \
&& sed -i '5i php_value opcache.revalidate_freq 0' .htaccess \
&& sed -i 's / # php_flag default_charset UTF-8 / php_flag default_charset UTF-8 /' .htaccess \
) && cd ..;
EXPOSE 80 3306
CMD ["/run.sh"]
esschtolts @ cloudshell: ~ / bitrix (essch) $ docker build -t essch / app: 0.12. | grep Successfully
Successfully built f76e656dac53
Successfully tagged essch / app: 0.12
esschtolts @ cloudshell: ~ / bitrix (essch) $ docker image push essch / app | grep digest
0.12: digest: sha256: 75c92396afacefdd5a3fb2024634a4c06e584e2a1674a866fa72f8430b19ff69 size: 11309
esschtolts @ cloudshell: ~ / bitrix (essch) $ cat deploymnet.yaml
apiVersion: apps / v1
kind: Deployment
metadata:
name: Nginxlamp
namespace: development
spec:
selector:
matchLabels:
app: lamp
replicas: 1
template:
metadata:
labels:
app: lamp
spec:
containers:
– name: lamp
image: essch / app: 0.12
ports:
– containerPort: 80
esschtolts @ cloudshell: ~ / bitrix (essch) $ IMAGE = essch / app: 0.12 kubectl create -f deploymnet.yaml
deployment.apps "Nginxlamp" created
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods -l app = lamp
NAME READY STATUS RESTARTS AGE
Nginxlamp-55f8cd8dbc-mk9nk 1/1 Running 0 5m
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl exec Nginxlamp-55f8cd8dbc-mk9nk – ls / app /
index.php
This happens because the developer of the images, which is correct and written in his documentation, expected that the image would be mounted to the host and the app folder was deleted in the script launched at the end. Also, in this approach, we will face the problem of constant updates of images, config (we cannot set the image number of a variable, since it will be executed on the cluster nodes) and container updates, we also cannot update the folder, since when the container is recreated, the changes will be returned to the original state.
The correct solution would be to mount the folder and include in the POD lifecycle the launch of the container, which starts in front of the main container and performs preparatory environment operations, often downloading the application from the repository, building, running tests, creating users and setting rights. For each operation, it is correct to launch a separate init container, in which this operation is the basic process, which are executed sequentially – by a chain that will be broken if one of the operations is performed with an error (it will return a non-zero process termination code). For such a container, a separate description is provided in the POD – InitContainer , listing them sequentially, they will build a chain of init container launches in the same order. In our case, we created an unnamed volume and, using InitContainer, delivered the installation files to it. After the successful completion of InitContainer , of which there may be several, the main one starts. The main container is already mounted in our volume, which already has the installation files, we just need to go to the browser and complete the installation:
esschtolts @ cloudshell: ~ / bitrix (essch) $ cat deploymnet.yaml
kind: Deployment
metadata:
name: Nginxlamp
namespace: development
spec:
selector:
matchLabels:
app: lamp
replicas: 1
template:
metadata:
labels:
app: lamp
spec:
initContainers:
– name: init
image: ubuntu
command:
– / bin / bash
– -c
– |
cd / app
apt-get update && apt-get install -y wget
wget https://www.1c-bitrix.ru/download/small_business_encode.tar.gz
tar -xf small_business_encode.tar.gz
sed -i '5i php_value short_open_tag 1' .htaccess
chmod -R 0777.
sed -i 's / # php_value display_errors 1 / php_value display_errors 1 /' .htaccess
sed -i '5i php_value opcache.revalidate_freq 0' .htaccess
sed -i 's / # php_flag default_charset UTF-8 / php_flag default_charset UTF-8 /' .htaccess
volumeMounts:
– name: app
mountPath: / app
containers:
– name: lamp
image: essch / app: 0.12
ports:
– containerPort: 80
volumeMounts:
– name: app
mountPath: / app
volumes:
– name: app
emptyDir: {}
You can watch events during POD creation with the watch kubectl get events command , and kubectl logs {ID_CONTAINER} -c init or more universally:
kubectl logs $ (kubectl get PODs -l app = lamp -o JSON | jq ".items [0] .metadata.name" | sed 's / "// g') -c init
It is advisable to choose small images for single tasks, for example, alpine: 3.5 :
esschtolts @ cloudshell: ~ (essch) $ docker pull alpine 1> \ dev \ null
esschtolts @ cloudshell: ~ (essch) $ docker pull ubuntu 1> \ dev \ null
esschtolts @ cloudshell: ~ (essch) $ docker images
REPOSITORY TAGIMAGE ID CREATED SIZE
ubuntu latest 93fd78260bd1 4 weeks ago 86.2MB
alpine latest 196d12cf6ab1 3 months ago 4.41MB
By slightly changing the code, we significantly saved on the size of the image:
image: alpine: 3.5
command:
– / bin / bash
– -c
– |
cd / app
apk –update add wget && rm -rf / var / cache / apk / *
tar -xf small_business_encode.tar.gz
rm -f small_business_encode.tar.gz
sed -i '5i php_value short_open_tag 1' .htaccess
sed -i 's / # php_value display_errors 1 / php_value display_errors 1 /' .htaccess
sed -i '5i php_value opcache.revalidate_freq 0' .htaccess
sed -i 's / # php_flag default_charset UTF-8 / php_flag default_charset UTF-8 /' .htaccess
chmod -R 0777.
volumeMounts:
There are also minimalistic images with pre-installed packages such as APIne with git: axeclbr / git and golang: 1-alpine .
Ways to ensure fault tolerance
Any process can crash. In the case of a container, if the main process crashes, then the container containing it also crashes. It is normal for the crash to occur during graceful shutdown. For example, our application in the container makes a backup of the database, in this case, after the container is executed, we get the work done. For demonstration purposes, let's take the sleep command:
vagrant @ ubuntu: ~ $ sudo docker pull ubuntu> / dev / null
vagrant @ ubuntu: ~ $ sudo docker run -d ubuntu sleep 60
0bd80651c6f97167b27f4e8df675780a14bd9e0a5c3f8e5e8340a98fc351bc64
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
0bd80651c6f9 ubuntu "sleep 60" 15 seconds ago Up 12 seconds distracted_kalam
vagrant @ ubuntu: ~ $ sleep 60
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
vagrant @ ubuntu: ~ $ sudo docker ps -a | grep ubuntu
0bd80651c6f9 ubuntu "sleep 60" 4 minutes ago Exited (0) 3 minutes ago distracted_kalam
In the case of backups, this is the norm, but in the case of applications that should not be terminated, it is not. A typical trick is a web server. The easiest thing in this case is to restart it:
vagrant @ ubuntu: ~ $ sudo docker run -d –restart = always ubuntu sleep 10
c3bc2d2e37a68636080898417f5b7468adc73a022588ae073bdb3a5bba270165
vagrant @ ubuntu: ~ $ sleep 30
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS
c3bc2d2e37a6 ubuntu sleep 10 "46 seconds ago Up 1 second
We see that when the container falls, it restarts. As a result – we always have an application in two states – raised or raised. If a web server crashes from some rare error, this is the norm, but most likely there is an error in processing requests, and it will crash on every such request, and in monitoring we will see a raised container. Such a web server is better dead than half alive. But, at the same time, a normal web server may not start due to rare errors, for example, due to the lack of connection to the database due to network instability. In such a case, the application must be able to handle errors and exit. And in case of a crash due to code errors, do not restart to see the inoperability and send it to the developers for repair. In the case of a floating error, you can try several times:
vagrant @ ubuntu: ~ $ sudo docker run -d –restart = on-failure: 3 ubuntu sleep 10
056c4fc6986a13936e5270585e0dc1782a9246f01d6243dd247cb03b7789de1c
vagrant @ ubuntu: ~ $ sleep 10
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c3bc2d2e37a6 ubuntu "sleep 10" 9 minutes ago Up 2 seconds keen_sinoussi
vagrant @ ubuntu: ~ $ sleep 10
vagrant @ ubuntu: ~ $ sleep 10
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c3bc2d2e37a6 ubuntu "sleep 10" 10 minutes ago Up 9 seconds keen_sinoussi
vagrant @ ubuntu: ~ $ sleep 10
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c3bc2d2e37a6 ubuntu "sleep 10" 10 minutes ago Up 2 seconds keen_sinoussi
Another aspect is when to consider the container dead. By default, this is a process crash. But, by far, the application does not always crash itself in case of an error in order to allow the container to be restarted. For example, a server may be designed incorrectly and try to download the necessary libraries during its startup, but it does not have this opportunity, for example, due to the blocking of requests by the firewall. In such a scenario, the server can wait a long time if an adequate timeout is not specified. In this case, we need to check the functionality. For a web server, this is a response to a specific url, for example:
docker run –rm -d \
–-name = elasticsearch \
–-health-cmd = "curl –silent –fail localhost: 9200 / _cluster / health || exit 1" \
–-health-interval = 5s \
–-health-retries = 12 \
–-health-timeout = 20s \
{image}
For demonstration, we will use the file creation command. If the application has not reached the working state within the allotted time limit (set to 0) (for example, creating a file), then it is marked as working, but before that the specified number of checks is done:
vagrant @ ubuntu: ~ $ sudo docker run \
–d –name healt \
–-health-timeout = 0s \
–-health-interval = 5s \
–-health-retries = 3 \
–-health-cmd = "ls / halth" \
ubuntu bash -c 'sleep 1000'
c0041a8d973e74fe8c96a81b6f48f96756002485c74e51a1bd4b3bc9be0d9ec5
vagrant @ ubuntu: ~ $ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c0041a8d973e ubuntu "bash -c 'sleep 1000'" 4 seconds ago Up 3 seconds (health: starting) healt
vagrant @ ubuntu: ~ $ sleep 20
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c0041a8d973e ubuntu "bash -c 'sleep 1000'" 38 seconds ago Up 37 seconds (unhealthy) healt
vagrant @ ubuntu: ~ $ sudo docker rm -f healt
healt
If at least one of the checks worked, then the container is marked as healthy immediately:
vagrant @ ubuntu: ~ $ sudo docker run \
–d –name healt \
–-health-timeout = 0s \
–-health-interval = 5s \
–-health-retries = 3 \
–-health-cmd = "ls / halth" \
ubuntu bash -c 'touch / halth && sleep 1000'
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
160820d11933 ubuntu "bash -c 'touch / hal …" 4 seconds ago Up 2 seconds (health: starting) healt
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
160820d11933 ubuntu "bash -c 'touch / hal …" 6 seconds ago Up 5 seconds (healthy) healt
vagrant @ ubuntu: ~ $ sudo docker rm -f healt
healt
In this case, the checks are repeated all the time at a given interval:
vagrant @ ubuntu: ~ $ sudo docker run \
–d –name healt \
–-health-timeout = 0s \
–-health-interval = 5s \
–-health-retries = 3 \
–-health-cmd = "ls / halth" \
ubuntu bash -c 'touch / halth && sleep 60 && rm -f / halth && sleep 60'
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8ec3a4abf74b ubuntu "bash -c 'touch / hal …" 7 seconds ago Up 5 seconds (health: starting) healt
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8ec3a4abf74b ubuntu "bash -c 'touch / hal …" 24 seconds ago Up 22 seconds (healthy) healt
vagrant @ ubuntu: ~ $ sleep 60
vagrant @ ubuntu: ~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8ec3a4abf74b ubuntu "bash -c 'touch / hal …" About a minute ago Up About a minute (unhealthy) healt
Kubernetes provides (kubernetes.io/docs/tasks/configure-POD-container/configure-liveness-readiness-probes/) three tools that check the state of a container from outside. They are more important because they serve not only to inform, but also to manage the application life cycle, roll-forward and rollback of updates. Configuring them incorrectly can, and often does, cause the application to malfunction. So, if the liveness test is triggered before the application starts working, Kubernetes will kill the container, not allowing it to rise. Let's consider it in more detail. The liveness probe is used to determine the health of the application, and if the application crashes and does not respond to the liveness probe, Kubernetes reloads the container. As an example, we will take a shell test, due to the simplicity of the demonstration of work, but in practice it should be used only in extreme cases, for example, if the container is started not as a long-lived server, but as a JOB, doing its job and ending its existence, having achieved the result … For server checks, it is better to use HTTP probes, which already have a built-in dedicated proxy and do not require curl in the container and do not depend on external kube-proxy settings. When using databases, you must use a TCP probe, as they usually do not support the HTTP protocol. Let's create a long-lived container at www.katacoda.com/courses/kubernetes/playground:
controlplane $ cat << EOF> liveness.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness
spec:
containers:
– name: healtcheck
image: alpine: 3.5
args:
– / bin / sh
– -c
– touch / tmp / healthy; sleep 10; rm -rf / tmp / healthy; sleep 60
livenessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 15
periodSeconds: 5
EOF
controlplane $ kubectl create -f liveness.yaml
pod / liveness created
controlplane $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness 1/1 Running 2 2m11s
controlplane $ kubectl describe pod / liveness | tail -n 10
Type Reason Age From Message
–– – – – –
Normal Scheduled 2m37s default-scheduler Successfully assigned default / liveness to node01
Normal Pulling 2m33s kubelet, node01 Pulling image "alpine: 3.5"
Normal Pulled 2m30s kubelet, node01 Successfully pulled image "alpine: 3.5"
Normal Created 33s (x3 over 2m30s) kubelet, node01 Created container healtcheck
Normal Started 33s (x3 over 2m30s) kubelet, node01 Started container healtcheck
Normal Pulled 33s (x2 over 93s) kubelet, node01 Container image "alpine: 3.5" already present on machine
Warning Unhealthy 3s (x9 over 2m13s) kubelet, node01 Liveness probe failed: cat: can't open '/ tmp / healthy': No such file or directory
Normal Killing 3s (x3 over 2m3s) kubelet, node01 Container healtcheck failed liveness probe, will be restarted
We see that the container is constantly being restarted.
controlplane $ cat << EOF> liveness.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness
spec:
containers:
– name: healtcheck
image: alpine: 3.5
args:
– / bin / sh
– -c
– touch / tmp / healthy; sleep 30; rm -rf / tmp / healthy; sleep 60
livenessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 15
periodSeconds: 5
EOF
controlplane $ kubectl create -f liveness.yaml
pod / liveness created
controlplane $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness 1/1 Running 2 2m53s
controlplane $ kubectl describe pod / liveness | tail -n 15
SecretName: default-token-9v5mb
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
–– – – – –
Normal Scheduled 3m44s default-scheduler Successfully assigned default / liveness to node01
Normal Pulled 68s (x3 over 3m35s) kubelet, node01 Container image "alpine: 3.5" already present on machine
Normal Created 68s (x3 over 3m35s) kubelet, node01 Created container healtcheck
Normal Started 68s (x3 over 3m34s) kubelet, node01 Started container healtcheck
Warning Unhealthy 23s (x9 over 3m3s) kubelet, node01 Liveness probe failed: cat: can't open '/ tmp / healthy': No such file or directory
Normal Killing 23s (x3 over 2m53s) kubelet, node01 Container healtcheck failed liveness probe, will be restarted
We also see on cluster events that when cat / tmp / health fails, the container is re-created:
controlplane $ kubectl get events
controlplane $ kubectl get events | grep pod / liveness
13m Normal Scheduled pod / liveness Successfully assigned default / liveness to node01
13m Normal Pulling pod / liveness Pulling image "alpine: 3.5"
13m Normal Pulled pod / liveness Successfully pulled image "alpine: 3.5"
10m Normal Created pod / liveness Created container healtcheck
10m Normal Started pod / liveness Started container healtcheck
10m Warning Unhealthy pod / liveness Liveness probe failed: cat: can't open '/ tmp / healthy': No such file or directory
10m Normal Killing pod / liveness Container healtcheck failed liveness probe, will be restarted
10m Normal Pulled pod / liveness Container image "alpine: 3.5" already present on machine
8m32s Normal Scheduled pod / liveness Successfully assigned default / liveness to node01
4m41s Normal Pulled pod / liveness Container image "alpine: 3.5" already present on machine
4m41s Normal Created pod / liveness Created container healtcheck
4m41s Normal Started pod / liveness Started container healtcheck
2m51s Warning Unhealthy pod / liveness Liveness probe failed: cat: can't open '/ tmp / healthy': No such file or directory
5m11s Normal Killing pod / liveness Container healtcheck failed liveness probe, will be restarted
Let's take a look at RadyNess trial. The availability of this test indicates that the application is ready to accept requests and the service can switch traffic to it:
controlplane $ cat << EOF> readiness.yaml
apiVersion: apps / v1
kind: Deployment
metadata:
name: readiness
spec:
replicas: 2
selector:
matchLabels:
app: readiness
template:
metadata:
labels:
app: readiness
spec:
containers:
– name: readiness
image: python
args:
– / bin / sh
– -c
– sleep 15 && (hostname> health) && python -m http.server 9000
readinessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 1
periodSeconds: 5
EOF
controlplane $ kubectl create -f readiness.yaml
deployment.apps / readiness created
controlplane $ kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-fd8d996dd-cfsdb 0/1 ContainerCreating 0 7s
readiness-fd8d996dd-sj8pl 0/1 ContainerCreating 0 7s
controlplane $ kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-fd8d996dd-cfsdb 0/1 Running 0 6m29s
readiness-fd8d996dd-sj8pl 0/1 Running 0 6m29s
controlplane $ kubectl exec -it readiness-fd8d996dd-cfsdb – curl localhost: 9000 / health
readiness-fd8d996dd-cfsdb
Our containers work great. Let's add traffic to them:
controlplane $ kubectl expose deploy readiness \
–-type = LoadBalancer \
–-name = readiness \
–-port = 9000 \
–-target-port = 9000
service / readiness exposed
controlplane $ kubectl get svc readiness
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE
readiness LoadBalancer 10.98.36.51
controlplane $ curl localhost: 9000
controlplane $ for i in {1..5}; do curl $ IP: 9000 / health; done
one
2
3
four
five
Each container has a delay. Let's check what happens if one of the containers is restarted – whether traffic will be redirected to it:
controlplane $ kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-5dd64c6c79-9vq62 0/1 CrashLoopBackOff 6 15m
readiness-5dd64c6c79-sblvl 0/1 CrashLoopBackOff 6 15m
kubectl exec -it .... -c .... bash -c "rm -f healt"
controlplane $ for i in {1..5}; do echo $ i; done
one
2
3
four
five
controlplane $ kubectl delete deploy readiness
deployment.apps "readiness" deleted
Consider a situation when a container becomes temporarily unavailable for work:
(hostname> health) && (python -m http.server 9000 &) && sleep 60 && rm health && sleep 60 && (hostname> health) sleep 6000
/ bin / sh -c sleep 60 && (python -m http.server 9000 &) && PID = $! && sleep 60 && kill -9 $ PID
By default, the container enters the Running state upon completion of the execution of scripts in the Dockerfile and the launch of the script specified in the CMD instruction if it is overridden in the configuration in the Command section. But, in practice, if we have a database, it still needs to rise (read data and transfer their RAM and other actions), and this can take a lot of time, while it will not respond to connections, and other applications, although read and ready to accept connections will not be able to do so. Also, the container transitions to the Feils state when the main process in the container crashes. In the case of a database, it can endlessly try to execute an incorrect request and will not be able to respond to incoming requests, while the container will not be restarted, since the database daemon (server) did not formally crash. For these cases, two identifiers have been invented: readinessProbe and livenessProbe, which check the transition of the container to a working state or its failure by a custom script or HTTP request.
esschtolts @ cloudshell: ~ / bitrix (essch) $ cat health_check.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: healtcheck
name: healtcheck
spec:
containers:
– name: healtcheck
image: alpine: 3.5
args:
– / bin / sh
– -c
– sleep 12; touch / tmp / healthy; sleep 10; rm -rf / tmp / healthy; sleep 60
readinessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 15
periodSeconds: 5
The container starts after 3 seconds and after 5 seconds a readiness check starts every 5 seconds. On the second check (at 15 seconds of life), the readiness check cat / tmp / healthy will be successful. At this time, the livenessProbe operability check begins and at the second check (at 25 seconds) it ends with an error, after which the container is recognized as not working and is recreated.
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl create -f health_check.yaml && sleep 4 && kubectl get
pods && sleep 10 && kubectl get pods && sleep 10 && kubectl get pods
pod "liveness-exec" created
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 Running 0 5s
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 Running 0 15s
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 0 26s
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 Running 0 53s
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 Running 0 1m
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 1 1m
Kubernetes also provides a startup, which remakes the moment when you can turn the readiness and liveness of the sample into work. This is useful if, for example, we are downloading an application. Let's consider in more detail. Let's take www.katacoda.com/courses/Kubernetes/playground and Python for the experiment. There are TCP, EXEC and HTTP, but HTTP is better, as EXEC spawns processes and can leave them as "zombie processes". In addition, if the server provides interaction via HTTP, then it is against it that you need to check (https://www.katacoda.com/courses/kubernetes/playground):
controlplane $ kubectl version –short
Client Version: v1.18.0
Server Version: v1.18.0
cat << EOF> job.yaml
apiVersion: v1
kind: Pod
metadata:
name: healt
spec:
containers:
– name: python
image: python
command: ['sh', '-c', 'sleep 60 && (echo "work"> health) && sleep 60 && python -m http.server 9000']
readinessProbe:
httpGet:
path: / health
port: 9000
initialDelaySeconds: 3
periodSeconds: 3
livenessProbe:
httpGet:
path: / health
port: 9000
initialDelaySeconds: 3
periodSeconds: 3
startupProbe:
exec:
command:
– cat
– / health
initialDelaySeconds: 3
periodSeconds: 3
restartPolicy: OnFailure
EOF
controlplane $ kubectl create -f job.yaml
pod / healt
controlplane $ kubectl get pods # not loaded yet
NAME READY STATUS RESTARTS AGE
healt 0/1 Running 0 11s
controlplane $ sleep 30 && kubectl get pods # not loaded yet but image is already zipped
NAME READY STATUS RESTARTS AGE
healt 0/1 Running 0 51s
controlplane $ sleep 60 && kubectl get pods
NAME READY STATUS RESTARTS AGE
healt 0/1 Running 1 116s
controlplane $ kubectl delete -f job.yaml
pod "healt" deleted
Self-diagnosis of micro service application
Let's consider how the probe works on the example of the microservice application bookinfo, which is part of Istio as an example: https://github.com/istio/istio/tree/master/samples/bookinfo. The demo will be at www.katacoda.com/courses/istio/deploy-istio-on-kubernetes. After deployment, it will be available
Infrastructure management
Although Kubernetes also has its own graphical interface – a UI dashboard, it does not provide other than monitoring and simple actions. More possibilities are given by OpenShift, providing a combination of graphic and text creation. A full-fledged product with a formed Google ecosystem in Kubernetes does not provide, but provides a cloud solution – Google Cloud Platform. However, there are third-party solutions, such as Open Shift and Rancher, that allow you to use it fully through a graphical interface at its own facilities. If desired, of course, you can sync with the cloud.
Each product is often not API compatible with each other, the only known exception being Mail. Cloud, which claims support for Open Shift. But, there is a third-party solution that implements the infrastructure as code approach and supports the API of most well-known ecosystems – Terraform. He, like Kubernetes, applies the concept of infrastructure as code, but not to containerization, but to virtual machines (servers, networks, disks). The Infrastructure as Code principle implies a declarative configuration – that is, a description of the result without explicitly specifying the actions themselves. Upon activation, the configuration (in Kubernetes it is kubectl apply -f name_config .yml , and in Hashicorp Terraform it is terraform apply ) of the system is brought into line with the configuration files, when the configuration or infrastructure changes, the infrastructure in the conflicting parts is brought into line with its declaration, when the system itself decides how to achieve this, and the behavior can be different, for example, when the meta information in the POD changes, it will be changed, and when the image changes, the POD will be deleted and created as a new one. If, before that, we created the server infrastructure for containers in an imperative form using the gcloud command of the Google Cloud Platform (GCP) public cloud, now we will consider how to create a similar configuration using the configuration in the declarative description of the pattern infrastructure as code using the universal Terraform tool that supports cloud GCP.
Terraform did not appear out of nowhere, but became a continuation of the long history of the emergence of software products for configuring and managing server infrastructure, I will list in the order of appearance and transition:
** CFN;
** Pupet;
** Chef;
** Ansible;
** Cloud AWS API, Kubernetes API;
* IasC: Terraform does not depend on the type of infrastructure (it supports more than 120 providers, including not only clouds), in contrast to the bucket counterparts that support only themselves: CloudFormation for Amazon WEB Service, Azure Resource Manager for Microsoft Azure, Google Cloud Deployment Manager from Google Cloud Engine.
CloudFormation is built by Amazon and is intended to be worthless, and is also fully integrated into the CI / CD of its infrastructure hosted on AWS S3, which makes GIT versioning difficult. We will consider a platform independent Terraform: the syntax of the basic functionality is the same, and the specific one is connected through the Providers entities (https://www.terraform.io/docs/providers/index.html). Terraform is one binary file, supports a huge number of providers, and of course AWS and GCE. Terraform, like most products from Hashicorp, is written in Go and is a single binary executable file, does not require installation, you just need to download it to the Linux folder:
(agile-aleph-203917) $ wget https://releases.hashicorp.com/terraform/0.11.13/terraform_0.11.13_linux_amd64.zip
(agile-aleph-203917) $ unzip terraform_0.11.13_linux_amd64.zip -d.
(agile-aleph-203917) $ rm -f terraform_0.11.13_linux_amd64.zip
(agile-aleph-203917) $ ./terraform version
Terraform v0.11.13
It supports splitting into modules that you can write yourself or use ready-made ones (https://registry.terraform.io/browse?offset=27&provider=google). To orchestrate and support changes in dependencies, you can use Terragrunt (https://davidbegin.github.io/terragrunt/), for example:
terragrant = {
terraform {
source = "terraform-aws-modules / …"
}
dependencies {
path = ["..network"]
}
}
name = "…"
ami = "…"
instance_type = "t3.large"
Unified semantics for different providers (AWS, GCE, Yandex. Cloud and many others) configurations, which allows you to create a transcendental infrastructure, for example, permanently loaded services are located to save on their own capacities, and are variably loaded (for example, during the promotional period) in public clouds … Due to the fact that management is declarative and can be described by files (IaC, infrastructure as code), the creation of infrastructure can be added to the CI / CD pipeline (development, testing, delivery, everything is automatic and with version control). Without CI / CD, config file locking is supported to prevent concurrent editing when working together. the infrastructure is not created by a script, but is brought into conformity with the configuration, which is declarative and cannot contain logic, although it is possible to inject BASH scripts into it and use Conditions (term operator) for different environments.
Terraform will read all files in the current directory with a .tf extension in the Hachicort Configuraiton Language (HCL) format or .tf format . json in JSON format. Often, instead of one file, it is divided into several, at least two: the first containing the configuration, the second – private data in variables.
To demonstrate Terraform's capabilities, we will create a GitHub repository due to its ease of authorization and API. First, we get a token generated in the WEB interface: SettingsDeveloper sittings -> Personal access token -> Generate new token and setting permissions. We will not create anything, just check the connection:
(agile-aleph-203917) $ ls * .tf
main.tf variables.tf
$ cat variables.tf
variable "github_token" {
default = "630bc9696d0b2f4ce164b1cabb118eaaa1909838"
}
$ cat main.tf
provider "github" {
token = "$ {var.github_token}"
}
(agile-aleph-203917) $ ./terraform init
(agile-aleph-203917) $ ./terraform apply
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Now, let's create a manager account Settings -> Organizations -> New organization -> Create organization. … Using: Terraform Repository API www.terraform.io/docs/providers/github/r/repository. html add a description of the repository to the config:
(agile-aleph-203917) $ cat main.tf
provider "github" {
token = "$ {var.github_token}"
}
resource "github_repository" "terraform_repo" {
name = "terraform-repo"
description = "my terraform repo"
auto_init = true
}
Now it remains to apply, look at the plan for creating a repository, agree with it:
(agile-aleph-203917) $ ./terraform apply
provider.github.organization
The GitHub organization name to manage.
Enter a value: essch2
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
+ github_repository.terraform_repo
id:
allow_merge_commit: "true"
allow_rebase_merge: "true"
allow_squash_merge: "true"
archived: "false"
auto_init: "true"
default_branch:
description: "my terraform repo"
etag:
full_name:
git_clone_url:
html _url:
http_clone_url:
name: "terraform-repo"
ssh_clone_url:
svn_url:
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
github_repository.terraform_repo: Creating …
allow_merge_commit: "" => "true"
allow_rebase_merge: "" => "true"
allow_squash_merge: "" => "true"
archived: "" => "false"
auto_init: "" => "true"
default_branch: "" => "
description: "" => "my terraform repo"
etag: "" => "
full_name: "" => "
git_clone_url: "" => "
html_url: "" => "
http_clone_url: "" => "
name: "" => "terraform-repo"
ssh_clone_url: "" => "
svn_url: "" => "
github_repository.terraform_repo: Creation complete after 4s (ID: terraform-repo)
Apply complete! Resources: 1 added, 0 changed, 0 destroyed
Now you can see an empty terraform-repo repository in the WEB interface. Reapplying will not create a repository because Terraform only applies the changes that weren't:
(agile-aleph-203917) $ ./terraform apply
provider.github.organization
The GITHub organization name to manage.
Enter a value: essch2
github_repository.terraform_repo: Refreshing state … (ID: terraform-repo)
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
But if I change the name, then Terraform will try to apply the changes to the name by deleting and creating a new one with the current name. It is important to note that any data that we would push into this repository after the name change would be deleted. To check how updates will be performed, you can first ask for a list of actions to be performed with the command ./Terraform plane . And so, let's get started:
(agile-aleph-203917) $ cat main.tf
provider "github" {
token = "$ {var.github_token}"
}
resource "github_repository" "terraform_repo" {
name = "terraform-repo2"
description = "my terraform repo"
auto_init = true
}
(agile-aleph-203917) $ ./terraform plan
provider.github.organization
The GITHub organization name to manage.
Enter a value: essch
Refreshing Terraform state in-memory prior to plan …
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
github_repository.terraform_repo: Refreshing state … (ID: terraform-repo)
–– –
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
+ github_repository.terraform_repo
id:
allow_merge_commit: "true"
allow_rebase_merge: "true"
allow_squash_merge: "true"
archived: "false"
auto_init: "true"
default_branch:
description: "my terraform repo"
etag:
full_name:
git_clone_url:
html_url:
http_clone_url:
name: "terraform-repo2"
ssh_clone_url:
svn_url:
"terraform apply" is subsequently run.
esschtolts @ cloudshell: ~ / terraform (agile-aleph-203917) $ ./terraform apply
provider.github.organization
The GITHub organization name to manage.
Enter a value: essch2
github_repository.terraform_repo: Refreshing state … (ID: terraform-repo)
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
– / + destroy and then create replacement
Terraform will perform the following actions:
– / + github_repository.terraform_repo (new resource required)
id: "terraform-repo" =>
allow_merge_commit: "true" => "true"
allow_rebase_merge: "true" => "true"
allow_squash_merge: "true" => "true"
archived: "false" => "false"
auto_init: "true" => "true"
default_branch: "master" =>
description: "my terraform repo" => "my terraform repo"
etag: "W / \" a92e0b300d8c8d2c869e5f271da6c2ab \ "" =>
full_name: "essch2 / terraform-repo" =>
git_clone_url: "git: //github.com/essch2/terraform-repo.git" =>
html_url: "https://github.com/essch2/terraform-repo" =>
http_clone_url: "https://github.com/essch2/terraform-repo.git" =>
name: "terraform-repo" => "terraform-repo2" (forces new resource)
ssh_clone_url: "git@github.com: essch2 / terraform-repo.git" =>
svn_url: "https://github.com/essch2/terraform-repo" =>
Plan: 1 to add, 0 to change, 1 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
github_repository.terraform_repo: Destroying … (ID: terraform-repo)
github_repository.terraform_repo: Destruction complete after 0s
github_repository.terraform_repo: Creating …
allow_merge_commit: "" => "true"
allow_rebase_merge: "" => "true"
allow_squash_merge: "" => "true"
archived: "" => "false"
auto_init: "" => "true"
default_branch: "" => "
description: "" => "my terraform repo"
etag: "" => "
full_name: "" => "
git_clone_url: "" => "
html_url: "" => "
http_clone_url: "" => "
name: "" => "terraform-repo2"
ssh_clone_url: "" => "
svn_url: "" => "
github_repository.terraform_repo: Creation complete after 5s (ID: terraform-repo2)
Apply complete! Resources: 1 added, 0 changed, 1 destroyed.
For reasons of clarity, I created a big security hole – I put the token in the configuration file, and therefore in the repository, and now anyone who can access it can delete all repositories. Terraform provides several ways to set variables besides the one used. I'll just recreate the token and override it with the one passed on the command line:
(agile-aleph-203917) $ rm variables.tf
(agile-aleph-203917) $ sed -i 's / terraform-repo2 / terraform-repo3 /' main.tf
./terraform apply -var = "github_token = f7602b82e02efcbae7fc915c16eeee518280cf2a"
Building infrastructure in GCP with Terraform
Each cloud has its own set of services, its own APIs for them. To simplify the transition from one cloud for both employees in terms of learning and rewriting, there are universal libraries that implement the Facade pattern. A facade is understood as a universal API that disrupts the features of the systems behind it.
One representative of the cloud API facades is KOPS. KOPS is a tool for deploying Kubernetes to GCP, AWS and Azure. KOPS is similar to Kubectl – it is a binary, it can create both commands and the YML config, has a similar syntax, but unlike Kubectl, it creates not a POD, but a cluster node. Another example is Terraform, which specializes in deployment by configuration to adhere to the IasC concept.
To create the infrastructure, we need a token, it is created in GCP for the service account to which access is issued. To do this, I went along the path: IAM and administration -> Service accounts -> Create a service account and upon creation I dropped the Owner role (full access for test purposes), created a key with the Create key button in JSON format and renamed the downloaded key to Key. JSON. To describe the infrastructure, I used the documentation www.terraform.io/docs/providers/google/index.html :
(agil7e-aleph-20391) $ cat main.tf
provider "google" {
credentials = "$ {file (" key.json ")}"
project = "agile-aleph-203917"
region = "us-central1"
}
resource "google_compute_instance" "terraform" {
name = "terraform"
machine_type = "n1-standard-1"
zone = "us-central1-a"
boot_disk {
initialize_params {
image = "debian-cloud / debian-9"
}
}
network_interface {
network = "default"
}
}
Let's check the user rights:
(agile-aleph-203917) $ gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* esschtolts@gmail.com
To set the active account, run:
$ gcloud config set account `ACCOUNT`
Let's select the project as the current one (you can create the current one by default):
$ gcloud config set project agil7e-aleph-20391;
(agil7e-aleph-20391) $ ./terraform init | grep success
Terraform has been successfully initialized!
Now let's create one instance in the WEB console, after copying the key to the key.json file in the Terraform directory:
machine_type: "" => "n1-standard-1"
metadata_fingerprint: "" => "
name: "" => "terraform"
network_interface. #: "" => "1"
network_interface.0.address: "" => "
network_interface.0.name: "" => "
network_interface.0.network: "" => "default"
network_interface.0.network_ip: "" => "
network_interface.0.network: "" => "default"
project: "" => "
scheduling. #: "" => "
self_link: "" => "
tags_fingerprint: "" => "
zone: "" => "us-central1-a"
google_compute_instance.terraform: Still creating … (10s elapsed)
google_compute_instance.terraform: Still creating … (20s elapsed)
google_compute_instance.terraform: Still creating … (30s elapsed)
google_compute_instance.terraform: Still creating … (40s elapsed)
google_compute_instance.terraform: Creation complete after 40s (ID: terraform)
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
That's it, we have created a server instance. Now let's remove it:
~ / terraform (agil7e-aleph-20391) $ ./terraform apply
google_compute_instance.terraform: Refreshing state … (ID: terraform)
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
– destroy
Terraform will perform the following actions:
– google_compute_instance.terraform
Plan: 0 to add, 0 to change, 1 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
google_compute_instance.terraform: Destroying … (ID: terraform)
google_compute_instance.terraform: Still destroying … (ID: terraform, 10s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 20s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 30s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 40s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 50s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 1m0s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 1m10s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 1m20s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 1m30s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 1m40s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 1m50s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 2m0s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 2m10s elapsed)
google_compute_instance.terraform: Still destroying … (ID: terraform, 2m20s elapsed)
google_compute_instance.terraform: Destruction complete after 2m30s
Apply complete! Resources: 0 added, 0 changed, 1 destroyed.
Building infrastructure in AWS
To create an AWS cluster configuration, create a separate folder for it, and the previous one in a parallel one:
esschtolts @ cloudshell: ~ / terraform (agil7e-aleph-20391) $ mkdir gcp
esschtolts @ cloudshell: ~ / terraform (agil7e-aleph-20391) $ mv main.tf gcp / main.tf
esschtolts @ cloudshell: ~ / terraform (agil7e-aleph-20391) $ mkdir aws
esschtolts @ cloudshell: ~ / terraform (agil7e-aleph-20391) $ cd aws
Role is an analogue of a user, only not for people, but for services such as AWS, and in our case, these are EKS servers. But I do not see users as an analogue of roles, but groups, for example, a group for creating a cluster, a group for working with a database, etc. Only one role can be assigned to a server, and a role can contain multiple rights (Polices). As a result, we do not need to work with logins and passwords, or with tokens, or with certificates: store, transfer, restrict access, transfer – we only indicate in the WEB toolbar (IMA) or using the API (and derivatively in the configuration) the rights … Our cluster needs these rights in order for it to self-configure and replicate as it consists of standard AWS services. To manage the components of the AWS EC2 cluster (server), AWS ELB (Elastic Load Balancer, balancer) and AWS KMS (Key Management Service, key manager and encryption), you need AmazonEKSClusterPolicy access, to monitor AmazonEKSServicePolicy using CloudWatch Logs components (monitoring by logs) , Route 53 (creating a network in the zone), IAM (rights management). I did not describe the role in the config and created it through IAM according to the documentation: https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role. html # create-service-role.
For greater reliability, the nodes of the Kubernetes cluster should be located in different zones, that is, data centers. Each region contains several zones to maintain fault tolerance, while maintaining minimal letency (server response time) for the local population. It is important to note that some regions may be represented in several copies within the same country, for example, US-east-1 in US East (N. Virginia) and US-east-2 in US East (Ohio) – regions are designated in numbers. So far, the creation of an EKS cluster is available only to the US-east zone.
The VPC for the developer, at its simplest, boils down to naming a subnet as a specific resource.
Let's write the configuration according to the documentation www.terraform.io/docs/providers/aws/r/eks_cluster. html :
esschtolts @ cloudshell: ~ / terraform / aws (agile-aleph-203917) $ cat main.tf
provider "aws" {
access_key = "$ {var.token}"
secret_key = "$ {var.key}"
region = "us-east-1"
}
# Params
variable "token" {
default = ""
}
variable "key" {
default = ""
}
# EKS
resource "aws_eks_cluster" "example" {
enabled_cluster_log_types = ["api", "audit"]
name = "exapmle"
role_arn = "arn: aws: iam :: 177510963163: role / ServiceRoleForAmazonEKS2"
vpc_config {
subnet_ids = ["$ {aws_subnet.subnet_1.id}", "$ {aws_subnet.subnet_2.id}"]
}
}
output "endpoint" {
value = "$ {aws_eks_cluster.example.endpoint}"
}
output "kubeconfig-certificate-authority-data" {
value = "$ {aws_eks_cluster.example.certificate_authority.0.data}"
}
# Role
data "aws_iam_policy_document" "eks-role-policy" {
statement {
actions = ["sts: AssumeRole"]
principals {
type = "Service"
identifiers = ["eks.amazonaws.com"]
}
}
}
resource "aws_iam_role" "tf_role" {
name = "tf_role"
assume_role_policy = "$ {data.aws_iam_policy_document.eks-role-policy.json}"
tags = {
tag-key = "tag-value"
}
}
resource "aws_iam_role_policy_attachment" "attach-cluster" {
role = "tf_role"
policy_arn = "arn: aws: iam :: aws: policy / AmazonEKSClusterPolicy"
}
resource "aws_iam_role_policy_attachment" "attach-service" {
role = "tf_role"
policy_arn = "arn: aws: iam :: aws: policy / AmazonEKSServicePolicy"
}
# Subnet
resource "aws_subnet" "subnet_1" {
vpc_id = "$ {aws_vpc.main.id}"
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "Main"
}
}
resource "aws_subnet" "subnet_2" {
vpc_id = "$ {aws_vpc.main.id}"
cidr_block = "10.0.2.0/24"
availability_zone = "us-east-1b"
tags = {
Name = "Main"
}
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
}
After 9 minutes 44 seconds, I got a ready-made self-supporting infrastructure for a Kubernetes cluster:
esschtolts @ cloudshell: ~ / terraform / aws (agile-aleph-203917) $ ./../terraform apply -var = "token = AKIAJ4SYCNH2XVSHNN3A" -var = "key = huEWRslEluynCXBspsul3AkKlin1ViR9 + Mo
Now let's delete (it took me 10 minutes 23 seconds):
esschtolts @ cloudshell: ~ / terraform / aws (agile-aleph-203917) $ ./../terraform destroy -var = "token = AKIAJ4SYCNH2XVSHNN3A" -var = "key = huEWRslEluynCXBspsul3AkKlin1ViR9 + Mo
Destroy complete! Resources: 7 destroyed.
Establishing the CI / CD process
Amazon provides (aws.amazon.com/ru/devops/) a wide range of DevOps tools designed in a cloud infrastructure:
* AWS Code Pipeline – the service allows you to create a chain of stages from a set of services in a visual editor, through which the code must go before it goes to production, for example, assembly and testing.
* AWS Code Build – the service provides an auto-scaling build queue, which may be required for compiled programming languages, when adding features or making changes requires a long re-compilation of the entire application, when using one server it becomes a bottleneck when rolling out the changes.
* AWS Code Deploy – Automates deployment and rollback in case of errors.
* AWS CodeStar – the service combines the main features of the previous services.
Setting up remote control
artifact server
aws s3 ls s3: // name_backet aws s3 sync s3: // name_backet name_fonder –exclude * .tmp # files from the bucket will be downloaded to the folder, for example, a website
Now, we need to download the AWS plugin:
esschtolts @ cloudshell: ~ / terraform / aws (agile-aleph-203917) $ ./../terraform init | grep success
Terraform has been successfully initialized!
Now we need to get access to AWS, for that we click on the name of your user in the header of the WEB interface, in addition to My account , the My Security Credentials item will appear , by selecting which, we go to Access Key -> Create New Access Key . Let's create EKS (Elastic Kuberntes Service):
esschtolts @ cloudshell: ~ / terraform / aws (agile-aleph-203917) $ ./../terraform apply
–var = "token = AKIAJ4SYCNH2XVSHNN3A" -var = "key = huEWRslEluynCXBspsul3AkKlinAlR9 + MoU1ViY7"
Delete everything:
$ ../terraform destroy
Creating a cluster in GCP
node pool – combining nodes into a cluster with
resource "google_container_cluster" "primary" {
name = "tf"
location = "us-central1"
$ cat main.tf # configuration state
terraform {
required_version = "> 0.10.0"
}
terraform {
backend "s3" {
bucket = "foo-terraform"
key = "bucket / terraform.tfstate"
region = "us-east-1"
encrypt = "true"
}
}
$ cat cloud.tf # cloud configuration
provider "google" {
token = "$ {var.hcloud_token}"
}
$ cat variables.tf # variables and getting tokens
variable "hcloud_token" {}
$ cat instances.tf # create resources
resource "hcloud_server" "server" {....
$ terraform import aws_acm_certificate.cert arn: aws: acm: eu-central-1: 123456789012: certificate / 7e7a28d2-163f-4b8f-b9cd-822f96c08d6a
$ terraform init # Initialize configs
$ terraform plan # Check actions
$ terraform apply # Running actions
Debugging:
essh @ kubernetes-master: ~ / graylog $ sudo docker run –name graylog –link graylog_mongo: mongo –link graylog_elasticsearch: elasticsearch \
–p 9000: 9000 -p 12201: 12201 -p 1514: 1514 \
–e GRAYLOG_HTTP_EXTERNAL_URI = "http://127.0.0.1:9000/" \
–d graylog / graylog: 3.0
0f21f39192440d9a8be96890f624c1d409883f2e350ead58a5c4ce0e91e54c9d
docker: Error response from daemon: driver failed programming external connectivity on endpoint graylog (714a6083b878e2737bd4d4577d1157504e261c03cb503b6394cb844466fb4781): Bind for 0.0.0.0:9000 failed: port is already allocated.
essh @ kubernetes-master: ~ / graylog $ sudo netstat -nlp | grep 9000
tcp6 0 0 ::: 9000 ::: * LISTEN 2505 / docker-proxy
essh @ kubernetes-master: ~ / graylog $ docker rm graylog
graylog
essh @ kubernetes-master: ~ / graylog $ sudo docker run –name graylog –link graylog_mongo: mongo –link graylog_elasticsearch: elasticsearch \
–p 9001: 9000 -p 12201: 12201 -p 1514: 1514 \
–e GRAYLOG_HTTP_EXTERNAL_URI = "http://127.0.0.1:9001/" \
–d graylog / graylog: 3.0
e5aefd6d630a935887f494550513d46e54947f897e4a64b0703d8f7094562875
https://blog.maddevs.io/terrafom-hetzner-a2f22534514b
For example, let's create one instance:
$ cat aws / provider.tf
provider "aws" {
region = "us-west-1"
}
resource "aws_instance" "my_ec2" {
ami = "$ {data.aws_ami.ubuntu.id}"
instance_type = "t2.micro"
}
$ cd aws
$ aws configure
$ terraform init
$ terraform apply –auto-approve
$ cd ..
provider "aws" {
region = "us-west-1"
}
resource "aws_sqs_queue" "terraform_queue" {
name = "terraform-queue"
delay_seconds = 90
max_message_size = 2048
message_retention_seconds = 86400
receive_wait_time_seconds = 10
}
data "aws_route53_zone" "vuejs_phalcon" {
name = "test.com."
private_zone = true
}
resource "aws_route53_record" "www" {
zone_id = "$ {data.aws_route53_zone.vuejs_phalcon.zone_id}"
name = "www. $ {data.aws_route53_zone.selected.name}"
type = "A"
ttl = "300"
records = ["10.0.0.1"]
}
resource "aws_elasticsearch_domain" "example" {
domain_name = "example"
elasticsearch_version = "1.5"
cluster_config {
instance_type = "r4.large.elasticsearch"
}
snapshot_options {
automated_snapshot_start_hour = 23
}
}
resource "aws_eks_cluster" "eks_vuejs_phalcon" {
name = "eks_vuejs_phalcon"
role_arn = "$ {aws_iam_role.eks_vuejs_phalcon.arn}"
vpc_config {
subnet_ids = ["$ {aws_subnet.eks_vuejs_phalcon.id}", "$ {aws_subnet.example2.id}"]
}
}
output "endpoint" {
value = "$ {aws_eks_cluster.eks_vuejs_phalcon.endpoint}"
}
output "kubeconfig-certificate-authority-data" {
value = "$ {aws_eks_cluster.eks_vuejs_phalcon.certificate_authority.0.data}"
}
provider "google" {
credentials = "$ {file (" account.json ")}"
project = "my-project-id"
region = "us-central1"
}
resource "google_container_cluster" "primary" {
name = "my-gke-cluster"
location = "us-central1"
remove_default_node_pool = true
initial_node_count = 1
master_auth {
username = ""
password = ""
}
}
output "client_certificate" {
value = "$ {google_container_cluster.primary.master_auth.0.client_certificate}"
}
output "client_key" {
value = "$ {google_container_cluster.primary.master_auth.0.client_key}"
}
output "cluster_ca_certificate" {
value = "$ {google_container_cluster.primary.master_auth.0.cluster_ca_certificate}"
}
$ cat deployment.yml
apiVersion: apps / v1
kind: Deployment
metadata:
name: phalcon_vuejs
namespace: development
spec:
selector:
matchLabels:
app: vuejs
replicas: 1
template:
metadata:
labels:
app: vuejs
spec:
initContainers:
– name: vuejs_build
image: vuejs / ci
volumeMounts:
– name: app
mountPath: / app / public
command:
– / bin / bash
– -c
– |
cd / app / public
git clone essch / vuejs_phalcon: 1.0.
npm test
npm build
containers:
– name: healtcheck
image: mileschou / phalcon: 7.2-cli
args:
– / bin / sh
– -c
– cd / usr / src / app && git clone essch / app_phalcon: 1.0 && touch / tmp / healthy && sleep 10 && php script.php
readinessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 15
periodSeconds: 5
voumes:
– name: app
emptyDir: {}
So we created an AWS EC2 instance. We omitted specifying the keys because the AWS API is already authorized and this authorization will be used by Terraform.
Also, for code use, Terraform supports variables, data, and modules.
Let's create a separate network:
resource "aws_vpc" "my_vpc" {
cidr_block = "190.160.0.0/16"
instance_target = "default"
}
resource "aws_subnet" "my_subnet" {
vpc_id = "$ {aws_vpc.my_vpc.id}"
cidr_block = "190.160.1.0/24"
}
$ cat gce / provider.tf
provider "google" {
credentials = "$ {file (" account.json ")}"
project = "my-project-id"
region = "us-central1"
}
resource "google_compute_instance" "default" {
name = "test"
machine_type = "n1-standard-1"
zone = "us-central1-a"
}
$ cd gce
$ terraform init
$ terraform apply
$ cd ..
For distributed work, let's put the state in AWS S3 the state of the infrastructure (you can also put other data), but for security in a different region:
terraform {
backend "s3" {
bucket = "tfstate"
key = "terraform.tfstate"
region = "us-state-2"
}
}
provider "kubernetes" {
host = "https://104.196.242.174"
username = "ClusterMaster"
password = "MindTheGap"
}
resource "kubernetes_pod" "my_pod" {
spec {
container {
image = "Nginx: 1.7.9"
name = "Nginx"
port {
container_port = 80
}
}
}
}
Commands:
terraform init # downloading dependencies according to configs, checking them
terraform validate # syntax check
terraform plan # to see in detail how the infrastructure will be changed and why exactly so, for example,
whether only the service meta information will be changed or the service itself will be re-created, which is often unacceptable for databases.
terraform apply # applying changes
The common part for all providers is the core.
$ which aws
$ aws fonfigure # https://www.youtube.com/watch?v=IxA1IPypzHs
$ cat aws.tf
# https://www.terraform.io/docs/providers/aws/r/instance.html
resource "aws_instance" "ec2instance" {
ami = "$ {var.ami}"
instance_type = "t2.micro"
}
resource "aws_security_group" "instance_gc" {
…
}
$ cat run.js
export AWS_ACCESS_KEY_ID = "anaccesskey"
export AWS_SECRET_ACCESS_KEY = "asecretkey"
export AWS_DEFAULT_REGION = "us-west-2"
terraform plan
terraform apply
$ cat gce.tf # https://www.terraform.io/docs/providers/google/index.html#
# Google Cloud Platform Provider
provider "google" {
credentials = "$ {file (" account.json ")}"
project = "phalcon"
region = "us-central1"
}
#https: //www.terraform.io/docs/providers/google/r/app_engine_application.html
resource "google_project" "my_project" {
name = "My Project"
project_id = "your-project-id"
org_id = "1234567"
}
resource "google_app_engine_application" "app" {
project = "$ {google_project.my_project.project_id}"
location_id = "us-central"
}
# google_compute_instance
resource "google_compute_instance" "default" {
name = "test"
machine_type = "n1-standard-1"
zone = "us-central1-a"
tags = ["foo", "bar"]
boot_disk {
initialize_params {
image = "debian-cloud / debian-9"
}
}
// Local SSD disk
scratch_disk {
}
network_interface {
network = "default"
access_config {
// Ephemeral IP
}
}
metadata = {
foo = "bar"
}
metadata_startup_script = "echo hi> /test.txt"
service_account {
scopes = ["userinfo-email", "compute-ro", "storage-ro"]
}
}
Extensibility using an external resource, which can be a BASH script:
data "external" "python3" {
program = ["Python3"]
}
Building a cluster of machines with Terraform
Clustering with Terraform is covered in Building Infrastructure in GCP. Now let's pay more attention to the cluster itself, and not to the tools for creating it. I will create a project through the GCE admin panel (displayed in the interface header) node-cluster. I downloaded the key for Kubernetes IAM and administration -> Service accounts -> Create a service account and when creating it, I selected the Owner role and put it in a project called kubernetes_key.JSON:
eSSH @ Kubernetes-master: ~ / node-cluster $ cp ~ / Downloads / node-cluster-243923-bbec410e0a83.JSON ./kubernetes_key.JSON
Downloaded terraform:
essh @ kubernetes-master: ~ / node-cluster $ wget https://releases.hashicorp.com/terraform/0.12.2/terraform_0.12.2_linux_amd64.zip> / dev / null 2> / dev / null
essh @ kubernetes-master: ~ / node-cluster $ unzip terraform_0.12.2_linux_amd64.zip && rm -f terraform_0.12.2_linux_amd64.zip
Archive: terraform_0.12.2_linux_amd64.zip
inflating: terraform
essh @ kubernetes-master: ~ / node-cluster $ ./terraform version
Terraform v0.12.2
Added the GCE provider and started downloading the "drivers" to it:
essh @ kubernetes-master: ~ / node-cluster $ cat main.tf
provider "google" {
credentials = "$ {file (" kubernetes_key.json ")}"
project = "node-cluster"
region = "us-central1"
} essh @ kubernetes-master: ~ / node-cluster $ ./terraform init
Initializing the backend …
Initializing provider plugins …
– Checking for available provider plugins …
– Downloading plugin for provider "google" (terraform-providers / google) 2.8.0 …
The following providers do not have any version constraints in configuration,
so the latest version was installed.
To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "…" constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.
* provider.google: version = "~> 2.8"
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Add a virtual machine:
essh @ kubernetes-master: ~ / node-cluster $ cat main.tf
provider "google" {
credentials = "$ {file (" kubernetes_key.json ")}"
project = "node-cluster-243923"
region = "europe-north1"
}
resource "google_compute_instance" "cluster" {
name = "cluster"
zone = "europe-north1-a"
machine_type = "f1-micro"
boot_disk {
initialize_params {
image = "debian-cloud / debian-9"
}
}
network_interface {
network = "default"
access_config {}
}
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform apply
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# google_compute_instance.cluster will be created
+ resource "google_compute_instance" "cluster" {
+ can_ip_forward = false
+ cpu_platform = (known after apply)
+ deletion_protection = false
+ guest_accelerator = (known after apply)
+ id = (known after apply)
+ instance_id = (known after apply)
+ label_fingerprint = (known after apply)
+ machine_type = "f1-micro"
+ metadata_fingerprint = (known after apply)
+ name = "cluster"
+ project = (known after apply)
+ self_link = (known after apply)
+ tags_fingerprint = (known after apply)
+ zone = "europe-north1-a"
+ boot_disk {
+ auto_delete = true
+ device_name = (known after apply)
+ disk_encryption_key_sha256 = (known after apply)
+ source = (known after apply)
+ initialize_params {
+ image = "debian-cloud / debian-9"
+ size = (known after apply)
+ type = (known after apply)
}
}
+ network_interface {
+ address = (known after apply)
+ name = (known after apply)
+ network = "default"
+ network_ip = (known after apply)
+ subnetwork = (known after apply)
+ subnetwork_project = (known after apply)
+ access_config {
+ assigned_nat_ip = (known after apply)
+ nat_ip = (known after apply)
+ network_tier = (known after apply)
}
}
+ scheduling {
+ automatic_restart = (known after apply)
+ on_host_maintenance = (known after apply)
+ preemptible = (known after apply)
+ node_affinities {
+ key = (known after apply)
+ operator = (known after apply)
+ values = (known after apply)
}
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
google_compute_instance.cluster: Creating …
google_compute_instance.cluster: Still creating … [10s elapsed]
google_compute_instance.cluster: Creation complete after 11s [id = cluster]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Add a public static IP address and SSH key to the node:
essh @ kubernetes-master: ~ / node-cluster $ ssh-keygen -f node-cluster
Generating public / private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in node-cluster.
Your public key has been saved in node-cluster.pub.
The key fingerprint is:
SHA256: vUhDe7FOzykE5BSLOIhE7Xt9o + AwgM4ZKOCW4nsLG58 essh @ kubernetes-master
The key's randomart image is:
+ – [RSA 2048] – +
| .o. +. |
| o. o. =. |
| * + o. =. |
| = *. … ... + o |
| B +. … S * |
| = + oo X +. |
| o. =. + = + |
| . = .... … |
| ..E. |
+ – [SHA256] – +
essh @ kubernetes-master: ~ / node-cluster $ ls node-cluster.pub
node-cluster.pub
essh @ kubernetes-master: ~ / node-cluster $ cat main.tf
provider "google" {
credentials = "$ {file (" kubernetes_key.json ")}"
project = "node-cluster-243923"
region = "europe-north1"
}
resource "google_compute_address" "static-ip-address" {
name = "static-ip-address"
}
resource "google_compute_instance" "cluster" {
name = "cluster"
zone = "europe-north1-a"
machine_type = "f1-micro"
boot_disk {
initialize_params {
image = "debian-cloud / debian-9"
}
}
metadata = {
ssh-keys = "essh: $ {file (" ./ node-cluster.pub ")}"
}
network_interface {
network = "default"
access_config {
nat_ip = "$ {google_compute_address.static-ip-address.address}"
}
}
} essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform apply
Let's check the SSH connection to the server:
essh @ kubernetes-master: ~ / node-cluster $ ssh -i ./node-cluster essh@35.228.82.222
The authenticity of host '35 .228.82.222 (35.228.82.222) 'can't be established.
ECDSA key fingerprint is SHA256: o7ykujZp46IF + eu7SaIwXOlRRApiTY1YtXQzsGwO18A.
Are you sure you want to continue connecting (yes / no)? yes
Warning: Permanently added '35 .228.82.222 '(ECDSA) to the list of known hosts.
Linux cluster 4.9.0-9-amd64 # 1 SMP Debian 4.9.168-1 + deb9u2 (2019-05-13) x86_64
The programs included with the Debian GNU / Linux system are free software;
the exact distribution terms for each program are described in the
individual files in / usr / share / doc / * / copyright.
Debian GNU / Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
essh @ cluster: ~ $ ls
essh @ cluster: ~ $ exit
logout
Connection to 35.228.82.222 closed.
Install packages:
essh @ kubernetes-master: ~ / node-cluster $ curl https://sdk.cloud.google.com | bash
essh @ kubernetes-master: ~ / node-cluster $ exec -l $ SHELL
essh @ kubernetes-master: ~ / node-cluster $ gcloud init
Let's choose a project:
You are logged in as: [esschtolts@gmail.com].
Pick cloud project to use:
[1] agile-aleph-203917
[2] node-cluster-243923
[3] essch
[4] Create a new project
Please enter numeric choice or text value (must exactly match list
item):
Please enter a value between 1 and 4, or a value present in the list: 2
Your current project has been set to: [node-cluster-243923].
Let's choose a zone:
[50] europe-north1-a
Did not print [12] options.
Too many options [62]. Enter "list" at prompt to print choices fully.
Please enter numeric choice or text value (must exactly match list
item):
Please enter a value between 1 and 62, or a value present in the list: 50
essh @ kubernetes-master: ~ / node-cluster $ PROJECT_I = "node-cluster-243923"
essh @ kubernetes-master: ~ / node-cluster $ echo $ PROJECT_I
node-cluster-243923
essh @ kubernetes-master: ~ / node-cluster $ export GOOGLE_APPLICATION_CREDENTIALS = $ HOME / node-cluster / kubernetes_key.json
essh @ kubernetes-master: ~ / node-cluster $ sudo docker-machine create –driver google –google-project $ PROJECT_ID vm01
sudo export GOOGLE_APPLICATION_CREDENTIALS = $ HOME / node-cluster / kubernetes_key.json docker-machine create –driver google –google-project $ PROJECT_ID vm01
// https://docs.docker.com/machine/drivers/gce/
// https://github.com/docker/machine/issues/4722
essh @ kubernetes-master: ~ / node-cluster $ gcloud config list
[compute]
region = europe-north1
zone = europe-north1-a
[core]
account = esschtolts@gmail.com
disable_usage_reporting = False
project = node-cluster-243923
Your active configuration is: [default]
Let's add copying the file and executing the script:
essh @ kubernetes-master: ~ / node-cluster $ cat main.tf
provider "google" {
credentials = "$ {file (" kubernetes_key.json ")}"
project = "node-cluster-243923"
region = "europe-north1"
}
resource "google_compute_address" "static-ip-address" {
name = "static-ip-address"
}
resource "google_compute_instance" "cluster" {
name = "cluster"
zone = "europe-north1-a"
machine_type = "f1-micro"
boot_disk {
initialize_params {
image = "debian-cloud / debian-9"
}
}
metadata = {
ssh-keys = "essh: $ {file (" ./ node-cluster.pub ")}"
}
network_interface {
network = "default"
access_config {
nat_ip = "$ {google_compute_address.static-ip-address.address}"
}
}
}
resource "null_resource" "cluster" {
triggers = {
cluster_instance_ids = "$ {join (", ", google_compute_instance.cluster. *. id)}"
}
connection {
host = "$ {google_compute_address.static-ip-address.address}"
type = "ssh"
user = "essh"
timeout = "2m"
private_key = "$ {file (" ~ / node-cluster / node-cluster ")}"
# agent = "false"
}
provisioner "file" {
source = "client.js"
destination = "~ / client.js"
}
provisioner "remote-exec" {
inline = [
"cd ~ && echo 1> test.txt"
]
}
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform apply
google_compute_address.static-ip-address: Creating …
google_compute_address.static-ip-address: Creation complete after 5s [id = node-cluster-243923 / europe-north1 / static-ip-address]
google_compute_instance.cluster: Creating …
google_compute_instance.cluster: Still creating … [10s elapsed]
google_compute_instance.cluster: Creation complete after 12s [id = cluster]
null_resource.cluster: Creating …
null_resource.cluster: Provisioning with 'file' …
null_resource.cluster: Provisioning with 'remote-exec' …
null_resource.cluster (remote-exec): Connecting to remote host via SSH …
null_resource.cluster (remote-exec): Host: 35.228.82.222
null_resource.cluster (remote-exec): User: essh
null_resource.cluster (remote-exec): Password: false
null_resource.cluster (remote-exec): Private key: true
null_resource.cluster (remote-exec): Certificate: false
null_resource.cluster (remote-exec): SSH Agent: false
null_resource.cluster (remote-exec): Checking Host Key: false
null_resource.cluster (remote-exec): Connected!
null_resource.cluster: Creation complete after 7s [id = 816586071607403364]
Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
esschtolts @ cluster: ~ $ ls / home / essh /
client.js test.txt
[sudo] password for essh:
google_compute_address.static-ip-address: Refreshing state … [id = node-cluster-243923 / europe-north1 / static-ip-address]
google_compute_instance.cluster: Refreshing state … [id = cluster]
null_resource.cluster: Refreshing state … [id = 816586071607403364]
Enter a value: yes
null_resource.cluster: Destroying … [id = 816586071607403364]
null_resource.cluster: Destruction complete after 0s
google_compute_instance.cluster: Destroying … [id = cluster]
google_compute_instance.cluster: Still destroying … [id = cluster, 10s elapsed]
google_compute_instance.cluster: Still destroying … [id = cluster, 20s elapsed]
google_compute_instance.cluster: Destruction complete after 27s
google_compute_address.static-ip-address: Destroying … [id = node-cluster-243923 / europe-north1 / static-ip-address]
google_compute_address.static-ip-address: Destruction complete after 8s
To deploy the entire project, you can add it to the repository, and we will upload it to the virtual machine by copying the installation script to this virtual machine and then launching it:
Moving on to Kubernetes
In the minimal version, creating a cluster of three nodes looks like this:
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ cat main.tf
provider "google" {
credentials = "$ {file (" ../ kubernetes_key.json ")}"
project = "node-cluster-243923"
region = "europe-north1"
}
resource "google_container_cluster" "node-ks" {
name = "node-ks"
location = "europe-north1-a"
initial_node_count = 3
}
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ sudo ../terraform init
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ sudo ../terraform apply
The cluster was created in 2:15, and after I added europe-north1-a two additional zones europe-north1 -b , europe-north1-c and set the number of created instances in the zone to one, the cluster was created in 3:13 seconds , because for higher availability, the nodes were created in different data centers: europe-north1-a , europe-north1-b , europe-north1-c :
provider "google" {
credentials = "$ {file (" ../ kubernetes_key.json ")}"
project = "node-cluster-243923"
region = "europe-north1"
}
resource "google_container_cluster" "node-ks" {
name = "node-ks"
location = "europe-north1-a"
node_locations = ["europe-north1-b", "europe-north1-c"]
initial_node_count = 1
}
Now let's split our cluster into two: the control cluster with Kubernetes and the cluster for our PODs. All clusters will be distributed over three data centers. The cluster for our PODs can auto scale under load up to 2 on each zone (from three to six in total):
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ cat main.tf
provider "google" {
credentials = "$ {file (" ../ kubernetes_key.json ")}"
project = "node-cluster-243923"
region = "europe-north1"
}
resource "google_container_cluster" "node-ks" {
name = "node-ks"
location = "europe-north1-a"
node_locations = ["europe-north1-b", "europe-north1-c"]
initial_node_count = 1
}
resource "google_container_node_pool" "node-ks-pool" {
name = "node-ks-pool"
cluster = "$ {google_container_cluster.node-ks.name}"
location = "europe-north1-a"
node_count = "1"
node_config {
machine_type = "n1-standard-1"
}
autoscaling {
min_node_count = 1
max_node_count = 2
}
}
Let's see what happened and look for the IP address of the cluster entry point:
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ gcloud container clusters list
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
node-ks europe-north1-a 1.12.8-gke.6 35.228.20.35 n1-standard-1 1.12.8-gke.6 6 RECONCILING
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ gcloud container clusters describe node-ks | grep '^ endpoint'
endpoint: 35.228.20.35
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ ping 35.228.20.35 -c 2
PING 35.228.20.35 (35.228.20.35) 56 (84) bytes of data.
64 bytes from 35.228.20.35: icmp_seq = 1 ttl = 59 time = 8.33 ms
64 bytes from 35.228.20.35: icmp_seq = 2 ttl = 59 time = 7.09 ms
–– 35.228.20.35 ping statistics –
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min / avg / max / mdev = 7.094 / 7.714 / 8.334 / 0.620 ms
By adding variables, which I selected in a separate file just for clarity, which parameterize our config for different uses, we can use it, for example, to create test and production clusters. Variables can be added as var.name_value , and inserted into the text similarly to JS: $ {var.name_value} , as well as path.root .
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ cat variables.tf
variable "region" {
default = "europe-north1"
}
variable "project_name" {
type = string
default = ""
}
variable "gce_key" {
default = "./kubernetes_key.json"
}
variable "node_count_zone" {
default = 1
}
They can be passed through the -var switch , for example: sudo ./terraform apply -var = "project_name = node-cluster-243923" .
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ cp ../kubernetes_key.json.
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ sudo ../terraform apply -var = "project_name = node-cluster-243923"
Our project in the folder is not only a project, but also a module ready to use:
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ cd ..
essh @ kubernetes-master: ~ / node-cluster $ cat main.tf
module "Kubernetes" {
source = "./Kubernetes"
project_name = "node-cluster-243923"
}
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform apply
Or upload to the public repository:
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ git init
Initialized empty GIT repository in /home/essh/node-cluster/Kubernetes/.git/
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ echo "terraform.tfstate" >> .gitignore
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ echo "terraform.tfstate.backup" >> .gitignore
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ echo ".terraform /" >> .gitignore
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ rm -f kubernetes_key.json
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ git remote add origin https://github.com/ESSch/terraform-google-kubernetes.git
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ git add.
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ git commit -m 'create a k8s Terraform module'
[master (root-commit) 4f73c64] create a Kubernetes Terraform module
3 files changed, 48 insertions (+)
create mode 100644 .gitignore
create mode 100644 main.tf
create mode 100644 variables.tf
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ git push -u origin master
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ git tag -a v0.0.2 -m 'publish'
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ git push origin v0.0.2
After publishing in the module registry https://registry.terraform.io/, having met the requirements such as having a description, we can use it:
essh @ kubernetes-master: ~ / node-cluster $ cat main.tf
module "kubernetes" {
# source = "./Kubernetes"
source = "ESSch / kubernetes / google"
version = "0.0.2"
project_name = "node-cluster-243923"
}
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform init
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform apply
On the next creation of the cluster, I got the error ZONE_RESOURCE_POOL_EXHAUSTED "does not have enough resources available to fulfill the request. Try a different zone, or try again later" , indicating that there are no required servers in this region. For me this is not a problem and I do not need to edit the module's code, because I parameterized the module with the region, and if I just pass the region region = "europe-west2" to the module as a parameter , terraform after the update and initialization command ./terrafrom init and the application command ./terraform apply will transfer my cluster to the specified region. Let's improve our module a little by moving the provider from the Kubernetes child module to the main module (the main script is also a module). Having brought it to the main module, we will be able to use one more module, otherwise the provider in one module will conflict with the provider in another. Inheritance from main module to child modules and their transparency applies only to providers. The rest of the data for transferring from child to parent will have to use external variables, and from parent to child – to parameterize the parent, but this is later, when we will create another model. Also, moving the provider to the parent module will be useful when creating the next module that we will create, since it will create Kubernetes elements that do not depend on the provider, and thus we can untie the Google provider from our module and can be used with other providers supporting Kubernetes. Now we don't need to pass the project name in the variable – it is set in the provider. For the convenience of development, I will use the local connection of the module for now. I created a folder and file for a new module:
essh @ kubernetes-master: ~ / node-cluster $ ls nodejs /
main.tf
essh @ kubernetes-master: ~ / node-cluster $ cat main.tf
// module "kubernetes" {
// source = "ESSch / kubernetes / google"
// version = "0.0.2"
//
// project_name = "node-cluster-243923"
// region = "europe-west2"
//}
provider "google" {
credentials = "$ {file (" ./ kubernetes_key.json ")}"
project = "node-cluster-243923"
region = "europe-west2"
}
module "Kubernetes" {
source = "./Kubernetes"
project_name = "node-cluster-243923"
region = "europe-west2"
}
module "nodejs" {
source = "./nodejs"
}
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform init
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform apply
Now let's transfer data from the Kubernetes infrastructure module to the application module:
essh @ kubernetes-master: ~ / node-cluster $ cat Kubernetes / outputs.tf
output "endpoint" {
value = google_container_cluster.node-ks.endpoint
sensitive = true
}
output "name" {
value = google_container_cluster.node-ks.name
sensitive = true
}
output "cluster_ca_certificate" {
value = base64decode (google_container_cluster.node-ks.master_auth.0.cluster_ca_certificate)
}
essh @ kubernetes-master: ~ / node-cluster $ cat main.tf
// module "kubernetes" {
// source = "ESSch / kubernetes / google"
// version = "0.0.2"
//
// project_name = "node-cluster-243923"
// region = "europe-west2"
//}
provider "google" {
credentials = file ("./ kubernetes_key.json")
project = "node-cluster-243923"
region = "europe-west2"
}
module "Kubernetes" {
source = "./Kubernetes"
project_name = "node-cluster-243923"
region = "europe-west2"
}
module "nodejs" {
source = "./nodejs"
endpoint = module.Kubernetes.endpoint
cluster_ca_certificate = module.Kubernetes.cluster_ca_certificate
}
essh @ kubernetes-master: ~ / node-cluster $ cat nodejs / variable.tf
variable "endpoint" {}
variable "cluster_ca_certificate" {}
To check the balancing of traffic from all nodes, start NGINX, replacing the standard page with the hostname. We'll replace it with a simple command call and resume the server. To start the server, let's look at its call in the Dockerfile: CMD ["Nginx", "-g", "daemon off;"] , which is equivalent to calling Nginx -g 'daemon off;' at the command line. As you can see, the Dockerfile does not use BASH as an environment for launching, but starts the server itself, which allows the shell to live in the event of a process crash, preventing the container from crashing and re-creating. But for our experiments, BASH is fine:
essh @ kubernetes-master: ~ / node-cluster $ sudo docker run -it Nginx: 1.17.0 which Nginx
/ usr / sbin / Nginx
sudo docker run -it –rm -p 8333: 80 Nginx: 1.17.0 / bin / bash -c "echo \ $ HOSTNAME> /usr/share/Nginx/html/index2.html && / usr / sbin / Nginx – g 'daemon off;' "
Now let's create our PODs in triplicate with NGINX, which Kubernetes will try to distribute to different servers by default. Let's also add the service as a balancer:
essh @ kubernetes-master: ~ / node-cluster $ cat nodejs / main.tf
terraform {
required_version = "> = 0.12.0"
}
data "google_client_config" "default" {}
provider "kubernetes" {
host = var.endpoint
token = data.google_client_config.default.access_token
cluster_ca_certificate = var.cluster_ca_certificate
load_config_file = false
}
essh @ kubernetes-master: ~ / node-cluster $ cat nodejs / main.tf
resource "kubernetes_deployment" "nodejs" {
metadata {
name = "terraform-nodejs"
labels = {
app = "NodeJS"
}
}
spec {
replicas = 3
selector {
match_labels = {
app = "NodeJS"
}
}
template {
metadata {
labels = {
app = "NodeJS"
}
}
spec {
container {
image = "Nginx: 1.17.0"
name = "node-js"
command = ["/ bin / bash"]
args = ["-c", "echo $ HOSTNAME> /usr/share/Nginx/html/index.html && / usr / sbin / Nginx -g 'daemon off;'"]
}
}
}
}
}
resource "kubernetes_service" "nodejs" {
metadata {
name = "terraform-nodejs"
}
spec {
selector = {
app = kubernetes_deployment.nodejs.metadata.0.labels.app
}
port {
port = 80
target_port = var.target_port
}
type = "LoadBalancer"
}
Let's check the work using kubectl, for this we transfer the secrets from gcloud to kubectl.
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform apply
essh @ kubernetes-master: ~ / node-cluster $ gcloud container clusters get-credentials node-ks –region = europe-west2-a
Fetching cluster endpoint and auth data.
kubeconfig entry generated for node-ks.
essh @ kubernetes-master: ~ / node-cluster $ kubectl get deployments -o wide
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
terraform-nodejs 3 3 3 3 25m node-js Nginx: 1.17.0 app = NodeJS
essh @ kubernetes-master: ~ / node-cluster $ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
terraform-nodejs-6bd565dc6c-8768b 1/1 Running 0 4m45s 10.4.3.15 gke-node-ks-node-ks-pool-07115c5b-bw15 none>
terraform-nodejs-6bd565dc6c-hr5vg 1/1 Running 0 4m42s 10.4.5.13 gke-node-ks-node-ks-pool-27e2e52c-9q5b none>
terraform-nodejs-6bd565dc6c-mm7lh 1/1 Running 0 4m43s 10.4.2.6 gke-node-ks-default-pool-2dc50760-757p none>
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ docker ps | grep node-js_terraform
152e3c0ed940 719cd2e3ed04
"/ bin / bash -c 'ech …" 8 minutes ago Up 8 minutes
Kubernetes_node-js_terraform-nodejs-6bd565dc6c-8768b_default_7a87ae4a-9379-11e9-a78e-42010a9a0114_0
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ docker exec -it 152e3c0ed940 cat /usr/share/Nginx/html/index.html
terraform-nodejs-6bd565dc6c-8768b
esschtolts @ gke-node-ks-node-ks-pool-27e2e52c-9q5b ~ $ docker exec -it c282135be446 cat /usr/share/Nginx/html/index.html
terraform-nodejs-6bd565dc6c-hr5vg
esschtolts @ gke-node-ks-default-pool-2dc50760-757p ~ $ docker exec -it 8d1cf9ef44e6 cat /usr/share/Nginx/html/index.html
terraform-nodejs-6bd565dc6c-mm7lh
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ curl 10.4.2.6
terraform-nodejs-6bd565dc6c-mm7lh
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ curl 10.4.5.13
terraform-nodejs-6bd565dc6c-hr5vg
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ curl 10.4.3.15
terraform-nodejs-6bd565dc6c-8768b
The Balancers load balance between PODs that are filtered by matching their selectors in the meta information and the Selector specified in the balancer description in the spec section . All nodes are connected to one common network, so you can connect to any node (I did this via SSH of the GCP WEB interface in the section with Compute Engine virtual machines). You can address both the IP address in the container or node host, and the host of the terraform-nodejs service in the terraform-NodeJS: 80 curl container , which is created by the internal DNS by the name of the service. You can view the external IP address EXTERNAL -IP both using kubectl at the service and using the web interface: GCP -> Kubernetes Engine -> Services:
essh @ kubernetes-master: ~ / node-cluster $ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE
kubernetes ClusterIP 10.7.240.1 none> 443 / TCP 6h58m
terraform-nodejs LoadBalancer 10.7.246.234 35.197.220.103 80: 32085 / TCP 5m27s
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ curl 10.7.246.234
terraform-nodejs-6bd565dc6c-mm7lh
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ curl 10.7.246.234
terraform-nodejs-6bd565dc6c-mm7lh
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ curl 10.7.246.234
terraform-nodejs-6bd565dc6c-hr5vg
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ curl 10.7.246.234
terraform-nodejs-6bd565dc6c-hr5vg
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ curl 10.7.246.234
terraform-nodejs-6bd565dc6c-8768b
esschtolts @ gke-node-ks-node-ks-pool-07115c5b-bw15 ~ $ curl 10.7.246.234
terraform-nodejs-6bd565dc6c-mm7lh
essh @ kubernetes-master: ~ / node-cluster $ curl 35.197.220.103
terraform-nodejs-6bd565dc6c-mm7lh
essh @ kubernetes-master: ~ / node-cluster $ curl 35.197.220.103
terraform-nodejs-6bd565dc6c-mm7lh
essh @ kubernetes-master: ~ / node-cluster $ curl 35.197.220.103
terraform-nodejs-6bd565dc6c-8768b
essh @ kubernetes-master: ~ / node-cluster $ curl 35.197.220.103
terraform-nodejs-6bd565dc6c-hr5vg
essh @ kubernetes-master: ~ / node-cluster $ curl 35.197.220.103
terraform-nodejs-6bd565dc6c-8768b
essh @ kubernetes-master: ~ / node-cluster $ curl 35.197.220.103
terraform-nodejs-6bd565dc6c-mm7lh
Now let's move on to implementing the NodeJS server:
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform destroy
essh @ kubernetes-master: ~ / node-cluster $ sudo ./terraform apply
essh @ kubernetes-master: ~ / node-cluster $ sudo docker run -it –rm node: 12 which node
/ usr / local / bin / node
sudo docker run -it –rm -p 8222: 80 node: 12 / bin / bash -c 'cd / usr / src / && git clone https://github.com/fhinkel/nodejs-hello-world.git &&
/ usr / local / bin / node /usr/src/nodejs-hello-world/index.js'
firefox http: // localhost: 8222
Let's replace the block in our container with:
container {
image = "node: 12"
name = "node-js"
command = ["/ bin / bash"]
args = [
"-c",
"cd / usr / src / && git clone https://github.com/fhinkel/nodejs-hello-world.git && / usr / local / bin / node /usr/src/nodejs-hello-world/index.js "
]
}
If you comment out a Kubernetes module, and it remains in the cache, it remains to remove the excess from the cache:
essh @ kubernetes-master: ~ / node-cluster $ ./terraform apply
Error: Provider configuration not present
essh @ kubernetes-master: ~ / node-cluster $ ./terraform state list
data.google_client_config.default
module.Kubernetes.google_container_cluster.node-ks
module.Kubernetes.google_container_node_pool.node-ks-pool
module.nodejs.kubernetes_deployment.nodejs
module.nodejs.kubernetes_service.nodejs
essh @ kubernetes-master: ~ / node-cluster $ ./terraform state rm module.nodejs.kubernetes_deployment.nodejs
Removed module.nodejs.kubernetes_deployment.nodejs
Successfully removed 1 resource instance (s).
essh @ kubernetes-master: ~ / node-cluster $ ./terraform state rm module.nodejs.kubernetes_service.nodejs
Removed module.nodejs.kubernetes_service.nodejs
Successfully removed 1 resource instance (s).
essh @ kubernetes-master: ~ / node-cluster $ ./terraform apply
module.Kubernetes.google_container_cluster.node-ks: Refreshing state … [id = node-ks]
module.Kubernetes.google_container_node_pool.node-ks-pool: Refreshing state … [id = europe-west2-a / node-ks / node-ks-pool]
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Terraform Cluster Reliability and Automation
For a general overview of automation, see https://codelabs.developers.google.com/codelabs/cloud-builder-gke-continuous-deploy/index. html # 0. We will dwell in more detail. Now if we run the ./terraform destroy and try to recreate the entire infrastructure from the beginning , we will get errors. Errors are received due to the fact that the order of creation of services is not specified and terraform, by default, sends requests to the API in 10 parallel threads, although this can be changed by specifying or removing the -parallelism = 1 switch during application or removal . As a result, Terraform tries to create Kubernetes services (Deployment and service) on servers (node-pull) that do not yet exist, the same situation when creating a service that requires proxying a Deployment that has not yet been created. By telling Terraform to request the API in a single thread ./terraform apply -parallelism = 1, we reduce possible provider-side restrictions on the frequency of API calls, but we do not solve the problem of lack of order in which entities are created. We will not comment out dependent blocks and gradually uncomment and run ./terraform apply , nor will we run our system piece by piece by specifying specific blocks ./terraform apply -target = module.nodejs.kubernetes_deployment.nodejs . We will indicate in the code the dependencies themselves on the initialization of the variable, the first of which is already defined as external var.endpoint , and the second we will create locally:
locals {
app = kubernetes_deployment.nodejs.metadata.0.labels.app
}
Now we can add dependencies to the code depends_on = [var.endpoint] and depends_on = [kubernetes_deployment .nodejs] .
The service unavailability error may also appear: Error: Get https: //35.197.228.3/API/v1 …: dial tcp 35.197.228.3:443: connect: connection refused , then the connection time has been exceeded, which is 6 minutes by default (3600 seconds), but here you can just try again.
Now let's move on to solving the problem of the reliability of the container, the main process of which we run in the command shell. The first thing we will do is separate the creation of the application from the launch of the container. To do this, you need to transfer the entire process of creating a service into the process of creating an image, which can be tested, and by which you can create a service container. So let's create an image:
essh @ kubernetes-master: ~ / node-cluster $ cat app / server.js
const http = require ('http');
const server = http.createServer (function (request, response) {
response.writeHead (200, {"Content-Type": "text / plain"});
response.end (`Nodejs_cluster is working! My host is $ {process.env.HOSTNAME}`);
});
server.listen (80);
essh @ kubernetes-master: ~ / node-cluster $ cat Dockerfile
FROM node: 12
WORKDIR / usr / src /
ADD ./app / usr / src /
RUN npm install
EXPOSE 3000
ENTRYPOINT ["node", "server.js"]
essh @ kubernetes-master: ~ / node-cluster $ sudo docker image build -t nodejs_cluster.
Sending build context to Docker daemon 257.4MB
Step 1/6: FROM node: 12
––> b074182f4154
Step 2/6: WORKDIR / usr / src /
––> Using cache
––> 06666b54afba
Step 3/6: ADD ./app / usr / src /
––> Using cache
––> 13fa01953b4a
Step 4/6: RUN npm install
––> Using cache
––> dd074632659c
Step 5/6: EXPOSE 3000
––> Using cache
––> ba3b7745b8e3
Step 6/6: ENTRYPOINT ["node", "server.js"]
––> Using cache
––> a957fa7a1efa
Successfully built a957fa7a1efa
Successfully tagged nodejs_cluster: latest
essh @ kubernetes-master: ~ / node-cluster $ sudo docker images | grep nodejs_cluster
nodejs_cluster latest a957fa7a1efa 26 minutes ago 906MB
Now let's put our image in the GCP registry, and not the Docker Hub, because we immediately get a private repository with which our services automatically have access:
essh @ kubernetes-master: ~ / node-cluster $ IMAGE_ID = "nodejs_cluster"
essh @ kubernetes-master: ~ / node-cluster $ sudo docker tag $ IMAGE_ID: latest gcr.io/$PROJECT_ID/$IMAGE_ID:latest
essh @ kubernetes-master: ~ / node-cluster $ sudo docker images | grep nodejs_cluster
nodejs_cluster latest a957fa7a1efa 26 minutes ago 906MB
gcr.io/node-cluster-243923/nodejs_cluster latest a957fa7a1efa 26 minutes ago 906MB
essh @ kubernetes-master: ~ / node-cluster $ gcloud auth configure-docker
gcloud credential helpers already registered correctly.
essh @ kubernetes-master: ~ / node-cluster $ docker push gcr.io/$PROJECT_ID/$IMAGE_ID:latest
The push refers to repository [gcr.io/node-cluster-243923/nodejs_cluster]
194f3d074f36: Pushed
b91e71cc9778: Pushed
640fdb25c9d7: Layer already exists
b0b300677afe: Layer already exists
5667af297e60: Layer already exists
84d0c4b192e8: Layer already exists
a637c551a0da: Layer already exists
2c8d31157b81: Layer already exists
7b76d801397d: Layer already exists
f32868cde90b: Layer already exists
0db06dff9d9a: Layer already exists
latest: digest: sha256: 912938003a93c53b7c8f806cded3f9bffae7b5553b9350c75791ff7acd1dad0b size: 2629
essh @ kubernetes-master: ~ / node-cluster $ gcloud container images list
NAME
gcr.io/node-cluster-243923/nodejs_cluster
Only listing images in gcr.io/node-cluster-243923. Use –repository to list images in other repositories.
Now we can see it in the GCP admin panel: Container Registry -> Images. Let's replace the code of our container with the code with our image. If for production it is necessary to version the launched image in order to avoid their automatic update during system re-creation of PODs, for example, when transferring POD from one node to another when taking a machine with our node for maintenance. For development, it is better to use the latest tag , which will update the service when the image is updated. When you update the service, you need to recreate it, that is, delete and recreate it, since otherwise the terraform will simply update the parameters, and not recreate the container with the new image. Also, if we update the image and mark the service as modified with the command ./terraform taint $ {NAME_SERVICE} , our service will simply be updated, which can be seen with the command ./terraform plan . Therefore, to update, for now, you need to use the commands ./terraform destroy -target = $ {NAME_SERVICE} and ./terraform apply , and the name of the services can be found in ./terraform state list :
essh @ kubernetes-master: ~ / node-cluster $ ./terraform state list
data.google_client_config.default
module.kubernetes.google_container_cluster.node-ks
module.kubernetes.google_container_node_pool.node-ks-pool
module.Nginx.kubernetes_deployment.nodejs
module.Nginx.kubernetes_service.nodejs
essh @ kubernetes-master: ~ / node-cluster $ ./terraform destroy -target = module.nodejs.kubernetes_deployment.nodejs
essh @ kubernetes-master: ~ / node-cluster $ ./terraform apply
Now let's replace the code of our container:
container {
image = "gcr.io/node-cluster-243923/nodejs_cluster:latest"
name = "node-js"
}
Let's check the result of balancing for different nodes (no line break at the end of the output):
essh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lqg48essh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lhgsnessh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lhgsnessh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lhgsnessh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lhgsnessh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lhgsnessh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lqg48essh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lhgsnessh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lhgsnessh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lhgsnessh @ kubernetes-master: ~ / node-cluster $ curl http://35.246.85.138:80
Nodejs_cluster is working! My host is terraform-nodejs-997fd5c9c-lqg48essh @ kubernetes-master: ~ / node-cluster $
We will automate the process of creating images, for this we will use the Google Cloud Build service (free for 5 users and traffic up to 50GB) to create a new image when creating a new version (tag) in the Cloud Source Repositories repository (free on Google Cloud Platform Free Tier). Google Cloud Platform -> Menu -> Tools -> Cloud Build -> Triggers -> Enable Cloud Build API -> Get Started -> Create a repository that will be available on Google Cloud Platform -> Menu -> Tools -> Source Code Repositories (Cloud Source Repositories):
essh @ kubernetes-master: ~ / node-cluster $ cd app /
essh @ kubernetes-master: ~ / node-cluster / app $ ls
server.js
essh @ kubernetes-master: ~ / node-cluster / app $ mv ./server.js ../
essh @ kubernetes-master: ~ / node-cluster / app $ gcloud source repos clone nodejs –project = node-cluster-243923
Cloning into '/ home / essh / node-cluster / app / nodejs' …
warning: You appear to have cloned an empty repository.
Project [node-cluster-243923] repository [nodejs] was cloned to [/ home / essh / node-cluster / app / nodejs].
essh @ kubernetes-master: ~ / node-cluster / app $ ls -a
… .. nodejs
essh @ kubernetes-master: ~ / node-cluster / app $ ls nodejs /
essh @ kubernetes-master: ~ / node-cluster / app $ ls -a nodejs /
… .. .git
essh @ kubernetes-master: ~ / node-cluster / app $ cd nodejs /
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ mv ../../server.js.
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git add server.js
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git commit -m 'test server'
[master (root-commit) 46dd957] test server
1 file changed, 7 insertions (+)
create mode 100644 server.js
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git push -u origin master
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 408 bytes | 408.00 KiB / s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://source.developers.google.com/p/node-cluster-243923/r/nodejs
* [new branch] master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.
Now it's time to set up image creation when creating a new version of the product: go to GCP -> Cloud Build -> triggers -> Create trigger -> Google Cloud source code repository -> NodeJS. Trigger tag type so that the image is not created during normal commits. I will change the name of the image from gcr.io/node-cluster-243923/NodeJS:$SHORT_SHA to gcr.io/node-cluster-243923/NodeJS:$SHORT_SHA and the timeout to 60 seconds. Now I'll commit and add a tag:
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ cp ../../Dockerfile.
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git add Dockerfile
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git add Dockerfile
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ cp ../../Dockerfile.
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git add Dockerfile
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git commit -m 'add Dockerfile'
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git remote -v
origin https://source.developers.google.com/p/node-cluster-243923/r/nodejs (fetch)
origin https://source.developers.google.com/p/node-cluster-243923/r/nodejs (push)
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git push origin master
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 380 bytes | 380.00 KiB / s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://source.developers.google.com/p/node-cluster-243923/r/nodejs
46dd957..b86c01d master -> master
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git tag
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git tag -a v0.0.1 -m 'test to run'
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git push origin v0.0.1
Counting objects: 1, done.
Writing objects: 100% (1/1), 161 bytes | 161.00 KiB / s, done.
Total 1 (delta 0), reused 0 (delta 0)
To https://source.developers.google.com/p/node-cluster-243923/r/nodejs
* [new tag] v0.0.1 -> v0.0.1
Now, if we press the start trigger button, we will see the image in the Container Registry with our tag:
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ gcloud container images list
NAME
gcr.io/node-cluster-243923/nodejs
gcr.io/node-cluster-243923/nodejs_cluster
Only listing images in gcr.io/node-cluster-243923. Use –repository to list images in other repositories.
Now if we just add the changes and the tag, the image will be created automatically:
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ sed -i 's / HOSTNAME \} / HOSTNAME \} \ n /' server.js
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git add server.js
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git commit -m 'fix'
[master 230d67e] fix
1 file changed, 2 insertions (+), 1 deletion (-)
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git push origin master
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 304 bytes | 304.00 KiB / s, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1)
To https://source.developers.google.com/p/node-cluster-243923/r/nodejs
b86c01d..230d67e master -> master
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git tag -a v0.0.2 -m 'fix'
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git push origin v0.0.2
Counting objects: 1, done.
Writing objects: 100% (1/1), 158 bytes | 158.00 KiB / s, done.
Total 1 (delta 0), reused 0 (delta 0)
To https://source.developers.google.com/p/node-cluster-243923/r/nodejs
* [new tag] v0.0.2 -> v0.0.2
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ sleep 60
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ gcloud builds list
ID CREATE_TIME DURATION SOURCE IMAGES STATUS
2b024d7e-87a9-4d2a-980b-4e7c108c5fad 2019-06-22T17: 13: 14 + 00: 00 28S nodejs@v0.0.2 gcr.io/node-cluster-243923/nodejs:v0.0.2 SUCCESS
6b4ae6ff-2f4a-481b-9f4e-219fafb5d572 2019-06-22T16: 57: 11 + 00: 00 29S nodejs@v0.0.1 gcr.io/node-cluster-243923/nodejs:v0.0.1 SUCCESS
e50df082-31a4-463b-abb2-d0f72fbf62cb 2019-06-22T16: 56: 48 + 00: 00 29S nodejs@v0.0.1 gcr.io/node-cluster-243923/nodejs:v0.0.1 SUCCESS
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git tag -a latest -m 'fix'
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ git push origin latest
Counting objects: 1, done.
Writing objects: 100% (1/1), 156 bytes | 156.00 KiB / s, done.
Total 1 (delta 0), reused 0 (delta 0)
To https://source.developers.google.com/p/node-cluster-243923/r/nodejs
* [new tag] latest -> latest
essh @ kubernetes-master: ~ / node-cluster / app / nodejs $ cd ../ ..
Creating multiple environments with Terraform clusters
When trying to create several clusters from the same configuration, we will encounter duplicate identifiers that must be unique, so we isolate them from each other by creating and placing them in different projects. To manually create a project, go to GCP -> Products -> IAM and administration -> Resource management and create a NodeJS-prod project and switch to the project, wait for its activation. Let's look at the state of the current project:
essh @ kubernetes-master: ~ / node-cluster $ cat main.tf
provider "google" {
credentials = file ("./ kubernetes_key.json")
project = "node-cluster-243923"
region = "europe-west2"
}
module "kubernetes" {
source = "./Kubernetes"
}
data "google_client_config" "default" {}
module "Nginx" {
source = "./nodejs"
image = "gcr.io/node-cluster-243923/nodejs_cluster:latest"
endpoint = module.kubernetes.endpoint
access_token = data.google_client_config.default.access_token
cluster_ca_certificate = module.kubernetes.cluster_ca_certificate
}
essh @ kubernetes-master: ~ / node-cluster $ gcloud config list project
[core]
project = node-cluster-243923
Your active configuration is: [default]
essh @ kubernetes-master: ~ / node-cluster $ gcloud config set project node-cluster-243923
Updated property [core / project].
essh @ kubernetes-master: ~ / node-cluster $ gcloud compute instances list
NAME ZONE INTERNAL_IP EXTERNAL_IP STATUS
gke-node-ks-default-pool-2e5073d4-csmg europe-north1-a 10.166.0.2 35.228.96.97 RUNNING
gke-node-ks-node-ks-pool-ccbaf5c6-4xgc europe-north1-a 10.166.15.233 35.228.82.222 RUNNING
gke-node-ks-default-pool-72a6d4a3-ldzg europe-north1-b 10.166.15.231 35.228.143.7 RUNNING
gke-node-ks-node-ks-pool-9ee6a401-ngfn europe-north1-b 10.166.15.234 35.228.129.224 RUNNING
gke-node-ks-default-pool-d370036c-kbg6 europe-north1-c 10.166.15.232 35.228.117.98 RUNNING
gke-node-ks-node-ks-pool-d7b09e63-q8r2 europe-north1-c 10.166.15.235 35.228.85.157 RUNNING
Switch gcloud and look at an empty project:
essh @ kubernetes-master: ~ / node-cluster $ gcloud config set project node-cluster-prod-244519
Updated property [core / project].
essh @ kubernetes-master: ~ / node-cluster $ gcloud config list project
[core]
project = node-cluster-prod-244519
Your active configuration is: [default]
essh @ kubernetes-master: ~ / node-cluster $ gcloud compute instances list
Listed 0 items.
The previous time, for node-cluster-243923, we created a service account, on behalf of which we created a cluster. To work with multiple Terraform accounts, we will create a service account for the new project through IAM and Administration -> Service Accounts. We will need to make two separate folders to run Terraform separately in order to separate SSH connections that have different authorization keys. If we put both providers with different keys, we will get a successful connection for the first project, later when Terraform proceeds to create a cluster for the next project, it will be rejected due to the invalid key from the first project to the second. There is another possibility – to activate the account as a company account (you need a website and email, and check them by Google), then it will be possible to create projects from the code without using the admin panel. After dev environment:
essh @ kubernetes-master: ~ / node-cluster $ ./terraform destroy
essh @ kubernetes-master: ~ / node-cluster $ mkdir dev
essh @ kubernetes-master: ~ / node-cluster $ cd dev /
essh @ kubernetes-master: ~ / node-cluster / dev $ gcloud config set project node-cluster-243923
Updated property [core / project].
essh @ kubernetes-master: ~ / node-cluster / dev $ gcloud config list project
[core]
project = node-cluster-243923
Your active configuration is: [default]
essh @ kubernetes-master: ~ / node-cluster / dev $ ../kubernetes_key.json ../main.tf.
essh @ kubernetes-master: ~ / node-cluster / dev $ cat main.tf
provider "google" {
alias = "dev"
credentials = file ("./ kubernetes_key.json")
project = "node-cluster-243923"
region = "europe-west2"
}
module "kubernetes_dev" {
source = "../Kubernetes"
node_pull = false
providers = {
google = google.dev
}
}
data "google_client_config" "default" {}
module "Nginx" {
source = "../nodejs"
providers = {
google = google.dev
}
image = "gcr.io/node-cluster-243923/nodejs_cluster:latest"
endpoint = module.kubernetes_dev.endpoint
access_token = data.google_client_config.default.access_token
cluster_ca_certificate = module.kubernetes_dev.cluster_ca_certificate
}
essh @ kubernetes-master: ~ / node-cluster / dev $ ../terraform init
essh @ kubernetes-master: ~ / node-cluster / dev $ ../terraform apply
essh @ kubernetes-master: ~ / node-cluster / dev $ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
gke-node-ks-default-pool-71afadb8-4t39 europe-north1-a n1-standard-1 10.166.0.60 35.228.96.97 RUNNING
gke-node-ks-node-ks-pool-134dada1-3cdf europe-north1-a n1-standard-1 10.166.0.61 35.228.117.98 RUNNING
gke-node-ks-node-ks-pool-134dada1-c476 europe-north1-a n1-standard-1 10.166.15.194 35.228.82.222 RUNNING
essh @ kubernetes-master: ~ / node-cluster / dev $ gcloud container clusters get-credentials node-ks
Fetching cluster endpoint and auth data.
kubeconfig entry generated for node-ks.
essh @ kubernetes-master: ~ / node-cluster / dev $ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
terraform-nodejs-6fd8498cb5-29dzx 1/1 Running 0 2m57s 10.12.3.2 gke-node-ks-node-ks-pool-134dada1-c476 none>
terraform-nodejs-6fd8498cb5-jcbj6 0/1 Pending 0 2m58s none> none> none>
terraform-nodejs-6fd8498cb5-lvfjf 1/1 Running 0 2m58s 10.12.1.3 gke-node-ks-node-ks-pool-134dada1-3cdf none>
As you can see, the PODs were distributed across the pool of nodes, while not getting to the node with Kubernetes due to lack of free space. It is important to note that the number of nodes in the pool was increased automatically, and only the specified limit did not allow creating a third node in the pool. If we set remove_default_node_pool to true, then we merge the Kubernetes PODs and our PODs. According to requests for resources, Kubernetes takes up a little more than one core, and our POD takes half, so the rest of the PODs were not created, but we saved on resources:
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
gke-node-ks-node-ks-pool-495b75fa-08q2 europe-north1-a n1-standard-1 10.166.0.57 35.228.117.98 RUNNING
gke-node-ks-node-ks-pool-495b75fa-wsf5 europe-north1-a n1-standard-1 10.166.0.59 35.228.96.97 RUNNING
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ gcloud container clusters get-credentials node-ks
Fetching cluster endpoint and auth data.
kubeconfig entry generated for node-ks.
essh @ kubernetes-master: ~ / node-cluster / Kubernetes $ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
terraform-nodejs-6fd8498cb5-97svs 1/1 Running 0 14m 10.12.2.2 gke-node-ks-node-ks-pool-495b75fa-wsf5 none>
terraform-nodejs-6fd8498cb5-d9zkr 0/1 Pending 0 14m none> none> none>
terraform-nodejs-6fd8498cb5-phk8x 0/1 Pending 0 14m none> none> none>
After creating a service account, add the key and check it:
essh @ kubernetes-master: ~ / node-cluster / dev $ gcloud auth login essh @ kubernetes-master: ~ / node-cluster / dev $ gcloud projects create node-cluster-prod3 Create in progress for [https: // cloudresourcemanager. googleapis.com/v1/projects/node-cluster-prod3]. Waiting for [operations / cp.7153345484959140898] to finish … done. https://medium.com/@pnatraj/how-to-run-gcloud-command-line-using-a-service-account-f39043d515b9
essh @ kubernetes-master: ~ / node-cluster $ gcloud auth application-default login
essh @ kubernetes-master: ~ / node-cluster $ cp ~ / Downloads / node-cluster-prod-244519-6fd863dd4d38.json ./kubernetes_prod.json
essh @ kubernetes-master: ~ / node-cluster $ echo "kubernetes_prod.json" >> .gitignore
essh @ kubernetes-master: ~ / node-cluster $ gcloud iam service-accounts list
NAME EMAIL DISABLED
Compute Engine default service account 1008874319751-compute@developer.gserviceaccount.com False
terraform-prod terraform-prod@node-cluster-prod-244519.iam.gserviceaccount.com False
essh @ kubernetes-master: ~ / node-cluster $ gcloud projects list | grep node-cluster
node-cluster-243923 node-cluster 26345118671
node-cluster-prod-244519 node-cluster-prod 1008874319751
Let's create a prod environment:
essh @ kubernetes-master: ~ / node-cluster $ mkdir prod
essh @ kubernetes-master: ~ / node-cluster $ cd prod /
essh @ kubernetes-master: ~ / node-cluster / prod $ cp ../main.tf ../kubernetes_prod_key.json.
essh @ kubernetes-master: ~ / node-cluster / prod $ gcloud config set project node-cluster-prod-244519
Updated property [core / project].
essh @ kubernetes-master: ~ / node-cluster / prod $ gcloud config list project
[core]
project = node-cluster-prod-244519
Your active configuration is: [default]
essh @ kubernetes-master: ~ / node-cluster / prod $ cat main.tf
provider "google" {
alias = "prod"
credentials = file ("./ kubernetes_prod_key.json")
project = "node-cluster-prod-244519"
region = "us-west2"
}
module "kubernetes_prod" {
source = "../Kubernetes"
providers = {
google = google.prod
}
}
data "google_client_config" "default" {}
module "Nginx" {
source = "../nodejs"
providers = {
google = google.prod
}
image = "gcr.io/node-cluster-243923/nodejs_cluster:latest"
endpoint = module.kubernetes_prod.endpoint
access_token = data.google_client_config.default.access_token
cluster_ca_certificate = module.kubernetes_prod.cluster_ca_certificate
}
essh @ kubernetes-master: ~ / node-cluster / prod $ ../terraform init
essh @ kubernetes-master: ~ / node-cluster / prod $ ../terraform apply