Saturday 3 November 2018

Docker for Developers - An Introduction

While Docker has been around for a few years, it is fairly new to me. The organization I work for has only been using it for about a year but I've never had to delve to deep into using it.

In most cases, I run a command to start my containers and start working. It all just works out of box, as it was intended.

However, I recently had a need to setup a new project on my own and using our standard setup wouldn't work and I realized I really don't know much about Docker. So, this article about a few key parts of Docker that I struggled with in the hopes that others might learn something of benefit.

Images and Words

Dream Theater references aside, one key aspect of Docker I didn't understand was the different between an image and a container. I thought the words were interchangeable but, alas, they most certainly are not.

One metaphor I read about was that an image is to a class as a container is to an instance. I think that's fairly accurate. Saying that a container is just a running image isn't quite right because it doesn't capture the full picture. A container is not only a running image but also has additional configuration and may have a filesystem attached.

You can have two containers, two instances of an image, with different settings. This is important to note.

Monolithic vs Multiple Containers

Another question was: "why one would want to run multiple containers orchestrated to work together rather than a single, monolithic image."

At first, I thought the only real difference was the argument between statically linked binaries and dynamically linked ones. Why go through all the headache require the use of additional tools, like docker compose, just to achieve the same thing as can be done with a single image? Turns out, that's not the case.

Of course, reproducibility is a key part of developing an application. The ideal situation is having a development environment that perfectly reflects the production environment so that any bugs or issues can be replicated easily. A single docker image does that job just fine. In fact, in many organizations, web applications are deployed using containers so that the development environment and production environment are, quite literally, identical from a software point of view.

However, unlike a binary, where libraries are linked at compile time and become static dependencies, Docker containers behave more like plug and play peripherals.

A perfect example is adding a container to trap emails sent by an application. In a development environment, you can quickly spin up a container containing a tool like Mailhog to capture emails without affecting the rest of the environment.

Another advantage is that you can mock application services with a dummy container or even swap different versions of a dependent service without having to rebuild the image each time.

Imagine needing to swap between different versions of PHP. You could have 3 complete images of 500Mb each to cover PHP 5.6, 7.0 and 7.2 or, you could have 3 different PHP images that are 50 Mb each. Need to add a new PHP version to support? Easy! Just pull down another PHP image rather than be force to rebuild an entirely new image for your app.

Configuring Containers

I'm a bit embarrassed to admit it but I didn't understand how to configure a container. I assumed that you must have to build an image, instantiate a container, open an interactive terminal and perform the configuration. The idea felt all wrong but I didn't understand. The reason it felt wrong was because, well, it was wrong.

The problem was, I didn't know what Volumes were or how to use them. I also realized I didn't understand how containers were persistent or how to access files on a local computer. I assumed, again incorrectly, that you must have to copy the files into the image or container. Again, how wrong I was.

When starting an Nginx or Apache server, you need to be able to configure it. Whether it's core configuration or virtual host configuration, you need to supply this information to the web service. Did you create the file outside the image (yes) and then copy it in (no; well, not exactly)?

This is when the brilliance of Docker set in. When I understood how this worked.

You can configure the web server using the configuration files for the deployment server, whether they use containers or not, by attaching them to a container via volumes.

Volumes are exactly like file system mounting. Take a directory, or file, on your local system and mount it to a directory in the container's file system. In the case of Nginx, you might attach a "myapp/nginx.conf" to "/etc/nginx/nginx.conf" of the container. Then, when the container is started, it uses your local filesystem's file as if it were part of its own filesystem. A little like a chroot. It's brilliant!

This, of course, opens all kinds of wondrous doors! Need to import a database or assets for a web application? Use volumes to mount them where the live system would expect them to be! It's great!

This was a crucial part of the system I was missing and explains one aspect of why images are so reusable. You can have a single image whereby container A uses volumes for project A and container B uses volumes for project B. This comes at a major size savings.

Communication is Key
If configuration, volumes specifically, was the biggest hurdle I needed to get over, then the second largest would be intercommunication. How in the hell do different containers communicate?

If one image contains Apache, a second image contains PHP, a third image contains MySQL, how in the world do they talk to one another? The web server acts as a proxy to PHP (presume php-fpm) and thereby forwards paths to scripts to that container. PHP runs the code attached to the container in a volume but that code needs access to MySQL which is, again, in another container.

Within a single image or server, all these programs live in one space and so intercommunication happens (typically, but not always) through localhost. So, how in heck does it work with multiple images that have no knowledge of each other?

Enter networks and some Docker magic. To use multiple containers, you need to use docker compose, hereinafter referred to only as compose. Compose does two things:

  1. Create a network to communicate by.
  2. Sets the hosts file up in a container with the network names.
Now, with some simple setup, different image can communicate with each other as if they were all together on a single environment. Far out, solid and right on!

Alpine Skiing

Last, and I think a bit more minor, might be what these Alpine images are. Alpine is a small Linux distro specifically created for Docker. An Ubuntu image is quite large. Unless you have a specific need to perfectly imitate an Ubuntu system, Alpine is an excellent choice.

However, I have discovered a potential best of both words that I haven't tested yet: https://blog.ubuntu.com/2018/07/09/minimal-ubuntu-released.

Anyway, that's the gist of Alpine and its primary use.

Wrapping Up

That about covers the things I wanted to talk about. With any luck, a fellow newb will learn something and help them on their way to devops greatness!

So...ya. Bye!

Thursday 6 September 2018

C++ Text Template Engine - Part 1: Overview

This post will be the first in a series of undetermined length in my journey to write a text template engine in C++. No doubt, something like this already exists but those I happened to look at didn't really wet my whistle. No only that, there are aspects of C++ I want to explore and this would give me an excellent opportunity to do just that.

Motivation

I would classify myself as an intermediate programmer overall and a somewhat novice to C++. Certainly, I am drastically behind the times of modern C++ and I was never a strong programmer in the language to begin with. So, the first motivation is to help improve my C++ chops a bit. Why?

Well, that leads me to my second motivation: Not long after I started working for my current employer, I was tasked with maintaining our in-house systems and they were authored almost 20 years ago. The main software functions much as a modern web-server except that it runs like a CGI script. Each page load invokes a program to build the page and output it over HTTP. Some of these pages are ludicrously complex and are all done through print (std::cout/std::ostream) statements. It's horrific to maintain.

As part of the effort to make the code base more maintainable I desire to have a template engine so I can create templates and inject data into them. Due to budgets and our main focus of development, completely replacing the system isn't currently in the cards but we are frequently making changes to it. So, in the meantime, I need to maintain it.

Recently, I created a stop-gap by using regex to perform search and replace on a template document. It has drastically improved the situation but falls far short of the mark. I'd like to further improve the system.

General Overview

So, the plan is to document the entire process. Including the mistakes. I want to give a sort of free form exploration as I meander through the process and hopefully you, the reader, learn something along the way.

The idea here is that I want to demonstrate a little about what agile programming looks like and demonstrate how the thought process works when scoping and building out a moderately sized piece of software.

I've already gone on long enough, so it's time to find a place to start.

User Stories

A good place would be to create user stories. These are just statements that describes what it is the user/client (in this case, me) wants. I used to think them quite silly and pointless but I've recently come to appreciate them.

They give you a starting point to begin a discussion and they help keep you on track. Too often I've found myself over-engineering or falling short of client expectations and it usually has been a result of not fully understanding their needs. That's where the user story comes in. It serves as a jumping off point to understanding what they want.

Story 1: As a developer, I would like to create a document, to act as a template, that can have portions of it replaced dynamically to create a final document. This would ease creating pages for my application by avoiding the need to construct these pages with print statements.

Story 2: As a developer, I will need to be able to use any basic data type as well as some standard library containers.

Story 3: As a developer, I need a method to perform conditionals and loops. I have lists of users and accounts I will need to loop over and print.

Story 4: As a front end developer, I want the syntax should be easy to learn and familiar to me. I don't want to spend a lot of time learning another

Story 5: As a developer, it would be useful that any classes I currently have in my code base be compatible or could be integrated easily into this system.

Initial Thoughts

Perfect. So, what can we glean from that?

I need to have a source document, maybe a file on disk or a string. Reading files was not part of the user story so I think I should dismiss that for now. All I care about is the actual source and either a data stream or string will suffice as that lets me take data from any source and allow me to transform it. I worry a little that a data stream might be a pain to work with so I'm thinking that I'll want the data source to be passed in as a string. That more or less covers story number one.

In the second story, I need to be able to access basic data types like an integer, float or character string. I also need to be concerned with more complex data structures like vectors or maps. Classes will be addressed in the last user story so I'll forget about that for now. Off the top of my head, I'm thinking I'll need to have some kind of base value type with several derived classes for each basic type. I'm thinking ahead a bit here so I'll leave it at that for the moment.

User story number three asks about a conditional. That could be something like an if statement or switch statement. I want to keep things as simple as possible, I will likely just use an if statement with and else clause. This story could use some fleshing out.

For loops, I feel am faced with the classic for and while statements. In the interest of simplicity, I will only pick one. Since I will likely be passing data structures into the template engine a for loop with an each or range behavior likely makes the most sense. In the Go standard library, they eschew the for and while keywords for the range keyword and I kind of like that but that may be problematic for user story four. Perhaps a keyword of foreach might be apropos.

A familiar syntax for front end developers would be one like Mustache, Django or Twig. I think if I adopt something similar, that will ease any mind-share. While some systems use different tokens for commands and identifiers I think I can safely use one and just enforce some simple naming rules common to almost every programming language.

Finally, I need to be able to support existing classes. To my knowledge, C++ does not have reflection and converting large classes of data into maps would not be fun or efficient. I'm thinking that I can maybe create an interface by which my existing classes could fulfill in order to make them compatible with this system. I'm going to have to think more on it but I need more information first.

The Stage is Set

That completes the initial breakdown and user stories. The next phase will be discovery and I'll cover in the next article.

Sunday 19 August 2018

Writing Interfaces in C++

After my last article, I got to thinking about utilizing interfaces in C++. I am not an expert in C++ by any means and most of the code I have to work on is both antiquated (C++ 98) and poorly written (even for its time). Most of my time is spent writing PHP and Go, so using interfaces is quite common.

Interfaces, abstract classes in C++, are not used at all in the code bases I work on regularly. It got me to thinking: "could Go or PHP style interfaces be done in C++?"

Virtual Recap

A virtual function is one that is defined in a class that is intended to be redefined by derived classes. When a base class calls a virtual function, the derived version (if it exists) is called.

A key point to make is that virtual functions do NOT need to be defined by a derived class.

To see an example, check out my previous post.

Pure Virtual Goodness

In PHP, you would call these abstract functions. In C++, they are called pure virtual functions. Perhaps a better thing to call them would be purely virtual. I feel like that term makes it more clear to the purpose of this feature. Declare that a function exists as a method on a class but leave the definition for later. It exists purely in a virtual sense.

Abstract functions are those that are defined in a lower, base class and implemented by a higher, derived class. Class A provides a prototype for a function but class B, which extends class A, actually implements it.

A pure virtual function is a contract. A class which extends a class with an abstract, pure virtual function must implement it before it can be instantiated. This guarantees that the function will exist somewhere in the class hierarchy. Otherwise, it would be no different than calling a function that has not been defined.

A pure virtual function looks like this:
virtual void Read(std::string& s) = 0;
This declares a virtual function Read. The notation of assigning zero to the function designates that the function is purely virtual.

Abstract classes as Interfaces

A class with only pure virtual functions is considered to be an abstract class. Since C++ does not have interfaces in the same way other languages do, abstract classes can fulfill the same role.
class Reader {
public:
    virtual void Read(std::string& s) = 0;};
The above class is fully abstract. Any classes that extend it will need to implement the method Read.

Or does it?

Compound Interfaces

I am a proponent of small, simple interfaces that can be combined to create ever more complex ones. This is the Interface segregation principle from SOLID design principles. The problem lies in the fact that C++ requires derived classes implement pure virtual functions. Thankfully, there is a way to get around that restriction.

Extending a class with the virtual keyword tells the compiler that the derived class will not be implementing the pure virtual function either, making it abstract as well. This allows multiple interfaces to be combined.
class Reader {
public:
    virtual void Read(std::string& s) = 0;};
class Writer {
public:
    virtual void Write(const std::string& s) = 0;};
class ReadWriter : public virtual Reader, public virtual Writer {};
Here, the interfaces Reader and Writer get combined into a third abstract class ReadWriter. It too, could add further pure virtual functions if desired.

Implementing an Interface

Implementing the interface is the same as any deriving any class. So, to tie everything together, he's a complete example:
#include 
#include 

class Reader {
public:
    virtual void Read(std::string& s) = 0;};
class Writer {
public:
    virtual void Write(const std::string& s) = 0;};
class ReadWriter : public virtual Reader, public virtual Writer {};
class SomeClass: public ReadWriter {
    std::string buf;public:
    void Read(std::string& s) override { s = buf; }
    void Write(const std::string& s) override { buf = s; }
};
void readAndWrite(ReadWriter& rw) {
    rw.Write("Hello");    std::string buf;    rw.Read(buf);    std::cout << buf << std::endl;}

int main() {
    SomeClass c;    readAndWrite(c);    return 0;}
The on caveat is that abstract classes must be some kind of pointer, either as a standard pointer or a reference pointer. This requirement makes sense since it can not be instantiated.

Summary

Using pure virtual functions and virtual classes, it is indeed possible to describe behaviour as would be done in other languages with interfaces. Pure virtual functions have a further use in multiple inheritance. To learn more, check out this StackOverflow answer.

Happy programming!

Saturday 18 August 2018

Understanding Virtual Functions in C++

Until I had to explain it to someone, I never appreciated how confusing the virtual keyword can be. After all, in certain situations there seems to be no functional difference between virtual and non-virtual functions except that your IDE or editor might complain at you.

This article is aimed at programmers who are new to C++ or an initiate to programming in general.

Virtual

The term conjures up a vision of something that is non-tangible. That is, something that doesn't exist in the physical world. In C++, this meaning is extended to describe functions which may not exist. By that, we mean that it may not be defined.

A virtual function, therefore, is a function which may not be real. While a class may call a function it defines, a virtual function could also exist somewhere else in it's object hierarchy. The class doesn't know what function it's actually calling until it is compiled or possibly until runtime.

Essentially, the virtual keyword signals to a developer that the function is intended to be able to be implemented or overridden in a derived class.

The Base Class (Non-Virtual)

To demonstrate, start by creating a simple base class with two public functions: Foo() and Bar(). Inside the Foo() function, call Bar(). It may be helpful to add some output so you know which function is being called.

Something like this:


class Base {
public:
    void Foo() { cout << "Base::Foo" << endl; Bar(); }
    void Bar() { cout << "Base::Bar" << endl; };};
In a main function, instantiate a new Base class and call Foo(). You should get ouput similar to this:
Base::Foo
Base::Bar
This is pretty standard fair and works as expected.

The Derived Class

Next, create a derived function that inherits from Base. In it, create a function with the same name and signature as Bar(). If printing out a statement, make sure you update it to report the new class name.
class Derived : public Base {
public:
    void Bar() { cout << "Derived::Bar" << endl; }
};
Call Foo() on this new class and you'll see that you get the same output. The function Bar() is only called on the Base class.

The Base Class (Virtual)

Add the virtual keyword to the function Bar() in the Base class then try running the code again. The code should look like this:
class Base {
public:
    void Foo() { cout << "Base::Foo" << endl; Bar(); }
    virtual void Bar() { cout << "Base::Bar" << endl; };};
You should get output like this:
Base::Foo
Derived::Bar
This time, Bar() was called in the Derived class instead of the Base class. Why?

Virtual Explained

The virtual keyword signaled a change in visibility. A non-virtual function tells the compiler not to bother looking for another function with the same name, just to look for the function in the class where it is called. A virtual function tells the compiler to see if a function with a matching signature exists in any derived classes first, before calling the one in the same class.

Summary

Virtual functions are those whom are intended to exist higher up in the class hierarchy in derived classes and should be called if they are defined.

Monday 7 May 2018

Domains, IPs, Ports and Virtual Hosts - How it All Fits Together

Developing web applications might seem fairly simple and writing a basic web page is. If all you are doing is writing a static web page then you can freely go ahead and start writing code. What if you want to develop a dynamic web page or mimic a production server?

Servers

Any single computer on any network, whether a local network or the Internet, is associated with an address. An address may be assigned to only one computer at any given time but multiple addresses could potentially point to the same server. Without going into the semantics of addresses, two common ones that will be encountered are:
  • 192.168.X.X - These are local, intranet addresses. Each computer on a home network would have an address in this range.
  • 127.0.0.1 - This is a special address called the loop-back address. It is used by a server in order to allow it to contact itself. Hence, loop back.
An IP (Internet Protocol) address is a 32 bit number that is usually displayed as a series of numbers separated by dots, as seen above.

There are a lot of complexities to how addressing works, including internal (intranet) to external network addresses and how they interact.

Ports

To serve files or data, a program has to be setup to listen for new connections. However, if a computer may only have a single address then how does a server know how to associate any given connection with a program?

Ports allow connections routed to routed to the correct program. Ports for a few common web services include:
  • 20 and 21 - File Server, FTP (File Transfer Protocol)
  • 25 - Mail Server, SMTP (Simple Mail Transfer Protocol)
  • 80 - Web Server, HTTP (Hyper-Text Transfer Protocol)
  • 443 - Web Server, HTTPS (Hyper-Text Transfer Protocol over SSL/TLS)
A list of reserved ports can be found on Wikipedia.

Only one program may listen to any one port but for web servers, at least, there may exist a way to get around this limitation.

Also of note, is that evne though many clients/services don't explicitly require you to add a port it is always there. If no port is supplied, web browsers will always use port 80. When testing web applications from a local system, it is common to use port 8080 or 8880 like so: www.example.com:8080.

Domains

A domain is usually a human readable string of characters that act as an alias to an IP address. Any single domain may be associated with a single IP address. Domains on the Internet must be registered with an authority called a Domain Registrar. You supply the registrar with the address to your host and they will associate it with the name of your domain.

A top level domain (TLD) are those such as: com, gov, net and dev, to name just a few. There are many others. As long as they are used on your local machine and a client, like a web browser, never accesses a name server to resolve it (as it would for domains on the internet) then there are no real restrictions.

Unfortunately, some web browsers, Chrome and Firefox for example, can complicate matters and it is generally recommended to steer away from common TLDs.

Multiple domains may point to the same address.

Sub Domains

Any other domains are sub-domains. Without going into detail, when someone refers to a subdomain they usually refer to the prefix domain, namely www or mobile. Moving from right to left, the least significant domain is on the right.

  • www.example.com - root, com (top level domain), example, www (least significant domain)

Virtual Hosts

A web server can host many different websites. If a web server is listening on port 80 but multiple different domains point to the same address, how would the web server know which site to route the request to?

Virtual hosts associate domain names to locations on the server. Even though both www.example.com and mobile.example.com might resolve to 192.168.0.1:80 the web server is given the value of the domain. The web server takes that domain and, provided an entry for it exists in its configuration, routes communication for each domain to the correct web directory.

Hosts File

When a client like a web server is supplied a domain name, normally it contacts a service called DNS (Domain Name Service) to resolve the address the name is associated with. When doing local development, this isn't desirable as an internal web server is usually not publicly accessible.

To overcome this limitation, most operating systems have a file that is used to define local domain names. If a domain exists in this list, then the associated IP address is used without ever connecting to a DNS server to resolve the name.

In this capacity, you could even do things like route www.google.com back at your own web server.

On most GNU/Linux systems you'll find the hosts file under the /etc directory. You would add an entry like:

www.example.com 127.0.0.1

When putting that address into your browser, the system will first check for a matching entry in /etc/hosts and, if it find one, provide the given address. In this case, the loop back address which sends the connection to the corresponding port on your system. Remember, a port is always required even though they often don't need to be supplied explicitly.

Bringing It All Together

Armed with this knowledge, a developer can setup a testing environment that mimics a production server.

If developing web sites and are in need of setting up a full stack setup for each one (by that I mean, a web server, database server and any other accoutrements) then a process similar to this could be followed:

  1. Install/Copy any files from either an existing production or a framework.
  2. Add a VHost entry for your web server.
  3. Add a Domain entry to your hosts file.
  4. Create or Import a database.
  5. Work like crazy.
With such a setup, a developer can connect to a local web server as if it were a remote server.


Of course, there are a lot more things that can be done to provide even better environments. Here are a few closing thoughts to whet your appetite.

Docker

You can use Docker to mimic a production server almost exactly or, in some cases, run docker on both production and in development to ensure 100% compatibility. This can be very useful for addressing issues with software packages and libraries being incompatible versions.

Reverse Proxy

Some environments, like PHP for instance, may require using a technique called a reverse proxy. Go development can also benefit one in instances where you need both a Go application and a web server like nginx or Apache running, too. This involves having one server accept an incoming request then forwarding the request to the same address but on a different port.