Code is a Four Letter Word: October 2015

2015-10-31

A Joke

Why do coders think that Halloween and Christmas are the same holiday?
Because OCT 31 = DEC 25

An explanation

The joke plays on the fact that the number 31 in octaldecimal (base 8, octal, or just OCT) equals 25 in decimal (base 10, or DEC - the number system most people are taught). In decimal, the second column (the 2 in our joke) represents the number of "10s" in the number, while in octal that column (the 3) represents the number of "8s". So to verify we would do 8*3 + 1 = 25.

But why is this a joke about coding? What is significant about base 8 that programmers would be interested in it?

We'll explore that by looking at a little about the binary number system (base 2) and the history of computer systems.

Binary

It is overly simplistic, but fundamentally, nearly all modern computers are built around base 2 - an electrical circuit can be on or off. This gets translated to a 0 or a 1. To build larger numbers, we use these two binary digits (bits) in much the same way we use 10 digits (0-9) to build base 10 numbers. With base 10, each column represents 10 times the column to the right, so in binary, each column represents twice the column to the right of it.

I like to build a table when I'm computing binary, so it might look something like this:

256 128 64 32 16 8 4 2 1

To convert a decimal number to binary, we find the left most column that can store our number, mark a 1 in that column, and subtract it from our number. We then keep repeating this until we get to 0. For every column we don't put a 1 in, we mark it with a 0.

To convert the number 42, for example, we might go through this process:

The largest number that fits is 32. We put a 1 in the 32 column, subtract it from 42 and get 10.
The largest number that fits into 10 is 8. We put a 1 in the 8 column, subtract, and get 2.
The largest number that fits into 2 is 2. We put a 1 in the 2 column, subtract, and get 0.
We'll then put a 0 into all the other columns.

So our table would look something like this:

256 128 64 32 16 8 4 2 1

0 0 0 1 0 1 0 1 0

If we have a binary number and need to get the decimal number, we simply put it into columns and add up those columns that have 1s in them. So given the binary number 001000101, we would write it out like this:

256 128 64 32 16 8 4 2 1

0 0 1 0 0 0 1 0 1

and add 64+4+1.

That seems well and good. But what does this have to do with octal?

Moving up to Octal

Binary numbers are good for computers, but are a bit long to always write out for humans. Decimal makes it shorter (more information dense - a subject for another time), but is more complex to figure out how the computer represents it since you'd need to do the math every time. Decimal also requires somewhere between 3 and 4 bits to represent one digit, which adds to the complexity.

Octal is convenient since three binary digits was completely represent one octit (the equivalent of a digit). To help compute octits, we can rewrite our table thusly, clustering every three columns:

256 128 64 | 32 16 8 | 4 2 1

4 2 1 | 4 2 1 | 4 2 1

We will use the top column when converting between decimal and binary, and the second column when converting between octal. So given our joke of 25 (decimal), we would write it as

256 128 64 | 32 16 8 | 4 2 1

4 2 1 | 4 2 1 | 4 2 1

0 0 0 | 0 1 1 | 0 0 1

we can then go through each cluster, and add up the columns that have 1s in them, using the values from the second row. Doing so gives us 031, which is the answer to our joke.

That seems pretty easy, right? Certainly nothing to be scared about.

Who cares?

If you were using a PDP-8 computer, you would! (And 40 years ago, if you were using a computer, there was a good chance you were using a PDP-8.) And a lot of early coding was done using a PDP-8, which was one of the most popular machines of its time. The hardware in a PDP-8 used 12 bits for most of its internal systems. This translates easily to 4 octits. Other systems used 18 or 36 bits, which corresponded to 6 or 9 octits.

Since UNIX was first written on some of these systems, you see traces of octal around the system. Most notably, the UNIX permission structure gives read, write, and execute permissions to a file. Since that can be represented as 3 bits, it translates nicely to an octit, and you can still see this reflected in the octal modes of the UNIX chmod command.

But largely, this is a relic of computing history. The wide popularity and adoption of the IBM Series/360 line of computers established the 8-bit byte as the de-facto standard in the late 60s and early 70s. The early Internet developers similarly adopted the "octet" (or 8-bit byte) when describing their Internet Protocol. Since 8 bits can't evenly be represented by octal, octal slowly fell out of favor to be replaced by hexadecimal (base 16, which uses 4 bits for one hexadigit). But that is some math for another time.

Pretty long explanation for one joke, huh? Kinda ruins the punch line. Anyway - enjoy the holiday!

2015-10-14

Dates and Division

Most modern operating systems have a basic scheduler built in. In UNIX, this is the cron command, and it can run a program based on the minute, hour, day of week, day of month, and/or month. So it lets you set a program to run every hour at 15 minutes past the hour, or every Monday, or every month on the first of the month.

But what if you want to run something every five days? For example, we may want a reminder that we need to make a phone call every so often, no matter the day of the week, or set our TARDIS to make a stop every other day. The cron scheduler doesn't have a way to specify this, and neither do many other scheduling systems. The reason is clear - our week has a prime number of days (seven) so isn't easily divisible into smaller units. And our months have inconsistent lengths, so we can't easily divide them up either.

Coding is about problem solving, and we can write some code to handle this. We'll take advantage of the built-in scheduler and then two things you can find in most programming environments:

A way to determine the number of seconds since a known point in time.
Integer mathematics, which allow us to do calculations against the number of seconds.

Let's look at each and how they help us solve our problem.

Epoch Time

Times, dates, and calendars are a tricky problem for computers because the method we have evolved, as humans, to deal with them are somewhat inconsistent. The full reasons why would be good for a totally separate blog, but we'll learn about some of them as we explore how they impact us as coders.

For now, we need a way to measure how many consistent time units there have been since a fixed point in time. Months aren't consistent since they have a variable number of days. Weeks aren't bad, but they don't divide very well. Days would be good, but even better would be the number of Seconds since it is of small granularity and can be used consistently. (Well, mostly consistently. Leap Seconds are a bit of a problem... but one that most people ignore. We'll talk about them another time.)

Most systems can provide the number of seconds since some specific point in time. This point is known as the "epoch time" and varies based on the programming environment and operating system. In UNIX, the epoch moment is 1 Jan 1970 at midnight UTC while Windows uses 1 Jan 1601 and other systems provide other reference point dates. I'm going to use UNIX epoch time in my examples, but the principles hold true for all of them.

Given this, 4:00 pm UTC on October 14th was 1444838400 seconds since the start of the UNIX epoch. If we want to verify this, we can run this command at the UNIX command line:

date -j -u 201510141600.00 "+%+ %s"

We can omit the -j and -u parameters and the time specification and change the format a bit to get the current number of seconds since the epoch. This can run in a cron job and gives us some representation of "now". That's a good start, but how can we turn this into something that happens every three days? For that, we'll need a touch of very simple math.

Divide and Conquer

Since we're going to be dealing with days, we'll need to convert seconds into days. Not a problem - we just need to know how many seconds are in a day (60 seconds * 60 minutes * 24 hours = 86400 seconds). Take the total number of seconds (1444838400 in our example above) and divide it by 86400 and we get... 16722.6666667.

That fractional part is irritating, however. It would be more useful if we knew the number of whole days. Most languages have a few ways to deal with this, and we just need to pick the best way to do so:

We can round the number to the nearest whole number of days (16723 in our example).
We can truncate the number by removing the fractional part (16722 in our example).
Some languages have "integer division", which won't return any fractional part at all, which will work much like dividing and truncating.

We're going to assume integer division in this case since it is consistent with some other math we'll be doing in a minute.

We now have the number of days in the epoch. So what? This isn't really what we want - we want to know which day of our five-day cycle we're in, given our example. Well, we divided by the number of seconds in a day to get days... so we should just divide it by 5 (using integer division) to get what we need, right? Doing so, we'll get 3344, which tells us that there have been 3344 five-day-cycles since the start of the epoch.

Which doesn't really help us. What we need to know is which day in the cycle we're in. For this, we're going to need to use a different bit of integer math called the modulo (which we sometimes shorten as mod). This is a fancy way of saying "the remainder when we divide one number by another". So 16722 mod 5 is 2.

Not all environments have a modulo operator (tho I'm baffled why not), however, but reproducing this is fairly easy given other integer operators. Remember that 3344 result we got earlier that didn't seem useful? Turns out, we can use it to get the remainder if we really needed to. Integer division means that if we take a number, divide it by a value, and then multiply it by that same value again, we may get a different result than the original number. What is the difference between the two? The remainder. We can represent this as something like:

remainder = value - ( value / divisor )

(Assuming, of course, you're doing integer division. Normal division isn't very useful here.)

Either way, we now have a remainder of 2 given our example. What does this mean? That it is the third day of the five-day-cycle we're interested in.

Wait... third day? Not the second? Nope. The first day would have a remainder of 0.

Putting it together

Now that we know which day in the cycle we're in, what do we do? Well, that depends on what we want to do. If we needed to remind us to make a phone call every five days at noon, we might do the following:

Write a program that checked which day in the cycle it was (as above) and, if it was the day in the cycle we needed, send us an email. If it isn't - it should do nothing.
Setup a cron job to call that program every day at noon. The cron job should run every day and leave the cycle-checking to the program.

That's it!

This concept of breaking a problem down into smaller parts is a common one in coding - we'll see it many more times as we explore topics. And the trick of using modulo to break down a big counter into a smaller cycle is also a good one to keep in mind - you'll see this trick used, for example, where you want to group things into clusters of a known size and need to know when you're at the beginning of the cluster.

Most important, however, is developing a solid approach to problem solving and breaking down a task to figure out the best way to approach it. That is the core of good coding.

2015-10-08

hello, world

A traditional "first program" that people write when learning a new programming language is a simple task of displaying "hello, world" as output. It isn't a great first program - it rarely illustrates much of anything about the language. But it is tradition.

So, in keeping with tradition when starting this blog about writing programs - "hello, world".

What is this, anyway?

My goal for this blog is fairly simple. At a time when business, educators, and politicians are stressing that kids should "learn to code", I think it is important to take a step back and discuss what that means and why it is important. Or at least why it is important to me - I present these as my own thoughts and opinions.

More importantly - I want to talk about why coding excites me. What it is about this... thing... that I've been doing for over 2/3rds of my life that is so enjoyable?

I will be sharing a fairly wide variety of kinds of posts. Many of my posts will be technical, but will focus on a programming concept rather than the syntax. I'm a strong believer that the "slinging code" part of coding is easy - the rest is the more difficult, and important, part. Other parts will be less technical. There will be some code examples given, particularly when I'm talking about a particular language or toolkit, but other times I may use "pseudocode" to illustrate an idea rather. Some posts will be for skilled programmers, while others will be for people who are just beginning to code or may not even code at all.

Most of what I say will focus around five things that I think are most important when it comes to coding:

The art of writing code
The science behind the code we write
The engineering that applies the science
A philosophy that many coders seem to share
Foundations of coding reflected in the history of our profession

Let me talk about each briefly now, but you can expect that we'll be talking about all these aspects in the future.

Writing Code and Computer Science

Typically, when we talk about "science", we talk about discovering fundamental laws of nature or the universe. We seek universal truths. How can we have a "science" about something that is completely artificial and created by humans?

It turns out that there are fundamental laws that underlay the code that we write. Computer Scientists will talk about algorithms and how efficient these algorithms are. At a time when computer hardware gets faster and faster, it is tempting to use inefficient algorithms - but some are so inefficient that no matter how fast our hardware is, they may never solve the problem. Others may take more memory to process than even the most powerful systems today have. Understanding these pitfalls, and how to avoid them, are hallmarks of good coders.

Software Engineering

Again, historically the field of engineering sought to take the fundamental laws discovered by science and apply them to building things of practical use. Civil engineers, for example, take the science of physics and applies it to creating things like buildings and bridges.

The same is true of software engineering, where we take the algorithms discovered by the computer scientist and apply them, and our own knowledge, into creating code that does something specific and practical. When most people think of "coding", they are probably thinking of doing something in the field of software engineering.

As I hope this blog will show, software engineering is really just a small part of the bigger picture.

The Noble Art of Writing Code

Most good coders will talk about the aesthetics of some chunk of code. We may praise our work by saying it is an "elegant hack" or criticize some code we are reviewing by condemning it as a "horrible kludge". We may even say that some code we're reviewing (never that we wrote, however) is "spaghetti code".

These are not precise terms, and good coders may disagree about the particular aesthetics of some code or a programming language, but the best coders do have a feel for how elegant code can be. It goes beyond the strict science and engineering that the code may reflect and has an inherent beauty.

Coding Philosophy - How we Think About Code

Related to the art and beauty of the code is an underlying philosophy about coding in general. In some ways, this is almost a "culture", a way of life, rather than a way of thinking. It almost becomes a cliche or a stereotype for many things, but there are some serious aspects of a coding philosophy that help us become better coders. These are principles that help us explain why some code might be more aesthetically pleasing than others.

Like many philosophies, however, sometimes these take on "religious wars" - battles for mindshare for a belief system that, fundamentally, don't really make a difference. The Windows vs UNIX battles were (and still are) along these lines, but if you want to see a real war, ask a coder if they prefer vi or emacs and beware of the answer.

The History of Computers and Coding

Some of you might be asking "what is vi or emacs and why do they have silly names?" And that is best answered by looking at the history of this field. In an industry where things are constantly changing, is it really worth looking at how we got here? I think so. And while some historical figures and actions in computing are well known and understood, others are less so, but still crucially important.

Not all of computing history is relevant to coding, but much of it has shaped the coding philosophy. I'll try to highlight some events that help explain how and why our code works and behaves the way it does today.

Let's Get Running

Hopefully that lays the groundwork for what we'll be exploring. I hope you'll join me on this journey through your comments here or via the various links I've provided about how to get a hold of me.

And most of all - get coding!

DLOAD "CODEBLOG"

LIST

RUN