Code is a Four Letter Word: 2015

2015-11-02

What does a 200-year-old professor from Ireland and a 900-year-old Jedi Master have in common?

See page for author [Public Domain],
via Wikimedia Commons

This is not another joke. But the question in the title ponders what George Boole has in common with Yoda. The answer, of course, is one of Yoda's classic aphorisms: "Do. Or do not. There is no try." which parallels the mathematical logic system created by George Boole - what we now know as Boolean Logic.

George Boole was the first to express the mathematics behind the logical operations we use today: XOR (which he saw as binary addition modulo 2) and AND (which was binary multiplication). The OR and NOT operations were added later. And although he was something of a contemporary to Charles Babbage, the two never met to discuss their similar ideas, although Boole was very interested in automation. His mathematical logic also foretold some AI operations, as he sought to use his theory to help explain how people thought and reasoned.

The XOR operation takes two binary numbers, adds them together, and then just looks at the rightmost digit for the result. It is an "exclusive OR" - the result will be 1 if, and only if, only one of the values being added is 1. We can make a simple table:

0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 0

Even simpler, the AND operation also takes two binary numbers and multiplies them. The result will be 1 if, and only if, both of the values are 1. Another simple table:

0 x 0 = 0
1 x 0 = 0
0 x 1 = 0
1 x 1 = 1

Boolean logic in coding

Decades after Boole's work, Shannon realized that his math could be applied to digital circuits, and thus to the computing that was evolving in the 1930s. We're most familiar with these operators today in slightly different forms. Instead of the XOR of Boole's time, we have a disjunctive OR which is more similar to how we use the word "or" in English.

0 v 0 = 0
0 v 1 = 1
1 v 0 = 1
1 v 1 = 1

The alert among you might figure out that a OR b == (a XOR b) XOR (a AND b).

Combined with NOT, which turns a 0 into a 1 and a 1 into a 0, we have the fundamental operations that most of us might use in an "if" statement, or many other conditional operations today in most programming languages.

Many languages are willing to play fast and loose with the concept of what represents false and true. Most agree that a zero will be false - but that anything non-zero ends up being true. Some languages are more strict - they have a boolean type and only things that evaluate to boolean types can be used with AND, OR, and NOT operations. Languages that aren't as strict are easier to work with - but also easier to make mistakes with. (Many is the time that a programmer meant to compare two values, ended up assigning one to the other, and getting a comparison result that was wrong.)

Combining boolean logic with multiple bits allow computers to do masking and quick comparisons and are a the heart of most graphics drivers and programming language implementations.

Most, but not all

All of this seems pretty straightforward, but it was pretty revolutionary when Boole came up with it in the 19th century. And it isn't guaranteed to be valid even in all programming languages today. For example, SQL appears to implement boolean logic, but actually has a third possible value which throws all the logic conditions out the window. Since SQL values can be null, which cannot be compared to non-null values, you can get operations which, when tested, and when their negation is tested, both evaluate to false. But that is a complication for another time.

2015-11-01

Don't count your seconds before they're hatched

or Why coders hate Daylight Saving Time.

Today marks the end of Daylight Saving Time in the US - what other countries sometimes call
Summer Time. A legislative attempt to reconcile how people use a clock with how the sun behaves seasonally. I'm not really going to debate or discuss the merits vs pitfalls of DST since they're not relevant at the moment, but I'll explore how, as programmers, we can't always assume that certain physical constants are actually constant.

Like the number of seconds between noon on two consecutive days.

The Problem

Most of the time - this isn't really a problem. One day has 24 hours, each hour has 60 minutes, each minute has 60 seconds. That comes to 86400 seconds in a day. What could be simpler?

Except on days like today when DST ends. That means that at 2am Daylight Saving Time it became 1am Standard Time. So between the midnight when the day began and the midnight when it ends, there was an extra hour which corresponds to an extra 3600 seconds. So there are 90000 seconds today. The opposite is true when DST begins - at 2am Standard Time it becomes 3am Daylight Saving Time, so there are only 82800 seconds in the day.

It gets even more confusing when you consider that the rules for DST aren't consistent. The weekend they change has changed over time, and there were even some times when it was in effect year-round. Some locations didn't have DST and have adopted it, while others don't need it for their location.

And don't think this is just about days and odd times of the year. We've gotten used to it, but keep in mind that there are a variable number of days in the year when you consider leap years. And there is a hot debate about the use of the leap second to keep atomic clocks in sync with astronomical clocks.

But so what? Aren't coders wise enough to all this to avoid any problems? Well...

The Impact

There have been a surprising number of cases where this wasn't true. Time and again, operating systems have been wrong on the day after DST begins or ends because they computed it wrong. Many schedulers have done things like failed to run jobs because an hour was skipped over during the change to DST, or run twice during the change away from it.

It gets even more confusing when you throw in time zones and consider that different time zones may apply the DST rules at different times of the year (such as just happened now - Europe and the US left summer time at different times).

For example, what if you're writing a calendar program. One common task is to schedule something for "2:00 every afternoon". A simplistic solution might have you do this:

You'll store this task as the time the event first happens in number of seconds since some date (an epoch time).
You'll then compute every other time this will happen by adding a multiple of 86400 seconds to that.

So what would that have scheduled?

October 30th, 2pm Daylight Saving Time
October 31st, 2pm Daylight Saving Time
November 1st, 1pm Standard Time
November 2nd, 1pm Standard Time

That doesn't look right!

It shows up in other places as well. What if you're writing a program that lets you view how much energy you've used every hour for the day before? Most days will be no problem - 24 hours in the day, 24 lines. But one day a year will have 25 hours, and one day will have 23. How will you show this so it isn't confusing? If someone asks for the 1am reading, what will you return?

The Solution

Well... there isn't one.

There are several tricks we can try, of course, but there is no "one size fits all" solution to this issue. It really depends on the exact case.

For our calendar example above, we need to do the calculations and take Daylight Saving into account. If we're storing it in a database, we'll need to store what we want ("every day at 2pm") rather than the results of the calculation so we can re-calculate if DST rules change.

When it comes to displaying days with more than 24 hours, we just have to be aware that they will happen and leave room for them. For queries for 1am (rare, but they will happen, you can count on that), we need to be able to handle situations where more than one result will be returned or find a way to specify which version of 1am we want - the Standard Time version or the Saving Time version.

Above all, however, we have to keep in mind that this isn't a straightforward issue... and that the time that we're reporting has to be used by people, who are using time in their own ways and for their own purposes.

Don't waste time - keep quirks like these in mind as you program.

2015-10-31

A Joke

Why do coders think that Halloween and Christmas are the same holiday?
Because OCT 31 = DEC 25

An explanation

The joke plays on the fact that the number 31 in octaldecimal (base 8, octal, or just OCT) equals 25 in decimal (base 10, or DEC - the number system most people are taught). In decimal, the second column (the 2 in our joke) represents the number of "10s" in the number, while in octal that column (the 3) represents the number of "8s". So to verify we would do 8*3 + 1 = 25.

But why is this a joke about coding? What is significant about base 8 that programmers would be interested in it?

We'll explore that by looking at a little about the binary number system (base 2) and the history of computer systems.

Binary

It is overly simplistic, but fundamentally, nearly all modern computers are built around base 2 - an electrical circuit can be on or off. This gets translated to a 0 or a 1. To build larger numbers, we use these two binary digits (bits) in much the same way we use 10 digits (0-9) to build base 10 numbers. With base 10, each column represents 10 times the column to the right, so in binary, each column represents twice the column to the right of it.

I like to build a table when I'm computing binary, so it might look something like this:

256 128 64 32 16 8 4 2 1

To convert a decimal number to binary, we find the left most column that can store our number, mark a 1 in that column, and subtract it from our number. We then keep repeating this until we get to 0. For every column we don't put a 1 in, we mark it with a 0.

To convert the number 42, for example, we might go through this process:

The largest number that fits is 32. We put a 1 in the 32 column, subtract it from 42 and get 10.
The largest number that fits into 10 is 8. We put a 1 in the 8 column, subtract, and get 2.
The largest number that fits into 2 is 2. We put a 1 in the 2 column, subtract, and get 0.
We'll then put a 0 into all the other columns.

So our table would look something like this:

256 128 64 32 16 8 4 2 1

0 0 0 1 0 1 0 1 0

If we have a binary number and need to get the decimal number, we simply put it into columns and add up those columns that have 1s in them. So given the binary number 001000101, we would write it out like this:

256 128 64 32 16 8 4 2 1

0 0 1 0 0 0 1 0 1

and add 64+4+1.

That seems well and good. But what does this have to do with octal?

Moving up to Octal

Binary numbers are good for computers, but are a bit long to always write out for humans. Decimal makes it shorter (more information dense - a subject for another time), but is more complex to figure out how the computer represents it since you'd need to do the math every time. Decimal also requires somewhere between 3 and 4 bits to represent one digit, which adds to the complexity.

Octal is convenient since three binary digits was completely represent one octit (the equivalent of a digit). To help compute octits, we can rewrite our table thusly, clustering every three columns:

256 128 64 | 32 16 8 | 4 2 1

4 2 1 | 4 2 1 | 4 2 1

We will use the top column when converting between decimal and binary, and the second column when converting between octal. So given our joke of 25 (decimal), we would write it as

256 128 64 | 32 16 8 | 4 2 1

4 2 1 | 4 2 1 | 4 2 1

0 0 0 | 0 1 1 | 0 0 1

we can then go through each cluster, and add up the columns that have 1s in them, using the values from the second row. Doing so gives us 031, which is the answer to our joke.

That seems pretty easy, right? Certainly nothing to be scared about.

Who cares?

If you were using a PDP-8 computer, you would! (And 40 years ago, if you were using a computer, there was a good chance you were using a PDP-8.) And a lot of early coding was done using a PDP-8, which was one of the most popular machines of its time. The hardware in a PDP-8 used 12 bits for most of its internal systems. This translates easily to 4 octits. Other systems used 18 or 36 bits, which corresponded to 6 or 9 octits.

Since UNIX was first written on some of these systems, you see traces of octal around the system. Most notably, the UNIX permission structure gives read, write, and execute permissions to a file. Since that can be represented as 3 bits, it translates nicely to an octit, and you can still see this reflected in the octal modes of the UNIX chmod command.

But largely, this is a relic of computing history. The wide popularity and adoption of the IBM Series/360 line of computers established the 8-bit byte as the de-facto standard in the late 60s and early 70s. The early Internet developers similarly adopted the "octet" (or 8-bit byte) when describing their Internet Protocol. Since 8 bits can't evenly be represented by octal, octal slowly fell out of favor to be replaced by hexadecimal (base 16, which uses 4 bits for one hexadigit). But that is some math for another time.

Pretty long explanation for one joke, huh? Kinda ruins the punch line. Anyway - enjoy the holiday!

2015-10-14

Dates and Division

Most modern operating systems have a basic scheduler built in. In UNIX, this is the cron command, and it can run a program based on the minute, hour, day of week, day of month, and/or month. So it lets you set a program to run every hour at 15 minutes past the hour, or every Monday, or every month on the first of the month.

But what if you want to run something every five days? For example, we may want a reminder that we need to make a phone call every so often, no matter the day of the week, or set our TARDIS to make a stop every other day. The cron scheduler doesn't have a way to specify this, and neither do many other scheduling systems. The reason is clear - our week has a prime number of days (seven) so isn't easily divisible into smaller units. And our months have inconsistent lengths, so we can't easily divide them up either.

Coding is about problem solving, and we can write some code to handle this. We'll take advantage of the built-in scheduler and then two things you can find in most programming environments:

A way to determine the number of seconds since a known point in time.
Integer mathematics, which allow us to do calculations against the number of seconds.

Let's look at each and how they help us solve our problem.

Epoch Time

Times, dates, and calendars are a tricky problem for computers because the method we have evolved, as humans, to deal with them are somewhat inconsistent. The full reasons why would be good for a totally separate blog, but we'll learn about some of them as we explore how they impact us as coders.

For now, we need a way to measure how many consistent time units there have been since a fixed point in time. Months aren't consistent since they have a variable number of days. Weeks aren't bad, but they don't divide very well. Days would be good, but even better would be the number of Seconds since it is of small granularity and can be used consistently. (Well, mostly consistently. Leap Seconds are a bit of a problem... but one that most people ignore. We'll talk about them another time.)

Most systems can provide the number of seconds since some specific point in time. This point is known as the "epoch time" and varies based on the programming environment and operating system. In UNIX, the epoch moment is 1 Jan 1970 at midnight UTC while Windows uses 1 Jan 1601 and other systems provide other reference point dates. I'm going to use UNIX epoch time in my examples, but the principles hold true for all of them.

Given this, 4:00 pm UTC on October 14th was 1444838400 seconds since the start of the UNIX epoch. If we want to verify this, we can run this command at the UNIX command line:

date -j -u 201510141600.00 "+%+ %s"

We can omit the -j and -u parameters and the time specification and change the format a bit to get the current number of seconds since the epoch. This can run in a cron job and gives us some representation of "now". That's a good start, but how can we turn this into something that happens every three days? For that, we'll need a touch of very simple math.

Divide and Conquer

Since we're going to be dealing with days, we'll need to convert seconds into days. Not a problem - we just need to know how many seconds are in a day (60 seconds * 60 minutes * 24 hours = 86400 seconds). Take the total number of seconds (1444838400 in our example above) and divide it by 86400 and we get... 16722.6666667.

That fractional part is irritating, however. It would be more useful if we knew the number of whole days. Most languages have a few ways to deal with this, and we just need to pick the best way to do so:

We can round the number to the nearest whole number of days (16723 in our example).
We can truncate the number by removing the fractional part (16722 in our example).
Some languages have "integer division", which won't return any fractional part at all, which will work much like dividing and truncating.

We're going to assume integer division in this case since it is consistent with some other math we'll be doing in a minute.

We now have the number of days in the epoch. So what? This isn't really what we want - we want to know which day of our five-day cycle we're in, given our example. Well, we divided by the number of seconds in a day to get days... so we should just divide it by 5 (using integer division) to get what we need, right? Doing so, we'll get 3344, which tells us that there have been 3344 five-day-cycles since the start of the epoch.

Which doesn't really help us. What we need to know is which day in the cycle we're in. For this, we're going to need to use a different bit of integer math called the modulo (which we sometimes shorten as mod). This is a fancy way of saying "the remainder when we divide one number by another". So 16722 mod 5 is 2.

Not all environments have a modulo operator (tho I'm baffled why not), however, but reproducing this is fairly easy given other integer operators. Remember that 3344 result we got earlier that didn't seem useful? Turns out, we can use it to get the remainder if we really needed to. Integer division means that if we take a number, divide it by a value, and then multiply it by that same value again, we may get a different result than the original number. What is the difference between the two? The remainder. We can represent this as something like:

remainder = value - ( value / divisor )

(Assuming, of course, you're doing integer division. Normal division isn't very useful here.)

Either way, we now have a remainder of 2 given our example. What does this mean? That it is the third day of the five-day-cycle we're interested in.

Wait... third day? Not the second? Nope. The first day would have a remainder of 0.

Putting it together

Now that we know which day in the cycle we're in, what do we do? Well, that depends on what we want to do. If we needed to remind us to make a phone call every five days at noon, we might do the following:

Write a program that checked which day in the cycle it was (as above) and, if it was the day in the cycle we needed, send us an email. If it isn't - it should do nothing.
Setup a cron job to call that program every day at noon. The cron job should run every day and leave the cycle-checking to the program.

That's it!

This concept of breaking a problem down into smaller parts is a common one in coding - we'll see it many more times as we explore topics. And the trick of using modulo to break down a big counter into a smaller cycle is also a good one to keep in mind - you'll see this trick used, for example, where you want to group things into clusters of a known size and need to know when you're at the beginning of the cluster.

Most important, however, is developing a solid approach to problem solving and breaking down a task to figure out the best way to approach it. That is the core of good coding.

2015-10-08

hello, world

A traditional "first program" that people write when learning a new programming language is a simple task of displaying "hello, world" as output. It isn't a great first program - it rarely illustrates much of anything about the language. But it is tradition.

So, in keeping with tradition when starting this blog about writing programs - "hello, world".

What is this, anyway?

My goal for this blog is fairly simple. At a time when business, educators, and politicians are stressing that kids should "learn to code", I think it is important to take a step back and discuss what that means and why it is important. Or at least why it is important to me - I present these as my own thoughts and opinions.

More importantly - I want to talk about why coding excites me. What it is about this... thing... that I've been doing for over 2/3rds of my life that is so enjoyable?

I will be sharing a fairly wide variety of kinds of posts. Many of my posts will be technical, but will focus on a programming concept rather than the syntax. I'm a strong believer that the "slinging code" part of coding is easy - the rest is the more difficult, and important, part. Other parts will be less technical. There will be some code examples given, particularly when I'm talking about a particular language or toolkit, but other times I may use "pseudocode" to illustrate an idea rather. Some posts will be for skilled programmers, while others will be for people who are just beginning to code or may not even code at all.

Most of what I say will focus around five things that I think are most important when it comes to coding:

The art of writing code
The science behind the code we write
The engineering that applies the science
A philosophy that many coders seem to share
Foundations of coding reflected in the history of our profession

Let me talk about each briefly now, but you can expect that we'll be talking about all these aspects in the future.

Writing Code and Computer Science

Typically, when we talk about "science", we talk about discovering fundamental laws of nature or the universe. We seek universal truths. How can we have a "science" about something that is completely artificial and created by humans?

It turns out that there are fundamental laws that underlay the code that we write. Computer Scientists will talk about algorithms and how efficient these algorithms are. At a time when computer hardware gets faster and faster, it is tempting to use inefficient algorithms - but some are so inefficient that no matter how fast our hardware is, they may never solve the problem. Others may take more memory to process than even the most powerful systems today have. Understanding these pitfalls, and how to avoid them, are hallmarks of good coders.

Software Engineering

Again, historically the field of engineering sought to take the fundamental laws discovered by science and apply them to building things of practical use. Civil engineers, for example, take the science of physics and applies it to creating things like buildings and bridges.

The same is true of software engineering, where we take the algorithms discovered by the computer scientist and apply them, and our own knowledge, into creating code that does something specific and practical. When most people think of "coding", they are probably thinking of doing something in the field of software engineering.

As I hope this blog will show, software engineering is really just a small part of the bigger picture.

The Noble Art of Writing Code

Most good coders will talk about the aesthetics of some chunk of code. We may praise our work by saying it is an "elegant hack" or criticize some code we are reviewing by condemning it as a "horrible kludge". We may even say that some code we're reviewing (never that we wrote, however) is "spaghetti code".

These are not precise terms, and good coders may disagree about the particular aesthetics of some code or a programming language, but the best coders do have a feel for how elegant code can be. It goes beyond the strict science and engineering that the code may reflect and has an inherent beauty.

Coding Philosophy - How we Think About Code

Related to the art and beauty of the code is an underlying philosophy about coding in general. In some ways, this is almost a "culture", a way of life, rather than a way of thinking. It almost becomes a cliche or a stereotype for many things, but there are some serious aspects of a coding philosophy that help us become better coders. These are principles that help us explain why some code might be more aesthetically pleasing than others.

Like many philosophies, however, sometimes these take on "religious wars" - battles for mindshare for a belief system that, fundamentally, don't really make a difference. The Windows vs UNIX battles were (and still are) along these lines, but if you want to see a real war, ask a coder if they prefer vi or emacs and beware of the answer.

The History of Computers and Coding

Some of you might be asking "what is vi or emacs and why do they have silly names?" And that is best answered by looking at the history of this field. In an industry where things are constantly changing, is it really worth looking at how we got here? I think so. And while some historical figures and actions in computing are well known and understood, others are less so, but still crucially important.

Not all of computing history is relevant to coding, but much of it has shaped the coding philosophy. I'll try to highlight some events that help explain how and why our code works and behaves the way it does today.

Let's Get Running

Hopefully that lays the groundwork for what we'll be exploring. I hope you'll join me on this journey through your comments here or via the various links I've provided about how to get a hold of me.

And most of all - get coding!

DLOAD "CODEBLOG"

LIST

RUN