### Preliminaries

"God made the integers, all the rest is the work of man."
Leopold Kronecker

Our computers are really good at doing math. When we write programs, we can just throw some equations at the computer and it will dutifully calculate them very fast. But is it fast enough? How can we make it faster? It is true that our computers are really good at math. And they are really fast. But they are better and faster at some things than others. What I have to say here applies mostly to the smaller processors we tend to use on our robots and other embedded devices. It also applies to modern PC type computers, but not as much. AVR(Arduino) processors, PICs, and ARM chips are typical. I will be using some terms that you may not be very familiar with, so let's explain those first. First is the "clock cycle," or just "cycle" or "clock." I will use this a lot. Every (well almost every, but that's another story) processor has a clock, or oscillator, that times all it's actions. The faster the clock on any given processor, the faster it can get work done. On older processors it normally took from 2 to 16 clock cycles for the processor to perform a single assembly language operation. Most modern processors like the AVR and ARM can perform many instructions in a single clock cycle. A 20 MHZ AVR chip has 20 million clock cycles in one second. A 50 MHZ LPC1114 has 50 million clock cycles in one second. A 3 GHz PC has 3 billion clock cycles in one second. Performance on different processors varies widely, so using clock cycles is usually the best way to describe how long something will take. I will use it frequently. Even on most modern processors that can execute one instruction per clock normally, there are almost always some that take more than one clock cycle.

The next term I will use is "math unit" or "integer math unit" or "floating point math unit" abbreviated "fpu." I may also use the term "ALU" for "arithmetic logic unit." All the processors we use have one or more parts inside that actually do the math. Some are very simple and can't do much more than a simple addition. Some are very complex and can do very complex math, like calculating trig functions (sine, cosine, tangent) without help. The processors we will be discussing most tend to fall toward the lower end. What types of arithmetic the processor can do on its own without help will have a big part in how we write our code to make it efficient. Now that we have the preliminaries out of the way, let's get to it.

### Integers vs Floating Point

Do you remember learning about scientific notation in school? You probably found it a bit more difficult and confusing than working with "normal" numbers. Same for the computer. One thing that should be noted: the code to perform floating point on our simple processors is rather large, often several kilobytes. If you don't perform ANY floating point math, they won't be included and the space will be available for other uses.

### Avoid Division Whenever Possible

How about division? Do you remember learning that in school? Addition, subtraction, and multiplication were fairly simple. They have simple rules that you follow to get the answer. But division took some guess work, and chances are you probably still pull out a calculator whenever you need to divide large numbers. Again, the same is true with the computer. The process the computer uses to divide is much like the way we do it and even involves some guess work. As a result, division is more complex and normally takes longer, even on processors that have hardware divide instructions. On those processors it is not unusual for a divide to take two to 32 times longer than multiply, even for integers. And many processors that do have mulitply instructions don't even have a divide instruction, which makes it take much longer: the division has to be done as a separate program piece. Floating point division is even worse and typically might take twice as long as multiply on a machine with extra hardware, and longer on machines without it. A floating point division is a rather complex program all by itself. On top of that, it is full of chances to introduce inaccurate results.

It is often rather easy and almost always much faster to avoid division when your program is running by dividing ahead of time. Especially with floating point numbers. How do we do that? One thing I see a lot is code to convert the number of microseconds an ultrasonic module takes to receive an echo into a distance in millimeters. The code will usually look something like this:
``` distance_mm = microseconds / 171.0; ``` That code causes the program to (obviously) divide the number of microseconds by 171.0 ( the .0 on the end makes sure the compiler does a floating point division.) But what if we put this line up at the start of our program so it only runs once:
``` float distance_divisor = 1/171.0; ``` and use this line instead:
``` distance_mm = microseconds * distance_divisor; ``` We calculate the division only once and replace the division that gets done many times with a multiply! That will speed up our code. The result will be the same (or at least close enough) so we don't lose anything except wasted time. As a bonus, when the compiler sees that 1/171.0 it knows all the values. With integers it would actually do the division BEFORE your program ever runs, at the time it is compiling. It doesn't do that with floating point for reasons I won't go into here. This is called "constant folding" in case you care.

With integers it is a bit harder to convert division into multiplication. But there are some nifty tricks we can use. First, why is it harder? Well, as we saw above, dividing by a number n is the same as multiplying by 1/n; For any integer other than 0, that will yield an answer less than 1, which in integers can only be 0. But how about we use a form of number scaling to accomplish the same thing? Let me explain. What happens when you take a decimal number, say 110, and drop off the rightmost digit, moving (or shifting) the other digits one place to the right? You end up with 11, which happens to be 110 divided by the base of our number system, ten. So an easy way to divide by ten is to "shift" the number right one place. You can divide by any power of ten: 1 (power of 0), 10 (power of 1), 100 (2), 1000 (3), etc. What happens if you are using base 2 (binary), the number system used in computers? Well, let's have a look. Since computers deal with fixed numbers of bits, I will write the binary numbers that way. Take the number 27 in binary:
27 = 00011011b
shift all the bits one to the right and fill the empty space on the left with a 0:
00001101b = 13
We lose the one that drops off the right and end up with 27/2 dropping the remainder, or 13. We can divide by any power of 2 that way: 2, 4, 8, 16, 32, 64, etc. Well, that's handy if we need to divide by a power of 2. But what about other numbers. Sometimes, a power of 2 is "close enough" and we can just use the closest one. But what if it isn't?

For instance, let's look at that ultrasonic calculation from earlier: distance_millimeters = microseconds / 171; The two closest powers of 2 are 128 and 256. Neither is very close, and probably not "close enough." What to do? Let's go back to how we did it with floating point. Start by dividing 1/171. My calculator gives me 5.8479532 * 10^ -3 power, or 0.0058479532. Not much good for integer arithmetic.  But let's multiply this by a convenient power of 2, say 1024 (2^10). I get 5.988 and some more digits that won't matter. Round that off to the nearest integer of 6, which is pretty close. Now, since we have mulitiplied by 1024 we can use this calculation:
distance_millimeters = microseconds * 6 >> 10;
We multiply by 6. The ">>" operator in many languages including C, C++, C#, and java, means "shift right" by the number of bits that follows, in this case 10. We saw above that is division by 2^10, or 1024. We ended up with a multiply by 1024 ahead of time to get the 6, then a divide by 1024 in the code. The two 1024s cancel in the end, we have some nice integer arithmetic that is probably close enough, and things end up well. But what did we gain? We traded a division for a shift. Did that help. Yes. It turns out that pretty much any processor can perform a shift in one clock cycle per shifted bit. In this case, ten clock cycles. Sometimes, on say an 8 bit processor doing shifts on values larger than 8 bits, it may take 2 or 4 clocks per shifted bit. But that is the slowest it gets. A ten bit shift in a sixteen bit int on an Arduino will take less than 20 clocks. Much better than the several dozen for a divide. On other processors, like most ARM processors, the processor has special hardware called a "barrel shifter" that can shift any number of bits up to 32 in a single clock! That is a huge improvement! So much so, that the barrel shifter is a very common addition to processors, and most compilers will try to convert any division by a power of 2 to a shift.

This morning I was writing some code to convert millimeters to inches. The "correct" way is to divide by 25.4. But since 1/25.4 is .03937, I mulitipled that number by 1024 to get 40.31488 which I rounded to 40. Then, I did the calculation with this:
``` inches = mm * 40 >> 10; // multiply by 40 and divide by 1024 ```
which is equivalent to multiplying by .0390625, or dividing by 25.6. Close enough. That was what prompted me to write this tip.

### Never Put off Until Runtime What You Can Do at Compile Timer (or Earlier)

When we write a program in C, C++, Arduino's bastardized mix of the two, java, or many other languages, it must be "compiled" before we can run it. Modern compilers are great and will do a lot to make our code more efficient. They can't read our minds, though, and sometimes we need to give them a little help as we've seen above. I mentioned earlier that if you have an equation with all integer constants the compiler will often perform that calculation at "compile time," or when it is compiling your code into machine language. That way, it doesn't have to perform it every time your program runs, over and over. It normally won't do that for floating point numbers because that might lead to rounding and calculation errors. But often, we can be reasonably sure those errors won't be a problem, so we can do some of those calculations ourselves ahead of time. Or, we can use integers where appropriate and simplify our equations so that the constants are together and available for the compiler to calculate ahead of time. Sometimes I see people write stuff like this:
``` distance = microseconds / 86 /2.0; ```
If that were an integer calculation the compiler would probably do the 86/2 ahead of time and only have to do one division at runtime, or when the program is actually running. With floating point, it won't. It will do BOTH divisions. We can help out a lot by simple doing this:
``` distance = microseconds/172.0; ```
Then the compiler only has to do one division. As we saw above, there are better ways for that particular calculation, but similar opportunities pop up a lot. If you absolutely must use the more complex calculations, try to make them as simple for the computer as you can.

### The End

I have been at this game for a very long time. I learned to program on 2 MHz Z-80s and 1 Mhz 6502s that took several clocks to perform the simplest instructions. They could only add or subtract usually 8 bit numbers. Anything more complex took loads of code to accomplish. Often the programs were written in assembly language, and the few compilers available were very primitive and didn't do much to make your code better. Learning to program efficiently was simply a necessity. Now our development tools are much better, but we can still help the compiler out and make the programs run better or accomplish more in any given amount of time.

I hope what I've written here is helpful. There are lots of little tricks one can do to make programs more efficient. I have only scratched the surface and addressed things I often see in code on the web. Thanks for reading and good luck.