(Note: This document is not, and will never be, completed - so everything is not exactly tied up in a bow at the end. None the less, you may find some useful information in it. The source is complete. You have to enable the correct #define at the start for ATmega16 [#define MEGA16] vs. ATmega32 [#define MEGA32] and you have to either comment out the 2 lines after "; 8 legs" or the 1 line after "; 4 legs" to choose walking vs. trotting. The source is currently set for ATmega16 and walking.)

V8 Code

So, you want to know how to control a bunch of servos with your AVR, eh? And you think I'm just going to reveal all of my secrets, eh? Be careful what you wish for.

Theory

Standard RC servos expect a positive pulse every 20 msec. The length of the pulse represents a position command to the servo electronics. The exact pulse length required to command a specific angle depends on the brand and model of the servo. (Unfortunately it can also vary by individual servo, servo power-supply voltage, and probably other conditions as well.) For the Futaba S3004 servos used in V8, the servos are capable of a bit more than 180 degress of rotation, the full range of which can be commanded by using pulse widths between about 0.5 msec and 2.25 msec. The servo electronics will actually respond to pulse widths outside of those limits, but such commands result in the servo trying to go past its mechanical stop. (RC servos typically include a mechanical stop to prevent the servo from rotating its internal potentiometer beyond its breaking point and/or usable range. Commanding the servo to hit the stop is hard on the servo and should be avoided.)

V8's controller board is designed to accept an ATmega16 or ATmega32 microcontroller. These Atmel mcus (micro-controllers) are rated for operation up to 16 MHz. V8 uses a crystal oscillator to run the mcu at the full 16 MHz rating. The timing needed to command the servos can be derived from this crystal oscillator. For example, to command the servo to one end of its range would require a 0.5 msec positive pulse every 20 msec, or a repeating pattern of 0.5 msec positive followed by 20 - 0.5 = 19.5 msec negative. That corresponds to 0.5x10^-3 seconds * 16x10⁶ cycles/second = 8000 clock cycles of positive, followed by 19.5x10^-3 seconds * 16x10⁶ cycles/second = 312,000 clock cycles of negative. So to command the servo to one end, the mcu just needs to set an output pin high, wait 8000 clock cycles, set the output pin low, wait 312,000 cycles, and repeat. (The repeat part is important. The servo will stop moving towards and/or not hold the target angle if the signal is dropped.) Commanding a different angle to the servo just requires using a different division of clock cycles between the positive and negative periods. Driving 16 servos just requires replicating this process for 16 different output pins with 16 different timings.

Design

V8 has 16 servos. The simplest approach to control them would be to use 16 timers in PWM (Pulse Width Modulation) mode automatically driving 16 PWM output pins and let the mcu hardware do all of the work. Unfortunately, no AVR (as of today) has that many timers or PWM outputs, and most AVRs aren't even close. So instead, general purpose I/Os (Input/Output pins) need to be configured to be output pins and driven by software, and that software needs to be driven by some small number of interrupts. (It could also be done without using interrupts - however that requires cycle counting. Using interrupts, only the servo-timing code [for the most part] needs to deal with the trickiness of using interrupts. Using cycle counting would require that the timing of all of the code [not just the servo-timing code] be very carefully and dilligently controlled. That is unless you built a virtual machine on top of the mcu which could do this inherently. But I digress.)

Since interrupts must be used both to set each output pin high and to set it low, driving 16 servos actually requires 32 events to be scheduled. One idea would be to use a single interrupt, and schedule that interrupt to be triggered 32 times during each 20 msec period. To keep the scheduler simple (or more accurately, completely avoid the need for a proper scheduler), it would be nice if the order of the events were fixed. Since the length of the pulse required by each servo is (at least in theory) completely independent, this requires preventing the potential active periods from two servos from overlapping. (Otherwise their order might vary and then a dynamic scheduler is needed to sort things out.)

For example, we could try to repeat the following during each 20 msec period:

Interrupt Trigger Count Servo Output New Output Setting
1 1 high
2 low
3 2 high
4 low
5 3 high
6 low
... ... ...
31 16 high
32 low
Single Interrupt Drives 16 Servos

Single Interrupt Drives 16 Servos
Interrupt Trigger Count	Servo Output	New Output Setting
1	1	high
2	low
3	2	high
4	low
5	3	high
6	low
...	...	...
31	16	high
32	low

However, there is a problem with that idea. Say servo #1 turns on at time 0. Then servo #1 may turn off any time between 0.5 msec and 2.25 msec. So the period from 0 to 2.25 msec is the active period for servo #1. (This is modulus 20 msec. Since each servo needs to be triggered every 20 msec, the period from 20+0 to 20+2.25 msec is also the active period for servo #1.) In other words, using this scheme a 20 msec period only allows for the scheduling of at most floor(20 / 2.25) = 8 servos on a single interrupt. Since V8 has 16 servos, we need two interrupts. So the real picture looks like this:

InterruptA Trigger Count Servo Output New Output Setting Time (msec, modulus 20 msec)
1 1 high 0
2 low 0 + servo #1 pulse width
3 2 high 2.5
4 low 2.5 + servo #2 pulse width
5 3 high 5
6 low 5 + servo #3 pulse width
... ... ... ...
15 8 high 17.5
16 low 17.5 + servo #8 pulse width
First Interrupt (InterruptA) Drives First 8 Servos

First Interrupt (InterruptA) Drives First 8 Servos
InterruptA Trigger Count	Servo Output	New Output Setting	Time (msec, modulus 20 msec)
1	1	high	0
2	low	0 + servo #1 pulse width
3	2	high	2.5
4	low	2.5 + servo #2 pulse width
5	3	high	5
6	low	5 + servo #3 pulse width
...	...	...	...
15	8	high	17.5
16	low	17.5 + servo #8 pulse width

InterruptB Trigger Count Servo Output New Output Setting Time (msec, modulus 20 msec)
1 9 high 0
2 low 0 + servo #9 pulse width
3 10 high 2.5
4 low 2.5 + servo #10 pulse width
5 11 high 5
6 low 5 + servo #11 pulse width
... ... ... ...
15 16 high 17.5
16 low 17.5 + servo #16 pulse width
Second Interrupt (InterruptB) Drives Next 8 Servos

Second Interrupt (InterruptB) Drives Next 8 Servos
InterruptB Trigger Count	Servo Output	New Output Setting	Time (msec, modulus 20 msec)
1	9	high	0
2	low	0 + servo #9 pulse width
3	10	high	2.5
4	low	2.5 + servo #10 pulse width
5	11	high	5
6	low	5 + servo #11 pulse width
...	...	...	...
15	16	high	17.5
16	low	17.5 + servo #16 pulse width

Now for some (most serendipitous) details...

We can defer setting the next trigger until after the current trigger goes off. That way we never have to specify more than 2.5 msec (or 2.5x10^-3 * 16x10⁶ = 40,000 cycles) into the future. That allows a 16-bit timer to be used. (Like any number between 32,768 [2¹⁵] and 65,535 [2¹⁶-1], 40,000 requires 16 bits to encode.) The ATmega16 and ATmega32 chips just happen to have a 16-bit timer: "16-bit Timer/Counter1" (called Timer1 in this text).

(If they didn't have a 16-bit timer it wouldn't prevent this solution from being used. But it would either mean you couldn't use the full 16 MHz resolution, or it would get a lot uglier and CPU-intensive because you would have to synthesize a 16-bit timer from something less capable, like an 8-bit timer.)

So we've got a timer - but we need two independent interrupts and the chips only have a single 16-bit timer. No problem. Timer1 has two "Output Compare Units". These are controlled with "Output Compare Register A" (or OCR1A - the "1" comes from the timer #) and "Output Compare Register B" (or OCR1B). How it works is that I let Timer1 spin continuously (keep counting from 0 to 65,535 [2¹⁶-1] over and over again). I then use OCR1A to schedule triggers for the first interrupt, and OCR1B to schedule triggers for the second interrupt. When Timer1 becomes exactly equal to the value in OCR1A, it triggers the "Timer/Counter1 Compare Match A" interrupt (called InterruptA in this text). When Timer1 becomes exactly equal to the value in OCR1B, it triggers the "Timer/Counter1 Compare Match B" interrupt (called InterruptB in this text).

Using a cyclical timer may seem a bit odd the first time you are exposed to it. (You have been warned.) Now take a look:

InterruptA Trigger Count Servo Output New Output Setting Time (msec) OCR1A setting (mcu cycles)
1 1 high 0 0x10^-3 * 16x10⁶ mod 2¹⁶ = 0
2 low 0 + servo #1 pulse width (depends on pulse width - somewhere between 1 and 39,999)
3 2 high 2.5 2.5x10^-3 * 16x10⁶ mod 2¹⁶ = 40,000
4 low 2.5 + servo #2 pulse width (depends on pulse width - somewhere between 40,001 and 65,535, or between 0 and 14,463)
5 3 high 5 5x10^-3 * 16x10⁶ mod 2¹⁶ = 14,464
6 low 5 + servo #3 pulse width (depends on pulse width - somewhere between 14,465 and 54,463)
... ... ... ... ...
15 8 high 17.5 17.5x10^-3 * 16x10⁶ mod 2¹⁶ = 17,856
16 low 17.5 + servo #8 pulse width (depends on pulse width - somewhere between 17,867 and 57,855)
17 1 high 20 20x10^-3 * 16x10⁶ mod 2¹⁶ = 57,856
18 low 20 + servo #1 pulse width (depends on pulse width - somewhere between 57,857 and 65,535, or between 0 and 32,319)
19 2 high 22.5 22.5x10^-3 * 16x10⁶ mod 2¹⁶ = 32,320
20 low 22.5 + servo #2 pulse width (depends on pulse width - somewhere between 32,321 and 65,535, or between 0 and 6,783)
21 3 high 25 25x10^-3 * 16x10⁶ mod 2¹⁶ = 6,784
22 low 25 + servo #3 pulse width (depends on pulse width - somewhere between 6,785 and 46,783)
... ... ... ... ...
Cyclical Timing of InterruptA

Cyclical Timing of InterruptA
InterruptA Trigger Count	Servo Output	New Output Setting	Time (msec)	`OCR1A` setting (mcu cycles)
1	1	high	0	0x10^-3 * 16x10⁶ mod 2¹⁶ = 0
2	low	0 + servo #1 pulse width	(depends on pulse width - somewhere between 1 and 39,999)
3	2	high	2.5	2.5x10^-3 * 16x10⁶ mod 2¹⁶ = 40,000
4	low	2.5 + servo #2 pulse width	(depends on pulse width - somewhere between 40,001 and 65,535, or between 0 and 14,463)
5	3	high	5	5x10^-3 * 16x10⁶ mod 2¹⁶ = 14,464
6	low	5 + servo #3 pulse width	(depends on pulse width - somewhere between 14,465 and 54,463)
...	...	...	...	...
15	8	high	17.5	17.5x10^-3 * 16x10⁶ mod 2¹⁶ = 17,856
16	low	17.5 + servo #8 pulse width	(depends on pulse width - somewhere between 17,867 and 57,855)
17	1	high	20	20x10^-3 * 16x10⁶ mod 2¹⁶ = 57,856
18	low	20 + servo #1 pulse width	(depends on pulse width - somewhere between 57,857 and 65,535, or between 0 and 32,319)
19	2	high	22.5	22.5x10^-3 * 16x10⁶ mod 2¹⁶ = 32,320
20	low	22.5 + servo #2 pulse width	(depends on pulse width - somewhere between 32,321 and 65,535, or between 0 and 6,783)
21	3	high	25	25x10^-3 * 16x10⁶ mod 2¹⁶ = 6,784
22	low	25 + servo #3 pulse width	(depends on pulse width - somewhere between 6,785 and 46,783)
...	...	...	...	...

One thing which may be surprising is that when we finish processing all 8 servos the first time, and then come back around to servo #1, the correct OCR1A setting is not 0 like it was the first time. (It's 57,856.)

So what exactly does everything in that table mean? It means this:

At the start of time we set OCR1A to 0 (and do all of the other configuration required to set up and enable Timer1 and InterruptA).
At some point in time (arbitrarily call that time "0"), Timer1 reaches 0 and InterruptA is triggered.
- InterruptA sets servo output #1 to high.
- InterruptA then places a new value in OCR1A, for example if the pulse width for servo #1 is 1 msec it sets OCR1A to (0+1)x10^-3 seconds * 16x10⁶ cycles/second = 16,000.
At 1 msec Timer1 reaches 16,000 and InterruptA is triggered.
- InterruptA sets servo output #1 to low.
- InterruptA then places the next value in OCR1A, this time 40,000.
At 2.5 msec Timer1 reaches 40,000 and InterruptA is triggered.
- InterruptA sets servo output #2 to high.
- InterruptA then places a new value in OCR1A, for example if the pulse width for servo #2 is 1 msec it sets OCR1A to (2.5 + 1)x10^-3 seconds * 16x10⁶ cycles/second = 56,000.
At 3.5 msec Timer1 reaches 56,000 and InterruptA is triggered.
- InterruptA sets servo output #2 to low.
- InterruptA then places the next value in OCR1A, this time 14,464.
(At 4.096 msec Timer1 wraps around from 2¹⁶-1 to 0.)
At 5 msec Timer1 reaches 14,464 and InterruptA is triggered.

...and so forth.

Implementation

First things first. ATmega chips use a fixed-address jump vector for all of their interrupts. There's not enough room there to do much of anything, so you generally need to drop some type of branch statement there instead. rjmp has enough range for my purposes and is faster than a jmp:

   rjmp cma_interrupt ; timer/counter1 compare match A (InterruptA)
   nop
   rjmp cmb_interrupt ; timer/counter1 compare match B (InterruptB)
   nop

That was the easy part. Now, before getting started with any of the actual interrupt code, there are a whole series of tricks needed. (Actually they are not needed. But they make the interrupts fairly fast and accurate, and using them makes everyone think you're cool. Really!)

Trick #1: Have the interrupt code enable further interrupts ASAP.

The interrupt is going to have to do some book-keeping and update some port registers. The timing of the port-register updates is important, so allowing other interrupts before finishing that task would be a bad idea. On the other hand, the timing of the book-keeping activity is not important, so allowing other interrupts to happen during the book-keeping phase will prevent the timing-critical parts of other interrupts from being delayed.

So, update the ports first, then enable interrupts, then do any book-keeping.

(Yeah, this means your interrupts can be interrupted. Wussies will say you should never do this. That's what makes them wussies.)

Trick #2: Interrupts can be enabled sooner than you might realize.

The sei (Set Global Interrupt Flag) instruction which is used to enable interrupts guarantees that the instruction following the sei will be executed before any pending interrupts. This means that it can be placed one instruction earlier than the "obvious" place.

Trick #3: Reserve registers for the interrupts.

Saving and loading registers in interrupts is a real drag. ATmegas have 32 registers. That's a full ton on the register scale. (V8 doesn't even use all of those registers.) By reserving some registers for use by interrupts (and not using them in non-interrupt code), those registers don't need to be saved/restored in the interrupt code. In the specific case of V8, the time-critical portion of the interrupt is the updating of the port registers. So:

Reserve one register for each port.
In the port-register-upate phase of a given interrupt, just send the reserved register directly to the correct port via an out instruction.
During the book-keeping phase of a given interrupt load the reserved register with the correct value for the next invocation of the interrupt.

Simple, no? The registers (r2 through r5) are reserved thusly:

; these are reserved for the compare match interrupts
.def PORTA_reg=r2
.def PORTB_reg=r3
.def PORTC_reg=r4
.def PORTD_reg=r5

Remember, those aren't the actual ports - just registers used to buffer values on their way to the ports. The symbolic names for the actual port addresses do not include the _reg suffix:

; 1 is high (output)
.equ PORTA = $1B
.equ PORTB = $18
.equ PORTC = $15
.equ PORTD = $12

Trick #4: Divide the work cleanly.

On V8, the ATmega pins which generate the servo signals are spread across all four I/O ports. (Like many 8-bit mcus, ATmegas have the concept of an I/O port, which is a collection of up to 8 I/O pins which can be controlled by writing a single, byte-wide register. The board could have just used two ports for the servos, but when I had my PCB layout hat on I decided that was not a good idea. Wiring up the board like that on a two layer PCB would have required hacking away large chunks of the ground plane to route traces on the back of the board.)

So there are four ports and two interrupts. It just happens to be the case that exactly four pins of each port are wired as servo outputs. If the correct servo outputs are assigned to each interrupt, then each interrupt will only have to update two ports instead of all four ports. (Technically each interrupt invocation need only update one port. However figuring out which of the two ports to update would actually take more cycles than just updating both ports. There's a trick for getting around that, but the trick isn't applicable on AVRs and other MCUs whose interrupt vector table is in flash intead of RAM.)

Trick #5: Dynamic vector chaining.

Each time the interrupt is triggered it will need to do something different. Sometimes it will set a bit. Sometimes it will clear a bit. It may be this bit or that bit, this port or that port.

There are several ways to solve this issue. For example, each time the interrupt is triggered it could increment a counter, modulus the number of different tasks through which it must cycle. It could then use that counter to index into a table of ports, operations, and bit-masks which describe what must be done. Or it could use that counter to jump through a branch table to the correct piece of code for a given invocation of the interrupt. The branch table has a performance advantage over the data table - instead of having to interpret table data, it just executes code that does exactly what is needed. Interestly enough (cue Twilight Zone music), this advantage of branch tables can be pressed into optimizing (nearly out of existence) the branch table itself.

Instead of doing these operations on the "front end" (i.e., before reaching the task-specific code):

Load index.
Increase index (modulus number of tasks).
Save index.
Use index to get target address from branch table.
Jump to target address.

We can reduce that to:

Load target address from fixed location in memory.
Jump to target address.

This optimization incurs some (fairly small) cost on the "back end" (i.e., in the task-specific code):

Store (known, hard-coded) target address for next invocation of interrupt to fixed location in memory.

(This can actually be optimized even further - but that will be covered later.)

Here is some pseudo-code which should help make things more clear:

initialize_program:
   MEM[target_address] <- address_of(interrupt_task1)
   <configure and enable interrupt>
   <...>

interrupt:
   move reg <- MEM[target_address]
   jump reg

interrupt_task1:
   <task1-specific code goes here>
   MEM[target_address] <- address_of(interrupt_task2) ; we just executed task1 - make it so the next interrupt executes task2
   return

interrupt_task2:
   <task2-specific code goes here>
   MEM[target_address] <- address_of(interrupt_task3) ; we just executed task2 - make it so the next interrupt executes task3
   return

...

interrupt_taskN:
   <taskN-specific code goes here>
   MEM[target_address] <- address_of(interrupt_task1) ; we just executed taskN - make it so the next interrupt loops back to task1
   return

Taking a "Goedel, Escher, Bach" view of things, this solution for the interrupt task address, the solution for the interrupt timing, and the solution for the port values all bear a striking similarity to each other. (I.e., part of processing each interrupt is to define the when (timing), where (address) and what (port value) of the next invocation of the interrupt.)

Now for something completely different - actual code!

With all of those tricks under our belt, we are now ready to write a whopping 10 lines of code. (Hey, that's a full day's work in the biz.)

cma_interrupt:
   out PORTA, PORTA_reg   ; update servo signal
   sei                    ; (make atomic to guarantee timing -
   out PORTB, PORTB_reg   ;  don't let low-priority/slow interrupts execute)

   push ZH                ; save ZH
   push ZL                ; save ZL
   in ZL, SREG            ; get &
   push ZL                ;   save status register (SREG)
   lds ZL, cma_vector     ; get target address low-byte
   lds ZH, cma_vector + 1 ; get target address high-byte
   ijmp                   ; jump to target address (ZH:ZL)

Do it again (the next day of course) for the Compare Match B interrupt:

cmb_interrupt:
   out PORTC, PORTC_reg   ; update servo signal
   sei                    ; (make atomic to guarantee timing -
   out PORTD, PORTD_reg   ;  don't let low-priority/slow interrupts execute)

   push ZH                ; save ZH
   push ZL                ; save ZL
   in ZL, SREG            ; get &
   push ZL                ;    save status register (SREG)
   lds ZL, cmb_vector     ; get target address low-byte
   lds ZH, cmb_vector + 1 ; get target address high-byte
   ijmp                   ; jump to target address (ZH:ZL)

The above code represents the "generic, front end" portion of the interrupt code. The "task-specific, back end" code is not described by this document (see the source instead). However, the last part of the "back end" code is actually generic, and by necessity must be a sort of mirror image of part of the above code, so is presented here:

   pop ZL                 ; restore
   out SREG, ZL           ;    status register (SREG)
   pop ZL                 ; restore ZL
   pop ZH                 ; restore ZH
   reti                   ; return from interrupt

Trick #6: Glitch free, delay free updates via properly ordered double-updates.

Note that the "front end" code in the previous section just overwrites the entire 8 bits of each port, even though the interrupt code is only responsible for 4 bits of each port (the servo signals). How are you supposed to modify the other port pins in non-interrupt code without the interrupt code and the non-interrupt code messing each other up? Time for another trick.

There are three parts to this trick:

The sbi and cbi instructions can be used to atomically change individual port bits, which avoids blowing away changes made by interrupts. This doesn't solve the problem by itself though, because if you change a bit from the non-interrupt code, then when the interrupt code executes it will just blow away your change.
The or and and instructions can be used to atomically change individual bits in the registers reserved for the interrupts. That way, when the interrupt overwrites the port, it will preserve the port pin values written by the non-interrupt code.
Do #2 before #1. Doing #1 before #2 can cause a "glitch" - a couple of port pin transitions that were never intended. This can happen if #1 changes the port pin, an interrupt comes in and blows away the change, then #2 modifies the interrupt-reserved register, and then the next interrupt undoes the mistake of the first interrupt - but only after untold damage was already done.

Wrap it up in some macros (for safety's sake of course), and you get something like this:

; these are reserved for use by macros
.def mv1=r16
.def mv2=r17

; These macros set/clear port bits without disabling interrupts,
; without glitches, and without waiting.  (They are needed because
; the primary owners of the ports and associated registers are the
; compare match interrupts.)
;
; @0 = name of port (e.g., PORTA)
; @1 = number of bit (e.g., 3)
.macro safe_sbi
   ldi mv1, (1 << @1)
   or @0_reg, mv1
   sbi @0, @1
.endmacro
.macro safe_cbi
   ldi mv1, ~(1 << @1)
   and @0_reg, mv1
   cbi @0, @1
.endmacro

Note that when the first macro parameter (@0) passed in is, for example, PORTA, the @0_reg construct resolves to PORTA_reg (which in turn resolves to register r2). An example of invoking this macro looks like this:

   safe_sbi PORTA, 6 ; turn on green LED

There is a consequence to the use of this trick (which itself was only needed as a consequence of previous tricks). And that is that the interrupt code, when updating its reserved registers, must only use atomic updates targeted to the specific bits it "owns" (the servo signal bits). (For example, and instructions which only clear servo bits and or instructions which only set servo bits are OK. A ldi instruction which assigns all bits would be bad.) Otherwise the interrupt code could blow away changes made to a reserved register in part 2 above, and subsequently mess up the port values. Oh, what a tangled web we weave.

Trick #7: Vector optimization.

For the Dynamic Vector Chaining trick, pseudo-code was presented that looked like this:

   MEM[target_address] <- address_of(interrupt_task)

The pseudo-code storage location MEM[target_address] is actually cma_vector for InterruptA and cmb_vector for InterruptB, as shown in the "front end" interrupt code. A real implementation for InterruptA code could look like this:

   ldi ZL, LOW(interrupt_task)
   sts cma_vector, ZL
   ldi ZH, HIGH(interrupt_task)
   sts cma_vector + 1, ZH

With 16 servos, there will be 32 different instances of the "back end" interrupt code. They won't be all that big and will be right next to each other in memory. The result is that the above code would generally be replacing the high-byte of the target address with the same value it had last time. The following macro takes the old address (@0) and new address (@1) and writes optimized code for performing the update.

   ; note that conditional assembly expressions must not include forward references,
   ; so to handle forward references the vector_optimization must be disabled
.if vector_optimization
 .if LOW(@0) != LOW(@1)
   ldi ZL, LOW(@1)
   sts cm_vector, ZL
 .endif
 .if HIGH(@0) != HIGH(@1)
   ldi ZH, HIGH(@1)
   sts cm_vector + 1, ZH
 .endif
.else
   ldi ZL, LOW(@1)
   sts cm_vector, ZL
   ldi ZH, HIGH(@1)
   sts cm_vector + 1, ZH
.endif

The End

(This document left intentionally incomplete. Go read the source for the full scoop.)

Last Modified: June 3rd, 2006
V8 main page: http://my.execpc.com/~steidl/walkers/v8/index.html
home: http://my.execpc.com/~steidl/
e-mail: steidl@execpc.com