Mini microcontroller breakout board

bpiphany

06 Dec 2013, 11:51

One day I felt the need, the need to go smaller..
abraham.JPG
abraham.JPG (494.48 KiB) Viewed 4868 times
I've posted some about it in one of my threads on that other forum. Now that it is finished and tested I think it deserves a thread of its own. It's a bit smaller than the Teensy. It needs an external USB port and reset button.
halfsize.JPG
halfsize.JPG (600.08 KiB) Viewed 4867 times
With right angle header pins it turns into an SMT device. The leads could be clipped a bit to make it narrower, and the legs turned inwards to make it look like a J-lead package.
smd.JPG
smd.JPG (366.13 KiB) Viewed 4771 times
It has an ATmega32u2 microprocessor and three HC138 decoders giving a total of 24 mutually exclusively selectable outputs (active low [or high with HC238]) plus 15 general purpose IO pins (3 of them with PWM). The rows are spaced by 0.300", and the pitch is 0.050". The only required additional components are the USB connector and some type of reset switch.
The schematic for those interested. KiCAD schematic symbol and PCB module are attached in the zip file.
mini_schematic.png
mini_schematic.png (147.6 KiB) Viewed 4868 times
output.png
output.png (24.82 KiB) Viewed 4773 times


I also built a test/programming board for it (since there are no applications to test it in, yet). Sorry about the poor light in the blinky video.
tester.JPG
tester.JPG (382.02 KiB) Viewed 4868 times
Assembled I think it could be very useful in DIY keyboard projects. I was able to build a couple by hand soldering. That was far from smooth, and for people in general to do it would be too much to ask for. With a solder paste stencil, hot air or an oven I manage to get it almost right every time.. They usually need some touching up afterwards. The components are just so tiny and finely pitched. A good professional manufacturing line should of course have little trouble producing them.

Now it just needs an application to be used in. Perhaps I will have to make one myself to get things started... =D More images http://www.flickr.com/photos/67486915@N03/
Attachments
mini_controller_kicad.zip
KiCAD modules for the mini controller
(1.49 KiB) Downloaded 172 times
Last edited by bpiphany on 08 Dec 2013, 12:07, edited 3 times in total.

User avatar
matt3o
-[°_°]-

06 Dec 2013, 12:31

Incredible work bpiphany, thanks for sharing.

I do not have the skill to solder those micro components, but that tiny controller might be very useful for custom keyboards.

User avatar
Broadmonkey
Fancy Rank

06 Dec 2013, 15:09

Looking good. I am impressed by your soldering skills, as this does not look like a walk in the park. Did you use your toaster oven you made?
I like you have done away with the onboard USB port, it really made the Teensy bulky (if you could say that), makes me think a board like this, but maybe a bit easier to solder, would be better for suited for keyboard projects.

Findecanor

06 Dec 2013, 16:16

Hmm.. Pretty hardcoded for use as a keyboard controller, Up to 24 strobing pins and 15 sending pins.
Hmm.. The ATmega32u2 has the same clock as the atmega32u4 in the Teensy 2.0, but it has less than half the SRAM (1K vs 2.5K) and no multiply-instructions. Also a somewhat less capable USB interface, but probably more than enough for a USB keyboard.

bpiphany

08 Dec 2013, 12:26

Thanks all of you =) It did take some messing around to solder them with an iron.. I successfully made two that way. They are meant to be oven baked though, and come pre-assembled. It's small enough to solder the 0.050" pitch headers...

I haven't baked any in the oven yet. I hadn't built it by then. The test board on the other hand was made in the oven. I was a bit worried about the LEDs, but they took it without trouble. Very convenient for the 176 solder points!

I didn't know about it missing a multiply instruction. Are you saying multiplication is implemented in software? avr-gcc seems to take care of it, so no need for writing your own multiplication routines at least..

I added KiCAD modules and more schematic stuff to the first post.

User avatar
Soarer

08 Dec 2013, 16:13

Damn you and your love of '2 series :evil:

Maybe when I release v2 of my firmware I can support these creations. I'm going to have to make my USB code far more flexible, but some of that was on the to-do list anyway.

Aside from the tiny amount of USB buffer memory (176 vs 832 in the '32U4), smaller RAM (1k vs 2.5k) and the reduced number of USB endpoints (4 vs 6), I've just noticed that only two endpoints can be double-buffered, which adds an extra complication :(

Findecanor

08 Dec 2013, 16:15

bpiphany wrote:I didn't know about it missing a multiply instruction. Are you saying multiplication is implemented in software? avr-gcc seems to take care of it, so no need for writing your own multiplication routines at least..
I suppose so, but I don't know specifically about avr-gcc. I expect the compiler to be smart and choose the solution that uses the least memory: a call to library routine for multiplication between two arbitrary values and a few inline shifts and additions for multiplication with a constant. Any multiplication of two constants should already have been computed by the compiler and not incur any extra instructions.

The thing to remember is that multiplication between two arbitrary values is slower on the ATmega32u2 than on the ATmega32u4.
Anyway, your AVR-Keyboard firmware does not seem to contain any multiplication of any arbitrary values. ;)

User avatar
Soarer

08 Dec 2013, 16:29

Findecanor wrote:Anyway, your AVR-Keyboard firmware does not seem to contain any multiplication of any arbitrary values. ;)
It is fairly uncommon, but also, neither has a shift-by-n instruction (which is used quite often!). For example, var1 << var2 is a loop. It's not really an issue except in rare circumstances, and there's usually a workaround for those.

Findecanor

08 Dec 2013, 18:30

Soarer wrote:It is fairly uncommon, but also, neither has a shift-by-n instruction (which is used quite often!). For example, var1 << var2 is a loop.
Yeah, that bugs me a little. I chose to use bitfields for a data structure because I thought it would be fast and constant time, and then I read that it wouldn't be constant time ... not slow, though.

bpiphany

09 Dec 2013, 08:47

Soarer wrote:Damn you and your love of '2 series :evil:
At least in this application it couldn't have been done (or I wasn't able to do it?) with a u4 =D

Bit shifts feel quite funky to have been left out. I would have thought they were the easiest thing ever to implement... My code does row*col left-one-shifts per scan. I expect the compiler to be able to translate that to an x=x+x operation =) The LED setting function does a couple of shift operations as well. Perhaps it would be a good idea doing something smarter by hand there. Or simply run it less often =)

An 8-bit multiplication should be possible to set up in quite few lines without a loop. No ifs and buts or loop variables to handle. I don't know how the compiler does it though. I suspect there will be a function call adding to the overhead for each multiplication as well.

User avatar
Soarer

09 Dec 2013, 15:08

They have bit shifts, but only by one bit at a time. It's not a big deal, just that it sometimes helps to be aware of it.

For example, this would not be a good pattern for dealing with a bit array:

Code: Select all

for ( i = 0; i < 64; ++i ) {
    byte_offset = i >> 3; // not too bad, constant shift, but still 3 in a row
    bit_offset = i & 7;
    mask = 1 << bit_offset; // looped because of variable shift
    if ( bit_array[byte_offset] & mask ) {
        do_something_for_bit(i); // this could be bad if the routine also does the above offset calcs
    }
}
Nested loops would generate faster code:

Code: Select all

for ( ibyte = 0; ibyte < 8; ++ibyte ) {
    mask = 1;
    for ( ibit = 0; ibit < 8; ++ibit ) {
        if ( bit_array[ibyte] & mask ) {
            do_something_for_bit(ibyte, ibit, mask);
        }
        mask = mask << 1;
    }
}

bpiphany

10 Dec 2013, 12:37

I'm sure there are infinitely smarter ways to do it, using less registers and tests. I'm pretty sure it only works for unsigned ints as well =)

Code: Select all

#include <stdio.h>

int main(void) {
  unsigned int a, b, s, t;
  scanf("%u %u\n", &a, &b);

  s = b; t = 0;
  t += (a & 0b00000001)? s: 0; s += s;
  t += (a & 0b00000010)? s: 0; s += s;
  t += (a & 0b00000100)? s: 0; s += s;
  t += (a & 0b00001000)? s: 0; s += s;
  t += (a & 0b00010000)? s: 0; s += s;
  t += (a & 0b00100000)? s: 0; s += s;
  t += (a & 0b01000000)? s: 0; s += s;
  t += (a & 0b10000000)? s: 0;
  
  printf("%u × %u = %u\n", a, b, t);
  return 0;
}

Findecanor

10 Dec 2013, 18:59

I think that AVR-GCC should be able to unroll the loop itself, but you can't always be 100% sure.

bpiphany

10 Dec 2013, 20:42

Findecanor wrote:I think that AVR-GCC should be able to unroll the loop itself, but you can't always be 100% sure.
I'm just doing it as a homework assignment... avr-gcc even provides the functionality, just put a*b in the code. I'm pretty sure it does something very very smart already =D Perhaps even hand tweaked assembler.

User avatar
Soarer

11 Dec 2013, 14:29

Actually, I think a multiply could be done as a loop which terminates when one side has run out of bits, so it completes faster when either operand is small. Something like:

Code: Select all

uint mul(uint a, uint b)
{
    if ( a > b ) swap(a, b);
    uint s = b;
    uint t = 0;
    while ( a ) {
        if ( a & 1 ) {
            t += s;
        }
        s = s << 1;
        a = a >> 1;
    }
    return t;
}

User avatar
cookie

11 Dec 2013, 15:28

Wow what a nice controller! SO SMALL!

Where coul'd I get one?

Findecanor

11 Dec 2013, 16:34

My wish list for bpiphany's next board:
- An AVR with more SRAM/EEPROM, for macros. :evilgeek:
- LED driver for a TLK keyboard matrix, or just shift-registers that can drive them enough and resistors.
:)

bpiphany

12 Dec 2013, 16:16

Soarer wrote:Actually, I think a multiply could be done as a loop which terminates when one side has run out of bits, so it completes faster when either operand is small. Something like:
What was the status on those shift operations? A left shift is just an add by the value itself, but how do you do a right shift without shifting?..

My guess is that a fixed set of operations can be made shorter when omitting the extra tests for reaching zero. A constant operation highly optimized. It would surprise me if it was even remotely clear what the multiplication routine does looking at the code =)
cookie wrote:Wow what a nice controller! SO SMALL!

Where coul'd I get one?
I ran out of components, but I have some PCBs left. I will order components for the remaining ones the next time I have a reason to reach a $100-free-shipping-order from Mouser or DigiKey =) In the meantime I may have one or two which came out alright from my prototyping so far... Do you have an idea what you would use it for? It's not very useful on it's own =P
Findecanor wrote:My wish list for bpiphany's next board:
- An AVR with more SRAM/EEPROM, for macros. :evilgeek:
- LED driver for a TLK keyboard matrix, or just shift-registers that can drive them enough and resistors.
:)
LED driver I think should be a separate unit. The biggest drawback with the 32u2 to me is the lack of I2C. Obviously useful for split keyboards or other expansions, say a LED driver for example ;)



The Teensy is just a tad too narrow to straddle a row of MX switches effortlessly. Also soldering it into place prohibits access to the switch solder joints below it. putting it in sockets builds a lot of height. Not very good for ease of case design, or overall height.
There are bottom entry sockets that would accept the header pins through the board. Most of them that I've found are a bit high to fit under a mounting plate. I still haven't found the ultimate option to have a detachable controller that doesn't take too much room. Placing controller components directly onto the main board is still the only option to get them out of the way. On keyboards with less dense areas this is a smaller problem.

User avatar
Soarer

12 Dec 2013, 16:53

Soarer wrote:Actually, I think a multiply could be done as a loop which terminates when one side has run out of bits, so it completes faster when either operand is small. Something like:
bpiphany wrote:What was the status on those shift operations? A left shift is just an add by the value itself, but how do you do a right shift without shifting?..
Oh, they all have various shift and rotate instructions, but they only work one bit at a time. So shifting by some other constant might be done as a sequence of shifts or as a loop, depending on whether the compiler is optimizing for speed or space, and shifting by a variable would result in a loop (possibly unrolled with certain compiler options set, but not usually). The latter case is what I meant by "var1 << var2" :)
bpiphany wrote:My guess is that a fixed set of operations can be made shorter when omitting the extra tests for reaching zero. A constant operation highly optimized. It would surprise me if it was even remotely clear what the multiplication routine does looking at the code =)
I found a bug report about __mulhi3 which listed the code. With bug, but it is readable... and very much like my guess! :D

Code: Select all

   +00000F0F: 2755      CLR     R21
   +00000F10: 2744      CLR     R20
   +00000F11: FF80      SBRS    R24,0
   +00000F12: C002      RJMP    +0x0002           ; Destination: 0x000F15
   +00000F13: 0F46      ADD     R20,R22
   +00000F14: 1F57      ADC     R21,R23
   +00000F15: 0F66      ADD     R22,R22
   +00000F16: 1F77      ADC     R23,R23
   +00000F17: 0561      CPC     R22,R1
   +00000F18: F021      BREQ    +0x04             ; Destination: 0x000F1D
   +00000F19: 9596      LSR     R25
   +00000F1A: 9587      ROR     R24
   +00000F1B: 0591      CPC     R25,R1
   +00000F1C: F7A1      BRNE    -0x0C             ; Destination: 0x000F11
   +00000F1D: 2F95      MOV     R25,R21
   +00000F1E: 2F84      MOV     R24,R20
   +00000F1F: 9508      RET
edit: Yeah, those 'compare with carry' (CPC) tests are clearly wrong! It's testing the low byte of t (R22) and then the high byte of a (R25), when it should be just comparing a (R25,R24) with 0 (without carry, I think). (R1 is just used to keep a zero handy, btw). So, corrected and translated back to C, it looks like:

Code: Select all

uint mul(uint a, uint b)
{
    uint t = 0;
    do {
        if ( a & 1 ) {
            t += b;
        }
        b += b;
        a = a >> 1;
    } while ( a );
    return t;
}
Oh well. I liked my idea of swapping a and b to get the smaller value in a, clearly they didn't :lol:

Post Reply

Return to “Workshop”