------- start of forwarded message ------- Newsgroups: comp.arch.fpga,comp.arch,comp.arch.embedded,sci.electronics.design Path: news.interlog.com!newsfeed.interlog.com!cyclone.news.idirect.com!island.idirect.com!Supernews60!supernews.com!uunet!in2.uu.net!world!jhallen From: jhallen@world.std.com (Joseph H Allen) Subject: New free FPGA CPU Message-ID: Date: Tue, 27 Oct 1998 04:50:38 GMT Organization: The World Public Access UNIX, Brookline, MA Lines: 183 Xref: news.interlog.com comp.arch.fpga:12299 comp.arch:45862 comp.arch.embedded:42310 sci.electronics.design:910293 Remember the last comp.arch discussion about minimalist CPUs about six months ago? It started, as usual, with the discussion about 1-instruction CPUs and the like, but it ended with a discussion about the best CPU you could fit in a small FPGA? Well XACT just finished the first successful place&route of my new small CPU so I should be ready to put it up on the net in the next few days. I'm pretty sure that I'm going to release it as copylefted freeware. Before I make this first release I thought I'd get one cycle of comments on it, before the instruction set is cast in stone. The original goal was to make the best possible CPU in the limited resources available in the cheapest Xilinx FPGA. This is what I've come up with: - Fits in Xilinx XC5202 FPGA. This is Xilinx's cheapest FPGA: DIGIKEY sells them for $10.15 in single quantities or $7.40 in hundreds for the 84-pin PLCC version. It has 64 CLBs, each with four FFs and four 4-input FMAPs. - The design is in Xilinx XACT in OrCAD schematics. I'll release the OrCAD source, postscript printouts of the schematics and a Xilinx macro of the CPU. - It uses 143 FFs and 192 FMAPs and runs at 9MHz in the -5 part. I should be able to get this a bit faster. It would almost certainly nearly double in speed if I use 40 more FFs for a decode pipeline register, but I think the extra space is more valuable for peripherals. I'll use the remaining space for a timer, a serial port and parallel ports. The minimal computer will be this FPGA, a byte-wide ram and rom, and an oscillator. The FPGA loads itself from the ROM, which then also contains the boot code. I'm hoping that it will compete with microcontrollers. The extra space in the FPGA could be used with a huge variety of possible peripherals (PLL, LCD interface, IDE-interface, floppy interface, printer interface, keyboard, video, who knows what else?). I want to collect a library of these on my ftp site. - It is not compatible with anything. I was originally considering making a 6502 clone, but this is smaller and better (and I didn't want to implement decimal mode). I'll write an assembler for it, hopefully for this first release (I need to write one to test it anyway). - Unlike the 6502, no operations occur in sequence with memory accesses. I.E., the ALU operation time does not detract from the memory performance. However, I do ALU operations on writes, but this is less of a problem than reads or address calculation. - The CPU has the following programmable registers: 16-bit accumulator, 16-bit program counter, 16-bit stack pointer, and 5-condition flag bits (the usual carry, overflow, negative, zero and interrupt enable). - It can operate on 16-bit words, unsigned bytes and signed bytes. - It has the following addressing modes: immediate, stack (stack-pointer plus 8-bit offset), and indexed (pick one of the first four words on the stack and add a 7-bit offset). Note that it does not have direct addressing- you have to put the address on the stack first. I chose to make it powerful enough to use data structures easily above the convenience of direct addressing. - It is by no means a RISC processor; instead it more like a classic 8-bit CPU. Instructions take 2-9 cycles to execute, where one cycle equals one memory access. Yes, the hard-coded schematic instruction sequencer was a big pain. - It has interrupts. When there's an interrupt, all of the registers are pushed on the stack. To return from the interrupt, pop all of the registers off of the stack using the 'pop' instruction. 'pop' is also used for subroutine returns. - Unconditional and conditional jumps take a 16-bit destination address. I would have done the usual pc-relative branches, but this saves space instruction sequencer space. - Instructions are either 2 or 4 bytes and are encoded as follows: 1 m w a a a a r r d d d d d d d indexed 0 m w a a a a 1 d d d d d d d d stack 0 1 w a a a a 0 i i i i i i i i immediate 0 0 1 c c c c 0 - - - - - - - - jump on condition (jcc) 0 0 0 1 1 1 1 0 d d d d d d d d jump to subroutine indirect (jsr) 0 0 0 1 1 1 0 0 d d d d d d d d jump indirect (address on stack) 0 0 0 1 1 0 1 0 - - - - - - - - jump to subroutine direct (jsr) 0 0 0 1 1 0 0 0 - - - - - - - - jump direct (jmp) 0 0 0 1 0 1 1 0 - - - - - a f p push registers (psh) 0 0 0 1 0 1 0 0 - - - - - a f p pop registers (pop) 0 0 0 1 0 0 1 0 v v v v v v v 0 software interrupt (brk) 0 0 0 1 0 0 0 0 - - - - - - - - extra op-code space 0 0 0 0 - - - - - - - - - - - - extra op-code space Where: m=1: register/memory instructions m=0: read-modify-write instructions cccc: condition code rr: base address no. (first four stack words) ddddddd: 7-bit offset dddddddd: 8-bit offset iiiiiiii: 8-bit immediate value. 16-bit immediate values appear in the following word. a f p: accumulator, flags, program counter: i.e., which regs to push/pop vvvvvvv: software interrupt vector no. aaaa: op-code ----: unused read-modify-write op codes: 0000 sta store accumulator word stab store accumulator byte 0001 sts store stack pointer 0010 lsl logical shift left word lslb logical shift left byte 0011 rol rotate left word rolb rotate left byte 0100 lsr logical shift right word lsrb logical shift right byte 0101 ror rotate right word rorb rotate right byte 0110 asr arithmetic shift right word asrb arithmetic shift right byte 0111 com 1's complement word comb 1's complement byte 1000 neg negate word negb negate byte 1001 cng carry-negate cngb carry-negate byte 1010 dec decrement word decb decrement byte 1011 csb subtract carry from word csbb subtract carry from byte 1100 inc increment word incb increment byte 1101 cad add carry to word cadb add carry to byte 1110 clr clear word clrb clear byte 1111 - extra rmw code register-memory (and register-immediate) op codes: 0000 nop 0001 tst test word tstbx test byte sign extended 0010 or or word orb or byte 0011 xor xor word xorbx xor byte sign extended 0100 bit bit-test word bitb bit-test byte 0101 and and word andbx and byte sign exteneded 0110 add add word addb add byte 0111 adc add with carry word addbx add byte sign extend 1000 cmp compare word cmpb compare byte 1001 cwc compare word with carry cmpbx compare byte sign extend 1010 sub subtract word subb subtract byte 1011 sbc subtract word with carry subbx subtract byte sign extend 1100 lda load word ldab load byte 1101 lda load word ldabx load byte sign extend 1110 lds load stack pointer ldsb load stack pointer from byte 1111 adds add word to stack pointer addsbx add sign extended byte to stack pointer - loads, stores and add to stack pointer do not affect the flags, everything else does (except that logical instructions do not affect carry or overflow). - I haven't decided where the reset or interrupt vectors or addresses are going to be yet. - It uses a two-phase clock system for controlling the internal tri-state ALU B-side bus. Basically the oscillator needs to be double the clock frequency. Anyway, that's it. Let me know what you think. -- /* jhallen@world.std.com (192.74.137.5) */ /* Joseph H. Allen */ int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0) +r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2 ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);} ------- end of forwarded message -------