Diary
From TriPU
| Table of contents |
29.01.2006
Still watching and surviving :) (PS: Putting stuff on someone elses wiki page is not really considered "hacking" or even "cracking" ;)
22.09.2005
Moan. My schedule sort of exploded during the last weeks :( I'm just posting to assure you that TriPU is not abandoned, I will continue my work whenever there's time to. Meanwhile I thank everyone who's still watching :) PS: You can also register and put the diary page on your watchlist to be notified whenever anything changes.
11.08.2005
I'm back. Again ;) Also, I finished the subversion repository, which will now hold any TriPU-related code. If you wanna see it, check it out [here (http://svn.triphoenix.de/listing.php?repname=TriPU+Project&path=%2F&sc=0)] or for your svn client http://svn.triphoenix.de/repos/tripu
15.07.2005
The semester is over, at least the part with lectures. Now coming up around 3,5 months without lectures, but with some exams and stuff :) Nevertheless finally time for projects again. The next two weeks I'll be gone to a silent place and work on things I like, finally :) So I don't know if I can keep you up to date during these weeks, but evtneually I'll be back and shove in some results :)
03.07.2005
Finished migrating triphoenix.de to another server. Finally I can build up a subversion repository for all kind of code I write, especially for the TriPU files. Details will be announced when Subversion is up & running :)
23.06.2005
Reworked some of the specs, especially Instruction encoding. Started some documentation (although it's not much yet) in Microarchitecture, as told before, having someone to work with suddenly points out all the documentation mistakes, I have done :) Anyway, the rgeister pictures have to be updated, I'm rebuilding hades right now (lost some files ;) to fix that, but probably not today anymore :)
16.06.2005
All machines up & running again, I seem to attract filesystem damage ;) On a note aside Justus started some work on the simulator. This has some benefit for the design process as well (beyond having a simulator ;) as I now have to describe exactly what TriPU does/should do to someone else, I sort of review my specifications and fix inconsistencies. Right now re-checking the register part :)
10.06.2005
Darn, my main machine is breaking apart (the filesystem especially), working on backup/recovery/reinstallation :(
17.05.2005
Carried on development on another front. Currently my designs of the control unit aren't very satisfactory, so I put that down to work on another part. When I first designed the I/O-Interface, especially the PC/MAR register, I still was designing an 8-Bit architecture. This resulted in 24-Bit address registers which could be loaded and used in 8-Bit blocks (e.g. you could load the middle 8 bits and then store the upper 8 bits). This isn't very useful in the new 32-Bit model, so I redesigned this block that the PC and MAR register can be loaded and stored in a stream of 8-Bit blocks, just like the rest of the values in TriPU. The new design already is on my whiteboard and will soon be in Hades, right now I'm fighting with some Java code ;)
Also I made some calculations about the external bus, this one will be the hell :) Currently we're talking about 32 address lines, 8 data lines, 16 IRQ lines and 6 other lines (Clk, Wait, R/W, Request, Gnd, Vcc) which makes a total of 62 lines. Interesting, even from the mechanical point of view :)
27.04.2005
Just discovered a flaw in the register block (s.b.), if I use four 8-Bit-Registers to form a 32-Bit value I should expand the unwritable register 0 to 4 unwritable registers 0-3. Will be fixed :)
26.04.2005
Who needs time anyway, useless concept. You never have enough except when you really don't need it.
I feel like I'm more busy than ever but perhaps that's just because I've been getting up way too early because of university ;) Anyway I have to continue some time and that time is now ;)
Some things have been done around here since the last entry, Justus created some pages for the simulator, the high-level part seems to be called TriSIM ;) (According to what I know ;) Justus wants to work on a high-level simulation environment as described before. I'll add some thoughts to the pages, Justus created for this sub-project :)
I started work on the microcode loader (remember? the microcode should be STORED in an EEPROM or loadable via parallel port and be run from SRAM). Heres some advice: _never_, _Never_, _NEVER_ make some notes or schematics, without documenting it. Currently I'm staring at some schematics which are about 2 months old and still am working on the last bits to recover what the signals meant :) Seriously, don't do it! Anyway, the plan contains a fatal flaw which will result in a lot more logic than before or I will have to use more EEPROMs and still have a lot of logic for loading via parallel port. The problem is that while loading the RAM all RAMs need the same data lines (as they're being selected one by one and filled with data), when in "normal" operation, the data lines have to be parallel to form a 64 bit bus (which by the way isn't supported by Hades, but that's another story ;) I'm afraid I will have to go with the more-logic version. Also the timing is _quite_ critical and will have to be verified with a vhdl model. Basically the current version has a counter, which increments on every clock, getting data form the EEPROM to the RAMs. When a certain value is reached, the counter stops, enabling the "external" interface as well as starting the system clock. This is some serious piece of shit :)
Finally, George suggested that every american should have a vacuum cleaner in their basement. Well I'm not american, I don't have a basement, so no vacuum there ;)
Followups will follow :)
18.03.2005
Heads up, I'm still alive :) to be honest I've taken a break and it's really refreshing. After two weeks of teaching other students to microprogram a CPU model in Hades (so called T3-Praktikum on the hades homepage) and another two weeks of designing, synthesizing and placing a DCF77-compatible (german time synch signal) alarm clock I needed some sort of break. I've been doing right about nothing useful for the last one and a half week project-wise and getting ready to work again these days :) Besides that some personal stuff and great restructuration at university keeps me somewhat busy.
On other news I just found an entry in the suggestion box from Justus (whom I also know personally) to create an emulator. I definitely will need an emulator, you can't have enough testing anyway, for this it's especially useful to have some external help as two people probably (hopefully ;) won't do the same mistakes in implementation. Currently I'm going for three different levels of simulation:
A simulator will provide high-level and high-speed testing. This is especially useful for the software development as flashing some EEPROMs won't be that much fun ;)
I'm using a hades-level simulation model during the design process as this allows direct testing of designs while still keeping some abstraction so I won't get lost in the gates.
A VHDL simulation model provides exact timing and gate-level resolution. This is sorta like the golden testing environment which should provide the most accurate simulation of the real machine. Any timing bug, glitch or whatever should be visible in this model.
These environments should provide the means for fast development and debugging.
So, gimme some more idle time, the brain likes it ;) More during the following days and weeks
10.02.2005
I got the info to implement my own components and here it is: the register bank. Checked out fine so far :)
click here for a full-size image
I'll probably have to implement some other components but it is definitely worth it, designing in hades allows quite nice manual testing. I still have to figure out, how but theoretically hades also supports automatic testing. Anyway, updated files be be online some other time, currently I'm implementing quite some stuff in hades, so I had to do a lot of updates ;)
09.02.2005
Good news everyone, TriPhoenix processor architecture theme weeks just started. Well at least the lecture-free time at university. The first two weeks I watch over students working on the processor architecture practical ("T3-Praktikum"), then I have two weeks of a VHDL course. Finally four weeks for recreation and learning for my exam in computer architecture. TriPU fits in there, quite well ;) That's why I extended my work today and put up another ALU design:
click here for a full-size image
I think the ALU is near to final now. I added another feedback loop, for two reasons. One: symmetric looks better ;) Two: during one of the last computer architecture lectures I came up with a multiply step operation for the ALU which would improve unsigned 8-Bit-multiplication by a lot. As ALU operations can be chained, you can essentially chain 8 multiply steps together, having the result only cycle in the ALU pipeline registers until you finally write out the result.
On other fronts I designed the current decoder (which needs some adaption for 32 bit as the instruction format may change a little bit) and the register bank. Latter is not yet available because I'm still missing the register window counter and have to ask how to compile and use self-made HADES-components tomorrow :)
That's it for now, enjoy the view and stay tuned for updates (probably during the non-lecture time finally some more updates as I always want to ;)
09.01.2005
Work starts to flow again :) First of all I'd like to point at an article, someone called muddasheep wrote. It's about motivation and I recommend this to everyone, who's working on some project: [Article (http://phq.muddasheep.com/cgi-bin/phq/phq_articles.cgi?show=39)] I liked it very much.
So, the brain worked and worked and out came some thing called an ALU compressor. Also I constructed the feedback loop. First of all enjoy the view:
click here for a full-size image
First of all Dieter recommended to throw away the 2:1-MUX where the result should come out and change it into two tristate buffers as they are integrated in some register chips anyway. Done :) I stretched out the carry resolution area a bit to make it a little bit more understandable, but it's still doing the same, selecting a carry value accourding to some flags.
All registers now come with an enable input. The gates at register A and B were discontinued, because I learned it's evil to put gates into the clock, it can lead to clock skew (i.e. the clock rises at different times in the design), probably not an issue in a low-speed design like triPU but I don't want to get used to bad habits and this way I don't need the two AND gates. The rest of the ALU is with enable, too, because of the instruction chaining. Between the instructions the processor has to fetch an instruction and this generates clocks. While I could have solved this with a chain of registers this way is much cleaner. It also allows to preserve the ALU contents when executing instructions that don't use the ALU.
Now for the compressor, the compressor consists of registers C1, C4 and everything in between (including the 1 bit registers on the left). Basically the compressor flow is fed with 1s until the first result passed the logic units, i.e. it's ready to be stored in X and Y. Then a zero has to be fed into COMPRESSOR_FLOW for three clocks, after then it's 1 again. If you follow the flow, the multiplexers in the chain will be enabled in a matter that the results will get out of C4 one byte per clock. This leads to a uniform view from the outside and is MUCH better than the old two-enables-solution. On 1-Bit operations or when chaining is used and a valid calculation has to be made every clock, COMPRESSOR_FLOW will be set to 1 and the multiplexers will pass the result straight to C4.
I just checked the whole design with a chained addition and the values came out, right :) the HADES files for simulation are on the Schematics page right now. Also I decided not to update the real schematics for some time, until the design is a bit more settled. Currently it's just too dynamic :)
What's left to be done on the ALU part now? The zero calculation is not correct for 32 bits and I want to implement an overflow flag (overflow meaning the carry from 30st bit to 31st bit, it indicates if a signed addition overflowed).
Stay here for more madness if you can stand it :) (wednesday is computer architectur class again, who knows what will come out then ;)
06.01.2005
New year, still here :)
First of all I visited the 21C3 and beside a lot of interesting stuff I learned the steps to etch PCBs. Probably this will take some tries, but in the end this method is probably much cheaper and quicker than wiring one by one. I'll have to get some basic stuff first, but as PCBs are quite final the construction process won't start very soon as I cannot change the PCBs afterwards ;) More on my miserable contacts with etching when I got the stuff :)
Now, every time, I'm sitting in the lecture at university about computer design concepts ideas for TriPU pop into my mind :) This is what came out:
- I forgot an overflow flag (for signed integer calculations)
- TriPU will go 32 Bit. After building the 32-bit capable pipeline there's not much reason to stay at 8 Bit anymore, 32 are just much more comfortable. The inner core will stay at 8 bit but on the outside (i.e. assembler) the programmer will see a nice 32 bit machine. Strange? Sure, but I don't want to design and create a high speed processor for industrial production, I wanna learn and have some fun :)
- the ALU will be somewhat redesigned (again (again!))
Now about the ALU because this is the interesting part. The problem for now was that the data bus needs two clock cycles per ALU operand because it has to load two distinct values. This results in some idle clocks inside the pipeline and (until now) resulted in the four register delay in the register tail on the bottom. This tail of registers will be replaced by some unit (for now called the compressor ;) which will turn the interleaved data (like 1 X 2 X 3 X 4 where X means "invalid") into a stream of valid data (i.e 1 2 3 4) which will show at the output the moment the inputs don't need to be feeded anymore. this results in an operation speedup and the ALU operations will take 12 clocks instead of 15, having the same speed with the pipeline as they'd have without it. The interesting part is that this involves only 3 2:1-MUXes (8 bit MUXes of course ;), so not so much material overhead and with PCB etching this probably won't hurt much.
but wait, there's more. If the data come out at the output in the exact moment when the inputs don't need to be fed anymore, we can start another interesting technique, which (for now ;) I call this ALU instruction chaining. by using a special begin, step and end command (which will exist for every ALU operation, as this in fact can be implemented with two flags) you can chain severy ALU commands together, reesulting in much more throughput and no more need for a temporary register. This is how it's done (roughly ;):
The output is fed back to one of the inputs via a multiplexer. When activated the feedback-value is fed back into one operand register (for example A) while the other can be fed from the data bus without interleaving. So while the result starts to turn up on the output, the program can already insert the next values. Because the pipeline is 4 stages, the new results will be available at the output (now with deactivated compressor) just when the last byte has been fed into the ALU. Storage to the data bus will only be made, when an end-command is encountered. This results in the following speeds:
| normal ALU operation | 12 clocks |
| begin ALU chain | 8 clocks |
| step ALU chain | 4 clocks |
| end ALU chain | 8 clocks |
So, even chaining just two commands will result in a speedup of 33% (24 clocks vs. 16 clock).
Images of the compressor and the new ALU design will be available shortly (at the moment I'm finishing the design on paper), as the hades files will be as well.
Will I go crazy in the process? Will the author accidently develop a method for explosive ALU operations? Stay tuned for the next episode of "TriPU: processor madness" if you wanna see how it turns out ;)
22.12.2004
I promised it, here it is :) I finally had the time to digitize the new ALU. This time I chose another method for simulation/design, I used the HADES (http://tams-www.informatik.uni-hamburg.de/applets/hades/html/hades.html) software system (which by the way was written in the very same place where I study ;) HADES is a simulation framework in Java which is quite flexible, you might see it as a clickable VHDL (it even features VHDL export, although I still have to figure out, how ;) Anyway, I implemented my idea of Dieter's ALU documents and this came out:
click here for a full-size image
As you can (or cannot ;) see the ALU is divided in several stages, each seperated by a register (the boxes labelled A, B, X, Y, X', S, R, R1, R2, R3 are registers). This creates a pipelined design with 4 stages. I haven't yet decided if it's worth the trouble (it probably makes the calculations with 2 operands a bit slower but allows me to try out pipelining and may keep the critical path short), that's the current design :) Other undescribed blocks in the design: The ADDC is an adder with carry, the >>1 is a shift right by one with in/output for the extra bits. The NOR-block is a nor over all 8 data bus values and the <7>-block only extracts the seventh (highest) byte. Finally the LogU is the logic unit as proposed (http://people.freenet.de/dieter.02/alu_4.htm) by Dieter.
I already did a few tests (HADES allows interactive design tests) and although I needed some time to figure out the correct parameters, the tests came clean, meaning it calculated the correct value :)
After this refreshing tests with HADES I'll probably continue to simulate TriPU's design with it as the interactive test possibilities are great. Nevertheless I'll continue to use VHDL for the direct test of timing with 74LS-models as most parts in HADES I use work with a behavourial model.
VHDL and eagle files will follow, this construction will be especially interesting as the ALU is much bigger than anything I've designed until now. Stay tuned && Merry Christmas apparently ;)
14.12.2004
Christmas holidays are coming :) Then (next week) I will implement Dieter's ALU idea with multiplexers (implement on schematics of course, probably simulation, too). After christmas I will be off to the 21C3 (http://www.ccc.de/congress/2004), I'm looking forward to that, especially because there is a small course available about creating PCBs on your own at home, if the techniques are low-cost this will probably the base for TriPU construction; I will report about that during or after the congress, but first there's an ALU to construct :)
Older content
To keep the page size reasonable, the diary is trimmed down into several pages, select here:
