Saturday, July 18, 2009

ALU Cache

This method use some things observered from the fractal world.

If we use the free program: Notepad++ and select a value for the file obtained generating the table: XOR for all the row and column smaller than a value VAL than if we select a value, the program search automatically and highlight the value all over where it can be found.
By doing in this way for some example of values, we can see how the numbers are arranged in the XOR table – in this case on the lines of the Peano filligree:


http://bocut-fractals-appl.blogspot.com/2009/07/xor-peano-filigree.html

We can implement a cache memory for the ALU in wich to store like in a matrix, all these values. Instead of computing an XOR and using 8/16/32 bits a.s.o. we can go directly in this matrix and read the value wich was already calculated.
If we computed for exaple A xor B in the wait time of microprocessor we can calculate all the smaller pairs wich can produce the same result only using an algorthm for bits of A and B.

This is a small C language code to demonstrate that are only necessary to know the representation of numbers A and B – more exactly the length of A and B – only the number of sicnificant bits different than 0 -> without the first positions wich are filled with 0.

http://infoarena.ro/operatii-pe-biti

# include
# include

FILE * pf;
int i,j;
int p,q;

void main(void)
{
pf=fopen(“xor.txt”,”w+”);
for(i=0;i<100 data-blogger-escaped-br="br" data-blogger-escaped-i="i">{
fprintf(pf,”\n\n”);
for(j=0;j<100 data-blogger-escaped-br="br" data-blogger-escaped-j="j"> {
q=(log10(i)/log10(2) < log10(j)/log10(2)) ? (int) log10(j)/log10(2) : log10(i)/log10(2);
++q;

fprintf(pf,”%3d ”, (i^j) – (i – (int)pow(2,q)) ^ (j – (int)pow(2,q)));
}
}
}

From this program it can be see that the only operation wich is made to finding an new pair A’ and B’ wich produce the same result like A xor B is an subtraction with a power of 2; this means that it is enough to negativate the respective bits or to unselect the respective lines when we want to write the result in the new position in the matrix – in memory.

The others binary operation can be verified substarcting (int)pow(2,q) after making an analog substraction like in the program but using instead “^” – xor the respective operator.

In the time when the microprocessor is in the wait state, we can calculte these values.
For numbers that are stored using 8 bits the capacity of the meory will be 2^8 * 2^8 / 2(we divide by 2 because the matrix is simetric after second diagonal) bits so 128 KB.
Instead of calculating on 8 bits an XOR we can just select the line/column from this matrix, step that is already done in the steps of the process of XOR – to fetch the values on the bus.
In this way the speed will grow because let’s say the matrix is complete computing only once and all the time when we want to compute an XOR is necessary only to select the lines without computing the values.

Like a fuel motor, the microprocessor can consume per total lower energy – because the wait states are avoided and like a motor it is not stoped and start again to consume more fuel.

An alternative solution can be the use of an repelace algorithm like in the classic cache memory for xample LRU – Least Recently Used and to have a memory with lower capacity.


ALU cache memory for ADD operations based on AND/XOR fractals
Making an ALU cache for microcontrollers / microprocessors.
Sierpinski fractal applied for XOR operation in a HW cache memory.
This cache memory can be used for both XOR and ADD operations.

XOR and then the complex ADD operations are basic ALU operations.

The total current consumption of uC can decrease. Taking into account that many
operations are currently automated and computers and microcontrollers are almost
present in activities, making a cache for the arithmetic logic unit seems feasible and
can reduce a considerable percentage of computer power consumption.

Making an ALU cache for microcontrollers / microprocessors. The idea resembles the
principle of dynamic programming - that used to shorten recursiveness of functions
in programming -> use a vector to store the results of the recursive function for
cascade calls in order to avoid recursivity if that call was calculated. The project aims
to apply the same principle to HW. It is well known that xor binary operation has at
least 3 different classes of fractals associated. Based on one of these ie. Sierpinski /
Peano can make a logic gateway to calculate and store in idle time of cores thepairs
of numbers that give the same result A xor B. There are other pairs whose results
can be predicted, pairs C xor D whose result based on the mathematical relations
given by the fractals can be directly specified without calculation of xor - for
example, adding powers of 2. With this list of pairs stored in the ALU cache, the
most frequent operations of xor will be avoided to be calculated. While running
a program on a computer, it will check if the N xor M is already cached and will
achieve lower consumption by avoiding memory accesses for 2 operands(featching
N and M) with a simple check if the pair is already cached and in this case 1 single
access to the memory.

OEIS: https://oeis.org/A221146


0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 ...
0 0 4 4 0 0 4 4 0 0 4 4 0 0 4 4 ...
0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6 ...
0 0 0 0 8 8 8 8 0 0 0 0 8 8 8 8 ...
0 2 0 2 8 10 8 10 0 2 0 2 8 10 8 10 ...
0 0 4 4 8 8 12 12 0 0 4 4 8 8 12 12 ...
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14 ...
0 0 0 0 0 0 0 0 16 16 16 16 16 16 16 16 ...
0 2 0 2 0 2 0 2 16 18 16 18 16 18 16 18 ...
0 0 4 4 0 0 4 4 16 16 20 20 16 16 20 20 ...
0 2 4 6 0 2 4 6 16 18 20 22 16 18 20 22 ...
0 0 0 0 8 8 8 8 16 16 16 16 24 24 24 24 ...
0 2 0 2 8 10 8 10 16 18 16 18 24 26 24 26 ...
0 0 4 4 8 8 12 12 16 16 20 20 24 24 28 28 ...
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 ...
...
CROSSREFS
Cf. A003987, A004198.

This matrix is obtained using the addition of the row and column indexes in order
to calculate the XOR between the indexes. The matrix contains the differences that
should be added/substarcted in order to pass from ADD of indexes to XOR of indexes
or vice versa.
This matrix shows the fact that both XOR and ADD operations have the same layout
(using i.e. this fractal) and the ADD operations can be computed easier if XOR is
used based on a cache memory which has implemented in HW this fractal releations.
This mechanism can be used like a lookup table principle in order to avoid the carry
HW logic AND gates, HW complexity inroduced by using AND gates for carry bits.

This lookup table mechanism of having a cache memory using the e.g. Sierpinski
layout will avoid the AND computations and HW complexity.
In this way all these carry bits will be avoided to be calculated using AND gates
and will be "linked" - HW wired or using a HW logic mechanism for all the pairs of
numbers that produce the same results or results that can be predicted.
This linkage will be stored in a cache memory and any future pairs or numbers that
should be added will be first flaged-checked in this cache in order to see if an other
pair of numebrs which has the same fractal propriety/arrangement - the difference
from the matrix of above - was previously calculated.

ALU cache avoids leakage and provides fewer access to memory by reducing
consumption during the calculations A xor B, very often used, used also in additions
which is an important ALU operation for microprocessors / microcontrollers.
Each Matrix operation: Dot Products, matrix multiplications, additions are using the
base addition operation between 2 numbers.
1 Full Adder: 2 XOR, 2 AND, 1 OR gates
1 XOR gate: 2 AND and 2 OR gates
32 Bit Paralle Adder: 31 Full Adders
AND, OR Gates: 3 transistors
1 bit CACHE: 6 trasistors
Classical style = at least 930 transistors
New Style = 32*4*3 + 32 * 6 = 576 transistors + the transistors needed for HW
wired and flaged activation of the values with the same carry result
Power consumption reduction => longer battery life and lower fuel consumption &
emissions in rechanging the battery


New Karnaugh ICs method using Peano fractal

The classic method is used for the circuits minimization.
http://en.wikipedia.org/wiki/Karnaugh_map
The new method is based on the digital comparator logic.
http://en.wikipedia.org/wiki/Digital_comparator
Let’s suppose we want to compare to numbers M and N which have each of them, X bits(we can consider that X=max of the length of the number of bits for each of the two numbers: M and N).
The comparator electronic scheme, uses multiple parallel XOR gates. The XOR gates propose us the study of the XOR logic table.
We can extend this table – in the sense that now we are interested to see the result of the calculus M xor N.
If we build a table in which we will compute i xor j where i means the index of row and j the index of the column, that we can onbserve some repetition on that table.
We can find on this table some different class of fractals.

http://www.math.umass.edu/~mconnors/fractal/generate/peano.html

If we use the free program: Notepad++ and select a value for the file obtained generating the table XOR for all the row and column smaller than a value VAL than if we select a value, the program search automatically and highlight the value all over where it can be found.
By doing in this way for some example of values, we can see how the numbers are arranged in the XOR table – in this case on the lines of the Peano filligree:

The advantages of using the XOR table conssist in the way that when we want to compute (e.g. compare two numbers of X bits) the Karnaugh table for a circuit (e.g. the digital comparator), instead of taking into consideration the numbers M and N and calculate and complete all the 2^X rows, we can apply the same Karnaugh method but for some other numbers, let’s note them:A and B which have a binary representation(in base 2) with Y bits with Y < X.
On the drawing shown before we can conssider M and N the coordinates where is the mouse cursor; than M=19, N=17 and M xor N is 2.
The values of A and B can be any other pair of coordinates where AFor using this method we can see the the smallest values A and B must be computed with a modulo of 2^power opeartion.
To find some other coordinates not with the smallest values, we can substract 2^m where m=the bigger power of the number 2 for which 2^mOnly for this operation not finding the modulo remainder, we can see that each of the results: A=M-2^m and B=N-2^m have maximum X-1 bits, X = the max of the length of the binary representation of M and N.
In this way the Karnaugh table will have with 2^(1gained bit + 1 gained bit)=2^2 = 4 times lesser rows than the Karnaugh table obtained for some numbers M and N with X bits.
This method of generating the disjunctive form of the circuit repersented with gates must be used dynamically, because the method depends of the numbers M and N and for each pair the result M xor N is not the same.
So for each number representing the result of the M xor N, this number having the meannig of number of bits in the Karnaugh table, the output scheme is different.
From here is the dynamic character of the method.
This means that the HW must contain a circuit with a matrix with gates and this gates to be connected in the moment of CPU execution; if we want to compare – for example the numbers 17 and 16 to select from the matrix of gates only the one that generate the output obtained from the Karnaugh method from A and B obtained from 17 and 16; for some others two numbers not 17 and 16 the output scheme will be different, not the same for all pairs like in the classic method.

>>In this way the time of execution will be a little bit slower but the consumption of energy of the ICs – especially the main CPU will be reduced.
The result can be: no usage of so big coolers and the possibility to increase the frequency of the circuits.