Talk:Binary GCD algorithm

This is an old revision of this page, as edited by SineBot (talk | contribs) at 16:17, 12 January 2012 (Signing comment by 78.31.46.6 - ""). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Latest comment: 13 years ago by 78.31.46.6

bug in wikipedia underlying framework?

It appears that there are two pages:

a) http://en.wikipedia.org/wiki/Binary_gcd

(last modified: "4 January 2012 at 17:09.")

trying to edit that page will take you to:

  http://en.wikipedia.org/w/index.php?title=Binary_GCD_algorithm&action=edit

b) http://en.wikipedia.org/wiki/Binary_GCD_algorithm

(last modified: "12 January 2012 at 15:34.")

trying to edit that page will take you to:

  http://en.wikipedia.org/w/index.php?title=Binary_GCD_algorithm&action=edit

However, a) and b) appear different as a) does not reflects changes to b).

By manually modifying the "edit" URL from a) to:

  http://en.wikipedia.org/w/index.php?title=Binary_gcd&action=edit

It appears that a) is just "#REDIRECTBinary GCD algorithm"

However, it does not seems to work.

I hope this helps (I could not find a "report a bug" thing so I'm posting here) — Preceding unsigned comment added by 78.31.46.6 (talk) 16:16, 12 January 2012 (UTC)Reply

gcd(0,0)

gcd(x,y) is the integer, that is divided by all common divisors of x and y. Thus gcd(0,0)=0, because the common divisors of 0 and 0 are all natural numbers, and these are also divisors of 0. HenningThielemann (talk) 20:22, 2 October 2009 (UTC)Reply

A quality source would be needed to confirm this. Cursory examination of the literature shows that, indeed, gcd(0,0) is typically not defined. (See, for instance, Niven, Montgomery, Zuckerman, "Introduction to the theory of numbers".) 71.182.236.76 (talk) 20:55, 31 October 2009 (UTC)Reply

GCD in hardware

The comparison of the Binary GCD with GCD implemented by hardware division lead me to the question, how fast GCD could be when implemented as hardware instruction. I assume it must be as fast or slow as one division. HenningThielemann (talk) 20:33, 2 October 2009 (UTC)Reply

unsigned integers

If don't see, why the algorithm is restricted to unsigned integers. As far as I can see it works equally on negative numbers. Even more the algorithm would be simplified because no v>=u comparison is necessary. 134.102.210.237 11:54, 31 March 2006 (UTC)Reply

ARM implementation

The ARM implementation lacks the initial u = 0 test. The code also does not appear to make good use of the conditional instruction set, so I have some optimizations to suggest:

remove_twos_loop optimization

gcd:
    @ Arguments arrive in registers r0 and r1
    mov     r3, #0            @ Initialize r3, the power of 2 in the result
    orr     r12, r0, r1       @ r12 = (r0 | r1)
    tst     r12, #1           @ Test least significant bit of (r0 | r1)
    bne     gcd_loop          @ If nonzero, either r0 or r1 are odd, jump into middle of next loop
remove_twos_loop:
    mov     r0, r0, lsr #1    @ Shift r0 right, dividing it by 2
    mov     r1, r1, lsr #1    @ Shift r1 right, dividing it by 2
    add     r3, r3, #1        @ Increment r3
    tst     r0, #1            @ Check least significant bit of r0
    bne     gcd_loop          @ If nonzero, r0 is odd, jump into middle of next loop
    tst     r1, #1            @ Check least significant bit of r1
    beq     remove_twos_loop  @ If zero, r0 and r1 still even, keep looping

optimization 1

gcd:
    @ Arguments arrive in registers r0 and r1
    mov     r3, #0            @ Initialize r3, the power of 2 in the result
    orr     r12, r0, r1       @ r12 = (r0 | r1)
remove_twos_loop:
    tst     r12, #1           @ Test least significant bit of (r0 | r1)
    moveq   r0, r0, lsr #1    @ Shift r0 right, dividing it by 2
    moveq   r1, r1, lsr #1    @ Shift r1 right, dividing it by 2
    moveq   r12, r12, lsr #1  @ Shift r12 right, dividing it by 2
    beq     remove_twos_loop  @ If zero, r0 and r1 still even, keep looping

optimization 2

gcd:
    @ Arguments arrive in registers r0 and r1
    mov     r3, #0            @ Initialize r3, the power of 2 in the result
remove_twos_loop:
    tst     r0, #1            @ Test least significant bit of r0
    tsteq   r1, #1            @ Test least significant bit of r1
    moveq   r0, r0, lsr #1    @ Shift r0 right, dividing it by 2
    moveq   r1, r1, lsr #1    @ Shift r1 right, dividing it by 2
    beq     remove_twos_loop  @ If zero, r0 and r1 still even, keep looping

gcd_loop optimization

gcd_loop:                     @ Loop finding gcd of r0, r1 not both even
    tst     r0, #1            @ Check least significant bit of r0
    moveq   r0, r0, lsr #1    @ If r0 is even, shift r0 right, dividing it by 2, and...
    beq     gcd_loop_continue @ ... continue to next iteration of loop.
    tst     r1, #1            @ Check least significant bit of r1
    moveq   r1, r1, lsr #1    @ If r1 is even, shift it right by 1 and...
    beq     gcd_loop_continue @ ... continue to next iteration of loop.
    cmp     r0, r1            @ Compare r0 to r1
    subcs   r2, r0, r1        @ If r0 >= r1, take r0 minus r1, and...
    movcs   r0, r2, lsr #1    @ ... shift it right and put it in r0.
    subcc   r2, r1, r0        @ If r0 < r1, take r1 minus r0, and...
    movcc   r1, r2, lsr #1    @ ... shift it right and put it in r1.
gcd_loop_continue:
    cmp     r0, #0            @ Has r0 dropped to zero?
    bne     gcd_loop          @ If not, iterate
    mov     r0, r1, asl r3    @ Put r1 << r3 in the result register
    bx      lr                @ Return to caller

optimization

gcd_loop:                     @ Loop finding gcd of r0, r1 not both even
    tst     r0, #1            @ Check least significant bit of r0
    moveq   r0, r0, lsr #1    @ If r0 is even, shift r0 right, dividing it by 2, and...
    beq     gcd_loop          @ ... continue to next iteration of loop.
gcd_loop_2:                   @ Loop finding gcd of r0, r1
    tst     r1, #1            @ Check least significant bit of r1
    moveq   r1, r1, lsr #1    @ If r1 is even, shift it right by 1 and...
    beq     gcd_loop_2        @ ... continue to next iteration of loop.
    cmp     r0, r1            @ Compare r0 to r1
    subcc   r2, r1, r0        @ If r0 < r1, take r1 minus r0, and...
    movcc   r1, r2, lsr #1    @ ... shift it right and put it in r1.
    bcc     gcd_loop_2        @ ... continue to next iteration of loop (r0 is still odd).
    sub     r2, r0, r1        @ If r0 >= r1, take r0 minus r1, and...
    movs    r0, r2, lsr #1    @ ... shift it right and put it in r0.
    bne     gcd_loop          @ Has r0 dropped to zero? If not, iterate
    mov     r0, r1, asl r3    @ Put r1 << r3 in the result register
    bx      lr                @ Return to caller
Sounds good to me. The point of this code sample is just to illustrate how efficient binary GCD is on a real machine. Feel free to plug in the most optimized version you can imagine. Deco 19:50, 31 October 2005 (UTC)Reply

Implementations

I've removed the ML implementation, because it teaches more about ML than about the algorithm. I have my doubts about the value of the assembly version, but I can't really assess it as I can't read it. IMHO, the C implementation is important because it exemplifies the use of bitwise operations for efficiency. Qwertyus 22:31, 13 April 2006 (UTC)Reply

I think you did the right thing. I just restored the C and ARM implementations after User:Arvindn blanked them, for these reasons: IMHO, C isn't important for showing bitwise operations; it should be obvious that you'd use bitwise ops where appropriate. But the C implementation is easier to read than the English algorithm at the moment, so it needs to stay at least until there's an appropriate substitute (pseudocode? Knuth-style algorithm description?). IMHO, the ARM implementation is really enlightening, because it shows the real size and speed benefit of the algorithm in a way that no higher-level language's compiler even approaches. (In particular, it totally stomps the C implementation.) Without some hand-optimized assembly implementation, the advantage of binary GCD over Euclid is not obvious, IMHO. --Quuxplusone 21:18, 1 January 2007 (UTC)Reply

The following pseudocode describes the algoritm probably best:

Input: Positive integers u and v
Output: gcd(u,v)
Note: division and multiplication by 2 can in many languages most efficiently be done by bitshifting.
START 
s:=1
while even(u) and even(v) do
   u:=u/2;v:=v/2;s:=s*2;
while even(u) do u:=u/2
while even(v) do v:=v/2
while u<>v do
begin
  if u>v then
     begin
        u:=u-v
        while even(u) do u:=u/2
     end
     else
     begin
        v:=v-u
        while even(v) do v:=v/2
     end
end
return(u*s)
END

Hkleinnl (talk) 20:07, 12 August 2010 (UTC)Reply

Implementation in assembly

Thanks Balabiot for pointing out the incorrect result (always zero) when u=0 or v=0. I have fixed the code to return the sum of the inputs when either or both are zero. The code now matches the output of the Euclidean algorithm.

I have reverted movne/movne to movs/movnes to ensure that r0 and r1 are both nonzero on entry to check_two_r0. (The previous code hung e.g. with u=0, v=3.)

I didn't understand why the case where u and v are powers of two needed fixing, could you please elaborate? -- Regregex (talk) 17:02, 10 October 2008 (UTC)Reply

I don't remember, probably I just screwed up. :-) Balabiot (talk) 17:03, 23 October 2008 (UTC)Reply

Is that page obsolete ?

I ran 10 000 000 random tests on Mac OS X.6.4 with C Code and got the following results:

  • Euclide : 6180 ms
  • Stein : 9038 ms

and compiling with option -O3

  • Euclide : 4728 ms
  • Stein : 5302 ms
#include <stdio.h>
#include <sys/time.h>
#include <stdlib.h>

#define N 10000000
#define MAX_UINTplus1 0x100000000LLU
#define uint64 unsigned long long
#define INIT_RANDOM srandom(3)

uint64 pgcdEuclide(uint64,uint64);
uint64 pgcdStein(uint64,uint64);


int main(void) {
	int i;
	uint64 a,b,d,t0,t1;
	struct  timeval tv;
	
	// Version Euclide
	INIT_RANDOM;
	gettimeofday(&tv,0);
	t0 = tv.tv_sec* 1000 *1000+tv.tv_usec;
	for(i=0;i<N;i++){
		a=random()*MAX_UINTplus1+random();
		b=random()*MAX_UINTplus1+random();		
		d=pgcdEuclide(a,b);
	}	
	gettimeofday(&tv,0);
	t1 = tv.tv_sec*1000*1000+tv.tv_usec;	
	printf("%llu,Euclide : %llu ms\n",d,(t1-t0)/1000);
	
	// Version Stein
	INIT_RANDOM;
	gettimeofday(&tv,0);
	t0 = tv.tv_sec* 1000 *1000+tv.tv_usec;
	for(i=0;i<N;i++){
		a=(uint64)random()*MAX_UINTplus1+random();
		b=(uint64)random()*MAX_UINTplus1+random();		
		d=pgcdStein(a,b);
	}	
	gettimeofday(&tv,0);
	t1 = tv.tv_sec*1000*1000+tv.tv_usec;	
	printf("%llu,Stein : %llu ms\n",d,(t1-t0)/1000);	
	
	return 0;
}

uint64 pgcdEuclide(uint64 a,uint64 b){
	if(b==0) return a;
	return pgcdEuclide(b,a%b);
}

uint64 pgcdStein(uint64 u,uint64 v){
		int shift;
		
		/* GCD(0,x) := x */
		if (u == 0 || v == 0)
			return u | v;
		
		/* Let shift := lg K, where K is the greatest power of 2
		 dividing both u and v. */
		for (shift = 0; ((u | v) & 1) == 0; ++shift) {
			u >>= 1;
			v >>= 1;
		}
		
		while ((u & 1) == 0)
			u >>= 1;
		
		/* From here on, u is always odd. */
		do {
			while ((v & 1) == 0)  /* Loop X */
				v >>= 1;
			
			/* Now u and v are both odd, so diff(u, v) is even.
			 Let u = min(u, v), v = diff(u, v)/2. */
			if (u < v) {
				v -= u;
			} else {
				uint64 diff = u - v;
				u = v;
				v = diff;
			}
			v >>= 1;
		} while (v != 0);
		
		return u << shift;
}

Why (u | v) & 1 == 0 is the correct test

Regarding edits by 94.43.147.234 (talk): Step 2 states:

If u and v are both even, then gcd(u, v) = 2·gcd(u/2, v/2), because 2 is a common divisor.

Therefore, before u and v may both be halved and shift incremented, the LSB of u must be zero, and the LSB of v must be zero. The test (in line 12 of the C code fragment) would look like this (avoiding the short-circuit operators for simplicity):

    for (shift = 0; (u & 1) == 0 & (v & 1) == 0; ++shift) {

This is an expression of the form  , which by de Morgan's laws can be rewritten as  . In other words, Halve u and v and increment shift if the following is false: that u is odd and/or v is odd. Notice the Boolean negation ! inserted at the front:

    for (shift = 0; !((u & 1) == 1 | (v & 1) == 1); ++shift) {

As the only non-zero result of x & 1 is 1, the equalities disappear:

    for (shift = 0; !((u & 1) | (v & 1)); ++shift) {

& is distributive over | so the two & 1 operations can be combined:

    for (shift = 0; !(((u) | (v)) & 1); ++shift) {

Comparing with zero is equivalent to the Boolean negation. After removing redundant parentheses we obtain the test as originally written:

    for (shift = 0; ((u | v) & 1) == 0; ++shift) {

As a further test, the code fragment as it stood (comprising ((u & v) & 1) == 0) was compiled, and gcd(1,2) was called. As u and v have no two bits set in the same column, the test never became false and the code hung in a loop between lines 12–15. u | v also corresponds to the ARM code (ORRS) which has been tested. – Regregex (talk) 16:28, 11 July 2011 (UTC)Reply