Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Coding inefficiencies instead of efficiencies
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Off the Wall
View previous topic :: View next topic  
Author Message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16117
Location: Colorado

PostPosted: Sun Oct 20, 2013 5:23 am    Post subject: Coding inefficiencies instead of efficiencies Reply with quote

So there's no point to this except something to work with. I started with the Collatz Conjecture, just to have a bunch of data to generate.

My idea was to check each number, and if successful (ending with calculating to 1), add the numbers to a list of "solved" integers. So, for example, here is 1:
Code:
2 (/2)
1
So any time we encounter 2, we know it results in 1, so it doesn't need to be tested again. Next is 3:
Code:
3 (*3+1)
10 (/2)
5 (*3+1)
16 (/2)
8 (/2)
4 (/2)
2 (already solved, stop calculating)
So now 3, 10, 5, 16, 8 and 4 are "solved" in addition to 2. Any time those numbers are encountered, stop calculating and move to the next number.

Here's some output, number in brackets is the one being tested. Number ending with 's' indicates it has already been solved (so stop calculating):
Code:
[1]:
[2]: 2
[3]: 3 10 5.0 16.0 8.0 4.0 2.0s
[4]: 4s
[5]: 5s
[6]: 6 3.0s
[7]: 7 22 11.0 34.0 17.0 52.0 26.0 13.0 40.0 20.0 10.0s
[8]: 8s
[9]: 9 28 14.0 7.0s
[10]: 10s
And without checking to see if we've already solved for a number:
Code:
[1]:
[2]: 2
[3]: 3 10 5.0 16.0 8.0 4.0 2.0
[4]: 4 2.0
[5]: 5 16 8.0 4.0 2.0
[6]: 6 3.0 10.0 5.0 16.0 8.0 4.0 2.0
[7]: 7 22 11.0 34.0 17.0 52.0 26.0 13.0 40.0 20.0 10.0 5.0 16.0 8.0 4.0 2.0
[8]: 8 4.0 2.0
[9]: 9 28 14.0 7.0 22.0 11.0 34.0 17.0 52.0 26.0 13.0 40.0 20.0 10.0 5.0 16.0 8.0 4.0 2.0
[10]: 10 5.0 16.0 8.0 4.0 2.0


So now the part about inefficiencies. I was originally thinking that NOT calculating numbers already tested would improve performance. What I didn't think about was the performance hit to check if a number had been tested. The "solved" list grows rapidly.

So calculating to 13000 (arbitrary, it takes a while), here are the times with checking (first) and without checking if a number was already solved. The numbers before the times (28,168 and 1,136,294) are the size of the solved list -- len(solved). The second includes duplicates.
Code:
28168

real    0m46.635s
user    0m46.469s
sys     0m0.073s



1136294

real    0m4.603s
user    0m3.677s
sys     0m0.117s
I was surprised how much longer it took to check if a number had already been solved. Not printing output didn't save much (I think 39 seconds instead of 46).

I had originally thought of periodically cleaning up the list of solved numbers so a known "floor" of all numbers below it having been solved might speed up the checking. So once numbers 1 - 10 had all been solved, a check for <10 rather than a lookup to solved (if n in solved). Now I'm thinking this won't make much difference.

Is the check likely to ever help with performance with larger numbers?

Since it is an exercise, I'll probably still try the cleanup of the solved list. I also intended (and still plan) to work on saving completed data so it wouldn't have to start from the beginning each time.

Here's the code if anyone is curious. It isn't much:
Code:
#!/usr/bin/python

solved = [ ]

def cc(n):
        checked = []

        print("\n[", n, "]: ", end='', sep='')
        while n > 1:
                if n in solved:
                        print(n, "s ", end='', sep='')
                        break
                else:
                        checked.append(n)
                print(n, end=' ')
                if n % 2 == 0:
                        n = n / 2
                else:
                        n = 3 * n + 1

        solved.extend(checked)
        return

n = 1
while n < 13001:
        if n not in solved:
                r = cc(n)
                #print("\n   Solved: ", solved)
        n = n + 1

print("\n\n")
print("   Solved: ", solved)
print(len(solved))

_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
Akkara
Administrator
Administrator


Joined: 28 Mar 2006
Posts: 5285
Location: &akkara

PostPosted: Sun Oct 20, 2013 6:01 am    Post subject: Reply with quote

It's a very interesting problem. There's a reason for the conjecture in its name: a lot of very smart people have worked on it and there's still a lot to be discovered about it.

Two thoughts come to mind:

When n is odd, after n = 3 * n + 1, you have 3*odd + 1 == even. So you can roll the next step into it since it's already known what it must be: n = (3 * n + 1) / 2.

Second thought is, what about only logging numbers that "took some time" to check. Having 2 in the solved-list doesn't help much because you'll reach that conclusion the very next step. But having 7 in there could help because its resolution is quite a few steps away.

So an idea like, "only log resolution chains longer than N", for some N to balance the inefficiency of logging and lookup against the cost of cycling the iteration N times. (Of course, that now means keeping a count, which injects its own source of inefficiency...)

Also: if you are checking the integers sequentially, you also know that all integers less than the one you're checking resolve, since if it didn't you would not have advanced yet.
_________________
echo 'long long long x;' | gcc -x c -c -
Back to top
View user's profile Send private message
Clad in Sky
l33t
l33t


Joined: 04 May 2007
Posts: 778
Location: Germany

PostPosted: Sun Oct 20, 2013 8:26 am    Post subject: Reply with quote

I like the Collatz conjecture. I've written some programmes (actually, the same programme in different languages) to calculate the results for a given number.
I was more interested in the maximum value for a given number, though, and didn't try to be efficient. Since I am neither much of a programmer nor a mathematician, I don't have as nice ideas as Akkara.

Keep us informed about your findings.
_________________
Kali Ma
Now it's autumn of the aeons
Dance with your sword
Now it's time for the harvest
Back to top
View user's profile Send private message
wswartzendruber
Veteran
Veteran


Joined: 23 Mar 2004
Posts: 1227
Location: Jefferson, USA

PostPosted: Sun Oct 20, 2013 3:41 pm    Post subject: Reply with quote

I think it takes longer to check the solved list from RAM or even cache than it does to just calculate them.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16117
Location: Colorado

PostPosted: Sun Oct 20, 2013 3:48 pm    Post subject: Reply with quote

I'm not trying to do anything meaningful with the conjecture itself, just use it to work with python. Once I've implemented the few ideas I wanted to learn how to do, I'll probably move on (unless I think of something new).


@Akkara: Thanks. Especially for "n = (3 * n + 1) / 2" as I'm likely never to have thought of that. I'll try the others as well; they seem worth trying.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16117
Location: Colorado

PostPosted: Sun Oct 20, 2013 4:32 pm    Post subject: Reply with quote

Wow, just making the "n = (3 * n + 1) / 2" change improved it by more than I would have guessed.
Code:
20633

real    0m30.341s
user    0m30.089s
sys     0m0.081s


EDIT:

Adding a check for the linearly solved floor didn't make a difference up to 13000. But limiting the solved list to only those chains longer than 6 or 10 has made a difference. I think this mainly demonstrates that it would be quite a while before any meaningful gains could be achieved by tracking solved numbers -- at least with my lack of exptertise :)
Code:
13000 and len(checked) > 6

   Solved[ 13621 ]

real    0m25.520s
user    0m25.367s
sys     0m0.062s


   Solved[ 0 ]

real    0m2.921s
user    0m2.429s
sys     0m0.101s
Code:
50000 and len(checked) > 10

   Solved[ 41691 ]

real    6m16.084s
user    6m14.850s
sys     0m0.516s


   Solved[ 0 ]

real    0m13.671s
user    0m10.529s
sys     0m0.334s


EDIT2:

Removing linearly solved numbers from the list does help some:
Code:
13000 and len(checked) > 6
Solved[ 5609 ]

real    0m12.983s
user    0m12.858s
sys     0m0.051s
Code:
13000 and len(checked) > 10

   Solved[ 4160 ]

real    0m10.554s
user    0m10.411s
sys     0m0.079s
Code:
50000   >6

   Solved[ 21456 ]

real    3m17.854s
user    3m16.907s
sys     0m0.390s
Code:
50000   >10


   Solved[ 15936 ]

real    2m45.135s
user    2m43.697s
sys     0m0.860s
The difference between tracking solutions longer than 6 vs 10 seems to be noteworthy as well.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 4060
Location: USA

PostPosted: Mon Oct 21, 2013 6:22 pm    Post subject: Reply with quote

Warning: I am not a python programmer, just some comments from the general programming point of view.

I do wonder how that "solved" data structure is implemented for python. Is it a hash table or sequentially ("grep") searched array? I would hope it would either be a hash table or some other O(1) mechanism to quickly determine whether it had been visited before.

In computing intensive applications I probably wouldn't use an interpreted language...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16117
Location: Colorado

PostPosted: Tue Oct 22, 2013 12:59 am    Post subject: Reply with quote

I don't know that much about Python, so I can't answer that.

As I mentioned, I just used this because it provides data and an opportunity to work with python without having to come up with data to manipulate. I don't know enough basic math principles to even consider attacking the actual conjecture.

As for computing intensive issues, supposedly NumPy is appropriate for many of those (I don't know if this would qualify).
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 4060
Location: USA

PostPosted: Tue Oct 22, 2013 6:16 am    Post subject: Reply with quote

That would have to be left to a python expert which I am not. Someday I want to learn python but every way I look at it, it looks so disorganized for some reason or another. Thus, not really knowing python I screwed up translating your program in perl, which I am a bit more versed (which is yet another very fscked up language):
Code:
#!/usr/bin/perl
$solved = ();
sub cc
{
        my $n=shift;
        print "\n[". $n. "]: ";
        while ($n > 1) {
                if ($solved{$n}) {
                        print "${n}s ";
                } else {
                       $solved{$n}+=1;
                }
                print "$n ";
                if (($n % 2) == 0) {
                        $n = $n / 2;
                } else {
                        $n = 3 * $n + 1;
                }
        }
}

$n = 1;
while ($n < 13001) {
        if (!$solved{$n}){
                $r = &cc($n);
#print("\n   Solved: ", solved)
        } else {
                print "\n[$n] previously solved - skip";
        }
        $n = $n + 1;
}
print "\n\n";
print "   Solved: ". (keys %solved )."\n";

Now I'm not sure what the python is doing in your original script but this seems to do what the original program was doing.
Instead I know that perl is using a hash array for memoization, so this should be O(1).

With the code as above and redirecting output to /dev/null
real 0m0.350s
user 0m0.350s
sys 0m0.000s

Changing the if($solved{$n}) in the main part of the script to if (1) (i.e., run all regardless)
real 0m0.778s
user 0m0.773s
sys 0m0.003s

Here understanding under the hood of the code that perl uses does result in a significant speedup. Understanding what python does would do similar. And yes my code is not exactly the same thing as what you did with python... I wish I knew of a fast way to do the same thing...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 352
Location: NRW, Germany

PostPosted: Tue Oct 22, 2013 10:29 am    Post subject: Reply with quote

eccerr0r wrote:
Is it a hash table or sequentially ("grep") searched array? I would hope it would either be a hash table or some other O(1) mechanism to quickly determine whether it had been visited before.

It's a list, so i guess it's sequential search.

Oh and for you pjp: http://docs.python.org/2/library/profile.html
Back to top
View user's profile Send private message
Yamakuzure
Veteran
Veteran


Joined: 21 Jun 2006
Posts: 1447
Location: Bardowick, Germany

PostPosted: Tue Oct 22, 2013 11:54 am    Post subject: Reply with quote

(Edit: Sorry, eccerr0r, I did not see that you already posted a perl conversion. However, mine below has a bit different approach, so I just leave it there.)

Checking an unordered list is the worst thing you can do, because it will always end up being O(n). Even it were sorted it would end up with O(n). The insane thing is that you have to traverse the whole list in the worst case, which is met on *every* number that is not tracked already.

The last point can be remedied with an ordered set. Every number can only be once in the set, and you can determine that a number can not be tracked if it is <head or >tail. However, if it is tail-1, and not tracked, you'd be traversing n-1 times anyway.

So the only options for the tracking of the solved numbers would be using a hash table or an AVL-tree. (Reordering is nothing compared to sequentially going through a list.) Doesn't Python support hashes like Perl does?

Here is a minimal perl program for 500,000 numbers:
Code:
#!/usr/bin/perl -w

use strict;
use warnings;

my %solved = ();

for (my $n = 1; $n < 500001; ++$n) {
        print "[$n]: ";
        defined($solved{$n})
                and print "solved\n"
                and next;

        my $x = $n;
        while ($x > 1) {
                if (defined($solved{$x})) {
                        print "${x}s";
                        last;
                }
                $solved{$x} = 1; ## Ends up being solved anyway
                print "$x ";
                ( $x % 2 )
                        and $x  = (3 * $x) + 1
                        or  $x /= 2;
        } # End while $x > 1
        print "\n";
} # end for

print("\n\n Solved: " . scalar (keys %solved) . "\n");
And here is the Output: (Only first and last ten lines!)
Code:
[1]:
[2]: 2
[3]: 3 10 5 16 8 4 2s
[4]: solved
[5]: solved
[6]: 6 3s
[7]: 7 22 11 34 17 52 26 13 40 20 10s
[8]: solved
[9]: 9 28 14 7s
[10]: solved
(...)
[499991]: solved
[499992]: 499992 249996s
[499993]: 499993 1499980 749990 374995s
[499994]: solved
[499995]: 499995 1499986 749993 2249980 1124990 562495 1687486 843743 2531230 1265615 3796846 1898423 5695270 2847635 8542906 4271453 12814360 6407180 3203590s
[499996]: solved
[499997]: solved
[499998]: 499998 249999s
[499999]: solved
[500000]: solved


 Solved: 1085233

real    0m4.614s
user    0m4.277s
sys     0m0.160s
Times with my approach for the initial 13,000 numbers are:
Code:
real    0m0.123s
user    0m0.096s
sys     0m0.008s

_________________
systemd - The biggest fallacies
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 352
Location: NRW, Germany

PostPosted: Tue Oct 22, 2013 12:51 pm    Post subject: Reply with quote

http://docs.python.org/2/library/stdtypes.html#set
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 4060
Location: USA

PostPosted: Tue Oct 22, 2013 3:03 pm    Post subject: Reply with quote

Yes, removing that function call also speeds things up a bit (it's completely extraneous as it's used only once), there's a lot of tricks that can be done to speed up software.

Loops in interpreted software really kill. Perl is no better than python in the best case, both suffer the same issues. If someone could get a fast python version up, I'm sure it should be comparable, but both get blown away by a well coded C version.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
wswartzendruber
Veteran
Veteran


Joined: 23 Mar 2004
Posts: 1227
Location: Jefferson, USA

PostPosted: Tue Oct 22, 2013 8:03 pm    Post subject: Reply with quote

Don't forget the "well-compiled" part.
Back to top
View user's profile Send private message
Yamakuzure
Veteran
Veteran


Joined: 21 Jun 2006
Posts: 1447
Location: Bardowick, Germany

PostPosted: Mon Oct 28, 2013 1:08 pm    Post subject: Reply with quote

eccerr0r wrote:
Yes, removing that function call also speeds things up a bit (it's completely extraneous as it's used only once), there's a lot of tricks that can be done to speed up software.

Loops in interpreted software really kill. Perl is no better than python in the best case, both suffer the same issues. If someone could get a fast python version up, I'm sure it should be comparable, but both get blown away by a well coded C version.
Both the function call and the "interpreted Language" Versus "Compiled Language" issue are irrelevant. relevant is that traversing an unordered list results in 46 seconds while using a hash table results in 0.123 seconds. If I used an unordered list in perl it would be at least as slow if not slower.

If the algorithm has a fundamental design problem, no optimization technique or compiler option will help you out.

My only advice is to get a copy of Mastering Algorithms with C (Paperback or ebook). It's worth the read, for every programmer, not only C programmers. (Most people groan about the overcommented writing style in the book. But it's great for understanding how which storing techniques works and where to use what. It's no copy&paste cookbook.)
_________________
systemd - The biggest fallacies
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 4060
Location: USA

PostPosted: Mon Oct 28, 2013 2:53 pm    Post subject: Reply with quote

Yamakuzure wrote:
Both the function call and the "interpreted Language" Versus "Compiled Language" issue are irrelevant. relevant is that traversing an unordered list results in 46 seconds while using a hash table results in 0.123 seconds. If I used an unordered list in perl it would be at least as slow if not slower.

This is wrong in this case, I was comparing two perl programs, both with hash memoization. In this case an extra function call DOES make a difference. And it is still true that interpreted languages will take longer to process the function calls with how they deal with argument passing. I'm not sure how anyone can claim these facts to be irrelevant, they are key details to speeding up programs.

Bah. Proofreading.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?


Last edited by eccerr0r on Mon Oct 28, 2013 4:00 pm; edited 1 time in total
Back to top
View user's profile Send private message
TheLexx
Tux's lil' helper
Tux's lil' helper


Joined: 04 Dec 2005
Posts: 86
Location: Austin Tx

PostPosted: Mon Oct 28, 2013 3:15 pm    Post subject: Reply with quote

One thing you have to keep in mind it that python itself is a very high level language, it has many efficacy issues within itself. Some data types themselves have efficiency issues.

Case in point you have the following in your code "if n in solved" where solved is of "type list". This statement is fine as long as the number of elements is solved is small, but for every new element in solved the python interpreter would have to preform another loop in it's own internal workings.

A better way to do this is to make solved of "type set". Data "type set" has it's own internal hash and the interpreter can search it very fast. The interpreter does have some internal overhead in setting up and maintaining "type set". Because of this small list are more efficient as list, but larger list are more efficient as set.

Dr.Willy wrote:
http://docs.python.org/2/library/stdtypes.html#set
Back to top
View user's profile Send private message
TheLexx
Tux's lil' helper
Tux's lil' helper


Joined: 04 Dec 2005
Posts: 86
Location: Austin Tx

PostPosted: Mon Oct 28, 2013 3:28 pm    Post subject: Reply with quote

I also noticed that it appears you are using python3. In python3 the operator "/" always returns a floating point number. In your program you should use the operator "//" because it always returns an integer which is much faster to work with floating points. You lucky your dividing by 2 if you were dividing by something that was not a power of 2, subtle errors would creep into your floating point number.

Python2 did things differently for "/", but if you use "//" it will work the same in both python2 and Python3.

"n = n / 2" becomes "n = n // 2"
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16117
Location: Colorado

PostPosted: Mon Oct 28, 2013 3:58 pm    Post subject: Reply with quote

Thanks, hadn't seen that mentioned. Seems odd, but then many things about python seem odd :)
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16117
Location: Colorado

PostPosted: Mon Oct 28, 2013 3:59 pm    Post subject: Reply with quote

Yamakuzure wrote:
My only advice is to get a copy of Mastering Algorithms with C (Paperback or ebook). It's worth the read, for every programmer, not only C programmers. (Most people groan about the overcommented writing style in the book. But it's great for understanding how which storing techniques works and where to use what. It's no copy&paste cookbook.)
Thanks, I'll put it on my list.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Off the Wall All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum