in All availableThis forumThis topic
Author Message
pjp

Joined: 16 Apr 2002
Posts: 16117

Posted: Sun Oct 20, 2013 5:23 am    Post subject: Coding inefficiencies instead of efficiencies

So there's no point to this except something to work with. I started with the Collatz Conjecture, just to have a bunch of data to generate.

My idea was to check each number, and if successful (ending with calculating to 1), add the numbers to a list of "solved" integers. So, for example, here is 1:
 Code: 2 (/2) 1
So any time we encounter 2, we know it results in 1, so it doesn't need to be tested again. Next is 3:
 Code: 3 (*3+1) 10 (/2) 5 (*3+1) 16 (/2) 8 (/2) 4 (/2) 2 (already solved, stop calculating)
So now 3, 10, 5, 16, 8 and 4 are "solved" in addition to 2. Any time those numbers are encountered, stop calculating and move to the next number.

Here's some output, number in brackets is the one being tested. Number ending with 's' indicates it has already been solved (so stop calculating):
 Code: [1]: [2]: 2 [3]: 3 10 5.0 16.0 8.0 4.0 2.0s [4]: 4s [5]: 5s [6]: 6 3.0s [7]: 7 22 11.0 34.0 17.0 52.0 26.0 13.0 40.0 20.0 10.0s [8]: 8s [9]: 9 28 14.0 7.0s [10]: 10s
And without checking to see if we've already solved for a number:
 Code: [1]: [2]: 2 [3]: 3 10 5.0 16.0 8.0 4.0 2.0 [4]: 4 2.0 [5]: 5 16 8.0 4.0 2.0 [6]: 6 3.0 10.0 5.0 16.0 8.0 4.0 2.0 [7]: 7 22 11.0 34.0 17.0 52.0 26.0 13.0 40.0 20.0 10.0 5.0 16.0 8.0 4.0 2.0 [8]: 8 4.0 2.0 [9]: 9 28 14.0 7.0 22.0 11.0 34.0 17.0 52.0 26.0 13.0 40.0 20.0 10.0 5.0 16.0 8.0 4.0 2.0 [10]: 10 5.0 16.0 8.0 4.0 2.0

So now the part about inefficiencies. I was originally thinking that NOT calculating numbers already tested would improve performance. What I didn't think about was the performance hit to check if a number had been tested. The "solved" list grows rapidly.

So calculating to 13000 (arbitrary, it takes a while), here are the times with checking (first) and without checking if a number was already solved. The numbers before the times (28,168 and 1,136,294) are the size of the solved list -- len(solved). The second includes duplicates.
 Code: 28168 real    0m46.635s user    0m46.469s sys     0m0.073s 1136294 real    0m4.603s user    0m3.677s sys     0m0.117s
I was surprised how much longer it took to check if a number had already been solved. Not printing output didn't save much (I think 39 seconds instead of 46).

I had originally thought of periodically cleaning up the list of solved numbers so a known "floor" of all numbers below it having been solved might speed up the checking. So once numbers 1 - 10 had all been solved, a check for <10 rather than a lookup to solved (if n in solved). Now I'm thinking this won't make much difference.

Is the check likely to ever help with performance with larger numbers?

Since it is an exercise, I'll probably still try the cleanup of the solved list. I also intended (and still plan) to work on saving completed data so it wouldn't have to start from the beginning each time.

Here's the code if anyone is curious. It isn't much:
 Code: #!/usr/bin/python solved = [ ] def cc(n):         checked = []         print("\n[", n, "]: ", end='', sep='')         while n > 1:                 if n in solved:                         print(n, "s ", end='', sep='')                         break                 else:                         checked.append(n)                 print(n, end=' ')                 if n % 2 == 0:                         n = n / 2                 else:                         n = 3 * n + 1         solved.extend(checked)         return n = 1 while n < 13001:         if n not in solved:                 r = cc(n)                 #print("\n   Solved: ", solved)         n = n + 1 print("\n\n") print("   Solved: ", solved) print(len(solved))

_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Akkara

Joined: 28 Mar 2006
Posts: 5319
Location: &akkara

 Posted: Sun Oct 20, 2013 6:01 am    Post subject: It's a very interesting problem. There's a reason for the conjecture in its name: a lot of very smart people have worked on it and there's still a lot to be discovered about it. Two thoughts come to mind: When n is odd, after n = 3 * n + 1, you have 3*odd + 1 == even. So you can roll the next step into it since it's already known what it must be: n = (3 * n + 1) / 2. Second thought is, what about only logging numbers that "took some time" to check. Having 2 in the solved-list doesn't help much because you'll reach that conclusion the very next step. But having 7 in there could help because its resolution is quite a few steps away. So an idea like, "only log resolution chains longer than N", for some N to balance the inefficiency of logging and lookup against the cost of cycling the iteration N times. (Of course, that now means keeping a count, which injects its own source of inefficiency...) Also: if you are checking the integers sequentially, you also know that all integers less than the one you're checking resolve, since if it didn't you would not have advanced yet._________________echo 'long long long x;' | gcc -x c -c -
l33t

Joined: 04 May 2007
Posts: 778
Location: Germany

 Posted: Sun Oct 20, 2013 8:26 am    Post subject: I like the Collatz conjecture. I've written some programmes (actually, the same programme in different languages) to calculate the results for a given number. I was more interested in the maximum value for a given number, though, and didn't try to be efficient. Since I am neither much of a programmer nor a mathematician, I don't have as nice ideas as Akkara. Keep us informed about your findings._________________Kali Ma Now it's autumn of the aeons Dance with your sword Now it's time for the harvest
wswartzendruber
Veteran

Joined: 23 Mar 2004
Posts: 1227
Location: Jefferson, USA

 Posted: Sun Oct 20, 2013 3:41 pm    Post subject: I think it takes longer to check the solved list from RAM or even cache than it does to just calculate them.
pjp

Joined: 16 Apr 2002
Posts: 16117

 Posted: Sun Oct 20, 2013 3:48 pm    Post subject: I'm not trying to do anything meaningful with the conjecture itself, just use it to work with python. Once I've implemented the few ideas I wanted to learn how to do, I'll probably move on (unless I think of something new). @Akkara: Thanks. Especially for "n = (3 * n + 1) / 2" as I'm likely never to have thought of that. I'll try the others as well; they seem worth trying._________________lolgov. 'cause where we're going, you don't have civil liberties. In Loving Memory 1787 - 2008
pjp

Joined: 16 Apr 2002
Posts: 16117

Posted: Sun Oct 20, 2013 4:32 pm    Post subject:

Wow, just making the "n = (3 * n + 1) / 2" change improved it by more than I would have guessed.
 Code: 20633 real    0m30.341s user    0m30.089s sys     0m0.081s

EDIT:

Adding a check for the linearly solved floor didn't make a difference up to 13000. But limiting the solved list to only those chains longer than 6 or 10 has made a difference. I think this mainly demonstrates that it would be quite a while before any meaningful gains could be achieved by tracking solved numbers -- at least with my lack of exptertise :)
 Code: 13000 and len(checked) > 6    Solved[ 13621 ] real    0m25.520s user    0m25.367s sys     0m0.062s    Solved[ 0 ] real    0m2.921s user    0m2.429s sys     0m0.101s
 Code: 50000 and len(checked) > 10    Solved[ 41691 ] real    6m16.084s user    6m14.850s sys     0m0.516s    Solved[ 0 ] real    0m13.671s user    0m10.529s sys     0m0.334s

EDIT2:

Removing linearly solved numbers from the list does help some:
 Code: 13000 and len(checked) > 6 Solved[ 5609 ] real    0m12.983s user    0m12.858s sys     0m0.051s
 Code: 13000 and len(checked) > 10    Solved[ 4160 ] real    0m10.554s user    0m10.411s sys     0m0.079s
 Code: 50000   >6    Solved[ 21456 ] real    3m17.854s user    3m16.907s sys     0m0.390s
 Code: 50000   >10    Solved[ 15936 ] real    2m45.135s user    2m43.697s sys     0m0.860s
The difference between tracking solutions longer than 6 vs 10 seems to be noteworthy as well.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
eccerr0r

Joined: 01 Jul 2004
Posts: 4094
Location: USA

 Posted: Mon Oct 21, 2013 6:22 pm    Post subject: Warning: I am not a python programmer, just some comments from the general programming point of view. I do wonder how that "solved" data structure is implemented for python. Is it a hash table or sequentially ("grep") searched array? I would hope it would either be a hash table or some other O(1) mechanism to quickly determine whether it had been visited before. In computing intensive applications I probably wouldn't use an interpreted language..._________________Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD What am I supposed to be advocating?
pjp

Joined: 16 Apr 2002
Posts: 16117

 Posted: Tue Oct 22, 2013 12:59 am    Post subject: I don't know that much about Python, so I can't answer that. As I mentioned, I just used this because it provides data and an opportunity to work with python without having to come up with data to manipulate. I don't know enough basic math principles to even consider attacking the actual conjecture. As for computing intensive issues, supposedly NumPy is appropriate for many of those (I don't know if this would qualify)._________________lolgov. 'cause where we're going, you don't have civil liberties. In Loving Memory 1787 - 2008
eccerr0r

Joined: 01 Jul 2004
Posts: 4094
Location: USA

Posted: Tue Oct 22, 2013 6:16 am    Post subject:

That would have to be left to a python expert which I am not. Someday I want to learn python but every way I look at it, it looks so disorganized for some reason or another. Thus, not really knowing python I screwed up translating your program in perl, which I am a bit more versed (which is yet another very fscked up language):
 Code: #!/usr/bin/perl \$solved = (); sub cc {         my \$n=shift;         print "\n[". \$n. "]: ";         while (\$n > 1) {                 if (\$solved{\$n}) {                         print "\${n}s ";                 } else {                        \$solved{\$n}+=1;                 }                 print "\$n ";                 if ((\$n % 2) == 0) {                         \$n = \$n / 2;                 } else {                         \$n = 3 * \$n + 1;                 }         } } \$n = 1; while (\$n < 13001) {         if (!\$solved{\$n}){                 \$r = &cc(\$n); #print("\n   Solved: ", solved)         } else {                 print "\n[\$n] previously solved - skip";         }         \$n = \$n + 1; } print "\n\n"; print "   Solved: ". (keys %solved )."\n";

Now I'm not sure what the python is doing in your original script but this seems to do what the original program was doing.
Instead I know that perl is using a hash array for memoization, so this should be O(1).

With the code as above and redirecting output to /dev/null
real 0m0.350s
user 0m0.350s
sys 0m0.000s

Changing the if(\$solved{\$n}) in the main part of the script to if (1) (i.e., run all regardless)
real 0m0.778s
user 0m0.773s
sys 0m0.003s

Here understanding under the hood of the code that perl uses does result in a significant speedup. Understanding what python does would do similar. And yes my code is not exactly the same thing as what you did with python... I wish I knew of a fast way to do the same thing...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Dr.Willy
Guru

Joined: 15 Jul 2007
Posts: 353
Location: NRW, Germany

Posted: Tue Oct 22, 2013 10:29 am    Post subject:

 eccerr0r wrote: Is it a hash table or sequentially ("grep") searched array? I would hope it would either be a hash table or some other O(1) mechanism to quickly determine whether it had been visited before.

It's a list, so i guess it's sequential search.

Oh and for you pjp: http://docs.python.org/2/library/profile.html
Yamakuzure
Veteran

Joined: 21 Jun 2006
Posts: 1467
Location: Bardowick, Germany

Posted: Tue Oct 22, 2013 11:54 am    Post subject:

(Edit: Sorry, eccerr0r, I did not see that you already posted a perl conversion. However, mine below has a bit different approach, so I just leave it there.)

Checking an unordered list is the worst thing you can do, because it will always end up being O(n). Even it were sorted it would end up with O(n). The insane thing is that you have to traverse the whole list in the worst case, which is met on *every* number that is not tracked already.

The last point can be remedied with an ordered set. Every number can only be once in the set, and you can determine that a number can not be tracked if it is <head or >tail. However, if it is tail-1, and not tracked, you'd be traversing n-1 times anyway.

So the only options for the tracking of the solved numbers would be using a hash table or an AVL-tree. (Reordering is nothing compared to sequentially going through a list.) Doesn't Python support hashes like Perl does?

Here is a minimal perl program for 500,000 numbers:
 Code: #!/usr/bin/perl -w use strict; use warnings; my %solved = (); for (my \$n = 1; \$n < 500001; ++\$n) {         print "[\$n]: ";         defined(\$solved{\$n})                 and print "solved\n"                 and next;         my \$x = \$n;         while (\$x > 1) {                 if (defined(\$solved{\$x})) {                         print "\${x}s";                         last;                 }                 \$solved{\$x} = 1; ## Ends up being solved anyway                 print "\$x ";                 ( \$x % 2 )                         and \$x  = (3 * \$x) + 1                         or  \$x /= 2;         } # End while \$x > 1         print "\n"; } # end for print("\n\n Solved: " . scalar (keys %solved) . "\n");
And here is the Output: (Only first and last ten lines!)
 Code: [1]: [2]: 2 [3]: 3 10 5 16 8 4 2s [4]: solved [5]: solved [6]: 6 3s [7]: 7 22 11 34 17 52 26 13 40 20 10s [8]: solved [9]: 9 28 14 7s [10]: solved (...) [499991]: solved [499992]: 499992 249996s [499993]: 499993 1499980 749990 374995s [499994]: solved [499995]: 499995 1499986 749993 2249980 1124990 562495 1687486 843743 2531230 1265615 3796846 1898423 5695270 2847635 8542906 4271453 12814360 6407180 3203590s [499996]: solved [499997]: solved [499998]: 499998 249999s [499999]: solved [500000]: solved  Solved: 1085233 real    0m4.614s user    0m4.277s sys     0m0.160s
Times with my approach for the initial 13,000 numbers are:
 Code: real    0m0.123s user    0m0.096s sys     0m0.008s

_________________
systemd - The biggest fallacies
Dr.Willy
Guru

Joined: 15 Jul 2007
Posts: 353
Location: NRW, Germany

 Posted: Tue Oct 22, 2013 12:51 pm    Post subject: http://docs.python.org/2/library/stdtypes.html#set
eccerr0r

Joined: 01 Jul 2004
Posts: 4094
Location: USA

 Posted: Tue Oct 22, 2013 3:03 pm    Post subject: Yes, removing that function call also speeds things up a bit (it's completely extraneous as it's used only once), there's a lot of tricks that can be done to speed up software. Loops in interpreted software really kill. Perl is no better than python in the best case, both suffer the same issues. If someone could get a fast python version up, I'm sure it should be comparable, but both get blown away by a well coded C version._________________Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD What am I supposed to be advocating?
wswartzendruber
Veteran

Joined: 23 Mar 2004
Posts: 1227
Location: Jefferson, USA

 Posted: Tue Oct 22, 2013 8:03 pm    Post subject: Don't forget the "well-compiled" part.
Yamakuzure
Veteran

Joined: 21 Jun 2006
Posts: 1467
Location: Bardowick, Germany

Posted: Mon Oct 28, 2013 1:08 pm    Post subject:

 eccerr0r wrote: Yes, removing that function call also speeds things up a bit (it's completely extraneous as it's used only once), there's a lot of tricks that can be done to speed up software. Loops in interpreted software really kill. Perl is no better than python in the best case, both suffer the same issues. If someone could get a fast python version up, I'm sure it should be comparable, but both get blown away by a well coded C version.
Both the function call and the "interpreted Language" Versus "Compiled Language" issue are irrelevant. relevant is that traversing an unordered list results in 46 seconds while using a hash table results in 0.123 seconds. If I used an unordered list in perl it would be at least as slow if not slower.

If the algorithm has a fundamental design problem, no optimization technique or compiler option will help you out.

My only advice is to get a copy of Mastering Algorithms with C (Paperback or ebook). It's worth the read, for every programmer, not only C programmers. (Most people groan about the overcommented writing style in the book. But it's great for understanding how which storing techniques works and where to use what. It's no copy&paste cookbook.)
_________________
systemd - The biggest fallacies
eccerr0r

Joined: 01 Jul 2004
Posts: 4094
Location: USA

Posted: Mon Oct 28, 2013 2:53 pm    Post subject:

 Yamakuzure wrote: Both the function call and the "interpreted Language" Versus "Compiled Language" issue are irrelevant. relevant is that traversing an unordered list results in 46 seconds while using a hash table results in 0.123 seconds. If I used an unordered list in perl it would be at least as slow if not slower.

This is wrong in this case, I was comparing two perl programs, both with hash memoization. In this case an extra function call DOES make a difference. And it is still true that interpreted languages will take longer to process the function calls with how they deal with argument passing. I'm not sure how anyone can claim these facts to be irrelevant, they are key details to speeding up programs.

_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?

Last edited by eccerr0r on Mon Oct 28, 2013 4:00 pm; edited 1 time in total
TheLexx
Tux's lil' helper

Joined: 04 Dec 2005
Posts: 86
Location: Austin Tx

Posted: Mon Oct 28, 2013 3:15 pm    Post subject:

One thing you have to keep in mind it that python itself is a very high level language, it has many efficacy issues within itself. Some data types themselves have efficiency issues.

Case in point you have the following in your code "if n in solved" where solved is of "type list". This statement is fine as long as the number of elements is solved is small, but for every new element in solved the python interpreter would have to preform another loop in it's own internal workings.

A better way to do this is to make solved of "type set". Data "type set" has it's own internal hash and the interpreter can search it very fast. The interpreter does have some internal overhead in setting up and maintaining "type set". Because of this small list are more efficient as list, but larger list are more efficient as set.

 Dr.Willy wrote: http://docs.python.org/2/library/stdtypes.html#set
TheLexx
Tux's lil' helper

Joined: 04 Dec 2005
Posts: 86
Location: Austin Tx

 Posted: Mon Oct 28, 2013 3:28 pm    Post subject: I also noticed that it appears you are using python3. In python3 the operator "/" always returns a floating point number. In your program you should use the operator "//" because it always returns an integer which is much faster to work with floating points. You lucky your dividing by 2 if you were dividing by something that was not a power of 2, subtle errors would creep into your floating point number. Python2 did things differently for "/", but if you use "//" it will work the same in both python2 and Python3. "n = n / 2" becomes "n = n // 2"
pjp

Joined: 16 Apr 2002
Posts: 16117

 Posted: Mon Oct 28, 2013 3:58 pm    Post subject: Thanks, hadn't seen that mentioned. Seems odd, but then many things about python seem odd :)_________________lolgov. 'cause where we're going, you don't have civil liberties. In Loving Memory 1787 - 2008
pjp

Joined: 16 Apr 2002
Posts: 16117

Posted: Mon Oct 28, 2013 3:59 pm    Post subject:

 Yamakuzure wrote: My only advice is to get a copy of Mastering Algorithms with C (Paperback or ebook). It's worth the read, for every programmer, not only C programmers. (Most people groan about the overcommented writing style in the book. But it's great for understanding how which storing techniques works and where to use what. It's no copy&paste cookbook.)
Thanks, I'll put it on my list.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 All times are GMT Page 1 of 1

 Jump to: Select a forum Assistance----------------News & AnnouncementsFrequently Asked QuestionsInstalling GentooMultimediaDesktop EnvironmentsNetworking & SecurityKernel & HardwarePortage & ProgrammingGamers & PlayersOther Things GentooUnsupported Software Discussion & Documentation----------------Documentation, Tips & TricksGentoo ChatGentoo Forums FeedbackOff the WallDuplicate Threads International Gentoo Users----------------中文 (Chinese)DutchFinnishFrenchDeutsches Forum (German)  Diskussionsforum  Deutsche DokumentationGreekForum italiano (Italian)  Forum di discussione italiano  Risorse italiane (documentazione e tools)Polskie forum (Polish)  Instalacja i sprzęt  Polish OTWPortuguese  Documentação, Ferramentas e DicasRussianScandinavianSpanishOther Languages Architectures & Platforms----------------Gentoo on AMD64Gentoo on PPCGentoo on SparcGentoo on Alternative ArchitecturesGentoo for Mac OS X (Portage for Mac OS X)
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum