Re: apgsearch v4.0
Posted: February 3rd, 2019, 4:44 pm
I don't think my compiler optimizations should affect certain rules....
It was actually Conway's Game of Life both times, with D8_1 and D8_4 symmetries. I should've clarified that, sorry. The processor I'm using is an AMD FX-6300.calcyman wrote: Which rule are you using, and on what CPU architecture?
Thanks!Ian07 wrote:It was actually Conway's Game of Life both times, with D8_1 and D8_4 symmetries. I should've clarified that, sorry. The processor I'm using is an AMD FX-6300.calcyman wrote: Which rule are you using, and on what CPU architecture?
Thanks! Fixed in apgluxe v4.88-ll2.1.13 (commit d2a3344e).Apple Bottom wrote:I'm also seeing a new blip on the proverbial radar with 4.87-ll2.1.12: soups in B2k37/S1e2an3-k6n78 die out completely without leaving any debris behind. (I did run 'make clean' this time; didn't fiddle with the compiler flags yet, but since it worked fine in 4.86-ll2.1.11 I reckon this may be due to the latest lifelib changes instead.)
(I guess that explains that 80% increase in search speed...)
Code: Select all
g++ -c -Wall -Wextra -pedantic -O3 -flto -funsafe-loop-optimizations -Wunsafe-loop-optimizations -frename-registers -march=native --std=c++11 main.cpp -o main.o
clang: warning: optimization flag '-funsafe-loop-optimizations' is not supported [-Wignored-optimization-argument]
clang: warning: optimization flag '-frename-registers' is not supported [-Wignored-optimization-argument]
warning: unknown warning option '-Wunsafe-loop-optimizations'; did you mean
'-Wunavailable-declarations'? [-Wunknown-warning-option]
1 warning generated.
g++ -flto -pthread main.o includes/md5.o includes/happyhttp.o -o apgluxe
clang: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
rm -f main.op includes/md5.op includes/happyhttp.op apgluxe-profile *.gcda */*.gcda
true
true oo o
true oo ooo
true o
true oo ooo
true o o
true o o
true o
apgluxe v4.88-ll2.1.13: Rule b3s23 is correctly configured.
apgluxe v4.88-ll2.1.13: Symmetry C1 is correctly configured.
Greetings, this is apgluxe v4.88-ll2.1.13, configured for b3s23/C1.
Yikes. I'm seeing just over a 50% speed improvement, as promised.calcyman wrote:I've released v4.9-ll2.1.14, which is 1.5x faster than v4.88-ll2.1.13.
The difference is due to escaping glider detection: if a rule is outer-totalistic and supports gliders, then any sufficiently isolated escaping gliders on the edge of the pattern's bounding diamond will be removed since they cannot subsequently interact with the rest of the pattern.
Code: Select all
Soup l_WiSegGLcr9c7583691 lasts an estimated 28270 generations; rerunning...
Soup l_WiSegGLcr9c7583691 actually lasts 36 generations.
As I suspected, it's because it contains a period-8 oscillator: https://catagolue.appspot.com/hashsoup/ ... 3691/b3s23dvgrn wrote:Yikes. I'm seeing just over a 50% speed improvement, as promised.calcyman wrote:I've released v4.9-ll2.1.14, which is 1.5x faster than v4.88-ll2.1.13.
The difference is due to escaping glider detection: if a rule is outer-totalistic and supports gliders, then any sufficiently isolated escaping gliders on the edge of the pattern's bounding diamond will be removed since they cannot subsequently interact with the rest of the pattern.
The only thing I'm a little worried by is the occasional wildly inaccurate estimate for methuselah longevity. I don't remember seeing anything with quite this large a mismatch in previous builds:
Code: Select all
Soup l_WiSegGLcr9c7583691 lasts an estimated 28270 generations; rerunning... Soup l_WiSegGLcr9c7583691 actually lasts 36 generations.
Code: Select all
b3s23/C1: 1000000 soups completed (8977.555 soups/second current, 9017.179 overall).
----------------------------------------------------------------------
1000000 soups completed.
Attempting to contact payosha256.
testing mode
testing
Connection was successful; starting new search...
----------------------------------------------------------------------
New seed: l_WMKAxyqQG9Xh; iterations = 1; quitByUser = 0
Terminating...
Performance counter stats for './apgluxe -n 1000000 -t 1 -v 0 -s test':
110900.704765 task-clock (msec) # 0.999 CPUs utilized
138 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
10,521 page-faults # 0.095 K/sec
384,053,084,119 cycles # 3.463 GHz
730,759,750,253 instructions # 1.90 insn per cycle
85,375,418,692 branches # 769.837 M/sec
4,444,271,844 branch-misses # 5.21% of all branches
110.994994644 seconds time elapsed
Code: Select all
b3s23/C1: 1000000 soups completed (9416.388 soups/second current, 9428.049 overall).
----------------------------------------------------------------------
1000000 soups completed.
Attempting to contact payosha256.
testing mode
testing
Connection was successful; starting new search...
----------------------------------------------------------------------
New seed: l_7TDp2JjnLps4; iterations = 1; quitByUser = 0
Terminating...
Performance counter stats for './apgluxe -n 1000000 -t 1 -v 0 -s test':
106068.101402 task-clock (msec) # 0.999 CPUs utilized
131 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
10,517 page-faults # 0.099 K/sec
367,314,725,588 cycles # 3.463 GHz
663,194,699,312 instructions # 1.81 insn per cycle
67,919,373,179 branches # 640.337 M/sec
4,273,100,815 branch-misses # 6.29% of all branches
106.166132540 seconds time elapsed
The *number* of branch misses has decreased with profiling; it's the proportion (out of total branches) that increased slightly.testitemqlstudop wrote:HOLY COW
Great job, I need to update when I get home!
Are you sure profiling doesn't do anything on AVX2?
EDIT: There seems to be more branch misses with profiling, however. I'm not sure this is the expected behaviour of profiling - it should decrease branch misses.
Code: Select all
Using seed l_rkjhCzsjjPnE
Running 10000000 soups per haul:
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 75592 soups completed (7559.052 soups/second current, 7559.052 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 150994 soups completed (7540.198 soups/second current, 7549.616 overall).
EDIT: I just realized the credits section isn't in the README, its on catagolue?testitemqlstudop wrote:
By the way, when I was playing around a little with optimization flags (yet again), -mfpmath=both gives the (old) apgluxe a +100 soups/second, at the cost of (I heard from the gcc manuals) an iota more memory.
Useful?
Code: Select all
Soup l_D2nHNvsxKXRa2318642 lasts an estimated 24730 generations; rerunning...
Soup l_D2nHNvsxKXRa2318642 actually lasts 5587 generations.
Code: Select all
Soup l_D2nHNvsxKXRa5405924 lasts an estimated 24730 generations; rerunning...
Soup l_D2nHNvsxKXRa5405924 actually lasts 5200 generations.
As I've said, apgluxe hardly uses any floating-point arithmetic. But you do raise a good point, which is that (even in AVX) there's a separate single-precision ALU, double-precision ALU, and integer ALU, and there are instructions duplicated between them. For instance, for bitwise AND, you can use either vpand, vandps, or vandpd. Apparently there might be benefit from mixing and matching them.testitemqlstudop wrote:By the way, when I was playing around a little with optimization flags (yet again), -mfpmath=both gives the (old) apgluxe a +100 soups/second, at the cost of (I heard from the gcc manuals) an iota more memory.
Useful?
Excellent to hear that!77topaz wrote:On an AVX1 machine, I can also report a major improvement going from v4.88 to v4.91: from ~2700 soups/sec/core to ~4000 soups/sec/core for b3s23/C1. Great!
Yes, I moved some of the more hardcore optimisation flags into the optional set that's only activated by './recompile.sh --profile'. I guess that -pthread warning is harmless; clang must already include the pthreads library by default as otherwise there would be a linker error.Most of the v4.88 warnings have also disappeared, with only the "pthread" one remaining.
Correct. It only applies to 'Glider 115', i.e. these 256 rules: http://fano.ics.uci.edu/ca/rules/b3s23/g2.htmlAm I understanding it correctly that this speed improvement would apply to any rule with xq4_153, but not extend to other spaceships?
Code: Select all
if (re.match('b36?7?8?s0?235?6?7?8?$', rulestring)):
g.write('#define GLIDERS_EXIST 1\n')
Programs can get faster/slower on certain platforms for very unpredictable reasons due to minor changes: sometimes inserting NOP instructions (which do nothing) can actually increase speed, bizarrely. Also, certain CPUs throttle programs if they're too fast (to avoid overheating the CPU). So in general I make changes based on evidence from multiple architectures, rather than just one (and I can't reproduce the effect on my AVX2 machine).77topaz wrote:EDIT 2: Oh right, it said "outer-totalistic" also. So it would apply only to outer-totalistic rules with xq4_153? That still doesn't explain why the non-totalistic rule got slower, though.
That could be any combination of the 'faster population determination in upattern' (which basically saves memory accesses), the early exiting when the number of tiles to update reaches zero, the inlined access of indirected_map, and the compiler optimisations made by testitemqlstudop.77topaz wrote:Hmm... despite not featuring CGoL's glider, the outer-totalistic rule B35/S2467 also seems to have gained a significant speed increase between v4.86 and v4.91. I don't have the v4.86 search speed on hand anymore, but hauls of 5m soups took between four and four and a half minutes (and in v4.72 around five), but in v4.91 they take just two and a half minutes (with search speed ~34k soups/sec).
Code: Select all
./recompile.sh --mingw --rule b38s23 --symmetry C1
Code: Select all
Greetings, this is [1;33mapgluxe v4.95-ll2.1.15[0m, configured for [1;34mb3s23/C1[0m.
[32;1mLifelib version:[0m ll2.1.15
[32;1mCompiler version:[0m 6.3.0 20170516
[32;1mPython version:[0m '2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516]'
Peer-reviewing hauls:
No more hauls to verify.
Peer-review complete; proceeding search.
Using seed l_3De9txKtUB9r
Instruction set [1mAVX1[0m detected
0 soups processed...
100000 soups processed...
Linear-growth pattern detected: [1;32myl144_1_16_afb5f3db909e60548f086e22ee3353ac[0m
200000 soups processed...
Linear-growth pattern detected: [1;32myl144_1_16_afb5f3db909e60548f086e22ee3353ac[0m
300000 soups processed...
Linear-growth pattern detected: [1;32myl144_1_16_afb5f3db909e60548f086e22ee3353ac[0m
It appears that the 3DS uses ARM processors (based on a cursory glance of https://3dbrew.org/wiki/Hardware#Common_hardware which I might have misinterpreted) rather than x86_64, so it would require considerable changes to lifelib (essentially writing pure C equivalents of the inline assembly routines).muzik wrote:This may be the stupidest question I've ever asked, but could a version of apgsearch be made for the Nintendo DSi or 3DS? I have numerous of them sitting about and it'd be cool to put them to good use while not being used.
Excellent!Ian07 wrote:The executable worked fine for B3/S23/C1 for me, and parallelization actually worked properly for once!
It's probably mingw64-g++ that you need.However, I couldn't find the mingw-w64 package for Cygwin in the list when I tried to reinstall it, instead seeing various packages prefixed with mingw64.
It's very much a soft limit: you can override it by calling the executable from Command Prompt and passing the usual flags. The main reason is to limit server load on Catagolue, especially if 77topaz's ethicacha idea becomes popular and results in many people running the executable. (There's a hard 100G upper limit, as beyond that a b3s23/C1 haul would begin to exceed the megabyte limit.)Also, do non-B3/S23 rules still have the 10M soup limit for hauls?
Good observation. I'm unsure as to the best way to address this. If you run the same executable from a different terminal (such as mintty, which is the terminal used by Cygwin, MSYS, and 'Git Bash'), then the ANSI colour codes are correctly interpreted, so it's not as simple as just testing whether it's been compiled for Windows or POSIX.One more minor thing; the color codes don't work in the Windows terminal:
Yes, responding to 'q' only works for POSIX (Cygwin / Linux / Mac); Windows handles terminals in a different manner.benetnasch85 wrote:Today I've been running v4.92 under cygwin vs. v4.95 in a Windows cmd window on our AVX1 machine, both at low priority.
v4.92 10101101 soups/haul
v4.95 10101106 soups/haul
Using command line options, v4.95 is running 1 to 3% faster, but it doesn't stop on "q".
Entering the options interactively results in a different output scheme (messages every 100000 soups) and may be very slightly faster, but also does not respond to "q".