Umm... Difficult to explain bug from running python script

Has something gone haywire? Let us know about it!
Post Reply
User avatar
Scorbie
Posts: 1692
Joined: December 7th, 2013, 1:05 am

Umm... Difficult to explain bug from running python script

Post by Scorbie » September 12th, 2016, 10:48 am

Code: Select all

import golly as g

g.autoupdate(True)
while True:
    g.new('')
    g.setmag(1)
    g.select([0, 0, 20, 20])
    g.randfill(30)
    while True:
        g.run(100)
        if int(g.getgen()) > 5000:
            break
If you run this code and watch, you can see a very strange agar pattern appearing. It's definitely not from the script, so it must be a bug from golly.
I am using...
golly 2.8b1, 64bit with python 2.7.12 64bit.

User avatar
biggiemac
Posts: 515
Joined: September 17th, 2014, 12:21 am
Location: California, USA

Re: Umm... Difficult to explain bug from running python script

Post by biggiemac » September 12th, 2016, 4:12 pm

The script has the expected behavior on my computer with Golly 2.8, Python 2.7.12 and Windows 64bit. Perhaps the beta version of Golly 2.8 is the problem? Or some external corrupted file?
Physics: sophistication from simplicity.

User avatar
dvgrn
Moderator
Posts: 10612
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by dvgrn » September 12th, 2016, 4:39 pm

biggiemac wrote:The script has the expected behavior on my computer with Golly 2.8, Python 2.7.12 and Windows 64bit. Perhaps the beta version of Golly 2.8 is the problem? Or some external corrupted file?
Same result here. The only odd behavior I noticed is that eventually the script will generate a diehard, and that makes everything stop (because the generation count never gets to 5000). Remove the comment on the note statement in the code below to pause when that happens:

Code: Select all

import golly as g

g.autoupdate(True)
while True:
    g.new('')
    g.setmag(1)
    g.select([0, 0, 20, 20])
    g.randfill(30)
    while True:
        g.run(100)
        if int(g.getgen()) > 5000:
            break
        if int(g.getpop())==0:
            # g.note("Life ended.")
            break
Is the "strange agar" something you can get a screenshot of, or no?

User avatar
Scorbie
Posts: 1692
Joined: December 7th, 2013, 1:05 am

Re: Umm... Difficult to explain bug from running python script

Post by Scorbie » September 12th, 2016, 8:32 pm

The screen autoupdates too quick, but I finally managed to get a screenshot of it.
The "agar" has various sizes and repeating units(?)

Edit: updating to golly 2.8 doesn't solve the problem, so I think it has something to do with corrupted files in my system. I am using Linux Mint 18 'Sarah' 64bit.
Edit2: After several attempts to take another screenshot, golly just crashes.
Attachments
finally.png
finally.png (85.68 KiB) Viewed 13368 times

User avatar
biggiemac
Posts: 515
Joined: September 17th, 2014, 12:21 am
Location: California, USA

Re: Umm... Difficult to explain bug from running python script

Post by biggiemac » September 13th, 2016, 3:46 am

Very weird! Looking at the population, I imagine the "agar" is a glitch in displaying the pattern and not in generating it, possibly some bug to do with grid display. It's 8,8 periodic in the plane, and has a weird nontrivial interaction with the underlying pattern. Strangely, at the top left just to the right of the object, the agar is missing a piece. Fascinating, and from my perspective not explainable :P
Physics: sophistication from simplicity.

User avatar
rowett
Moderator
Posts: 3777
Joined: January 31st, 2013, 2:34 am
Location: UK
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by rowett » September 13th, 2016, 5:30 am

The following will reproduce the agar fairly consistently from a single soup on my 64bit Ubuntu system:

Code: Select all

import golly as g
g.open("soup/soup4.rle")
g.setmag(3)
g.update()
while True:
    g.run(1000)
    if int(g.getgen()) >= 50000:
        g.update()
        break
    if int(g.getpop())==0:
        # g.note("Life ended.")
        break
soup/soup4.rle

Code: Select all

#CXRLE Pos=3,0
x = 20, y = 20, rule = B3/S23-a4ei6
ob2ob2o6bob4o$bobo7bo5bo$o2bob2o5bo3b2o$3b2o6b2o3b4o$b3o2b3ob3o$o2bo5b
2obo6bo$3b2obo2b2o6bo$o4b2o3bo2b3ob3o$2bo4bo5bo$o10bo4bo$ob2o5bobo3b2o
b2o$2bo3bo2bob2o2b2o2bo$4bobo$ob2o12bo$7bo6bo$3b3o3b4o4b2o$o3bo3bo$2b
2o2bo4bo2b2o$ob3o3bob2ob2o$8bobo5b2obo!

User avatar
rowett
Moderator
Posts: 3777
Joined: January 31st, 2013, 2:34 am
Location: UK
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by rowett » September 13th, 2016, 8:10 am

Some more findings:
  • The issue is appearing (when it appears) on the first g.update() - which causes a DrawView() in Golly
  • The renderer looks fine since we get the corruption at any zoom (and different routines are used for different zooms)
  • The corruption only seems to appear when using HashLife, if I switch to QuickLife then I can't reproduce at all

User avatar
Andrew
Moderator
Posts: 919
Joined: June 2nd, 2009, 2:08 am
Location: Melbourne, Australia
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by Andrew » September 13th, 2016, 8:46 am

Very weird! I can't reproduce it on my Mac. To me it looks like some sort of bug in OpenGL or in the video card driver. An interesting test would be to try the same script in Golly 2.7 (which doesn't use OpenGL).
Use Glu to explore CA rules on non-periodic tilings: DominoLife and HatLife

User avatar
rowett
Moderator
Posts: 3777
Joined: January 31st, 2013, 2:34 am
Location: UK
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by rowett » September 13th, 2016, 9:16 am

Andrew wrote:Very weird! I can't reproduce it on my Mac. To me it looks like some sort of bug in OpenGL or in the video card driver. An interesting test would be to try the same script in Golly 2.7 (which doesn't use OpenGL).
It also happens on 2.7 on Linux 64bit. (Note if you're using the script I supplied above you'll need to copy the new liferules.[cpp,h] into the 2.7 source to get support for non-totalistic rules).

Could not reproduce on Windows 64bit on any Golly version.

User avatar
Andrew
Moderator
Posts: 919
Joined: June 2nd, 2009, 2:08 am
Location: Melbourne, Australia
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by Andrew » September 13th, 2016, 7:12 pm

rowett wrote:It also happens on 2.7 on Linux 64bit. ...
Ok, so nothing to do with OpenGL. So it seems to be restricted to Linux and to HashLife, although I failed to reproduce it on my 64-bit Ubuntu system (perhaps because it's a VM on my Mac).

Maybe it's a bug in the code generated by the gcc optimizer? Chris, try changing the -O5 to -O1 (or remove it) in the CXXFLAGS in makefile-gtk, do a make clean and rebuild golly. Does that make any difference?
Use Glu to explore CA rules on non-periodic tilings: DominoLife and HatLife

User avatar
rowett
Moderator
Posts: 3777
Joined: January 31st, 2013, 2:34 am
Location: UK
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by rowett » September 13th, 2016, 7:52 pm

Andrew wrote:Maybe it's a bug in the code generated by the gcc optimizer? Chris, try changing the -O5 to -O1 (or remove it) in the CXXFLAGS in makefile-gtk, do a make clean and rebuild golly. Does that make any difference?
Nope, no difference.

User avatar
rowett
Moderator
Posts: 3777
Joined: January 31st, 2013, 2:34 am
Location: UK
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by rowett » September 13th, 2016, 7:56 pm

Plus it's intermittent which is annoying.

When you do a g.update() in a Python script does it render synchronously, or just schedule a render to happen at the next OnPaint?

In this case in the script, we load the pattern, call g.update() and then ask to run for 1000 generations. Will the pattern be rendered at generation 0, generation 1000, or somewhere in between?

User avatar
Andrew
Moderator
Posts: 919
Joined: June 2nd, 2009, 2:08 am
Location: Melbourne, Australia
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by Andrew » September 13th, 2016, 10:56 pm

rowett wrote:When you do a g.update() in a Python script does it render synchronously, or just schedule a render to happen at the next OnPaint?
Funny you should ask that. Pattern rendering is only done via OnPaint, but an Update() call is meant to force that to occur immediately. That's true on Mac and Windows, but on Linux the OnPaint event is only generated the next time events are processed, which when a script is running means when we call Yield(), which only happens at most 10 times per second.

I only learned this a couple of weeks ago while writing a script that did some heavy animation -- the animation was nice and smooth on Mac/Win but jerky on Linux. I fixed the problem by adding some wxGTK-specific code in wxscript.cpp so that Yield() is called after a g.update or when g.autoupdate is turned on. I've just committed those changes so you might like to try them and see if they somehow prevent the weird bug.
Use Glu to explore CA rules on non-periodic tilings: DominoLife and HatLife

User avatar
rowett
Moderator
Posts: 3777
Joined: January 31st, 2013, 2:34 am
Location: UK
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by rowett » September 14th, 2016, 3:31 am

Andrew wrote:I've just committed those changes so you might like to try them and see if they somehow prevent the weird bug.
Appears to have fixed the problem!

User avatar
Andrew
Moderator
Posts: 919
Joined: June 2nd, 2009, 2:08 am
Location: Melbourne, Australia
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by Andrew » September 14th, 2016, 6:38 am

rowett wrote:Appears to have fixed the problem!
Ok, I think I understand what was going wrong. If you look in hlifealgo.cpp you'll see a number of places (5 to be precise) where poller->poll() is called. This is the event checking routine that might call Yield(). Immediately after your script calls g.update there is a lengthy g.run. At some point within that g.run poller->poll() is called which calls Yield() which fires an OnPaint event and so the rendering code in DrawView gets called. One or more of those poller->poll() calls must be in an unsafe place -- unsafe in the sense that the drawing code is relying on incomplete data.

I've no idea which poll call is dodgy (the hlifealgo code is Tom's baby) but there is a way you could find out. Comment out the Yield calls I added in wxscript.cpp, then comment out each poller->poll() call one at a time until the bug no longer occurs. If you do manage to isolate the dodgy call then best to notify Tom in the golly-test list so he can think about the best solution (it might be as simple as removing the call, or maybe moving it to a safer place).
Use Glu to explore CA rules on non-periodic tilings: DominoLife and HatLife

User avatar
rowett
Moderator
Posts: 3777
Joined: January 31st, 2013, 2:34 am
Location: UK
Contact:

Re: Umm... Difficult to explain bug from running python script

Post by rowett » September 15th, 2016, 6:39 am

Andrew wrote:I've no idea which poll call is dodgy (the hlifealgo code is Tom's baby) but there is a way you could find out. Comment out the Yield calls I added in wxscript.cpp, then comment out each poller->poll() call one at a time until the bug no longer occurs. If you do manage to isolate the dodgy call then best to notify Tom in the golly-test list so he can think about the best solution (it might be as simple as removing the call, or maybe moving it to a safer place).
Looks like it's the first poll call, but I'll do more testing because it could be that the first call is just more likely to cause the problem than the others.

User avatar
Scorbie
Posts: 1692
Joined: December 7th, 2013, 1:05 am

Re: Umm... Difficult to explain bug from running python script

Post by Scorbie » September 16th, 2016, 10:19 am

@Chris @Andrew Thanks for the work!
I guess this bug has little to do with script speed, hopefully.

Post Reply