It took me a while to figure out that the old octohash hash-generating code was used to generate the hash files, instead of the new octo3obj function. It was probably a bad idea for me to ever allow there to be two different versions of that function, but I remember having some kind of reasonable reason at the time.
Anyway, here's a script that works for me to look through EvinZL's new 3g-collision hash files. This means, for example, that I can find out that there are 54 "vanish collisions" in the 3g database that have a final
obo$obo! spark, but only three collisions that end with a final
o2bo$o2bo!
Code: Select all
# find-octo3g.py
# Dave Greene, 15 August 2022 (Golly Python3)
######## Download hash files from https://drive.google.com/drive/folders/1l6TQEgNpXpFd6ATU7Tgrf7k-76MJcVi3
######## Then update line 10 below with your chosen location for the downloaded files
import golly as g
import hashlib
basepath = "C:/path/to/3g/hashes/" ###### UPDATE THIS TO MATCH YOUR DOWNLOAD LOCATION
searchfiles = "octohashes3g_0.txt,octohashes3g_1.txt,octohashes3g_2.txt,octohashes3g_3.txt,octohashes3g_4.txt," + \
"octohashes3g_5.txt,octohashes3g_6.txt,octohashes3g_7.txt,octohashes3g_8.txt,octohashes3g_9.txt," + \
"octohashes3g_10.txt,octohashes3g_11.txt,octohashes3g_12.txt,octohashes3g_13.txt,octohashes3g_14.txt," + \
"octohashes3g_15.txt,octohashes3g_16.txt,octohashes3g_17.txt,octohashes3g_18.txt,octohashes3g_19.txt"
searchlist = searchfiles.split(",")
NUMLINES = 464746
chardict = {}
for i in range(37, 127):
chardict[i-37] = chr(i)
chardict[92-37] = "!" # backslash
chardict[39-37] = "#" # apostrophe
chardict[44-37] = "$" # comma
def get9char(inputstr):
h = hashlib.sha1()
h.update(inputstr.encode())
i = 0 # convert first seven bytes of SHA1 digest to an integer
for char in h.digest()[:7]:
i = i*256 + char
s = ""
while len(s)<9:
d = i//90
r = i - d*90
s = chardict[r] + s
i = (i - r) // 90
return s
def getoctohash(clist):
ptr = 0
g.new("Octotest"+str(count))
for orientation in [[1,0,0,1],[0,-1,1,0],[-1,0,0,-1],[0,1,-1,0],[-1,0,0,1],[1,0,0,-1],[0,1,1,0],[0,-1,-1,0]]:
g.putcells(clist,ptr*2048,0,*orientation)
ptr += 1
for j in range(8):
g.select([2048*j-1024,-1024,2048,2048])
g.shrink()
r = g.getselrect()
if r == []: r = [0,0,1,1]
pat = g.getcells(r)
deltax, deltay = 0, 0
if pat != []:
deltax, deltay = -pat[0], -pat[1]
if j==0:
minstr = str(g.transform(pat, deltax, deltay))
else:
strpat = str(g.transform(pat, deltax, deltay))
if strpat < minstr:
minstr = strpat
return " " + get9char(minstr)
g.setalgo("HashLife")
g.setrule("B3/S23")
try:
g.fitsel()
except:
pass
r = g.getselrect()
if r==[]:
r = g.getrect()
if r==[0]:
g.exit("No pattern found to search for.")
g.select(r)
count = NUMLINES
outptr = 0
pat = g.getcells(r)
g.addlayer() # do tests in a new layer, then put results there
hash = getoctohash(pat)
g.new("Output")
g.putcells(pat,-pat[0]-128,-pat[1])
g.fit()
g.update()
for fingerprintfile in searchlist:
with open(basepath+fingerprintfile, "r") as f:
for line in f:
count -= 1
if hash in line:
matchingpat = line[:line.index(" ")]
g.putcells(g.parse(matchingpat),outptr*64,0)
outptr+=1
g.fit()
g.update()
if count % 1000 == 0:
g.show("Searching. Lines remaining: " + str(count/1000) + "K lines.")
plural = "" if outptr==1 else "s"
g.show("Found " + str(outptr) + " line" + plural + " matching " + hash + " in " + str(NUMLINES) + " lines of the octo3obj database.")
The old synthesise-patt.py either gives no results at all, or lots of false positives, in any case like this where a recognizable population sequence can't be generated for the target pattern -- so this is a huge improvement on what we had before.
EDIT: Created a
octo3g git repo for the above code. The current download location for the roughly one gigabyte of hash files is given in the script comments. Many of the individual files are over GitHub's "warning level" of 50MB, so I'm not too inclined to attempt to check in such oversized files into the repo at the moment. Maybe if they were split into a total of 37 or so files of less than 30MB each, it might be worth it? -- along with adding octohash files for the 4G collisions used by synth-const-4G-Python3.py.
That last item should be pretty easy, so I'll probably take care of that in the next week or so if nobody else gets to it first.
We still definitely need versions of these octo* search scripts, that mark where the output pattern is in each search result, and rotate/reflect each search result so they're all in the same orientation and lined up, and maybe (optionally?) show the full LifeHistory reaction envelope as well. That's really not a particularly painful programming effort, though it will be slow for cases where thousands of results are being displayed, so we might need an option to turn off that reorienting/post-processing functionality.