Memory efficient deep salvo searches

For scripts to aid with computation or simulation in cellular automata.
Post Reply
User avatar
biggiemac
Posts: 515
Joined: September 17th, 2014, 12:21 am
Location: California, USA

Memory efficient deep salvo searches

Post by biggiemac » October 6th, 2016, 6:53 am

Recently I posted a modification to chris_c's salvo search script that stored the salvo information in an extremely compact manner. In doing so, I also relied on a few specifics of the (27,1)c/72 project. Namely, there are fewer than 16 distinct glider options (14), so 4 bits is sufficient to encode the glider; 64 bits is sufficient to reach depth 16 and build a database of allowed 16-glider slow salvos for testing glider 17.

I want to try to remove any specifics so that a more general memory-efficient salvo search tool exists. Something where desired_depth * log2(number of possible gliders) was rounded up to the nearest convenient multiple of 32, for example, and stored in that many bits. This is all made difficult due to the fact that the language for this application is Python. (My main programming experience is in C and C++ so I am a bit fussy about efficient memory management - a point where Python falls flat). It happens that numpy allows one to make C-like arrays in Python, so such a program will only work for users that have numpy in their Python distribution.

Before I generalize, though, I would like to improve the memory use even further.

Looking at a memory dump of the running script (currently processing depth 13, 75M options), it looks like the bulk of the memory is taken up rather inefficiently by the hashtable of seen patterns. While the to_hashable function does a good job of generating a probably-unique 64-bit value for each pattern, Python doesn't respect the bittage and stores 40 bytes instead of 8 for every value. It's a factor of 5 which isn't as much savings as the change in salvo representation, but given that my memory is still more full than I'd hope, I'm interested in alternatives.

I implemented a DIY hashtable based on numpy arrays and it appears to work after fixing some strange numpy behavior with uncast values. I also fixed a small bug in the display code, and tweaked the guess of the expansion factor from 4 to 5, so the code below will run more smoothly. I still don't have support for dumping the list of acceptable salvos to a file after each iteration to be safe from unexpected shutdown, which has left me a little paranoid (this prompted the memory dump in the first place). I think I'll try to write something soon that can restore a search from such a file, including hashing all the intermediate patterns.

Code: Select all

# OK let's make this more memory efficient shall we?  I know this isn't C, but we can
# still try to force python to behave.  Numpy arrays respect fixed variable width.

import golly as g
import numpy as np
from hashlib import sha256
from itertools import chain

#arbitrary numbers
MAX_GENERATIONS = 192
MAX_POPULATION = 50
MAX_HEIGHT = 27
MAX_LANES = 28

# MAX_GLIDERS has been set to hardcoded 17.  Execution can be interrupted at any time.

#NE glider
GLIDER = g.parse('3o$2bo$bo!')
# The piece that can't be harmed. 
START_KEEPERS = [(27,29),(28,28),(26,28),(28,27),(26,27),(27,26)]

# The piece that is to be manipulated
# Limitation: with this approach it is easiest to only allow a single initial target
START_TARGET = [(22,20),(21,21),(23,21),(21,22),(23,22),(22,23),
                (22,11),(21,12),(23,12),(21,13),(23,13),(22,14),
                (16,17),(17,18),(17,16),(18,18),(18,16),(19,17),
                (25,17),(26,18),(26,16),(27,18),(27,16),(28,17)]

# Any potential desired result
CONSTELLATION = [(20,21),(21,20),(20,22),(22,20),(21,23),(23,21),(22,23),(23,22),
                 (22,17),(22,16),(23,17),(23,16),(17,5),(17,6),(18,5),(18,6)]
OFFSETS = [(0,0),(1,1),(2,0),(0,4),(0,6)]
GOALS = [[(x0 + dx, y0 + dy) for x0, y0 in CONSTELLATION] for dx, dy in OFFSETS]

def to_pairs(cell_list):
  return zip(cell_list[::2], cell_list[1::2])

def from_pairs(cell_set):
  return list(sum(cell_set, ()))

TARGET = from_pairs(START_KEEPERS + START_TARGET)
btlist = [TARGET,[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]]

def find_wanted(cells_set):
  return any(all(j in cells_set for j in k) for k in GOALS)

# Return an empty cell list if the argument is invalid as a target, either not p2
# or if the height or population exceeds the specified values.  Nonempty cell
# lists continue on to the search tree.
def kosherify(cell_list):
  rest = set(to_pairs(cell_list))
  if not all(j in rest for j in START_KEEPERS):
    return[]
  if len(rest) < 3 + len(START_KEEPERS) or len(rest) > MAX_POPULATION:
    return []
  rest = from_pairs(rest)
  if max(rest[1::2]) - min(rest[1::2]) >= MAX_HEIGHT or not is_p2(rest):
    return []
  return rest

def patterns_identical(cells1, cells2):
  if len(cells1) != len(cells2):
    return False
  if sum(cells1) != sum(cells2):
    return False
  return sorted(to_pairs(cells1)) == sorted(to_pairs(cells2))

def is_p2(cells):
  return patterns_identical(cells, g.evolve(cells, 2))

def get_shooting_range(cells):

  min_d1 = max_d1 = cells[0] + cells[1]
  min_d2 = cells[0] - cells[1]

  for i in range(2, len(cells), 2):
    min_d1 = min(min_d1, cells[i] + cells[i+1])
    max_d1 = max(max_d1, cells[i] + cells[i+1])
    min_d2 = min(min_d2, cells[i] - cells[i+1])
  
  min_lane = (min_d1 - 6) | 1 # monochromatic
  max_lane = max_d1 + 3
  shift = 6 - min_d2 // 2

  return min_lane, max_lane, shift

def get_pattern_to_try(cells, lane, offset=50):
    y = lane // 2 + offset
    return cells + g.transform(GLIDER, lane - y, y)

# Return the index of the first nibble (4 bits) in which two numbers differ.
def difnibble(currid,previd):
  if (previd == 0):
    return 0
  a = np.int64(currid)
  b = np.int64(previd)
  xorab = np.bitwise_xor(a,b)
  lsbmask = np.bitwise_and(xorab, -xorab)
  ret = 0
  while ((lsbmask & 15) == 0):
    ret += 1
    lsbmask >>= 4
  return ret

# Assumes all checks for well-behavedness already happened.  Blindly runs the salvo
# up to the current point to get the start constellation for the next iteration.
# The backtrace list btlist holds all intermediate constellations, and since there is
# significant expected overlap between consecutive ids in the array, it saves a bunch
# of computation to just check where the branch point is between them and start there
def salvoID_to_cells(currid, previd):
  global btlist
  starter = difnibble(currid, previd)
  startcells = btlist[starter]
  currid >>= (4*starter)
  while (currid > 0):
    lane, _, shift = get_shooting_range(startcells)
    lane += 2*((currid & 15) - 1)
    startcells = g.evolve(get_pattern_to_try(startcells, lane, shift), MAX_GENERATIONS)
    currid >>= 4
    starter += 1
    btlist[starter] = startcells # Update btlist for the next in line
  return startcells


# For viewing in golly
offset = 0
def display_solution(salvoID, laneID, n):
  # When this method is called, btlist holds all the intermediate constellations.
  global btlist
  global offset

  lanes = [get_shooting_range(btlist[i])[0] + 2*(((salvoID >> 4*i) & 15) - 1) for i in range(n)]
  lanes.append(get_shooting_range(btlist[n])[0] + 2*(laneID - 1))
  cells = TARGET

  i = 100
  for lane in lanes:
    cells = get_pattern_to_try(cells, lane, i)
    i += 100
  g.putcells(cells, 0, offset)
  for i in range(n + 1):
    cells = get_pattern_to_try(btlist[i], lanes[i])
    g.putcells(cells, 100 + 100 * i, offset)
  g.putcells(g.evolve(cells, MAX_GENERATIONS), 200 + 100*n, offset)
  g.select(g.getrect())
  g.copy()
  g.select([])
#  g.fit()
  g.update()
  offset += 400


randoms = []
for i in range(4096):
  randoms.append(int(sha256(str(i)).hexdigest()[:16], 16))

def to_hashable(cells):
  hashval = 0
  for i in range(0, len(cells), 2):
    hashval ^= randoms[64 * (cells[i] & 63) + (cells[i+1] & 63)]
  return hashval

LOG_TABLE_SIZE = 10
TABLE_COUNT = 0
# We ignore the ~10^-20 chance zero is the hash value.
SEEN = np.zeros(2 ** LOG_TABLE_SIZE, dtype=np.uint64)

# Also returns True/False, True if the pattern was seen before
def mark_seen(hashval, skipcount=False):
  global LOG_TABLE_SIZE
  offset = 0
  doffset = 0 # Quadratic probe with triangular numbers and power-of-two size
  success = False
  found = False
  while not success:
    # numpy does some weird stuff if some but not all arguments are cast..
    # I've seen np.mod(number, np.uint64(number)) give 0.0 before.
    idx = np.mod(np.add(np.uint64(hashval), np.uint64(offset)), np.uint64(2 ** LOG_TABLE_SIZE))
    val = SEEN[idx]
    if val == hashval:
      found = True
      success = True
    elif val == 0:
      SEEN[idx] = hashval
      success = True
      if skipcount: break
      global TABLE_COUNT
      TABLE_COUNT += 1
      if TABLE_COUNT > 2 ** (LOG_TABLE_SIZE - 1): # Half filled
        resize_seens()
    else:
      doffset += 1
      offset += doffset
  return found

def resize_seens():
    global SEEN
    global LOG_TABLE_SIZE
    LOG_TABLE_SIZE += 2
    temp = SEEN
    SEEN = np.zeros(2 ** LOG_TABLE_SIZE, dtype = np.uint64)
    for val in temp:
      if val != 0:
        mark_seen(val, True)

g.new('')
newids = np.zeros(1, dtype=np.int64)
count = 1

for n in range(17):

  ids = np.resize(newids, count)
  newids = np.zeros(5*count, dtype=np.int64) # Evidence suggests 4 wasn't enough
  count = 0
  previd = 0

  for idx, currid in enumerate(ids):

    if idx % 10 == 0:
      g.show("Depth "+str(n+1)+": {:.2f}%".format(100.*idx/len(ids)))

    last = salvoID_to_cells(currid, previd)
    min_lane, max_lane, shift = get_shooting_range(last)
    laneid = 0

    for lane in range(min_lane, min(min_lane + MAX_LANES, max_lane + 1), 2):

        laneid += 1

        start_cells = get_pattern_to_try(last, lane, shift)
        new_cells = kosherify(g.evolve(start_cells, MAX_GENERATIONS))

        if not new_cells:
          continue

        if mark_seen(to_hashable(new_cells)):
          continue

        # If we're here, the resulting pattern is both valid and new.  It
        # goes into the next iteration.
        cells_set = set(to_pairs(new_cells))

        if find_wanted(cells_set):
          display_solution(currid, laneid, n)

        if n < 16:
          newids[count] = np.bitwise_or(currid, np.int64(laneid) << (4 * n))
          count += 1
          if (count >= len(newids)):
            newids.resize(2*count) # Expensive, won't happen often if at all

    # Once all lanes have been checked:
    previd = currid
Physics: sophistication from simplicity.

chris_c
Posts: 966
Joined: June 28th, 2014, 7:15 am

Re: Memory efficient deep salvo searches

Post by chris_c » October 6th, 2016, 8:42 am

biggiemac wrote:Recently I posted a modification to chris_c's salvo search script that stored the salvo information in an extremely compact manner.
Great. I'm pleased to see the code being worked on.
biggiemac wrote: I want to try to remove any specifics so that a more general memory-efficient salvo search tool exists.
At the moment the compact way you are storing the salvo data looks like a bit of an over-optimisation. The reads and writes to the newids array are always sequential so I think all of that stuff can be put on disk without seeing any slow down. When I run your code I am getting around 600 additions to the newids array per second. Even if you have an extremely bloated format that stores the salvo data in 1KB, have a computer 2 times as fast as mine and run with 8 cores simultaneously that would only be around 10MB per second so I think the disk should be fast enough to keep up.

I did a bit of a search and it seems like Python bytearray's are nice and simple to write to and from disk.

User avatar
biggiemac
Posts: 515
Joined: September 17th, 2014, 12:21 am
Location: California, USA

Re: Memory efficient deep salvo searches

Post by biggiemac » October 6th, 2016, 12:31 pm

chris_c wrote:At the moment the compact way you are storing the salvo data looks like a bit of an over-optimisation. The reads and writes to the newids array are always sequential so I think all of that stuff can be put on disk without seeing any slow down.
That's fair; I coded up the densest packing I could think of, instead of considering the possibility of taking the salvo data out of RAM altogether. For the latter my code is certainly overkill.
chris_c wrote:I did a bit of a search and it seems like Python bytearray's are nice and simple to write to and from disk.
That would likely make for much friendlier code to generalize as well. At one glider/byte the lane data may be easily read and has 256 options.

Would you rather the code have no dependence on numpy? My DIY hashtable also relies on it, but other alternatives might also work. The native set type is worse by a factor of either 5 or 6 (I couldn't tell if it was storing the hashables or pointers).
Physics: sophistication from simplicity.

chris_c
Posts: 966
Joined: June 28th, 2014, 7:15 am

Re: Memory efficient deep salvo searches

Post by chris_c » October 6th, 2016, 1:15 pm

biggiemac wrote: Would you rather the code have no dependence on numpy? My DIY hashtable also relies on it, but other alternatives might also work. The native set type is worse by a factor of either 5 or 6 (I couldn't tell if it was storing the hashables or pointers).
Depending on numpy seems reasonable. Python has memory-efficient arrays but not using 64-bit integers. I suppose you could use an array of doubles instead to get 53 bits of hash but that's kind of ugly.

EDIT: Actually, forget about python arrays, numpy seems much better. I tried to allocate a big array of doubles with python arrays and it failed. Numpy worked fine.

User avatar
calcyman
Moderator
Posts: 2938
Joined: June 1st, 2009, 4:32 pm

Re: Memory efficient deep salvo searches

Post by calcyman » October 6th, 2016, 3:27 pm

When you realise you're using Python like C/C++, it's probably a good idea to switch to C/C++.
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
biggiemac
Posts: 515
Joined: September 17th, 2014, 12:21 am
Location: California, USA

Re: Memory efficient deep salvo searches

Post by biggiemac » October 6th, 2016, 4:07 pm

Oh I am well aware C/C++ would be a better language for this, also given my coding preferences. I just find it a significant barrier familiarizing myself with some other way to interface with the game of life. At some point I'll just go ahead and learn the tools for evolving and pattern matching gol grids in C. The result will also be inevitably faster, like the apgsearch evolution.

In the meantime, while this is still my side project and nowhere near my main project, I will take the path of least resistance, which in this case, is C-ifying a Python script.

Edit: ok, LifeAPI is similar to the Golly interface with python, so it would probably be easier to learn it and benefit from C libraries than try to learn Python hacks.
Physics: sophistication from simplicity.

User avatar
biggiemac
Posts: 515
Joined: September 17th, 2014, 12:21 am
Location: California, USA

Re: Memory efficient deep salvo searches

Post by biggiemac » October 14th, 2016, 4:59 am

The depth 13 iteration completed on the original search about a week after completing depth 12. The expansion factor is about 5. I aborted the process since at this depth it was using about 50 GB of RAM (yes you read that right).

There are 341 million candidate length-13 salvos starting from the honeyfarm and leaving the hive untouched. They comprise 2.6 GB of the 50 GB. Another 10 GB is the preallocated space for the next iteration. The rest is the hash table of seen constellations. With just the salvo list, one can easily extend to depth one further and only check for success, meaning theoretically I could spend another month or so running segments of the list through a new script with absolutely no memory footprint to scan depth 14. But I managed to muck up that possibility anyway; read on:

I forced a core dump before aborting the script to try to pick out the length-13 salvo list so the work was not all lost, but unfortunately in reading the dump I managed to corrupt it and overwrite 40% of it (including the salvo list) with zeros. I have the analogous list at depth 12 but the last week's progress is lost. I'm very unhappy with the program UltraEdit I used to open the file. In hindsight there were a number of protective measures I could have taken but the fault for losing the data lies with the UE backend which has some unstated assumptions about file length not exceeding 0xffffffff bytes if being read in binary format.

In the future any deep searches will be saving data to files so users don't have to force core dumps to get to it. Only displaying to Golly in the event of a positive result is not helpful if the process can run for weeks without one. However, most of the reason for trying this anyway is to get intuition about salvo searches, since I never really set them up myself before now. I determined that placing 3 arbitrary SL simultaneously is likely beyond the range of doable searches, but a deep search has a decent chance of getting 2, which generally saves a few gliders over placing one at a time. The same conclusion could have probably been reached a lot less resource-intensively with a counting argument.
Physics: sophistication from simplicity.

Post Reply