Difference between revisions of "Tutorials/Coding Life simulators/bitwise SWAR Life"

From LifeWiki
Jump to navigation Jump to search
m (you did not see anything (it was a uhhh misclick))
m (minor corrections of forgone words (it seems as though I deleted them by accident))
Line 14: Line 14:
We know the amount to shift left and right to displace by one cell, four bits, so we add to the state itself shifted by 4 left and right. It will be encoded with consecutive rows concatenated, so to shift the state up and down we will shift by four bits times the width. (We can enact the summing in one dimension then the other to only have four shift-and-OR operations instead of eight.)
We know the amount to shift left and right to displace by one cell, four bits, so we add to the state itself shifted by 4 left and right. It will be encoded with consecutive rows concatenated, so to shift the state up and down we will shift by four bits times the width. (We can enact the summing in one dimension then the other to only have four shift-and-OR operations instead of eight.)


Once we have this summed list, we would like a function that maps specific sum values to 0001 and others to 0000, we can whether a cell's sum value matches a given value by XORing it with the value's NOT (so that only cells with the value become 1111) then "folding" by ANDing with itself shifted right by two bits (such that the third and fourth 1s remain so only if the first and second are also), then one bit (such that the fourth 1 depends on all four), then discarding the preceding three by ANDing with 0001. This all seems at first a very arbitrary way to do it but each operation can be done across all cells simultaneously by enacting it upon the entire integer. This can be done once for value 0011 and again for 0100 (ANDed with the original state to ensure only on cells with four neighbours including themselves remain on), then these can be ORed together to yield the state after an iteration.
Once we have this summed list, we would like a function that maps specific sum values to 0001 and others to 0000, we can determine whether a cell's sum value matches a given value by XORing it with the value's NOT (so that only cells with the value become 1111) then "folding" by ANDing with itself shifted right by two bits (such that the third and fourth 1s remain so only if the first and second are also), then one bit (such that the fourth 1 depends on all four), then discarding the preceding three by ANDing with 0001. This all seems at first a very arbitrary way to do it but each operation can be done across all cells simultaneously by enacting it upon the entire integer. This can be done once for value 0011 and again for 0100 (ANDed with the original state to ensure only on cells with four neighbours including themselves remain on), then these can be ORed together to yield the state after an iteration.


A problem with this way is that the leftmost cell in a row will believe the rightmost one in an adjacent row to be to its left, so there will be horizontal toroidal scrolling but not quite aligned with a square, and the top and bottom edges will be bounded. To fix this, we will create a one-cell margin along each edge, that we can put shifted copies of the inside of the opposing edge into to get aligned toroidal scrolling as well (and even reverse them to get other manifolds (Möbius strips, Klein bottles and real projective planes)).
A problem with this way is that the leftmost cell in a row will believe the rightmost one in an adjacent row to be to its left, so there will be horizontal toroidal scrolling but not quite aligned with a square, and the top and bottom edges will be bounded. To fix this, we will create a one-cell margin along each edge, that we can put shifted copies of the inside of the opposing edge into to get aligned toroidal scrolling as well (and even reverse them to get other manifolds (Möbius strips, Klein bottles and real projective planes)).
Line 66: Line 66:
manifold=(2,2)
manifold=(2,2)
FPS=10
FPS=10
</pre>Our simulator will work with arbitrary INT rules, but recognise OT ones as such to simulate in the more efficient way. (You could also make special-case optimisations for other things like INT rules [[von Neumann neighbourhood|von Neumann rules]].)
</pre>Our simulator will work with arbitrary INT rules, but recognise OT ones as such to simulate in the more efficient way. (You could also make special-case optimisations for other things like INT rules encoding [[von Neumann neighbourhood|von Neumann rules]].)
The index of each of the 512 neighbourhood states is the number of which it's the binary representation, ie.<pre>
The index of each of the 512 neighbourhood states is the number of which it's the binary representation, ie.<pre>
  oo
  oo

Revision as of 21:33, 27 December 2022

So you've decided to make a Life simulator that encodes states as integers' binary representations? Excellent!

Prerequisites

This will be teaching it in Python (using only the standard library), however the idea can be implemented irrespective of language. Note that though Python is often considered slow (due to being an interpreted language), bitwise operators on its integers take fixedly many Python instructions, so the only thing that changes as they become arbitrarily long is the underlying C implementation, which is much more efficient. This allowed David Buchanan's implementation to run at 1080p60fps (that is, 124416000 cells/second) on his machine.

Bitwise operators are functions represented by single infix symbols, like +, -, * and /, except mapping boolean logical functions across bits of numbers' binary representations. ~ maps NOT, and by two's complement is equivalent to taking -1 minus the input (which is useful for reflecting upper-exclusive zero-indexed list indices[n 1]), | maps OR, ^ maps XOR, & maps AND and << and >> shift the first parameter left and right respectively by numbers of bits corresponding with the second (ie. 0b101<<2=0b10100), right-shifts truncating noninteger parts (so they correspond with multiplications and floor-divisions respectively by 2**(their second input)).

A reasonably good grasp of Python, in particular its list comprehensions and lambda functions, is helpful.

Idea

We create an integer of binary length equal to the simulation area's width times height times cell length. For outer-totalistic rules, every cell will be a subsequence of four bits within the integer, off will be represented by 0000 and on by 0001. This seems inefficient but in the process of simulating an iteration we will generate an intermediate integer where each cell is instead the sum of its neighbourhood including itself (where 1001 is with all nine cells being on), then we will identify the cells that are off with three on neighbours or on with two or three (though because they include themselves, it is instead three or four) by generating other integers that have 0001 in only these cells, then we will OR these together to get the resultant state.

We know the amount to shift left and right to displace by one cell, four bits, so we add to the state itself shifted by 4 left and right. It will be encoded with consecutive rows concatenated, so to shift the state up and down we will shift by four bits times the width. (We can enact the summing in one dimension then the other to only have four shift-and-OR operations instead of eight.)

Once we have this summed list, we would like a function that maps specific sum values to 0001 and others to 0000, we can determine whether a cell's sum value matches a given value by XORing it with the value's NOT (so that only cells with the value become 1111) then "folding" by ANDing with itself shifted right by two bits (such that the third and fourth 1s remain so only if the first and second are also), then one bit (such that the fourth 1 depends on all four), then discarding the preceding three by ANDing with 0001. This all seems at first a very arbitrary way to do it but each operation can be done across all cells simultaneously by enacting it upon the entire integer. This can be done once for value 0011 and again for 0100 (ANDed with the original state to ensure only on cells with four neighbours including themselves remain on), then these can be ORed together to yield the state after an iteration.

A problem with this way is that the leftmost cell in a row will believe the rightmost one in an adjacent row to be to its left, so there will be horizontal toroidal scrolling but not quite aligned with a square, and the top and bottom edges will be bounded. To fix this, we will create a one-cell margin along each edge, that we can put shifted copies of the inside of the opposing edge into to get aligned toroidal scrolling as well (and even reverse them to get other manifolds (Möbius strips, Klein bottles and real projective planes)).

To simulate isotropic non-totalistic rules (and even MAP ones), we can either have each cell be instead a 3*3 square within the integer's row-wise representation (so that it can contain its neighbourhood shrunk to a bit per cell) or an 8*1 rectangle (to contain all neighbour cells excluding itself (for which we will use its initial value from before the summation)), our program will support both (determined by the byteINT variable).

We want a byte per cell for memory efficiency (a 12.5% improvement seems minor but lets cells at indexes be more efficiently queried in arrays packed without delimiters due to processor word sizes), and so that we can encode it to a YUV4MPEG video, which accepts a byte per pixel. (Buchanan's implementation of Conway's game of life uses four bits per cell and pipes its output through gzip, which is able to efficiently inflate it by inserting 0000s preceding cells.)

To fill each cell's 8*1 rectangle with its neighbours excluding itself, we can think of the problem equivalently as mapping it to a different index within each.

        |        |        
--------+--------+--------
        |       o|        
--------+--------+--------
        |        |        
First, we OR together itself shifted by (1,0), (0,1) and (-1,-1)
        |       o|        
--------+--------+--------
        |        |o       
--------+--------+--------
        |      o |        
Then we OR this with (itself shifted by (3,0) AND a constant that is 01100000 for each bit) and itself shifted by (-11,0)
    o   |       o|  o     
--------+--------+--------
     o  |        |o       
--------+--------+--------
   o    |      o | o      
The indexes of the state of the central cell within each neighbour are thus
    3   |       0|  5     
--------+--------+--------
     2  |        |7       
--------+--------+--------
   4    |      1 | 6      

So after this is enacted, each cell's last bit will be the cell south of it, the second-last will be north, then west, north-west, south-west, north-east, south-east and east.

First steps

Because we will be printing to the terminal (to begin with), the patterns will be small so will need to be simulated more slowly than the maximum speed the computer can handle (which will make all of the interesting things happen before we can see), so we will import the sleep() function.

functools' reduce() is a function that, given an input function (with two inputs), iterable (a list, tuple or generator expression) and optional initial value (without which it will defer to the iterable's first element), computes the function on the initial value and the first element of the iterable, then on the result of this and the second element, and so on (ie. reduce(int.__multiply,range(1,5))=((1*2)*3)*4), which we will use to OR together integers (more efficiently than by summing them). This will not be used during the simulation itself, only initialisation of constants.

itertools' groupby() intakes an iterable and returns a run-length encoding of it, which will be useful for generating RLEs of our patterns.

lap() is a function for our convenience, like map() but returning a list instead of a generator expression (like Python 2).

manifold will be, for axes (x,y), whether neighbourhoods exceeding the edge should perceive the outside as being empty (0), scroll around in the axis (1) or scroll and flip position in the other axis (2).

from time import sleep
from functools import reduce
ORsum=(lambda l: reduce(int.__or__,l,0))
from itertools import groupby
lap=(lambda func,*iterables: list(map(func,*iterables)))
digits={"0","1","2","3","4","5","6","7","8","9"}
manifold=(2,2)
FPS=10

Our simulator will work with arbitrary INT rules, but recognise OT ones as such to simulate in the more efficient way. (You could also make special-case optimisations for other things like INT rules encoding von Neumann rules.) The index of each of the 512 neighbourhood states is the number of which it's the binary representation, ie.

 oo
oo 
 o 

is 0b011_110_010=242.

transitions=tuple(a+t for a in ("b","s") for t in ("0c","1c","2c","3c","2n","4c","1e","2a","2k","3n","3i","4n","3q","3y","4y","5e","2e","3a","3j","4a","4w","5a","3k","4q","4k","5j","5k","6e","3e","4r","4j","5n","5i","6a","5q","5y","6k","7e","2i","3r","4i","4t","5r","4z","6i","4e","5c","6c","7c","6n","8c"))
unreducedTransitions=(0,1,6,7,1,2,7,10,6,7,16,17,8,9,18,19,51,52,57,58,52,53,58,61,57,58,67,68,59,60,69,70,6,8,16,18,7,9,17,19,38,39,28,29,39,40,29,32,57,59,67,69,58,60,68,70,89,90,79,80,90,91,80,83,1,2,8,9,4,3,12,11,7,10,18,19,12,11,20,21,52,53,59,60,55,54,63,62,58,61,69,70,63,62,71,72,8,13,22,24,12,14,23,25,39,41,30,31,43,42,34,33,59,64,73,75,63,65,74,76,90,92,81,82,94,93,85,84,6,8,38,39,8,13,39,41,16,18,28,29,22,24,30,31,57,59,89,90,59,64,90,92,67,69,79,80,73,75,81,82,16,22,28,30,18,24,29,31,28,30,45,46,30,35,46,47,67,73,79,81,69,75,80,82,79,81,96,97,81,86,97,98,7,9,39,40,12,14,43,42,17,19,29,32,23,25,34,33,58,60,90,91,63,65,94,93,68,70,80,83,74,76,85,84,18,24,30,35,20,26,34,36,29,31,46,47,34,36,49,48,69,75,81,86,71,77,85,87,80,82,97,98,85,87,100,99,1,4,8,12,2,3,9,11,8,12,22,23,13,14,24,25,52,55,59,63,53,54,60,62,59,63,73,74,64,65,75,76,7,12,18,20,10,11,19,21,39,43,30,34,41,42,31,33,58,63,69,71,61,62,70,72,90,94,81,85,92,93,82,84,2,3,13,14,3,5,14,15,9,11,24,25,14,15,26,27,53,54,64,65,54,56,65,66,60,62,75,76,65,66,77,78,9,14,24,26,11,15,25,27,40,42,35,36,42,44,36,37,60,65,75,77,62,66,76,78,91,93,86,87,93,95,87,88,7,12,39,43,9,14,40,42,18,20,30,34,24,26,35,36,58,63,90,94,60,65,91,93,69,71,81,85,75,77,86,87,17,23,29,34,19,25,32,33,29,34,46,49,31,36,47,48,68,74,80,85,70,76,83,84,80,85,97,100,82,87,98,99,10,11,41,42,11,15,42,44,19,21,31,33,25,27,36,37,61,62,92,93,62,66,93,95,70,72,82,84,76,78,87,88,19,25,31,36,21,27,33,37,32,33,47,48,33,37,48,50,70,76,82,87,72,78,84,88,83,84,98,99,84,88,99,101)
totalTransitions=[{"c"},
                  {"c","e"},
                  {"c","e","k","a","i","n"},
                  {"c","e","k","a","i","n","y","q","j","r"},
                  {"c","e","k","a","i","n","y","q","j","r","t","w","z"},
                  {"c","e","k","a","i","n","y","q","j","r"},
                  {"c","e","k","a","i","n"},
                  {"c","e"},
                  {"c"}]
  • transitions is the list of names of INT transitions.
  • unreducedTransitions is the index within transitions of each neighbourhood state (as computed by manually annotating each of the 102 transitions' canonical (lexicographically minimal) representation and reducing then indexing each neighbourhood state).
  • totalTransitions is the table from the INT article.

Initialisation

Now we will create a function for setting the rule from a rulestring.

def setRule(string):
    global rulestring,OT,byteINT,ruleTotalTransitions,rule,totalSums,minuses,rulestring,RLErulestring,dualRule,cellWidth,cellHeight,cellBits
    rulestring=string.lower().replace("/","").replace("b","").split("s")
    byteINT=True
    ruleTotalTransitions=[[set() for __ in range(9)] for _ in range(2)]
    for i,r in enumerate(rulestring):
        accumulator=""
        for t in reversed(r):
            if t in {"0","1","2","3","4","5","6","7","8"}:
                ruleTotalTransitions[i][int(t)]=(totalTransitions[int(t)] if accumulator=="" else totalTransitions[int(t)]-set(accumulator) if "-" in accumulator else set(accumulator))
                accumulator=""
            else:
                accumulator+=t
    #print(ruleTotalTransitions)
    rule={("s" if i else "b")+str(j)+t for i,s in enumerate(ruleTotalTransitions) for j,ss in enumerate(s) for t in ss}
    totalSums=tuple(tuple(sum((("s" if i else "b")+str(j)+t in rule) for t in s) for j,s in enumerate(totalTransitions)) for i in range(2))
    minuses=tuple(tuple(--0--len(t)//2<j<len(t) for j,t in zip(i,totalTransitions)) for i in totalSums)
    (rulestring,RLErulestring)=(b+s.join("".join("".join(str(j)*(y!=0)+("-"*m+"".join(t for k,t in enumerate(u) if (("s" if i else "b")+str(j)+str(t) in rule)^m))*(y!=len(u))) for j,(y,u,m) in enumerate(zip(w,totalTransitions,x))) for i,(w,x) in enumerate(zip(totalSums,minuses))) for b,s in zip(("b","B"),("s","/S")))
    OT=not(any(any((i in j) for j in rulestring) for i in totalTransitions[4]))
    dualRule=[lap(rule.__contains__,transitions[51*i:51*(i+1)]) for i in range(2)] #the rule that is dual
    rule=lap(rule.__contains__,transitions)
    checkTransition=(lambda transition: rule[transitions.index(transition)])
    rule=lap(rule.__getitem__,unreducedTransitions)
    cellWidth=(4 if OT else 8 if byteINT else 3)
    cellHeight=(1 if OT or byteINT else 3)
    cellBits=cellWidth*cellHeight
setRule(input("Which rule would you like? "))
  • rulestring is the sanitised input.
  • byteINT (as mentioned earlier) is whether the cells are 8*1 instead of 3*3 (True for our purposes).
  • ruleTotalTransitions is, for each OT transition, the set of INT ones included.
  • rule begins as the set of names of enabled INT transitions then becomes a list of 102 booleans of whether each INT transition is enabled, then 512 for each neighbourhood state (meaning this implementation works just as well with MAP rules).
  • totalSums is the number of enabled INT transitions in each OT transition.
  • minuses is, for each OT transition, whether there is conventionally a minus and a string of excluded INT transitions instead of included (ie. 'b3-qs23'), decided by whether it would minimise the length (true only if it's shorter to do so).
  • rulestring is in the Catagolue format (ie. 'b3s23').
  • RLErulestring is in the RLE format (ie. 'B3/S23').
  • OT is whether the rule is outer-totalistic.
  • dualRule is the rule split into birth and survival conditions.
  • cellBits is the number of bits in a cell, cellWidth is per row.

Now we will create another function, for setting the board's dimensions and some other parameters that we will use later for convenience given a width and height.

def setBoard(w,h):
    global boardWidth,boardHeight,WIDTH,HEIGHT,STATE_BYTE_LENGTH,innerCOLSHIFT,COLSHIFT,WRAPSHIFT,BIAS,WRAP_MASK,lastColumn,firstColumn,MASK_1,emptyColumns
    boardWidth=w;boardHeight=h
    WIDTH=boardWidth+2;HEIGHT=boardHeight+2
    STATE_BYTE_LENGTH=(boardWidth*boardHeight//2**OT if OT or byteINT else --0--9*boardWidth*boardHeight//8)
    innerCOLSHIFT=cellWidth*boardWidth
    COLSHIFT=cellBits*WIDTH
    WRAPSHIFT=COLSHIFT*boardHeight
    BIAS=((WIDTH+1)*cellWidth if OT or byteINT else (WIDTH*3+1)*4)
    unitNeighbourhood=(0xf if OT else 0xff if byteINT else ORsum(0b111<<i*WIDTH*3 for i in range(3)))
    WRAP_MASK=ORsum(unitNeighbourhood<<cellWidth*i+COLSHIFT for i in range(WIDTH))
    lastColumn=ORsum(unitNeighbourhood<<i*COLSHIFT for i in range(HEIGHT))
    firstColumn=ORsum(unitNeighbourhood<<(i+(OT or byteINT))*COLSHIFT-(cellWidth if OT or byteINT else 3*(1-WIDTH)) for i in range(HEIGHT))
    BLIT_MASK_1=( int.from_bytes(WIDTH*HEIGHT//2**OT*(b"\x11" if OT else b"\x01"),"little")
                 if OT or byteINT else
                  ORsum(1<<cellWidth*(x+cellHeight*WIDTH*y) for x in range(WIDTH) for y in range(HEIGHT))<<(1+3*WIDTH)*(not (OT or byteINT)))
    MASK_1=(1<<COLSHIFT*(HEIGHT-1))-1&~((1<<COLSHIFT)-1|firstColumn|lastColumn)&BLIT_MASK_1
    emptyColumns=ORsum(0b1100000<<8*x+COLSHIFT*y for x in range(WIDTH) for y in range(HEIGHT))

This seems like a lot but doesn't all need to be remembered at once.

  • boardWidth and boardHeight correspond with the simulation area within the margins, WIDTH and HEIGHT include them.
  • COLSHIFT is the number of bits to shift by a cell vertically.
  • innerCOLSHIFT is the number to shift from an outermost cell inside the margin in a column to the one in the opposing margin, and WRAPSHIFT correspondingly from a top inner edge cell to a bottom outer one (or bottom inner to top outer).
  • BIAS is the number of bits to displace the final bit in an integer (representing the smallest value) to correspond with the bit of the final cell within the margins.
  • unitNeighbourhood is only used for generating the others and is the amount by which to multiply a 1 at the index of each cell to get the cell's "footprint."
  • BLIT_MASK_1 is 0001 for each cell (or for the other modes, 1 in only the bit representing itself in the sums).
  • WRAP_MASK is the bottommost[n 2] row within the margins (we AND with it before shifting up to avoid spilling over above it), lastColumn and firstColumn are similar but for horizontal scrolling (we don't need one for downward scrolling due to bitshifts' truncation).
  • MASK_1 is BLIT_MASK_1 but only inside the margins.
  • emptyColumns is the byteINT-specific constant mentioned earlier, 01100000 for all cells.

Now we will create the masks by which to XOR before the folding to determine cells matching their neighbourhood sums. Remembering that sums for cells' survival transitions will contain themselves so are incremented, for Conway's game of life, the b3s23 rule becomes b3s34. For any OT rule, we can group these sums into three categories. 3 is mutual to both conditions, so once we have the integers for cells with neighbour sums, those with three neighbours are on irrespective of their initial state, 4 is survival-only so is ANDed with the initial state first, and if there were to be a birth-only one, it would be ANDed with the initial state's NOT first. When there are multiple, the sums of each category are ORed together to generate masks.

def setTransitions():
    global OTtransitions,INTtransitions,irr,andnts,ands
    if OT:
        OTtransitions=tuple(tuple((str(i) in s) for i in range(9)) for s in rulestring.replace("b","").split("s"))
        (irr,andnts,ands)=(tuple(MASK_1*(0xf^i) for i in range(9) if OTtransitions[0][i]^o and OTtransitions[1][i-1]^a) for o in range(2) for a in range(1+(not o))) #(tuple(MASK_1*(15^i) for i,(b,s) in enumerate(OTtransitions[0],OTtransitions[1][-1:]+OTtransitions[1][:7]) if b^o and s^a) for o in range(2) for a in range(1+(not o)))
    else:
        if byteINT:
            (irr,andnts,ands)=(tuple({MASK_1*ORsum((~i>>j&1)<<(5,0,3,7,-1,2,6,1,4)[j] for j in tuple(range(4))+tuple(range(5,9))) for k,(b,s) in enumerate(zip(*dualRule)) if b^o and s^a for i,r in enumerate(unreducedTransitions) if r%51==k and (s if r//51 else b)}) for o in range(2) for a in range(1+(not o)))
        else:
            INTtransitions=tuple(MASK_1*ORsum((~i&1<<3*j)<<3*(WIDTH-1)*j for j in range(3))>>WIDTH*3+1 for i,r in enumerate(rule) if r)

The OT one is simple enough (they're multiplied by MASK_1 because they're applied over all cells simultaneously, and remember, we NOT the desired neighbourhood states so that those that match will XOR with these NOTs for all-1s), the INT transitions for the 3*3 mode don't need to be categorised to be considered with the initial state because it's stored within each neighbourhood (which, you will see later, makes it easier to implement). The byteINT one iterates over zip(*dualRule), which transposes it (meaning that instead of being a list of lists of transitions for each of birth and survival, it's a list of lists of two booleans (whether its birth and survival conditions are met) for each transition), and for each one checks whether it satisfies the combination of birth and survival it's iterating over in the outermost loop (for irr,andnts,ands) and if so, iterates over unreducedTransitions and, for each that has the same transition and has its corresponding condition (survival or birth) satisfied by the dualRule element, generates the mask for each cell by converting the neighbour indexes from our previous indexing convention to the byteINT one with the otherwise-mysterious-looking tuple, then removes duplicates generated by this method by converting it to a set and back.

String representations

Now that we have our constants set up, we will make a function for printing a board state, given whether to exclude the cells in the margins and the symbols to use for off and on cells and line-joins.

boardStr=(lambda board,inner=True,b='b',o='o',j='\n': j.join((lambda i: "".join(o if i>>cellWidth*j&1 else b for j in range(inner,i.bit_length()//cellWidth+2-inner)))(board>>(cellWidth*WIDTH*i if OT or byteINT else 3*WIDTH*(i*3+1)+1)&(1<<cellWidth*WIDTH)-1) for i in range(inner,HEIGHT-inner)))

Do not fear, we will explain. We begin with

range(inner,HEIGHT-inner)

, then apply

board>>(cellWidth*WIDTH*i if OT or byteINT else 3*WIDTH*(i*3+1)+1)&(1<<cellWidth*WIDTH)-1

to get the integer for the row, then

lambda i: "".join(o if i>>cellWidth*j&1 else b for j in range(inner,i.bit_length()//cellWidth+2-inner))

to map it to o's and b's (note that we would use WIDTH-inner for the upper(-exclusive) bound but this way we truncate the iteration and don't print trailing off cell), then join these rows with the joining symbol

def printBoard(board,inner=True,symbol="="):
    print(symbol*(boardWidth if inner else WIDTH)+"\n"+boardStr(board,inner,' ','o','\n'))

We include a row of the symbol at the top so we have lines delimiting frames when printing them out consecutively, \n is the shortcut for the line-break character.


Our RLE-generating function will have a setting for whether to use the convention for RLEs seen in Mark Niemiec's posts (where two consecutive ons or offs are represented as 'oo' or 'bb' instead of '2o' or '2b') for additional whimsy.

niemiec=True
RLE=(lambda board,inner=True,includeRule=False: ("x ="+str(boardWidth)+", y = "+str(boardHeight)+", rule = "+RLErulestring+"\n")*includeRule+"".join((lambda l,q: l*2 if niemiec and q==2 else l if q==1 else str(q)+l)(l,len(list(q))) for l,q in groupby(boardStr(board,inner,'b','o','$')))+"!")

We use boardStr, put it through groupby() and convert each element (l,q) of its output to l,len(list(q)) (because it returns an iterable of tuples of element values and lists of occurrences (which here are all only copies of it but are useful to have instead of only the quantity when you use a function as a key)), then we apply

lambda l,q: l*2 if niemiec and q==2 else l if q==1 else str(q)+l

and join it together, and attach the header and exclam. To load RLEs, we can't do any such nice closed forms, because we must change variables in-place and iterate over things across text at intervals unknown prior. This is a function for proceeding to the numbers or rule in the header, for our convenience.

def proceed(RLE,i):
    while RLE[i] in set("x y, =rule"):
        i+=1
    return(i)

We will begin with the header-parsing part.

def loadRLE(RLE):
   if RLE[0]=='x':
       i=1
       size=[]
       for di in range(2):
           stringer=""
           i=proceed(RLE,i)
           for i,r in enumerate(RLE[i:],start=i):
               if r in digits:
                   stringer+=r
               else:
                   break
           size.append(int(stringer))
       setBoard(*size)
       i=proceed(RLE,i)
       setRule(RLE[i:(i:=RLE.index('\n'))])
   else:
i=0

For each dimension, it skips along with proceed(), then iterates through and appends characters to its stringer while they're digits, then sets the rule to the part remaining until the linebreak. Now for parsing the RLE itself!

    arm=0
    output=0
    endOfTheLine=False
    stringer="" #like finger
    #print(RLE[i:])
    for r in RLE[i:].replace(' ',''):
        if r in digits:
            stringer+=r
        else:
            stringer=(int(stringer) if stringer else 1)
            if r=="$":
                arm=(arm//boardWidth-endOfTheLine+stringer)*boardWidth
                endOfTheLine=False
            else:
                if r=="o":
                    output|=ORsum(1<<s%boardWidth*cellWidth for s in range(arm,arm+stringer))<<arm//boardWidth*COLSHIFT+BIAS
                elif r=="!":
                    break
                if r and r!='\n':
                    arm+=stringer
                    endOfTheLine=not(arm%boardWidth)
            stringer=""
    return(output)

arm tells us the location in the output integer to be appending to (by ORing with), stringer accumulates characters while they're digits then converts to an int to be the number of times to enact whichever character breaks the run of digits. We want arm to be a multiple of boardWidth, so we floor-divide it by boardWidth then multiply it back up, but in between add the stringer (to go down stringer times). However, if the arm has already been incremented to reach the end of the line, it will wrap around and floor-divide to one greater than the line index, so endOfTheLine is used to tell whether it has just been shifted by a $ or such a case has occurred. The output-ORing part isn't too complicated either, it inflates the range to an integer with cellWidth intervals, then shifts it by the y position (which from arm's perspective is inside the margins so must be converted) and shifts by the BIAS.

Manifolds

This section can be skipped over if you would like only to get to the simple implementation, but the parts are explained in roughly the order you will have to define them.

As mentioned earlier in setBoard(), we are going to shift the board and AND with "masks" to copy inner edge cells into opposing margin ones, to include in the sums on the opposite side of the board. This is reasonably simple for toruses, but the other ones require reversing the cells' order within the edges. This would be trivial if they were stored in lists or tuples, we would only call reversed() on them, but with bits in a binary representation we must be slightly more clever. The Stanford bit-twiddling hacks page explains that you can do it for a length-2**n number in 5*n operations by swapping pairs of substrings of increasing length, for instance:

v=v>> 1&0x55555555|(v&0x55555555)<< 1
v=v>> 2&0x33333333|(v&0x33333333)<< 2
v=v>> 4&0x0F0F0F0F|(v&0x0F0F0F0F)<< 4
v=v>> 8&0x00FF00FF|(v&0x00FF00FF)<< 8
v=v>>16&0x0000FFFF|(v&0x0000FFFF)<<16

For lengths that are not powers of 2, you can reverse them by shifting them left to the centre of an integer of the next power of 2 up, enacting it for that and shifting them back right. For instance, for length 48,

x<<=8 x=x>> 1&0x5555555555555555|(x&0x5555555555555555)<< 1 x=x>> 2&0x3333333333333333|(x&0x3333333333333333)<< 2 x=x>> 4&0x0F0F0F0F0F0F0F0F|(x&0x0F0F0F0F0F0F0F0F)<< 4 x=x>> 8&0x00FF00FF00FF00FF|(x&0x00FF00FF00FF00FF)<< 8 x=x>>16&0x0000FFFF0000FFFF|(x&0x0000FFFF0000FFFF)<<16 x=x>>32&0x00000000FFFFFFFF|(x&0x00000000FFFFFFFF)<<32 x>>=8

However, by recursively applying simple arithmetic across each of these lines (bringing common shifts through), you can simplify it from 14 to 9 bitshifts.

x=x &0x00AAAAAAAAAAAAAA|(x&0x0055555555555555)<< 2 x=x &0x0199999999999999|(x&0x0066666666666666)<< 4 x=x &0x0787878787878787|(x&0x0078787878787878)<< 8 x=x>> 7&0x00FF00FF00FF00FF|(x&0x007F807F807F807F)<< 9 x=x>>16&0x0000FFFF0000FFFF|(x&0x0000FFFF0000FFFF)<<16 x=x>>40&0x0000000000FFFFFF|(x&0x00000000FFFFFFFF)<<24

Note that with both functions, the input integer doesn't remain its original length, in the intermediate representation they must 'cast' some bits to indexes at the left edge of the next power of 2 up from the input width (ie. a 48-bit integer will become 64-bit), however this isn't a problem because most programming languages only provide builtin types for integers of power-of-2 lengths anyway. To run it as quickly as possible, we would like to make a function like this second one, for a specified length. It will be faster if it has no loops and uses constants instead of variables, so we will make a function for generating its definition, then execute it to define the function. First, a function for generating the mask to use in each iteration of the unsimplified version

andMasks=(lambda inputWidth: (lambda bitWidth: tuple(reduce(lambda n,j: n|n<<(1<<j),range(i+1,bitWidth),(1<<(1<<i))-1) for i in range(bitWidth)))((inputWidth-1).bit_length()))

Do not worry, this is our only use of reduce() outside ORsum(). We first find the exponent of the smallest power of 2 that is greater than or equal to the width with

(inputWidth-1).bit_length()

(that is the number of iterations the function we generate must run), use it as a range, use

(1<<(1<<i))-1

(that creates an integer of 2**i consecutive 1s), then call

reduce(lambda n,j: n|n<<(1<<j),range(i+1,bitWidth))

initialised with this to tile it to the entire width (like the idea with the folding with ANDs but in reverse). Now with this, we can begin.

Our convention is to encode each line (element of the masks list) as two tuples (one for either side of the OR) of three elements, the 0th of which being the mask with which to AND, the 1th the bitshift and the 2th the boolean of whether the AND is to be enacted before the bitshift. When called with axis=0, it returns only a function for reversing an integer of the specified length, however with axis=1, it will generate one for Möbius scrolling in the x axis (reversing the vertical edges on the left and right sides), and with axis=2 in the y axis. We use the same process but then 'inflate' the bits we move so there are three we ignore in between each.

def reversalParameters(axis=0,inputWidth=WIDTH):
    if axis:
        inputWidth=(WIDTH if axis==2 else HEIGHT)
    masks=[[[m,1<<i,False],[m,0,True]] for i,m in enumerate(andMasks(inputWidth))]
    exceeding=(1<<(inputWidth-1).bit_length())-inputWidth
    for i,m in enumerate(masks):
        m[1][1]=m[0][1]-2*(1<<i)
    bytewise=(lambda m: [ORsum((m[0]&1<<i)<<(cellWidth-1)*i for i in range(1<<(inputWidth-1).bit_length()))<<(0 if OT or byteINT else 3*WIDTH+1),cellWidth*m[1],m[2]])
    if axis:
        masks=[[( [(lambda n: n|n<<cellWidth*(WIDTH-1))(ORsum((n&1<<(cellWidth*i if OT or byteINT else 3*(WIDTH+i)+1))<<cellWidth*(cellHeight*WIDTH-1)*i for i in range(1<<(HEIGHT-1).bit_length()))),s*cellHeight*WIDTH,o]
                 if axis==1 else
                  [n|n<<COLSHIFT*(HEIGHT-1),s,o]) for n,s,o in map(bytewise,m)] for m in masks]
    conditionalSwap=(lambda x,b,t,f,br: '('*br+x+(t+')'*br+f if b else f+')'*br+t))
    indent=(lambda i,t: " "*((i-len(t))*(i>len(t)))+t)
    marge=[max(len(str((abs if i else hex)(n[i]))) for m in masks for n in m) for i in range(2)]
    return("def reverseBits"+("Y" if axis==2 else "X" if axis else "")+"(x):\n    "+"\n    ".join("x="+'|'.join(conditionalSwap('x',(n[2] and n[1]),"&"+indent(marge[0],str(hex(n[0]))),((">>" if n[1]>0 else "<<")+indent(marge[1],str(abs(n[1]))) if n[1] else ' '*(2+marge[1])),(n[1] and n[2])) for n in m) for m in masks)+"\n    return(x)")
  • exceeding tells us by how much the first power of 2 up from the input width exceeds it (and is 0 if it's a power of 2 itself).
  • bytewise enacts the bitwise inflation mentioned earlier. Ironically, this could be done in logarithmic time instead of linear similarly to the reversal that it's for (except instead shifting first the left half then the left half of each half in its new position and so on), but it only runs once at the beginning of the program so doesn't matter. Note that due to the maximum length in an intermediate phase being the first power of two up, we must use that as our range's upper limit, not inputWidth itself.
  • conditionalSwap creates the function, given a name for the variable (x), two operations (t and f) a boolean (b) of whether to enact t inside (the aforementioned 2th element of the tuple for this side of the OR), and whether to include brackets (not necessary if the bitshifts are inside due to them taking precedence over bitwise AND).
  • indent is for formatting our output function definitions nicely (with operations lined up across rows :-).
  • marge has the margins for the hexadecimal bitmasks and denary bitshifts.

If we're enacting scrolling in the y axis, the bytewise-inflated one is already lined up with the bottom edge, we could generate this function again for the top edge by shifting the masks up to the top of the board and run one version on the bottom row and the other on the top, however because the shifts are by the same values we can OR these functions' masks together and input the rows ORed together. For the x axis (vertical edges on the left and right sides), we enact inflation again, from intervals of cellWidth to cellWidth*cellHeight*WIDTH (COLSHIFT) (from the bottom row to the rightmost column), then OR it with itself shifted to the leftmost column to be able to do the same thing.

It seems odd that we should set the second shift value to 0 then subtract twice the calculation for the first one's value from its value instead of only setting it to -(1<<i), however this function shown above only works for power-of-2 values of WIDTH and HEIGHT, it allows us to make others work (and implement our optimisations for them) by adding to and subtracting from only the first shift value then generating the second from that. In between the line assigning exceeding and the one amending the shifts, we will add

    if exceeding:
        endShifts=[-exceeding//2,exceeding//2]
        for i,m in enumerate(masks):
            #print(m[0][1])
            shifter=(lambda i,m: min(m[0][1],m[0][1]-2*(1<<i),key=abs))
            if -endShifts[0]<=shifter(i,m):
                m[0][1]+=endShifts[0]
                m[1][0]>>=-endShifts[0]
                break
            else:
                m[0][0]>>=m[0][1]-endShifts[0]
                m[0][1]-=endShifts[0]
                m[0][2]=True
                m[1][0]>>=-endShifts[0]
                common=(shifter(i,m) if m[0][1]^(m[0][1]-2*(1<<i))>0 else 0)
                if common:
                    endShifts[0]=-common
                    if endShifts[0]<m[0][1]:
                        m[0][1]=0
                else:
                    break
        masks[-1][0][0]>>=endShifts[1]
        masks[-1][0][1]+=endShifts[1]

Our convention is that positive shift values are rightwards and negative leftwards.

  • endShifts are the amount by which to bitshift at the beginning and end, the for loop is continually applying the first one to the masks to move the bitshifts to their outside so that the mutual one can become endShifts' next value.
  • shifter returns the minimum absolute value of the two shifts.
  • common is the minimum value if they have the same sign, after the end shift has been brought forwards, the loop from the beginning stops once they have not and there is no longer anything to carry through.

The endShifts line works because, while floor division (that rounds towards minus infinity) shares precedence with regular so would be enacted before subtraction (ie. 0-exceeding//2=0-(exceeding//2)), negation supersedes it (ie. -exceeding//2=(-exceeding)//2). (See the spaceship operator.)

Now we must only execute it to define our functions reverseBitsX() and reverseBitsY()

for i in range(3):
    #print(reversalParameters(i))
    exec(reversalParameters(i))

Simulation

This is the exciting part, and due to all of our earlier work can be implemented very elegantly

def iterateCellular(state):
    if manifold[0]:
        state|=(lambda x: reverseBitsX(x) if manifold[0]==2 else x)(state>>innerCOLSHIFT&lastColumn|state<<innerCOLSHIFT&firstColumn)
    if manifold[1]:
        state|=(lambda x: reverseBitsY(x) if manifold[1]==2 else x)(state>>WRAPSHIFT|(state&WRAP_MASK)<<WRAPSHIFT)
    summed=( (lambda s: (s>>COLSHIFT)+s+(s<<COLSHIFT))((state>>4)+state+(state<<4))
            if OT else
             (lambda s: s<<11|s|s>>3&emptyColumns)(state>>1|state>>COLSHIFT|state<<COLSHIFT+1)
            if byteINT else
             (lambda s: summed>>6*WIDTH|s|summed<<6*WIDTH)(state>>2|state|state<<2))
    masks=( [ORsum((lambda s: s if OT else s&s>>4)((lambda s: s&s>>2)((lambda s: s&s>>1)(summed^r))) for r in o) for o in (irr,andnts,ands)]
           if OT or byteINT else
            ORsum((lambda s: s>>3*WIDTH&s&s<<3*WIDTH)((lambda s: s>>1&s&s<<1)(summed^r)) for r in INTtransitions))
    return((state&masks[2]|masks[0]|masks[1]&~state if OT or byteINT else masks)&MASK_1)

The scrolling is done with single functions per axis on both edges, as explained, summed is calculated by adding in the x axis followed by the y for OT rules (or ORing for INT ones), and in the byteINT mode with the inner-exclusive neighbourhood mapping explained at the beginning. Then for each transition we XOR the summed state with it, 'fold' it twice (so that the final bit depends on the matching) and OR these together by category, then AND these with the initial state or its NOT or neither, before ORing these together (or not doing any of that in the non-byteINT mode (for which this function is thus slightly simpler to implement than an OT one)) and ANDing with the MASK_1 to get the next state. Now, to simulate it, we need only

while True:
    state=iterateCellular(state,manifold==(0,0))
    print(RLE(state))
    printBoard(state)
    sleep(1/FPS)

Footnotes

  1. ie. if you have an x position, x, in a width-n chessboard you're reflecting horizontally, it will be mapped to n+~x
  2. The use of 'up' and 'down' as continuations of left and right bitshifts to columns instead of rows is somewhat misleading, the printing functions are little-endian to be slightly shorter so rightwards and downwards in the representations you will see when running it correspond with leftshifts, but fortunately our rules are isotropic so it doesn't cause too much confusion.

External links