Pattern Storage Discussion Thread

A forum for topics that don't fit elsewhere. Introduce yourselves to other members of the forums, discuss how your name evolves when written out in the Game of Life, or just tell us how you found it. Forum rules still apply.
Post Reply
Mathemagician314
Posts: 153
Joined: November 15th, 2023, 3:15 pm
Location: Toroidal Universe MKA-84

Pattern Storage Discussion Thread

Post by Mathemagician314 » February 24th, 2024, 2:54 pm

I don't really know if this belongs in the sandbox or not, but I've been kind of obsessed with finding different ways of storing patterns, and I kinda just wanted to discuss it here. Here are some basic ones, to start.

RLE (run length encoded) is the most common way of storing patterns; here's Wikipedia's say on how that works. They're a very basic way to store patterns, listing the bounding box and rule, and then listing the data of the pattern (again, see the Wikipedia article for how that works). The most obvious weakness is that they're extremely long and unwieldy; they are meant to be simple for storing and reading, not a compact identifier for specific objects.

Only slightly less common than RLEs are apgcodes, which are primarily used on Catalogue. Lifewiki has a very detailed and helpful article about how they work (they primarily use extended Wechsler format) linked here. They are much more compact than RLEs, and also give detail about what kind of object you're looking at. For me, the one downside of apgcodes is that the prefixes seem a little random -- for example, why xp2 instead of just p2? This is a very small nitpick (as I'm having trouble finding downsides), and otherwise apgcodes are extremely useful in my opinion.

These two are most commonly used, but SOF (Small Object Format -- both human-readable and compact), Macrocell (for larger patterns), and Mcell are sometimes used as well. A full list of commonly used file formats is linked here on the wiki.

Now, I was trying to think about what other kind of data storage could work to be compact and efficient, and I devised this simple method which I call base-10 encoding, or B10E for short (although it may have been thought of before -- sorry about this post if it has):
  • The string will be in the form x-y-n where x and y are the length and height of the bounding box, respectively. Then, to find n, replace all live cells with 1's and all dead cells with 0's. Then, read this as a string, starting from the top left and reading row by row, left to right and top to bottom. This will be a number in binary. Convert this number to base-10, and that number will be n.
I'm not so good at explaining things :wink: , so maybe an example will help:
  • Take a glider. It has a three-by-three bounding box. Converting cells to 0's and 1's, we get

    Code: Select all

    010
    001
    111
    Putting this together, we get the number 010001111. Converting to base 10, this becomes 128 + 8 + 4 + 2 + 1 = 143. So the glider is base-10 encoded as 3-3-143.
The above is fully reversible -- 143 converts to binary as 10001111. Because we know the bounding box has an area of 9, but the number is 8 digits long, we know that we have to put one 0 at the beginning. Then, we know there are three cells per row, so we get an array of 0's and 1's that we can convert to live and dead cells easily.

I like this process because it seems very natural and almost no subjective decisions are made. Also, it can work in any base -- for example, a glider in base-3 encoding would become 10-10-12022, and a glider in base-16 encoding would become 3-3-8F. So technically, you could make it as compact as you want (of course, after base 36 or something like that you run out of letters to substitute for digits).

In addition, you could also add information for it if you wanted to; for example, adding the rule integer of an outer-totalistic rule to the beginning of the string, or adding a suffix somehow specifiying the type of object, much in the way that apgcodes do (for example, S11 for an 11-bit still life, O3P2 for a oscillator with an average population of 3 and a period of 2 or F9H2V0P4 for a 2c/4 orthogonal spaceship with an minimum population of 9*). With these two things, a glider in CGOL would be base-10 encoded as 6152-3-3-143-F5H1V1P4 -- a little unwieldy, but it conveys a lot of information and the prefix and suffix would seem less prominent when encoding a larger object.

Anyway, this thread can also be used just to discuss these kind of things, which I find slightly fascinating (along with anything else information-theory related). I don't know if The Sandbox is the right place for this, but this doesn't seem totally academic, so I put it here. A mod can move it if they feel it's necessary. :) Is my idea of base-10 encoding far-fetched, or is the system reasonable? Tips and suggestions appreciated.

*The F is for "fish", which spaceships are sometimes referred to as -- yes, it might just be to be different from apgcodes, but I like the look of the F, and it's not totally unrelated to spaceships like Q in apgcodes. Yes, I am way to fond of nitpicking apgcodes, and yes, I will fight you (in a kind and polite manner).
Can we make a (28,3)c/84 spaceship??

Code: Select all

x = 3, y = 4, rule = B3-e4i5-a/S2-i3-a4cr5e6c
o$obo$b2o$2o!
[[ THEME PCA ]]

Code: Select all

x = 6, y = 5, rule = 2-ak34/2kn3-r4aijnr5c/5
.3A$.ABA$DAD2A$.ABADC$.3A2B!
[[ THEME BLUES ]]
[currently inactive]

User avatar
confocaloid
Posts: 4643
Joined: February 8th, 2022, 3:15 pm
Location: https://catagolue.hatsya.com/census/b3s234c/C4_4/xp62

Re: Pattern Storage Discussion Thread

Post by confocaloid » February 24th, 2024, 11:28 pm

Relevant discussion: viewtopic.php?p=159330#p159375
Mathemagician314 wrote:
February 24th, 2024, 2:54 pm
In addition, you could also add information for it if you wanted to; for example, adding the rule integer of an outer-totalistic rule to the beginning of the string, or adding a suffix somehow specifiying the type of object, much in the way that apgcodes do (for example, S11 for an 11-bit still life, O3P2 for a oscillator with an average population of 3 and a period of 2 or F9H2V0P4 for a 2c/4 orthogonal spaceship with an minimum population of 9*). With these two things, a glider in CGOL would be base-10 encoded as 6152-3-3-143-F5H1V1P4 -- a little unwieldy, but it conveys a lot of information and the prefix and suffix would seem less prominent when encoding a larger object.
Why not reuse existing Pentadecathlon ID descriptors for that? You could still prepend a single letter at the beginning if you wanted to simplify distinguishing still lives, oscillators and spaceships. For example, "O48P3" for the pulsar (period 3, minimum population 48) and "F9P4H2V0" for the lightweight spaceship.

There are also rotor descriptors:
hkoenig wrote:
October 29th, 2022, 4:55 pm
There have been a number of posts lately from relative newcomers asking if a small oscillator they just encountered is known. Usually it is, but there's no easy way to know that. So its probably time to remind people about the "knownrotors.txt" file that's part of dr. It is an encoded list of oscillator rotors in a format devised by Dean Hickerson.

I don't know if LifeLib includes routines to encode and decode Rotor Descriptors. If it doesn't, then adding that would probably be a worthwhile contribution by someone. I found it pretty easy to write/rewrite them for my own Life toolbox (Originally in C++, then Objective C, now Swift).

Maybe Catalogue should also include it with when it displays oscillators, too.

From dr.documentation.txt:

Code: Select all

Rotor descriptors
-----------------

A rotor descriptor is a compact, but human-readable, description of the
rotor of an oscillator.  It shows the size and shape of the rotor, how
many neighbors in the stator each rotor cell has, and what gen 0 looks
like.  For example, the rotor descriptor for aVerage (LifeLine 9.3,
SC 5.1.2) is:

p5 r9 3x5 .2..B 30@@@ .2..B

The first 3 'words' mean period 5, with a rotor consisting of 9 cells in
a 3x5 rectangle.  (The orientation is always chosen so that the rectangle
is at least as wide as it is tall.)  The rest of it is a picture of the
rotor in one generation, with each word representing 1 row:

 .2..B
 30@@@
 .2..B

The characters other than periods indicate rotor cells.  Digits 0 to 3
indicate cells that are dead in gen 0; characters '@', 'A', 'B', and 'C'
indicate live cells in gen 0.  (If you're using a rule other than Life,
you may see digits larger than 3 and letters after 'C'.)  If you change
'@' to '0', 'A' to '1', 'B' to '2', and 'C' to '3', you get:

 .2..2
 30000
 .2..2

Each number tells how many live neighbors that rotor cell has in the
stator.  With this information, it's fairly easy to determine what
the rotor looks like in every generation, without having to construct
the stator.  (Finding a minimal stator is still pretty tedious, even
with the help of lifesrc.)

A given oscillator will have many different rotor descriptors:  You can
change the orientation and choose which phase is gen 0.  The program
computes all of them and chooses the one that's lexicographically
smallest, with  '.' < '0' < '1' < '2' < '3' < '@' < 'A' < 'B' < 'C'.

The file "knownrotors" contains a list of rotor descriptors and names
of oscillators.  Usually these are given on 1 line, with 1 or more tabs
in between.  But if the rotor is large, then the rotor descriptor can
have carriage returns within it or after it.  (The spaces must still be
present.)  The name must fit on 1 line.  For example:

p3 r7 3x4 .1.C A.A. 1A1.			protein (DJB #6)

p4 r28 7x7 .1.0.1. A.101.A .A.@.A. 0@@.@@0 .A.@.A. A.101.A .1.0.1.
							monogram (SC 4.0.2)

p14 r24 8x9 ........3 ....A.A.1 ...10@@1. ...3.21A. .AA....3. A.1......
 0@@2..... B.A......						new 14.0.1

The name usually contains a reference to where the oscillator can be found,
either in the stamp collection ("SC"), or in David Buckingham's oscillator
collection ("DJB #"), or in the collection of billiard tables found by
dr.c ("new").
One other question is, how to represent patterns in universes other than the square tiling with Schläfli symbol {4,4}? Hex rules are handled by representing them on the square grid (and already one has to commit to some arbitrary choices regarding coordinates). Triangular rules are more complicated, since the triangles come in two types. For uniform non-regular tilings (such as snub square tiling), non-Euclidean tilings (such as heptagonal tiling), non-periodic tilings, three-dimensional patterns, one needs some other ideas.

Of course there are also multistate rules. Apgcodes were extended to support multistate rules, but they can become awkward; there may be a more compact and intuitive format.
127:1 B3/S234c User:Confocal/R (isotropic CA, incomplete)
Unlikely events happen.
My silence does not imply agreement, nor indifference. If I disagreed with something in the past, then please do not construe my silence as something that could change that.

unname4798
Posts: 1202
Joined: July 15th, 2023, 10:27 am
Location: On the highest skyscraper

Re: Pattern Storage Discussion Thread

Post by unname4798 » February 25th, 2024, 2:07 am

Base 64 for Patterns (B64fP):
Encoding:
unname4798 wrote: 1) Take a pattern, example: a glider.
2) Reading dead and alive cells as state 0 and 1 yields:

Code: Select all

010
001
111
3) Group cells into 6 cells:

Code: Select all

010001
111
4) Complete an incomplete group by adding zeroes after it:

Code: Select all

010001
111000
5) Convert every group from binary (base 2) to base 64 and add width to the resulting string:

Code: Select all

03Hu
6) (Optional) Include metadata to convey more information:

Code: Select all

0310C01!!0505Hu
Decoding:
unname4798 wrote: 1) Take a base 64 string:

Code: Select all

03Hu
2) Exclude width from the original string:

Code: Select all

Hu
3) Convert it to binary:

Code: Select all

010001
111000
4) Group cells into (width=3) cells:

Code: Select all

010
001
111
000
5) Remove extra lines:

Code: Select all

010
001
111
The resulting pattern is: a glider!
Metadata:
0310C01!!0505Hu
Red=width (2 digits)
Green=rulestring (3 digits)
Blue=in order: vertical speed (2), horizontal speed (2), period+1 (2), population (2)
This profile is sponsored by Unname Inc. (2022-2024)
Status: none.
Companies: NOT (Nihonium Orange Team)

Mathemagician314
Posts: 153
Joined: November 15th, 2023, 3:15 pm
Location: Toroidal Universe MKA-84

Re: Pattern Storage Discussion Thread

Post by Mathemagician314 » February 25th, 2024, 8:01 am

Cool! I don't know if there'd be anyway to use that B64tP in a practical way, since it seems like apgcodes are now the standard for compact representations of patterns, but it seems like B64tP would be shorter than apgcodes in the long run, even with metadata. As for the question of other tilings, I suppose you would have to convert them to the square plane to use B10E or B64tP (of course, you would then add one letter with the metadata, such as T for the triangular tiling and H for the hexagonal tiling). With triangles, I suppose you could merge every two triangles into one rhombus, then kind of shift or stretch the plane to make those rhombi into squares. Then you can encode the square normally, except you double the numbers representing the bounding box, and every square has two values (whose order matters). This might not be reversible -- I'd have to dig a little deeper to see if that would work. For the hexagonal tiling, take the bounding parallelogram of the pattern and then kinda shift the bottom and stretch the whole shape to make it a square. Then you can encode as usual. Again, this is just an idea, and I don't know if this would actually work.

A follow-up question: what's the minimum number of bits needed to store a pattern? Obviously, you can extend B10E to any arbitarily large base, just by adding arbitrarily many symbols, and so you can make any compact representation of any arbitrarily large patterns. But if you just had a string in binary, how short could you make it such that you can find the pattern? Would a variant of B10E or B64tP work, or would you have to try something different?
Can we make a (28,3)c/84 spaceship??

Code: Select all

x = 3, y = 4, rule = B3-e4i5-a/S2-i3-a4cr5e6c
o$obo$b2o$2o!
[[ THEME PCA ]]

Code: Select all

x = 6, y = 5, rule = 2-ak34/2kn3-r4aijnr5c/5
.3A$.ABA$DAD2A$.ABADC$.3A2B!
[[ THEME BLUES ]]
[currently inactive]

Post Reply