Pattern Storage Discussion Thread
Posted: February 24th, 2024, 2:54 pm
I don't really know if this belongs in the sandbox or not, but I've been kind of obsessed with finding different ways of storing patterns, and I kinda just wanted to discuss it here. Here are some basic ones, to start.
RLE (run length encoded) is the most common way of storing patterns; here's Wikipedia's say on how that works. They're a very basic way to store patterns, listing the bounding box and rule, and then listing the data of the pattern (again, see the Wikipedia article for how that works). The most obvious weakness is that they're extremely long and unwieldy; they are meant to be simple for storing and reading, not a compact identifier for specific objects.
Only slightly less common than RLEs are apgcodes, which are primarily used on Catalogue. Lifewiki has a very detailed and helpful article about how they work (they primarily use extended Wechsler format) linked here. They are much more compact than RLEs, and also give detail about what kind of object you're looking at. For me, the one downside of apgcodes is that the prefixes seem a little random -- for example, why xp2 instead of just p2? This is a very small nitpick (as I'm having trouble finding downsides), and otherwise apgcodes are extremely useful in my opinion.
These two are most commonly used, but SOF (Small Object Format -- both human-readable and compact), Macrocell (for larger patterns), and Mcell are sometimes used as well. A full list of commonly used file formats is linked here on the wiki.
Now, I was trying to think about what other kind of data storage could work to be compact and efficient, and I devised this simple method which I call base-10 encoding, or B10E for short (although it may have been thought of before -- sorry about this post if it has):
, so maybe an example will help:
I like this process because it seems very natural and almost no subjective decisions are made. Also, it can work in any base -- for example, a glider in base-3 encoding would become 10-10-12022, and a glider in base-16 encoding would become 3-3-8F. So technically, you could make it as compact as you want (of course, after base 36 or something like that you run out of letters to substitute for digits).
In addition, you could also add information for it if you wanted to; for example, adding the rule integer of an outer-totalistic rule to the beginning of the string, or adding a suffix somehow specifiying the type of object, much in the way that apgcodes do (for example, S11 for an 11-bit still life, O3P2 for a oscillator with an average population of 3 and a period of 2 or F9H2V0P4 for a 2c/4 orthogonal spaceship with an minimum population of 9*). With these two things, a glider in CGOL would be base-10 encoded as 6152-3-3-143-F5H1V1P4 -- a little unwieldy, but it conveys a lot of information and the prefix and suffix would seem less prominent when encoding a larger object.
Anyway, this thread can also be used just to discuss these kind of things, which I find slightly fascinating (along with anything else information-theory related). I don't know if The Sandbox is the right place for this, but this doesn't seem totally academic, so I put it here. A mod can move it if they feel it's necessary.
Is my idea of base-10 encoding far-fetched, or is the system reasonable? Tips and suggestions appreciated.
*The F is for "fish", which spaceships are sometimes referred to as -- yes, it might just be to be different from apgcodes, but I like the look of the F, and it's not totally unrelated to spaceships like Q in apgcodes. Yes, I am way to fond of nitpicking apgcodes, and yes, I will fight you (in a kind and polite manner).
RLE (run length encoded) is the most common way of storing patterns; here's Wikipedia's say on how that works. They're a very basic way to store patterns, listing the bounding box and rule, and then listing the data of the pattern (again, see the Wikipedia article for how that works). The most obvious weakness is that they're extremely long and unwieldy; they are meant to be simple for storing and reading, not a compact identifier for specific objects.
Only slightly less common than RLEs are apgcodes, which are primarily used on Catalogue. Lifewiki has a very detailed and helpful article about how they work (they primarily use extended Wechsler format) linked here. They are much more compact than RLEs, and also give detail about what kind of object you're looking at. For me, the one downside of apgcodes is that the prefixes seem a little random -- for example, why xp2 instead of just p2? This is a very small nitpick (as I'm having trouble finding downsides), and otherwise apgcodes are extremely useful in my opinion.
These two are most commonly used, but SOF (Small Object Format -- both human-readable and compact), Macrocell (for larger patterns), and Mcell are sometimes used as well. A full list of commonly used file formats is linked here on the wiki.
Now, I was trying to think about what other kind of data storage could work to be compact and efficient, and I devised this simple method which I call base-10 encoding, or B10E for short (although it may have been thought of before -- sorry about this post if it has):
- The string will be in the form x-y-n where x and y are the length and height of the bounding box, respectively. Then, to find n, replace all live cells with 1's and all dead cells with 0's. Then, read this as a string, starting from the top left and reading row by row, left to right and top to bottom. This will be a number in binary. Convert this number to base-10, and that number will be n.
- Take a glider. It has a three-by-three bounding box. Converting cells to 0's and 1's, we get
Putting this together, we get the number 010001111. Converting to base 10, this becomes 128 + 8 + 4 + 2 + 1 = 143. So the glider is base-10 encoded as 3-3-143.
Code: Select all
010 001 111
I like this process because it seems very natural and almost no subjective decisions are made. Also, it can work in any base -- for example, a glider in base-3 encoding would become 10-10-12022, and a glider in base-16 encoding would become 3-3-8F. So technically, you could make it as compact as you want (of course, after base 36 or something like that you run out of letters to substitute for digits).
In addition, you could also add information for it if you wanted to; for example, adding the rule integer of an outer-totalistic rule to the beginning of the string, or adding a suffix somehow specifiying the type of object, much in the way that apgcodes do (for example, S11 for an 11-bit still life, O3P2 for a oscillator with an average population of 3 and a period of 2 or F9H2V0P4 for a 2c/4 orthogonal spaceship with an minimum population of 9*). With these two things, a glider in CGOL would be base-10 encoded as 6152-3-3-143-F5H1V1P4 -- a little unwieldy, but it conveys a lot of information and the prefix and suffix would seem less prominent when encoding a larger object.
Anyway, this thread can also be used just to discuss these kind of things, which I find slightly fascinating (along with anything else information-theory related). I don't know if The Sandbox is the right place for this, but this doesn't seem totally academic, so I put it here. A mod can move it if they feel it's necessary.
*The F is for "fish", which spaceships are sometimes referred to as -- yes, it might just be to be different from apgcodes, but I like the look of the F, and it's not totally unrelated to spaceships like Q in apgcodes. Yes, I am way to fond of nitpicking apgcodes, and yes, I will fight you (in a kind and polite manner).