Extending apgcodes to larger patterns

Apple Bottom · Post by **Apple Bottom** » July 25th, 2016, 12:55 pm

Over at the Tiki bar on the LifeWiki a discussion emerged regarding how to extend apgcodes to large patterns, i.e. those exceeding a 40x40 bounding box, especially in the presence of runs of 40 or more consecutive zeros.

dvrgn wrote:

dvgrn wrote: t's only mostly true that [apgcodes] can grow without bounds. A pattern over 40 cells wide would need a definite official ruling on how to encode longer strings of 0s. Quoting [...] from the apgcode article:

(Theoretically, runs of more than 39 zeroes should be replaced by 'yz' followed by the coding for the remaining zeroes. At the moment apgsearch labels everything larger than 40x40 as 'oversized' and refuses to process it.)

I'd agree that one or more yz, followed by one more y-something to get to the required length, is the easiest and most obvious way to extend the apgsearch format to encode patterns bigger than 40x40. But it might be worth thinking about a little bit, to see if it might be worth adding a marker that allows arbitrarily large (base 40?) counts to be encoded. At the moment, sufficiently large sparse patterns can be encoded much more efficiently in RLE format, because RLE can encode long distances of emptiness in a logarithmic number of characters, where apgcode is currently stuck with a linear encoding... a much more efficient encoding, but still linear.

I think most current Golly scripts that deal with apgcodes are lifted fairly directly from the old Python apgsearch, so they would similarly fail to handle patterns that need more than thirty-nine 0s, so it would be good to publish "official" encoder and decoder scripts somewhere -- maybe in Lua as well as Python.

And I replied:

Apple Bottom wrote: I think this is an issue of decoding vs. encoding. Given an apgcode of the form ...yzy1..., it's obvious that this corresponds to a run of 44 zeros, and even absent a formal specification or test suite I think decoders should treat it as such. It's the encoding part that's problematic, since there's no Word of God on which apgcode should be canonical.

Indeed, although encoding a run of 44 zeros can be encoded as yzy1, it can also be encoded as y1yz, which would retain the overall code's length but precede it lexicographically, so a crafty encoder could emit this instead when faced with such a pattern. In fact it could also emit wxyz instead. (0y0yz would yield a longer code and should thus be rejected.)

I'm tempted to suggest encoding long runs of zeros using a base-36 notation, where y# encodes # + 4 consecutive zeros, with # given in base-36, but since arbitary base-36 digits that aren't part of the code may follow, it would require a stop marker. This would break all existing codes that don't have one, so it's not feasible.

What is feasible is repurposing unused letters to indicate how many base-36 digits the number # has. For instance, if the z digit were used, with n z's indicating n + 1 digits, code fragments such as yzzz#### would be unambiguous, and the codes y0 ... yy would remain unchanged. A run of 44 zeros would be encoded as yz14 with this scheme.

This could still be done: a quick check of my files reveals that there are not currently any objects on Catagolue containing either yz or yy (though there are two containing yx, an xp412 in B358/S23 and the ship-pulling xq190 in B38/S23 (one variant, anyway).

So that's two ways out I see:

clarify how runs of more than 40 zeroes are encoded (greedy, e.g. yzy1; based on length/lexicographic order, e.g. wxyz; or other, to be specified), or

repurpose yz... to encode longer runs (e.g. yz14).

I personally prefer the first, since it requires no changes to existing decoders, and since I'm not so convinced people are going to want to encode extremely large patterns as apgcodes anyway (as you point out, file formats such as RLE are better suited to that task.) For that matter I also prefer the greedy algorithm, since that'd be easiest to implement.

What do others think? Although it's not urgent I think this is a matter that should be dealt with at some point; one of the nice features of apgcodes is that although there's many ways to encode the same object only one such code is THE canonical apgcode.

Since this is what allows us to meaningfully use apgcodes to discuss patterns in the first place, I'd like to see this feature retained for larger patterns: out of the various possible encodings in the presence of long runs of zeros, there needs to be one canonical one.

Thoughts?

drc · Post by **drc** » July 25th, 2016, 1:25 pm

I think that the first is better.

Also, could base64 work? 0-9, then CAPITAL letters, then lowercase letters, then two other symbols?

Post by **dvgrn** » July 25th, 2016, 1:50 pm

Apple Bottom wrote:... Although it's not urgent I think this is a matter that should be dealt with at some point; one of the nice features of apgcodes is that although there's many ways to encode the same object only one such code is THE canonical apgcode.

Yes, this seems like a detail worth settling sooner rather than later, with as much consensus as possible.

Various symmetries in Catagolue already produce objects that overflow the 40x40 limitation, but don't seem like they're really oversized. So at some point it might be worth increasing the maximum bounding box at least a little -- 64x64? 128x128? Not sure what makes sense. But the canonical-form question will have to be decided before that can happen.

I'd also vote for the greedy encoding of long strings of 0s -- yzy1 not y1yz. I guess that means that apgcodes from unknown sources should probably be decoded and then re-encoded to ensure canonical form.

The point about not breaking existing decoders is a good one.

A new apgcode-based format could be designed for storing arbitrarily large patterns, that would be much much more compact than RLE and quite a bit more efficient than macrocell format. But it would be more complicated to write encoders and decoders for it, and really there's no sign that anyone actually needs something like that.

For patterns in Golly, some kind of "pattern script" format will be fairly competitive with any likely apgcode-type encoding. Could even use apgcodes to describe the lowest-level objects, once the 40x40 limitation is taken care of... and it might be worth adding support for copying and pasting apgcodes directly into some future version of Golly.

-- It might be an interesting project to try to define and standardize a pattern-script format well enough that third-party programs could extract patterns successfully. Maybe use Lua rather than Python, since Lua will be available on more platforms?

EDIT:

drc wrote:Also, could base64 work? 0-9, then CAPITAL letters, then lowercase letters, then two other symbols?

It could work -- in fact, I tried changing the lookup table in the old apgsearch at one point, to increase the maximum pattern size to 64x64. But then the new base64codes wouldn't really be URL-safe any more. That seems like a feature of apgcodes that's really worth keeping -- whereas it just doesn't really seem to be all that important to get the absolute best possible compression rate.

Apple Bottom · Post by **Apple Bottom** » July 25th, 2016, 6:26 pm

dvgrn wrote:A new apgcode-based format could be designed for storing arbitrarily large patterns, that would be much much more compact than RLE and quite a bit more efficient than macrocell format. But it would be more complicated to write encoders and decoders for it, and really there's no sign that anyone actually needs something like that.

Straying off-topic -- I actually wondered earlier whether PNG wouldn't be a suitable format for this sort of thing, too. A rectangular pattern is basically a 1-bit (black/white) image, after all, and PNG is well-suited to handling 1-bit images, obtains good compression, has support across a wide variety of platforms, and even comes with an excellent reference implementation.

PNG also supports higher color depths, so it could be used to encode multistate rules (within reason), and embedded tag/value combinations (such as "Rule: B3/S23") could be added in text chunks.

Random data point re: size: a while ago I generated a still image of the Caterpillar at scale 1:1. Here's how the various files line up, size-wise:

Code: Select all

$ ls -l caterpillar*
-rwxr-xr-x+ 1 User None 11004191 Jul 15 17:45 caterpillar.png
-rwxr-xr-x+ 1 User None  8192751 Jul 15 14:51 caterpillar.mc
-rwxr-xr-x+ 1 User None 31033893 Jul 26 00:21 caterpillar.rle

The PNG is actually 2-bit -- a bug in my (buggy, hacky) image generator. Unfortunately GIMP chokes on image widths/heights that don't fit into a 16-bit integer, so I've not yet converted it to 1-bit to see how much that really saves.

But as you can see even in its current state it's only slightly larger than the Macrocell and much smaller than the RLE.

Post by **dvgrn** » July 25th, 2016, 6:50 pm

Apple Bottom wrote:Straying off-topic -- I actually wondered earlier whether PNG wouldn't be a suitable format for this sort of thing, too. A rectangular pattern is basically a 1-bit (black/white) image, after all, and PNG is well-suited to handling 1-bit images, obtains good compression, has support across a wide variety of platforms, and even comes with an excellent reference implementation.

PNG also supports higher color depths, so it could be used to encode multistate rules (within reason), and embedded tag/value combinations (such as "Rule: B3/S23") could be added in text chunks.

Yes, that's pretty workable already, since Golly allows images to be pasted in directly (though at the moment you'd lose any embedded metadata.)

I'd be interested to see what the smallest PNG representation of a Caterpillar really is. I keep saying that one of these decades I'm going to convert a Caterpillar into "pattern script" format, and see how complicated it really is.

A script would probably take quite a while to build a complete copy of the Caterpillar, but I think the compression would be fairly impressive. At one time I was guessing that I could fit the script into 128K, but maybe 256K is a more reasonable goal...!

Apple Bottom · Post by **Apple Bottom** » July 25th, 2016, 7:03 pm

dvgrn wrote:I'd be interested to see what the smallest PNG representation of a Caterpillar really is. I keep saying that one of these decades I'm going to convert a Caterpillar into "pattern script" format, and see how complicated it really is.

A script would probably take quite a while to build a complete copy of the Caterpillar, but I think the compression would be fairly impressive. At one time I thought I was guessing that I could fit the script into 128K, but maybe 256K is a more reasonable goal...!

That's an interesting question as well. How well could you do knowing that the Caterpillar isn't just random pixels but instead has a highly structured... well, structure?

As for the PNG -- I'm currently trying to convert it again, but its sheer size makes it unwieldy. VIPS / NIP2 seems to lack the ability to handle less than 8 bits per color channel, and ImageMagick is asking for altogether too much memory to process this file. I'd upload it for others to play with, but imgur chokes on it as well.

chris_c · Post by **chris_c** » July 25th, 2016, 8:04 pm

Apple Bottom wrote:This could still be done: a quick check of my files reveals that there are not currently any objects on Catagolue containing either yz or yy (though there are two containing yx, an xp412 in B358/S23 and the ship-pulling xq190 in B38/S23 (one variant, anyway).

Oh that's interesting. I have thought about this problem in the past (I got part way through writing a glider synthesis canonicaliser that needed to know about constellations of junk that can feasibly be bigger than 40x40).

My solution broke apgcodes in a backwards-incompatible way by changing "z" to mean "35 blank cells". I chose this method because it was very easy to code:

Code: Select all

    chars = "0123456789abcdefghijklmnopqrstuvwxyz"

....
....

                if (zeroes > 0):
                    if (zeroes == 1):
                        representation += "0"
                    elif (zeroes == 2):
                        representation += "w"
                    elif (zeroes == 3):
                        representation += "x"
                    else:
                        representation += "y"
                        representation += (zeroes - 4) // 35 * "z"
                        representation += chars[(zeroes - 4) % 35]

Apple Bottom · Post by **Apple Bottom** » September 28th, 2016, 5:19 pm

For those interested, I've posted a version of biggiemac's stand-alone apgcode encoder script that uses greedy encoding for candidate representations and doesn't choke on large, sparse patterns in the Script Request Thread.

BlinkerSpawn · Post by **BlinkerSpawn** » September 29th, 2016, 9:32 pm

While looking at how RLEs are structure I came up with a code structure that reliably seems to end up smaller than RLE for small patterns I tried, however it's totally incompatible with apgcodes and would've been pretty awkward to handle (no written line breaks, among other things).
However, thinking back to it and drc's base64 comment, one of the mechanics I used to facilitate compression could potentially be modified to work here, seeing as how in their current state apgcodes are all-lowercase.
Namely, add a "z" wherever there's 35 (or whatever the magic number is) or more zeroes, subtract 35 from the number of zeroes, then append the number as a base26 number in capital letters (A=0, B=1, ... Z=25) :

Code: Select all

*EXAMPLES*
35 zeroes: ...z...
36 zeroes: ...zB...
42 zeroes: ...zH...
300 zeroes: ...zKF...
etc.

Could this work?

Post by **dvgrn** » September 29th, 2016, 9:51 pm

BlinkerSpawn wrote:... add a "z" wherever there's 35 (or whatever the magic number is) or more zeroes, subtract 35 from the number of zeroes, then append the number as a base26 number in capital letters (A=0, B=1, ... Z=25) :
...
Could this work?

It certainly could, but it would be for a different use case than apgcodes. In some situations, URLs don't reliably track upper vs. lowercase, and case is sort of not supposed to carry any information.

... You'll find a lot of places where case does make a difference, though -- like the LifeWiki, for example. I really don't know if that's just a matter of servers-run-on-case-sensitive OSes (Linux) vs. non-case-sensitive OSes (Windows).

Anyway, as soon as apgcodes are allowed to include uppercase there are a lot of other encodings that it might be worth looking at.

Apple Bottom · Post by **Apple Bottom** » September 30th, 2016, 7:06 am

BlinkerSpawn wrote:Namely, add a "z" wherever there's 35 (or whatever the magic number is) or more zeroes, subtract 35 from the number of zeroes, then append the number as a base26 number in capital letters (A=0, B=1, ... Z=25) :
Code: Select all
*EXAMPLES*
35 zeroes: ...z...
36 zeroes: ...zB...
42 zeroes: ...zH...
300 zeroes: ...zKF...
etc.
Could this work?

I think it very much could. Of course, as dvgrn pointed out, the result wouldn't be apgcodes -- or even decoder-compatible with them (one thing I like about the extensions discussed above is that if you feed them to existing decoders, they will still represent the correct pattern).

Your proposal is not compatible with apgcodes at all, seeing as how it repurposes the z currently used to indicate a linebreak (as it were). But it could be modified to be; instead of repurposing z, we could also:

Continue to use y0 ... yz for 4 ... 39 consecutive zeroes, as right now.
In the presence of 40 or more zeroes, subtract 40 and encode the result in base-26, with the digits represented by the upper-case letters A-Z. 40 zeroes would by yA; 65 would be yZ; 300 would be yKA.

By only using upper-case letters for longer runs of zeroes, we would ensure that these new codes are a strict superset of (regular) apgcodes: any pattern containing at most 39 consecutive zeroes would receive the same code, while any pattern containing 40+ consecutive zeroes cannot be encoded using regular apgcodes.

However, this would break decoder compatibility; existing apgcode decoders could not handle these patterns.

dvgrn wrote:In some situations, URLs don't reliably track upper vs. lowercase, and case is sort of not supposed to carry any information.

... You'll find a lot of places where case does make a difference, though -- like the LifeWiki, for example. I really don't know if that's just a matter of servers-run-on-case-sensitive OSes (Linux) vs. non-case-sensitive OSes (Windows).

Anyway, as soon as apgcodes are allowed to include uppercase there are a lot of other encodings that it might be worth looking at.

Aren't URLs/URIs supposed to always be case-sensitive, in principle? Certainly as far as the webserver is concerned, though the application stack behind it might of course have other ideas.

I agree that not being case-sensitive is valuable, though. Imagine, say, a database of glider syntheses in some file format, with each file named for the object it contains. Right now, file names would be unambiguous even on case-insensitive file-systems, but if this changed, patterns such as this xp2 would cause problems: is that ya, ye, and ym, or yA, yE and yM?

Windows is (IIRC) case-insensitive but case-preserving these days, BTW, so files would pass through it unharmed. But collisions would be possible, and that's still bad.

Perhaps we also need to ask ourselves what we ultimately want to accomplish. Identify arbitrary objects unambiguously so they can be discussed in the technical literature, on the Wiki, on the forums etc.? Use these identifiers as labels, in systems where additional restrictions might be imposed (e.g. case-insensitivity, as discussed above)? Encode patterns, the same way that file formats as RLE do?

apgcodes do the first two things, and sort of the third (for objects that either do not exceed a 40x40 bounding box, or that are sufficiently dense to not contains runs of too many zeroes in any candidate representation). Extended (greedy) apgcodes do the third, period, still subject to the constraint that they can only encode precisely one object, and with the caveat that the encoding may not be space-efficient for larger objects. (It's been famously observed by Calcyman that "xp15_4r4z4r4" is shorter than "pentadecathlon", but the codes do get unwieldy quickly when one encodes larger objects.)

BlinkerSpawn themselves said that their codes were designed to provide smaller encodings than RLE. But "produces smaller file sizes when contained within a file" is a different design goal that's orthogonal to "allows encoding of larger objects".

And my personal opinion on this matter is that file compression is largely a solved matter anyway and we're better of sticking to standard tools like gzip and xz.

Tools like Golly could be taught to transparently handle .rle.xz files if they can't already, and we'd not have to worry about this again.

That's not to say I don't think BlinkerSpawn's idea isn't interesting, just that I don't see it as the way forward as far as extending apgcodes is concerned.

Scorbie · Post by **Scorbie** » October 31st, 2016, 5:31 am

Said this in another thread, but How about
y(base36 encoding of number of zeros -3)D
where D is a delimeter, one of ._-~.
For instance, yzzzzz_ or yzzzzz. or yzzzzz- or yzzzz~
The same may be applied to long runs if zs.

EDIT: If backward-compatibility is not required, I would like to suggest the following:
1. _ as newlines (instead of zs) because they make lines seperated better.
(e.g. compare 4r4z4r4 with 4r4_4r4.) (Actually I am fine with any of [~-_.]; alternatively, using - would make it easier to type.)

2. Multiple newlines would be represented as z###.
(e.g. 1zzz1 would be 1_z2_1. (1, two blank chunks of 5-cell-thick-rows, then another 1)

3. No codes for x and w. Always use y### (Where ### is the number of 0s, encoded in base32). We use - to seperate a line into chunks.
(e.g. 0000aa000022 would be y4-aa-y4-22.)

4. (Optional, not used in canonical encoding; probably useless.) use x for other repetitions, if you like.
(e.g. 03030303003330333 may be 03x4-0-0333x2)

5. the decoder would be something like this (currently not implemented 4.)

Code: Select all

def decode(code):
    lines = code.split('_')
    for line in lines:
        if line[0] == 'z':
            for i in range(int(line[1:], 32)):
                draw_line('0')
        else:
            segments = line.split('-')
            parsed_line = ''.join(s if s[0] != 'y' else '0'*int(s[1:], 32) 
                                  for s in segments)
            draw_line(parsed_line)

def draw_line():
    """Draw 5-cell-think patterns based on one-line apgcode,
    and move the cursor to the next 5-line chunk.
    
    e.g. draw_line("31ago") would draw the following:
        **...
        *.*..
        .....
        ..*.*
        ...**
    """
    ...

Apple Bottom · Post by **Apple Bottom** » October 31st, 2016, 12:32 pm

Scorbie wrote:EDIT: If backward-compatibility is not required, I would like to suggest the following:

Interesting ideas. Of course these lead to new codes -- which might coincide with apgcodes in some simple cases, but wouldn't be likely to.

Some random thoughts on these:

a) I'd not use the period in codes, since that could cause confusion with filenames and extensions. Same for ~, which also sometimes (rarely these days) has a special meaning in URLs, e.g. http://www.example.com/~user. That leaves at least - and _, though.

b) When encoding multiple linebreaks in _z###_ format, ### is in base32 as well?

c) Given that e.g. 1__1 (with two underscores) is shorter than 1_z2_1 by two characters and parity is only reached at 4, wouldn't it make more sense to say that two linebreaks are __, three are ___, four are ____, and n>5 are _z###_ where ### = (n-5)? This would keep codes short. (Subtracting five there might look an unnecessary complication, but it avoids ambiguities and also helps keep codes short.)

d) Actually, why not use the generic "x" multiplier you propose for linebreaks? E.g. have 1_x5-1 mean "1", five linebreaks, another "1".

e) Given that these codes are entirely NEW, they should also use different prefixes to separate them from apgcodes; something other than xs_, xq_, xp_, yl_ and zz_. (In fact I'd suggest avoiding ANY prefix that begins with x, y or z.)

Overall though I'm more interested in extending apgcodes than replacing them, at least absent a good reason FOR replacing them. Do these new codes do something that apgcodes don't?

I suppose they might be useful for patterns that have many linebreaks with nothing in between, or a lot of repetition that currently can't be expressed concisely in apgcodes (if 4. on your list is implemented).

----

Something else entirely: since one of the purposes of apgcodes is to unambiguously IDENTIFY objects (with ENCODING them being an extra cherry on top that's not always necessary), large objects where apgcodes (classic, greedy or otherwise extended) are unwieldy could also be identified by a hash of their code. To do this, a suitable algorithm would be chosen -- say, SHA-256 --, and the object expressed as (I'm making this up) e.g.

xh_sha256_ca56a5073e32a19e09e774a6e8285772cc45f6f6231fbb64b4c3a3b6284536ac

or something along those lines. Not necessarily *informative*, but it could easily be pasted into forum posts and emails, used in URLs and so on, where it's not necessary to reconstruct the object from the code alone -- merely to be clear about which object is being talked about.

Downside: there might be hash collisions. (In fact these are obviously guaranteed to exist; but they might also occur in practice if we're unlucky.) But that seems unlikely, anymore than there being different linear-growth patterns that so happen to be assigned the same yl code (without the parameters used to generate the yl code under the hood matching in the first place, which I think could ALSO happen with those).

And in most other contexts where hashes are used, this never seems to be cause for concern, so long as there is a mechanism for plugging in different hash functions to facilitate switching should an older one be unsuitable for some reason. (This is one reason the above example is explicitely prefixed xh_sha256).

Practical example -- the longest apgcode currently recorded on Catagolue for B3/S23 is this:

xp24_y1sssy0co9nas0san9ocy0ssszw777y3goldlo0oldlogy3777zg04s
8g0g8s40h04721012740h04s8g0g8s40gz13lml303lml31y713lml303l
ml31z63ita707ati360ggy1gg063ita707ati36zwgggy36ckrle0elrkc6y
3gggzw333sssy0ocamas0samacoy0sss333zya231x132

That's 223 characters; expressing it in hash form would yield e.g.

xh_sha256_c5ac41666943f97fbcd014612bd3b0a3f65f2be0f18b61c95044bd3269cc7d9c

which at 74 characters is much more manageable.

drc · Post by **drc** » October 31st, 2016, 1:06 pm

If we're adding new apgcodes, then I also think pathologicals should be labeled as "unknown" since pathological objects are very varied, and pathological is a lot more unknown of a word than "unknown" (funnily enough). And quadratic growth should be split too.

Still life - SL##_###
Oscillator - OC##_###
Spaceship - SP##_###
Linear (yl) - LY_#_#_###
Other (zz) - LZ_LN/RP/EX
Quadratic - QD
Pathological - UNKNOWN

Sphenocorona · Post by **Sphenocorona** » October 31st, 2016, 2:24 pm

There's a couple additional lossless options, all of which make things extra complex and aren't really backwards compatible (and one of them might make old apgcodes no longer 'correct'), such as:
- being able to combine patterns represented in EWF into a 'compound' apgcode, having them placed via relative coordinates
- having a 'repeating component' definition, as a sort of macro addition to EWF; for example something like
xs112_oie0.lt7_10_-3_6512312453
I'm not sure how to best treat returning from one of these 'macro' extensions, but I think using ~ and . to delineate the starts and ends of macros is the most flexible if we want to go that route. I'm not sure if apgcodes should be so broad purpose though.

Post by **dvgrn** » October 31st, 2016, 8:04 pm

Sphenocorona wrote:There's a couple additional lossless options, all of which make things extra complex and aren't really backwards compatible (and one of them might make old apgcodes no longer 'correct')...

Hmm. At this point (after going on interminably about possibilities in the other thread), my take on all this is that there's a reason that new pattern formats don't get invented every day -- or if they do, they don't end up getting used, because it's so hard to get any two people to agree on all the details.

The best way to launch a new pattern format seems to be the way Calcyman did it -- present it as a fait accompli, as part of a really significant amount of coding work in a project that lots of people want to use (like Catagolue). That's what gave us apgcodes, and they work really well for their intended purpose of uniquely fingerprinting small patterns.

It looks to me like there's arguably a real need for a simplest-possible extension of apgcodes, to make it possible to assign an apgcode to a pattern slightly larger than 40x40, when something like that comes up on Catagolue or elsewhere, and have everybody agree on what the apgcode is.

It doesn't seem very mission-critical at all to code these rare-exception patterns in the absolute minimal number of bytes -- whereas it definitely does seem mission-critical to maintain perfect backward compatibility with existing apgcodes. Seems like even just the minimal change is going to be tough enough to sell...!

If anyone wants to design a format that does compression really well but doesn't do unique fingerprinting, it seems like that's a design problem where it's worth starting over from scratch. Apgcode won't ever be able to compress large high-population patterns even vaguely efficiently, the way SOF or even RLE format can. (Nobody has suggested logarithmic compression for ON cells, right? Only for spaces.)

Any way of adding compression just makes the encoding algorithm harder to debug -- and simpler really is better in this case. It's good if anyone can code up a C++ or Lua or whatever version of an encoder, and have it reliably produce the exact same canonical form of Pattern X as any other apgcode encoder.

Scorbie · Post by **Scorbie** » October 31st, 2016, 10:49 pm

Apple Bottom wrote:Overall though I'm more interested in extending apgcodes than replacing them, at least absent a good reason FOR replacing them. Do these new codes do something that apgcodes don't?

Just wanted them to be more human-readable; i.e. easier to parse (both with code and the naked eye) as my first priority

That's partly why I tried to make the z#### coefficients like that...
(For me apgcodes are easier to read than SOFs or RLEs.) Also I thought this would give quite good compression, so a representation with balanced readability and compression. I didn't try to squeeze out the last char...
Also, it would be very easy and elegant to parse, if you look at the code. (The code has bugs on edge cases, but you have the idea...)

Apple Bottom wrote:b) When encoding multiple linebreaks in _z###_ format, ### is in base32 as well?

Umm, yeah, that was what I intended.

Apple Bottom wrote:d) Actually, why not use the generic "x" multiplier you propose for linebreaks? E.g. have 1_x5-1 mean "1", five linebreaks, another "1".

For readability... this looks a little harder to read and a little trickier to parse...

By the way, I intended _s and -s to work like brackets. i.e. You split into chunks of smaller patterns. That way you could use something like 03x7 or 234x8 without ambiguity... But all this is very fuzzy and I didn't settle on anything. I just wanted to make everything very readable.

Scorbie · Post by **Scorbie** » October 31st, 2016, 11:04 pm

Sorry for double posting...

dvgrn wrote:It looks to me like there's arguably a real need for a simplest-possible extension of apgcodes, to make it possible to assign an apgcode to a pattern slightly larger than 40x40, when something like that comes up on Catagolue or elsewhere, and have everybody agree on what the apgcode is.

I know I am repeating this, but just in case you haven't read my post, I suggested using y####- (- used as a delimiter for marking the end of the coefficient) and z####- (same). That's a very minimal change that can adress your requirements. Just adding yet another available option here, but it did seem to express what I thought you(dvgrn) wanted. I thought you wanted to mark the end of the coeffiecient in a less hacky way.

Post by **dvgrn** » November 1st, 2016, 12:43 pm

Scorbie wrote:...I suggested using y####- (- used as a delimiter for marking the end of the coefficient) and z####- (same).

That does seem a lot better than my silly idea about encoding the number of digits with a base-36 digit. The rule would basically be: if the apgcode contains the new delimiter character, then it's this new format with ###s in front of the delimiter. If not, then the format could be made backwards compatible with classic apgcodes -- just have to make sure there are no collisions with existing codes containing y's. (I think.)

Anyway, I'm sorry that I've kind of changed gears in the middle of the discussion here. At the moment, it's Apple Bottom's turn to attempt a pattern-format coup, by putting greedy apgcode references to Catagolue on the LifeWiki.

After looking at all the complicated-but-cool possible extensions to apgcodes, I'd like to join Apple Bottom's revolution. The super-simple greedy apgcode method solves all the problems that we actually have at the moment, and it has the advantage of already being in use. Occam's Razor seems to be voting for greedy apgcodes. The best is the enemy of the good, as they say...!

That leaves lots of other options for better pattern storage and communication. We could decide on an optional delimiter for macrocell format that can replace a newline, for example, so that huge patterns could be stored one line per pattern. It might also be interesting to develop BlinkerSpawn's new variant of SOF format so that large strings of ON cells can be stored efficiently (where even extended apgcode can only handle long strings of OFF cells).

However, it would also be really nice to be able to settle on an apgcode extended format, good for apgcode's purpose of a unique fingerprint for small objects. Probably better to move discussion of other pattern formats to some other thread than this one. (?)

Apple Bottom · Post by **Apple Bottom** » November 1st, 2016, 2:32 pm

dvgrn wrote:Anyway, I'm sorry that I've kind of changed gears in the middle of the discussion here. At the moment, it's Apple Bottom's turn to attempt a pattern-format coup, by putting greedy apgcode references to Catagolue on the LifeWiki.

After looking at all the complicated-but-cool possible extensions to apgcodes, I'd like to join Apple Bottom's revolution. The super-simple greedy apgcode method solves all the problems that we actually have at the moment, and it has the advantage of already being in use. Occam's Razor seems to be voting for greedy apgcodes. The best is the enemy of the good, as they say...!

Indeed, as the saying goes: if you wanna get something done, you gotta do it yourself.

I like bike shedding as much as anyone else, but at the end of the day someone's gotta roll up their sleeves, pick up a brush and get paintin'.

Another question that perhaps remains is how to best integrate these codes into articles. External Catagolue links work fine for now, but they're not meaningful for patterns that apg{search|nano|mera} (currently) fails to handle. Ultimately it would perhaps be nicer to have both a Catagolue link (where appropriate) and a link to the code, perhaps in the same manner as RLE files or RLE snippets. But that's a discussion more suited for the Tiki bar.

dvgrn wrote:That leaves lots of other options for better pattern storage and communication. We could decide on an optional delimiter for macrocell format that can replace a newline, for example, so that huge patterns could be stored one line per pattern. It might also be interesting to develop BlinkerSpawn's new variant of SOF format so that large strings of ON cells can be stored efficiently (where even extended apgcode can only handle long strings of OFF cells).

Re: efficient storage for file formats, I'll just repeat what I said earlier in this thread:

And my personal opinion on this matter is that file compression is largely a solved matter anyway and we're better of sticking to standard tools like gzip and xz. Tools like Golly could be taught to transparently handle .rle.xz files if they can't already, and we'd not have to worry about this again.

This doesn't necessarily apply to "label-less" (content-based) representations of patterns where there's no distinction between name (e.g. filename) and data (e.g. file contents) -- though encoding a pattern in a simple, straightforward way, then (say) compressing the resulting string using xz/lzma and encoding the result in Base64 might also be an option worth investigating.

Scorbie · Post by **Scorbie** » November 1st, 2016, 3:32 pm

@dvgrn @Apple Bottom
I like the greedy extended format too, especially that it doesn't break compatibility with existing decoders :) I think a few more bytes wouldn't hurt for the usage right now.
Reskimmed the two threads to find out that your thoughts have shifted indeed, my bad to miss that.
Anyway, glad to see this settled, and as always, kudos to Apple Bottom for the great work on the wiki pages :)

Apple Bottom · Post by **Apple Bottom** » November 2nd, 2016, 7:45 am

Scorbie wrote:and as always, kudos to Apple Bottom for the great work on the wiki pages

Aww shucks, sugarcube. *tips hat* Always happy to help!

Post by **dvgrn** » November 2nd, 2016, 9:59 am

Apple Bottom wrote:
dvgrn wrote:That leaves lots of other options for better pattern storage and communication...
Re: efficient storage for file formats, I'll just repeat what I said earlier in this thread...

There's one way to do significantly better than either macrocell or .xz-style compression (or both), for your average big Life pattern if not for a large chaotic active pattern which is pretty much a hopeless case. Unfortunately I think it needs a little too much advance knowledge of pattern contents, or too much clever analysis, to make a workable format.

I started writing "pattern scripts" years ago, to save space in the Golly installation package for cases like heisenburp.py, metafier.py, and p1100-MWSS-gun.py. There's really a lot more internal repetition in most patterns than any compression algorithm can find in a reasonable amount of time -- so a script that builds a pattern will often be a fraction of the size of an equivalent .mc.gz or .rle.gz file.

If a relevant library of subpatterns is available, conversion to a pattern-script format can be done completely automatically. Otherwise it would take a fairly tricky program to successfully analyze a complex circuit and record it in terms of its functional units.

This kind of format would have one big advantage for editing purposes, though. If some add-on for Golly were able to do grouping, ungrouping, and rephasing operations, then a structured pattern format would suddenly make editing circuitry patterns a lot easier. Seven cells could be grouped into an eater, eaters and blocks and tubs would be grouped into semi-Snarks, which would be grouped into a construction-arm circuit or whatever -- and a whole semi-Snark or a whole construction-arm circuit could be moved as a unit, maybe even with restrictions that keep it on an exact diagonal by default. It would be much harder to accidentally damage a larger-scale unit, because you'd have to ungroup it to work on the pieces.

-----------------------------------

However, this is really material for another thread -- I'm wandering off topic again. It does seem like zipped RLE and macrocell files are plenty good enough as a compression format, and there's not much use in worrying about making apgcodes hyperefficient, just for a few odd edge cases like large sparse patterns.

Scorbie · Post by **Scorbie** » November 2nd, 2016, 1:18 pm

dvgrn wrote:If a relevant library of subpatterns is available, conversion to a pattern-script format can be done completely automatically. Otherwise it would take a fairly tricky program to successfully analyze a complex circuit and record it in terms of its functional units.

This kind of format would have one big advantage for editing purposes, though. If some add-on for Golly were able to do grouping, ungrouping, and rephasing operations, then a structured pattern format would suddenly make editing circuitry patterns a lot easier. Seven cells could be grouped into an eater, eaters and blocks and tubs would be grouped into semi-Snarks, which would be grouped into a construction-arm circuit or whatever -- and a whole semi-Snark or a whole construction-arm circuit could be moved as a unit, maybe even with restrictions that keep it on an exact diagonal by default. It would be much harder to accidentally damage a larger-scale unit, because you'd have to ungroup it to work on the pieces.

-----------------------------------

However, this is really material for another thread -- I'm wandering off topic again. It does seem like zipped RLE and macrocell files are plenty good enough as a compression format, and there's not much use in worrying about making apgcodes hyperefficient, just for a few odd edge cases like large sparse patterns.

Didn't you already make such a script? I remember you working on degrouping the weekender gun.

Post by **dvgrn** » November 2nd, 2016, 2:00 pm

Scorbie wrote:Didn't you already make such a script? I remember you working on degrouping the weekender gun.

The only thing I've done so far is the recognizer script, which works very nicely on the G-to-weekender-to-G pattern but could be applied equally well to many other patterns.

I haven't really done much thinking, except for wishful thinking, about how an actual grouping/ungrouping GUI would work, either inside Golly or as a separate application.

ConwayLife.com

Extending apgcodes to larger patterns

Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns

Re: Extending apgcodes to larger patterns