dvgrn wrote:@Apple Bottom, is what you said
at the top of that thread still true, that
Apple Bottom wrote: ...a quick check of my files reveals that there are not currently any objects on Catagolue containing either yz or yy (though there are two containing yx, an xp412 in B358/S23 and the ship-pulling xq190 in B38/S23 (one variant, anyway).
If so, it seems as if it's not technically too late to support ridiculously large sparse patterns in a way that's completely backwards compatible -- practically speaking. If no apgcode has been officially recorded yet that contains a "yz", then we could still agree to update encoders and decoders everywhere, to allow variable-length base-36 encodings of long strings of 0s.
Yes, this is still true to the best of my knowledge. Using data freshly grabbed this morning just for this, covering
936 different rules I know have been investigated:
Code: Select all
sqlite> select distinct code, rule from census where code like "%yx%";
xp412_y7444y04bp3zzwezgg8gyrgg08oz32011yr1211zyxezzyeojq4y0444|b358s23
xq18_1a4zyx665|b2ce3aiys12-ckn3r
xq18_335zyxe95|b2ce3aiys12-ckn3r
xq190_33y1g88gzy133433zyeskszzyjcmm8zyx352|b38s23
xq190_sm3fk8cszw11zyecscwgzyi303w4zyj4s8zyx352|b38s23
sqlite> select distinct code, rule from census where code like "%yy%";
sqlite> select distinct code, rule from census where code like "%yz%";
sqlite>
That said I added greedy apgcodes for a number of previously-unencodable patterns to the LifeWiki a while ago, using the modification of biggiemac's encoder script that I
posted above; these might.
OTOH they should be easy to find; they all use
Template:LinkCatagolue and pass format=extended. I've added a
tracking category, so they could easily be fixed up if necessary.
dvgrn wrote:
TL;DR
I guess for the moment my vote is for the simple greedy algorithm, but my mind might not be entirely made up yet. If someone can suggest a backwards-compatible apgext format that also allows logarithmic compression of vertical gaps, I might find that to be pretty much irresistible.
Might even volunteer to write revised encoders and decoders for the format, but I won't promise anything yet...!
I'm still leaning towards greedy codes, too; the fact that they strictly extend apgcodes (in their current form) and that existing decoders should require no modification are killer features IMO.
I also still think there's better ways of passing large patterns around, namely RLE, Macrocell and possible other future file formats.
Finally, I think that one of the purposes of apgcodes is allowing people to unambiguously talk about patterns -- and while it's perfectly reasonable to say e.g. "
xp5_idiidiz01w1 is a known oscillator", nobody would do the same if the code were a million characters, or even a thousand. Large,
extremely sparse patterns might get short codes, but are most large patterns really that sparse? (Sure,
some are, but are these really the norm?)
THAT SAID, I think there is one more purpose that apgcodes serve: they provide the machinery to unambiguously compare patterns. Given two RLE files, it's not a trivial task to figure out whether they contain the same pattern or not. Two (canonical) apgcodes, however, will be definition encode the same object if and only if the codes themselves are identical. And this is a use where it won't matter whether the code is short enough to be used, remembered or quoted by human users.
On the subject of encoding vertical gaps:
dvgrn wrote:Would have to only start encoding vertical gaps after the first 38 z's have gone by, I suppose, to be perfectly safe... but probably after a quick search of current Catagolue contents we could pick a reasonable number that's smaller than 38.
This little problem definitely pushes me back toward voting for the simple greedy linear encoding algorithm, since our "reasonable number" would have to be a fairly contrived choice to match existing uses, not anything obvious or elegant.
Keep in mind that Catagolue only properly encodes patterns up to 40x40 right now, and that there's at most eight strips of height 5 in a pattern -- so the maximum number of consecutive z's you can have in a code on Catagolue right now is 7.
A few codes with seven consecutive z's exist:
Code: Select all
sqlite> select distinct code, rule from census where code like "%zzzzzzz%";
xp160_3vy9v3zzzzzzzovy9vo|b3s2-i34q
xp160_voy9ovzzzzzzzv3y93v|b3s2-i34q
xq11_6917zzzzzzzyeci2e|b2c34e6cs2-i3-jn5e
xq12_4fzzzzzzzyv896|b2ik367s127
xq12_f2zzzzzzzyw5041|b2ik367s127
xq52_cjoo7zzzzzzzyt4a1a6|b36ce7-es23-y
xq74_okb7zzzzzzzyb25534jv|b3s23-e4e
sqlite>
The ships are just improperly unseparated, I think, but the xp160's in b3s2-i34q are genuine. In any case, since 8 consecutive z's are not currently possible on Catagolue, this would be a reasonable cut-off. (And it would not be
completely arbitrary, either, insofar as that it derives from the specification of "classic" apgcodes rather than just the objects that happen to have been recorded on Catagolue at an arbitrary point in time.)