Creating a Better CA notation (because all the others suck)

For discussion of other cellular automata.
Post Reply
User avatar
muzik
Posts: 5648
Joined: January 28th, 2016, 2:47 pm
Location: Scotland

Creating a Better CA notation (because all the others suck)

Post by muzik » September 2nd, 2019, 5:58 pm

Since there's a ton of different cellular automaton families out there, with a bunch of different notations for each of them, I figured I'd try my hand at creating a sort of master notation that describes all of these families without the need to use like seven notations. I intend to only cover 2-dimensional isotropic rule families which take place on square grids, since these are the most commonly played with rules, however more may be covered in the future (hexagonal and triangular could probably be supported very easily). The rule families I plan to cover with this are the following:

- 2-state outer-totalistic rules: Range-1 outer-totalistic rules, HROT rules (including LTL)
- 3+-state outer-totalistic rules: Range-1 outer-totalistic genext rules (including Generations and snoitareneG), HROT genext rules (includes HROT Generations and snoitareneG, and LTL Generations and snoitareneG as subsets)

The neighbourhoods supported for these rules are as follows:

- NM: Moore
- NN: von Neumann
- NC: circular
- NL: Euclidean/L2
- NX: saltire
- NT: cross
- NS: star

Also supported, with fixed neighbourhoods, are the following:
- 2-state isotropic non-totalistic rules - range-1 Moore (through Hensel notation), range-2 von Neumann (through Feb24 notation)
- 3+-state isotropic non-totalistic rules- range-1 Moore genext (through Hensel notation), range-2 von Neumann genext (through Feb24 notation)

Like with HROT notation, a - can be used to specify a range of birth/survival conditions. ~ can also be used instead of -, given that the numbers at both ends are of identical parity, to define an on-and-off range, so that every second condition in the range is ignored.

Examples of notation follow:

R1,B3,S2-3,G1,NM (Life)

R1,B1~7,S0~8,G1,NM (Fredkin)

R5,B34-45,S34-58,G1,NM (Bosco's Rule)

R1,B3-4,S1-2,G1-1,NM (Frogs)

R1,B3-4,S1-2,G0-1-1,NM (sgorF)

R1,B3-5,7,S4-6,8,G1,NM (Gems)

R0,B0,S0,G1,NM ("EVERYBODY LIVE FOREVER")

R1,B3,6-7,S2-i,3,4q,G1,NM (SilverLife)


As far as I know, extended-range genext and the range-2 isotropic non-totalistic von Neumann rulespaces have never actually been explored yet by anyone, so this opens up a lot of new stuff to be discovered (assuming we get the appropriate software support for it).

For isotropic non-totalistic rulestrings, N and R could potentially be ditched for the sake of brevity (since they're pretty redundant anyway), although the rulestring wouldn't be canonical.

Gamedziner
Posts: 795
Joined: May 30th, 2016, 8:47 pm
Location: Milky Way Galaxy: Planet Earth

Re: Creating a Better CA notation (because all the others suck)

Post by Gamedziner » September 3rd, 2019, 2:05 pm

Here, have a notation that describes a lot more CA:
Start with "G=" followed by "H," "S," or "T" to indicate a hexagonal, square, or triangular grid, respectively.
To describe the neighborhood, use "N=" followed by an RLE in which "A" represents a neighbor and "B" represents the cell itself. After this, replace the "!" at the end of the RLE with a semicolon and add "C=" (number of cell states in rule). Continue with "D=" (number of neighbors plus one) to indicate the depth of the tree. End the non-tree part of the notation with "V=" followed by the number of nodes.

The tree part works similar to how it does in Golly, with one key difference: Here, the neighborhood is checked from left to right, then from top to bottom (excluding the cell being affected, which is last).

For example, the following lines would describe Seeds:

Code: Select all

G=S; N=3A$ABA$3A; C=2; D=9; V=27
1 0 0
1 1 0
2 0 1
2 1 0
2 0 0
3 4 2
3 2 3
3 3 4
3 4 4
4 5 6
4 6 7
4 7 8
4 8 8
5 9 10
5 10 11
5 11 12
5 12 12
6 13 14
6 14 15
6 15 16
6 16 16
7 17 18
7 18 19
7 19 20
8 21 22
8 22 23
9 24 25

Code: Select all

x = 81, y = 96, rule = LifeHistory
58.2A$58.2A3$59.2A17.2A$59.2A17.2A3$79.2A$79.2A2$57.A$56.A$56.3A4$27.
A$27.A.A$27.2A21$3.2A$3.2A2.2A$7.2A18$7.2A$7.2A2.2A$11.2A11$2A$2A2.2A
$4.2A18$4.2A$4.2A2.2A$8.2A!

User avatar
muzik
Posts: 5648
Joined: January 28th, 2016, 2:47 pm
Location: Scotland

Re: Creating a Better CA notation (because all the others suck)

Post by muzik » September 4th, 2019, 3:07 am

...and how are you supposed to fit that onto a single line and memorise it?

User avatar
dvgrn
Moderator
Posts: 10672
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Creating a Better CA notation (because all the others suck)

Post by dvgrn » September 7th, 2019, 1:39 pm

muzik wrote:...and how are you supposed to fit that onto a single line and memorise it?
The same question, or at least the "memorise" part, would seem to apply quite well to your own suggestion.
standards.png
standards.png (95.03 KiB) Viewed 5161 times
(xkcd.com)
muzik wrote:... (assuming we get the appropriate software support for it)
That does seem like the key assumption. Your new standard looks an awful lot like Larger than Life syntax, to the point where it might be a bit of a pain to adapt existing parser code to decide whether the rule string is canonical LtL or this new standard format. Existing pattern collections aren't all going to magically going to switch themselves over to a new universal format, so we'll always have to deal with the new vs. old question.

As a writer of parser code, I'd be much happier if a rule string could be easily recognized as the new format -- say the first five letters were always "xkcd!" or "MUZIK" or "G=" or whatever -- something unique. Then I'd only have to start worrying if patterns actually started showing up with rule strings in that format.

Wanted: Interesting New Rulespaces
In general there doesn't seem to be too much point in worrying about being able to support all existing rule types in the new system. We already have perfectly good parser code to load those rules with, and thirty years' worth of existing pattern collections and email threads and so on, full of patterns compatible with that parser code. It seems unlikely that anyone will ever actually start using "R1,B3,S2-3,G1,NM" instead of "B3/S23".

(Having to type an unnecessary but traditional forward slash is bad enough -- why switch to having to type four more letters in the right order, plus four commas and a hyphen?)

The point of a new rule syntax is to be able to support new types of rules. But for that you need simulator code, or you need to get someone interested in the project of writing the simulator code.

Is there any reason to think that rules that MuzikSyntax or GamedzinerSyntax can uniquely support, will behave in a qualitatively different way from rules in the incredibly vast unexplored rulespaces that Golly and LifeViewer already support? If not, then how will you convince someone to spend the highly nontrivial amount of time needed to implement one of these new syntaxes correctly?

User avatar
Hdjensofjfnen
Posts: 1743
Joined: March 15th, 2016, 6:41 pm
Location: re^jθ

Re: Creating a Better CA notation (because all the others suck)

Post by Hdjensofjfnen » September 7th, 2019, 5:05 pm

dvgrn wrote:standards.png
That image is edited.

Code: Select all

x = 5, y = 9, rule = B3-jqr/S01c2-in3
3bo$4bo$o2bo$2o2$2o$o2bo$4bo$3bo!

Code: Select all

x = 7, y = 5, rule = B3/S2-i3-y4i
4b3o$6bo$o3b3o$2o$bo!

User avatar
muzik
Posts: 5648
Joined: January 28th, 2016, 2:47 pm
Location: Scotland

Re: Creating a Better CA notation (because all the others suck)

Post by muzik » September 10th, 2019, 5:24 am

dvgrn wrote:
muzik wrote:...and how are you supposed to fit that onto a single line and memorise it?
The same question, or at least the "memorise" part, would seem to apply quite well to your own suggestion.
Red Berries Sometimes Give Nausea - should be enough to get the order of each field memorised easily. And it harmonises more or less with LtL/HROT notation, so it shouldn't be too difficult for anyone familiar with it to figure out what each one does. The only really exotic thing here is G being there instead of C and using genext notation, which took me (entirely too big of) a while to get my head around, but it's not a hard concept either.
dvgrn wrote:
muzik wrote:... (assuming we get the appropriate software support for it)
That does seem like the key assumption. Your new standard looks an awful lot like Larger than Life syntax, to the point where it might be a bit of a pain to adapt existing parser code to decide whether the rule string is canonical LtL or this new standard format. Existing pattern collections aren't all going to magically going to switch themselves over to a new universal format, so we'll always have to deal with the new vs. old question.

As a writer of parser code, I'd be much happier if a rule string could be easily recognized as the new format -- say the first five letters were always "xkcd!" or "MUZIK" or "G=" or whatever -- something unique. Then I'd only have to start worrying if patterns actually started showing up with rule strings in that format.
This notation does have a significant difference early on from HROT and LtL notations. With those, the value after R is C, while here it's B, so unless I'm doing something very stupid here, all that needs to be done is to check whether that character is a B or not.
dvgrn wrote:In general there doesn't seem to be too much point in worrying about being able to support all existing rule types in the new system. We already have perfectly good parser code to load those rules with, and thirty years' worth of existing pattern collections and email threads and so on, full of patterns compatible with that parser code. It seems unlikely that anyone will ever actually start using "R1,B3,S2-3,G1,NM" instead of "B3/S23".

(Having to type an unnecessary but traditional forward slash is bad enough -- why switch to having to type four more letters in the right order, plus four commas and a hyphen?)
That is true. But don't forget that B3/S23 can be equivalently notated as 23/3/2, R1,C2,M0,S2..3,B3..3,NM, R1,C2,S2-3,B3,NM, R1,C2,M0,S2..3,B3..3,NC, R1,C2,S2-3,B3,NC, and much more. Of course, nobody uses these notations in practice for much the same reason nobody says "The United Kingdom of Great Britain and Northern Ireland" in informal discourse - they're intended for completely different circumstances and would be far too much of a mouthful for viable common use. It probably shouldn't become canonical for rule families that can be notated through much simpler means, it just happily and conveniently generalises them in line with the bigger boys.

This could potentially replace LtL and (the very new, don't forget) HROT notation, though, since it tends to be similar enough in length to them to the point where it could be preferable to use this notation for them instead.
dvgrn wrote:The point of a new rule syntax is to be able to support new types of rules. But for that you need simulator code, or you need to get someone interested in the project of writing the simulator code.

Is there any reason to think that rules that MuzikSyntax or GamedzinerSyntax can uniquely support, will behave in a qualitatively different way from rules in the incredibly vast unexplored rulespaces that Golly and LifeViewer already support? If not, then how will you convince someone to spend the highly nontrivial amount of time needed to implement one of these new syntaxes correctly?
This notation is intended as a proposal for how higher-range genext rules could be notated (with a few neighbourhood notation suggestions thrown in there as well, but those can be ignored). As far as I'm aware, higher-range genext isn't supported by any program that I can think of - LifeViewer doesn't support genext at all yet, and I'm pretty sure Golly's ruletables are capped at range 1 (correct me if I'm wrong though). I'd say that high range genext is a pretty massive unexplored rulespace, since all we've seen of it is the Generations side, which is a tiny fraction of all the rules that could be supported by this. Let's not forget range-2 von Neumann isotropic, either - it might not be as big but it does open up quite a lot, with theoretical rules like SkewLife that can support two Life grids simultaneously. What Golly supports currently with range-1 Moore ruletables is indeed massive, but ask: how easy is it to explore all of them, and how quick can someone read a usually very large ruletable and understand what each transition is like and how the rule will roughly behave? With a notation, it's easy to get at least somewhat of a grasp of what the rule will behave like, even if this notation does have the potential to spiral out of control with super-complex definitions.

User avatar
muzik
Posts: 5648
Joined: January 28th, 2016, 2:47 pm
Location: Scotland

Re: Creating a Better CA notation (because all the others suck)

Post by muzik » September 10th, 2019, 7:02 pm

To summarise for everyone:

- The notation is meant for the currently vastly unexplored Higher-Range Outer-Totalistic Extended Generations rulespace, which currently has no software support to my knowledge

- Since any LTL rule can be expressed as a HROT rule (and conventional decay can be expressed in genext notation), this notation could potentially also replace the currently supported LTL notation, which may be desirable as using this notation for them would take up less space due to the absence of the redundant M parameter

- It could also replace the current HROT notation, since the only difference between this new notation and the current canonical HROT notation is that the new notation uses genext as opposed to strictly decay, and the parameters are in a slightly different order

- Range-1 outer-totalistic rules and their Generations counterparts can also be expressed as range-1 HROT rules, which means they can also be described using this notation, however this is not recommended since it just takes up more space than it needs to

- Isotropic non-totalistic rules are a similar case to the above - we would be better off just using the current notation for those as well unless we need to describe genext variants of them

All good?

User avatar
EvinZL
Posts: 854
Joined: November 8th, 2018, 4:15 pm
Location: A tungsten pool travelling towards the sun
Contact:

Re: Creating a Better CA notation (because all the others suck)

Post by EvinZL » November 18th, 2019, 8:04 pm

Well, I have a notation based on Hensel notation, which can describe all isotropic rules with any number of states on the Moore neighborhood through a straightforward generalization. However, the only description is here and isn't very clear.
Edit: And it looks ugly too. Brian's Brain = "3,2*,,,*,*," Some rules are worse. I remember someone made OWSSLife, I figured it would be "3,3*6*,,0*1*-e**4*5*6*7*8*,1*e**,**," or something like that I didn't check the whole rule table.
Edit2: I'll just post Life, it adjusts pretty well, to make "2,3,0145678".

User avatar
Entity Valkyrie 2
Posts: 1758
Joined: February 26th, 2019, 7:13 pm
Contact:

Re: Creating a Better CA notation (because all the others suck)

Post by Entity Valkyrie 2 » November 20th, 2019, 3:56 pm

Is there any need to change the name of StateInvestigator?
User:Entity Valkyrie 2/StateInvestigator
Bx222 IS MY WORST ENEMY.

Please click here for my own pages.

My recent rules:
StateInvestigator 3.0
B3-kq4ej5i6ckn7e/S2-i34q6a7
B3-kq4ej5y6c/S2-i34q5e
Move the Box

User avatar
HactarCE
Posts: 8
Joined: March 16th, 2019, 4:41 pm

One String to Rule Them All

Post by HactarCE » October 21st, 2020, 1:51 am

I am soon getting to the point where I will need to parse rulestrings in my own CA simulator, which has the additional challenge of supporting higher-dimensional CA, which no existing notation does sufficiently.

Here is my proposal:

One String to Rule Them All

Code: Select all

B<births>/S<survivals>/C<state_count>/N<neighborhood>
  • Components are separated by slashes, and may appear in any order. We could decide on a canonical order, but it doesn't matter much.
  • All components are optional except for B and S. By default, everything is 2-state 2D range=1 Moore.
  • Components MUST have the prefix letter, with the following exceptions:
    • The legacy `S/B` (e.g. `23/3`) and `S/B/C` (e.g. `345/2/4`) formats are accepted, but are discouraged and non-canonical.
    • The legacy suffixes `H` and `V` following `S/B` notation (e.g. `2m34/B2oH`) or at the end of the `B` or `S` component (e.g. `B2o/S2m34H`) are accepted for hexagonal and von Neumann neighborhoods respectively, but are discouraged and non-canonical. (New notation is `B.../S.../NH` for hex and `B.../S.../NN` for vN.)
  • Birth and survival conditions use comma-separated nonnegative integers, except for certain neighborhoods (such as 2D Moore/hex) where Hensel notation (classic INT) or similar notations are also accepted.
  • Birth/survival ranges may be represented using a hyphen. `..` is also accepted, but is discouraged and non-canonical. Example: `S6-8`
  • `N` component describes the shape, radius, and dimensionality of the neighborhood using a comma-separated list of the following:
    • Dimensionality: `1D`, `2D`, `3D`, etc. (default = 2D)
    • Radius: `R1`, `R2`, etc. (default = R1)
    • Neighborhood shape: any existing HROT symbol from `MN2#*+HCBXA3L` (default = M)
Examples
  • B3/S23 (Life)
  • B0/S0/NR0 ("Everybody lives forever")
  • B367/S2-i34q (SilverLife)
  • B2/S345/C4 (Star Wars)
  • B7-8/S6-9/C2/NR2 (some random HROT rule)
  • B2,4,6/S1,3,5/N3D (3D "Life" on cubes.io)
Rationale
  • Backwards-compatible with Hensel notation, which is by far the most common rulestring format
  • Generalization of existing formats; doesn't conflict with any existing format (that I'm aware of)
  • Easy for computers to parse
  • Easy for humans to parse
    • Intuitive operator precedence: `/` is lowest, followed by `,` and then `-`
    • Comma always separates items within a component, never separating components
    • Explicit, but not verbose; anyone with any knowledge of existing formats should be able to immediately and unambiguously understand the overall structure of this one
    • No need to memorize arbitrary orders, for reading or writing
  • Easy to extend in the future: just add more components with new prefix letters
  • Uses few symbols, and only those that are already accepted in most rulestring notations
    • Avoids `=`, which may cause problems in RLE header
  • Legacy S/B/C generations syntax is only supported because that notation is so common already
  • I ordered components in the original example roughly by what I expect to provide the most relevant information, or what is most subtle. For example, the rule neighborhood, number of dimensions, or number of states of a rule is less likely to change throughout the course of discussion compared to the birth and survival conditions. This is very subjective though -- and like I said, the order of components really doesn't matter.
Open questions/extensions
  • `G` instead of `C` for number of states? I think `C` is more common, but I could be wrong.
  • Higher-dimensional hexagonal/triangular support? Perhaps `Hxy` for "flat-topped" hexagons and `Hyx` for "pointy-topped" hexagons, which generalizes to higher dimensions.
  • Split dimensionality, radius, and grid shape into their own components?
  • Omit commas in neighborhood component, but force canonical order? Example: `N2DR1M` instead of `N2D,R1,M` to represent 2D range=1 Moore neighborhood. This causes problems with HROT neighborhoods `2` and `3` though.
  • Suffixes like `_History` and `_Super`? (underscore required to separate from other components)
  • Alternating ranges for birth/survival using a tilde instead of hyphen, maybe? I think this is more likely to be overlooked, and would probably cause more errors due to being seldom used and looking very similar to hyphen.
  • The symbols `/,~` require special encoding in URLs. What URL-safe substitutions should we use, if that's something that should be supported?
  • Custom neighborhood notations, beyond single letters?
  • Weighted neighborhoods? (using `W` component)
Conclusion

This is not a complete description (e.g. what about INT range-2 vN?) but I think this is a strong framework to work within, and is stylistically consistent and backwards-compatible with the most commonly-used existing formats. I don't mind bike-shedding about how to represent particular rule families, but they all have enough commonalities to be expressed using the same general notation.

I'm not completely sure about the notation for neighborhood, and this is all pretty hastily thrown together, so please do give feedback! With the recent support for HROT (which is exciting!) in Golly, this new rulespace will become more popular, but hopefully this is our chance to avoid fragmenting rule representation even further. :?
Last edited by HactarCE on October 21st, 2020, 11:52 am, edited 4 times in total.

User avatar
blah
Posts: 311
Joined: April 9th, 2016, 7:22 pm

Re: One String to Rule Them All

Post by blah » October 21st, 2020, 5:27 am

HactarCE wrote:
October 21st, 2020, 1:51 am
B0/S0/NR0 ("Everybody lives forever")
Support for range 0 rules should not be required for compliance with the standard. I'd even say 'R0' should be syntactically invalid.
HactarCE wrote:
October 21st, 2020, 1:51 am
Dimensionality: `1D`, `2D`, `3D`, etc. (default = 2D)
Provisions or specifications for 1D rules should not exist. Nobody would use that.
HactarCE wrote:
October 21st, 2020, 1:51 am
`G` instead of `C` for number of states? I think `C` is more common, but I could be wrong.
'C' is more intuitive to me. And while I'm dictating lower bounds for parameters, C(n<3) should be invalid.
HactarCE wrote:
October 21st, 2020, 1:51 am
Omit commas in neighborhood component, but force canonical order? Example: `N2DR1M` instead of `N2D,R1,M` to represent 2D range=1 Moore neighborhood. This causes problems with HROT neighborhoods `2` and `3` though.
I prefer omitting commas because I think they're ugly. But that's just personal preference. (Neighbourhoods '2' and '3' could be disambiguated by putting them right after N or something.)
HactarCE wrote:
October 21st, 2020, 1:51 am
Legacy support for `B.../S...h` hexagonal rules? (New notation would be `B.../S.../NH`)
The notation of "B[0-8]*/S[0-8]*[HV]" is already well accepted. If backwards-compatibility is a concern, and extreme ease of implementation is not, this is a necessity. The characters H and V (I'm not sure I've ever seen them lowercase) introduce no syntactic ambiguity here, so I see no reason not to include this. It should also work for generations notation, like "/12/3V".

Also, if we're really serious about backwards-compatibility, 23/3 should be a valid substitute for B3/S23. You haven't specified how much of a concern this is, though. There's also the concern of whether CA software should automatically convert rulestrings to their canonical forms, like opening an RLE with the rule set to "/2/3", doing some editing, then saving it, but being unable to open it in another program because the rule is now "B2/S/C3".

Somebody should write a formal grammar for this.
(EDIT: Removed typo corrections, since they don't matter for archival readers of this thread.)
Last edited by blah on October 21st, 2020, 1:41 pm, edited 1 time in total.
succ

User avatar
HactarCE
Posts: 8
Joined: March 16th, 2019, 4:41 pm

Re: One String to Rule Them All

Post by HactarCE » October 21st, 2020, 12:06 pm

Thank you for the corrections! I've updated the original post accordingly.
blah wrote:
October 21st, 2020, 5:27 am
Support for range 0 rules should not be required for compliance with the standard. I'd even say 'R0' should be syntactically invalid.
That's fair. Range-0 rules are pretty useless.
blah wrote:
October 21st, 2020, 5:27 am
Provisions or specifications for 1D rules should not exist. Nobody would use that.
Existing integer codes do a fine job for a computer, but are not very readable for a human. This notation generalizes down to 1D, so it may as well support 1D. If the consensus is that it's useless, however, we can remove it.
blah wrote:
October 21st, 2020, 5:27 am
And while I'm dictating lower bounds for parameters, C(n<3) should be invalid.
I disagree. Omitting the `C` component should imply `C2`; after all, `C2` is just the base case for `Cn`. `C(n<2)` may be invalid.
blah wrote:
October 21st, 2020, 5:27 am
I prefer omitting commas because I think they're ugly. But that's just personal preference. (Neighbourhoods '2' and '3' could be disambiguated by putting them right after N or something.)
`N32DR1`? `N3R12D`? `N2D3R1` might work. I agree that it looks nicer without commas, but it needs to be unambiguous -- both to computers, and at a glance to humans. IMO the `2` and `3` neighborhoods should have different symbols, but it's a little late for that. This is a good argument in favor of splitting them into separate components though: `B3/S23/N3/R1/D2`. (For what it's worth, `2D` could work instead of `D2` since `D` isn't used in Hensel notation.)
blah wrote:
October 21st, 2020, 5:27 am
Also, if we're really serious about backwards-compatibility, 23/3 should be a valid substitute for B3/S23. You haven't specified how much of a concern this is, though.
I've just added that, as well as the `H` and `V` suffixes, to the list of exceptions. I think it's possible/reasonable to maintain backwards compatibility with all notations using slashes to separate components, since those are similar enough are I'm not aware of any two slash-notations that conflict with one another.
blah wrote:
October 21st, 2020, 5:27 am
There's also the concern of whether CA software should automatically convert rulestrings to their canonical forms, like opening an RLE with the rule set to "/2/3", doing some editing, then saving it, but being unable to open it in another program because the rule is now "B2/S/C3".
This is tricky. Ideally, the major CA software (Golly, LifeViewer, Lifelib, etc.) would all support this new standard and so it would be generally ok to canonicalize. (In case it's relevant, Golly automatically converts a rulestring like `sb2ekin` to `B2-ac/S`.) I would say that it is up to the implementations whether they want to automatically convert to canonical rulestrings, and which legacy extensions to support, but if anyone has other opinions please share! The main thing I want is a single format that supports all rules and is easy to parse; applications can support as much or as little legacy as they like.
blah wrote:
October 21st, 2020, 5:27 am
Somebody should write a formal grammar for this.
I don't think this would generally be parsed using a formal grammar (CA formats are usually easier to handwrite code for -- I'm speaking from experience), but here's one that gives the general outline of a canonical rulestring:

Code: Select all

rule_string       = component     ('/' component)*
component         = prefix_letter subcomponent_list?
subcomponent_list = subcomponent  (',' subcomponent)*
prefix_letter     = basic_symbol
subcomponent      = basic_symbol*
basic_symbol      = 'A'..'Z' | 'a'..'z' | '0'..'9' | '-'
And here's one that supports legacy prefixless `S/B` and `S/B/C` notation:

Code: Select all

rule_string       = std_rule_string | legacy_sb

std_rule_string   = component     ('/' component)*
component         = prefix_letter subcomponent_list?
subcomponent_list = subcomponent  (',' subcomponent)*
prefix_letter     = basic_symbol
subcomponent      = basic_symbol*
basic_symbol      = 'A'..'Z' | 'a'..'z' | '0'..'9' | '-'

legacy_sb         = subcomponent_list '/' subcomponent_list ('/' number)?
number            = ('0'..'9')+
For a computer, parsing might look like the following:
  1. Split components by `/`
  2. For each component:
    • If the first character is a letter, it should be `B`, `S`, `C`, `N`, etc. (case-insensitive) This tells what type of component this is, and thus how to parse the rest of it
    • If the first character is a number (EDIT: or the component is empty), this is either survival or birth conditions or the number of states using the legacy format; save these separately and assign to survival/birth/count at the end
    • Anything else is invalid
Last edited by HactarCE on October 21st, 2020, 11:49 pm, edited 1 time in total.

User avatar
rowett
Moderator
Posts: 3815
Joined: January 31st, 2013, 2:34 am
Location: UK
Contact:

Re: One String to Rule Them All

Post by rowett » October 21st, 2020, 12:50 pm

HactarCE wrote:
October 21st, 2020, 12:06 pm
Ideally, the major CA software (Golly, LifeViewer, Lifelib, etc.) would all support this new standard and so it would be generally ok to canonicalize.
I think this is actually the hardest problem to solve if you want to introduce a new "standard". There are countless tools, scripts and patterns out there that use the various current "legacy" rule formats. Not only do you need to be able to read all the legacy formats but also you need to be careful to choose which format to canonicalize to.

At the moment there are canonical formats per rule family (at least in Golly and LifeViewer). Lifelib supports some other variants, a few of which LifeViewer can read but will convert to a different canonical representation.

A standard is helpful when it has broad adoption.

When I wrote LifeViewer I was careful to adopt the rule formats that Golly supported. When I then added new rule families to LifeViewer over time I added the same to Golly to keep the parsers and canonical formats in sync.

User avatar
blah
Posts: 311
Joined: April 9th, 2016, 7:22 pm

Re: One String to Rule Them All

Post by blah » October 21st, 2020, 7:09 pm

The algorithm at the end of your last post would trip up on /2/3 because the first character of the survival conditions is not a digit, but is the null character since the component is length 0.
HactarCE wrote:
October 21st, 2020, 12:06 pm
Omitting the `C` component should imply `C2`; after all, `C2` is just the base case for `Cn`. `C(n<2)` may be invalid.
My reasoning for the lower limit being 3 is that allowing C2 creates an edge case, and does not serve any actual function. What algorithm and colour scheme does 23/3/2 use? Is it different from B3/S23/C2? Is that different from B3/S23? What if a program is designed in such a way that assumes all Generations rules have at least one refractory state, and that a rule is a Generations rule if the Cn component is present? I can't think of any examples of this causing bugs but I think as a rule it's best to avoid edge cases and complexity if possible.

Taking this to its (il?)logical conclusion, "2D" and "R1" are also unneeded.
HactarCE wrote:
October 21st, 2020, 12:06 pm
`N32DR1`? `N3R12D`? `N2D3R1` might work. I agree that it looks nicer without commas, but it needs to be unambiguous -- both to computers, and at a glance to humans. IMO the `2` and `3` neighborhoods should have different symbols, but it's a little late for that. This is a good argument in favor of splitting them into separate components though: `B3/S23/N3/R1/D2`. (For what it's worth, `2D` could work instead of `D2` since `D` isn't used in Hensel notation.)
Yeah, having them be separate components seems reasonable, especially if you're usually just specifying R or N. Also, bringing up that 'D' doesn't conflict with Hensel notation raises an important question: Is this case sensitive? I don't see why it shouldn't be. Also I don't think "2D" should be a valid component, or if it is you should restrict where it can occur, because then you wouldn't be able to use "first character is digit or slash" to determine whether to parse a rulestring as legacy notation.

In general, I don't see why there should be no order. Having a required order of components makes parsing simpler.

As for the issue of whether or not this has any chance of catching on, it's probably up to whether your simulator ends up seeing any use. Personally I'm doubtful that this community could be drawn away from Golly, although I'd like to see more diversity in common CA implementations.
succ

Post Reply