Scorbie wrote:2. Subpattern searching
Although searching for subpatterns is used a lot by scripts, people don't seem to feel uncomfiness often as most of the pattern searching is done by custom rules. This is really fast but only works for "specific" targets. (99% of the time gliders) On the other hand, one could generalize to searching arbitrary patterns (just like str.find(substring)) but that's really slow with the {(cell, cell)} form. So we would have to make a new datatype anyway, or any suggestions for a fast subpattern searching algorithm? I'm not talking about big, big arrays, even a 2-digit x 2-digit bounding box pattern is slow.
That's an item that has been on my wish list for many years, actually. Imagine a database containing every pattern in the old and new glider gun collections, for example. How can I make some kind of 2D subpattern index, so that I can easily find all the patterns that use a particular p7 pipsquirter oscillator? (Maybe I'm swapping in a new smaller version of the pipsquirter, or some other similarly obsessive little project.)
Subpattern searching is maybe not
quite hopeless. If storage space isn't too much of an issue, a possible new datatype might be a hashlife-inspired pattern format that stores all possible ways of chopping a large pattern up into NxN pieces. Macrocell-ish compression could be used on each of the NxN overlapping tilings.
Then if you're looking for a smaller-than-NxN subpattern, you only need to make one pass through the list of NxN tiles. If the subpattern doesn't show up in a tile, it's not in the pattern. Conversely, if it does show up, the quadtree structure will quickly tell you where the tile is used in the larger pattern.
Short of this kind of pre-indexing trickery, here is the most workable algorithm I've used so far. Don't expect anything too amazing -- it's pretty straightforward:
1) Translate the pattern to be searched into nested-list form. Let's call this list searchpattern[]. Maybe a set or some other Python collection type would be more efficient, I haven't done thorough timing tests. Simple lists of coord tuples seem to work quite well.
2) Translate the subpattern to search for into a normal-form nested list, let's call it subpat_ON[]. First ON cell in the upper left is at (0,0). You can actually remove the (0,0) cell, but that's a minor detail.
3) Make a list -- subpat_OFF[] -- of all OFF cells adjacent to the cells in the subpattern, to prevent false-positive matches. Depending on the search, you might use all OFF cells within two cells of an ON cell, or whatever. For larger patterns, using all OFF cells in the bounding box slows down the search too much, and anyway you start to miss good matches due to other nearby objects.
4) Generate and normalize the other orientations of the subpattern, if needed, and run the same search for each.
Then the actual search:
5) For each ON cell (x,y) in searchpattern[], check to see that every cell in subpat_ON[] can be translated by (x,y) to match another cell in searchpattern[]. Just use plain Python "in" or "not in". Stop as soon as a subpat_ON[] cell doesn't match, and go on to the next ON cell in searchpattern[].
6) If all subpat_ON[] cells match, check that all the subpat_OFF[] cells, again translated by (x,y), are
not in searchpattern[]. Stop as soon as an exception is found.
7) If everything matches, remove all the matching cells from searchpattern[], then go back to step 4 to try and match the next remaining ON cell.
Steps 5 and 6 are pretty fast, because usually the mismatch shows up in the first few ON cells. With this method, even searching for any orientation of dozens of different objects, my recently reworked recognizer script can run through a big pattern like the
glider-to-weekender-to-glider converter, and find all of the pieces in fifteen seconds or so.
This works best if you have a complete library of matching subpatterns, because then you can always find a match for the upper left cell, and remove the other associated cells which would all otherwise be unrecognizable, and therefore very slow to process. If you're trying to recognize large circuitry components, the library has to be ordered with larger subpatterns first -- e.g., find Snark reflector before eater, otherwise the Snark reflector won't be recognizable by the time that subpattern is tried.
I've been thinking about trying a trivial modification of steps 5 and 6, to allow up to M mismatches before giving up. Too high a value of M would probably slow the search down unbearably, but at low values, welded objects and other nonstandard variants could still be recognized.
It could be handy to be able to figure out that some weird object is actually two overlapping Snarks, plus some pattern that can be XORed onto the combination to complete the weld. For example, a GUI that allows you to move Snarks around would just need an XOR weld pattern for every known workable overlap.
-- This is probably getting a little far afield from the original question, though. For the linear propagator project I applied the same algorithm to find all the interesting still lifes in random ash constellations... with "interesting" defined as "anything that isn't in the subpattern library yet".