"Posts per page" URL parameter?

MikeP · Post by **MikeP** » May 2nd, 2020, 8:02 pm

Is there an URL parameter I can set to display more posts on each page? Ideally, all of them.

I'd like to download the entire Oscillator Discussion Thread and try to extract all the oscillators automatically.

Post by **dvgrn** » May 2nd, 2020, 9:25 pm

MikeP wrote: ↑
May 2nd, 2020, 8:02 pm
Is there an URL parameter I can set to display more posts on each page? Ideally, all of them.

I'd like to download the entire Oscillator Discussion Thread and try to extract all the oscillators automatically.

I don't know of one. However, here's some code on GitHub that's pretty easy to adapt to collect HTML for all the posts.

Here are the contents of all the code blocks in the Oscillator Discussion Thread

patterns-from-Osc-Discussion.zip: everything that looked like an oscillator pattern on 2 May 2020; (1.13 MiB) Downloaded 164 times

-- according to this boiled-down version of the get-all-patterns code:

Code: Select all

# get-all-patterns.py, version 1.618033
# Version 1: initial working version
# Version 1.618033: hacked version specific to Oscillator Discussion thread

import golly as g
import urllib2
import re
import os

url = "https://conwaylife.com/forums/viewtopic.php?f=2&t=1437&start="

outfolder = "c:/REPLACE/THIS/WITH/YOUR/PATH/"

maxpost = 1268
ptr =  0

count = 0
addthis = 100000

while ptr<maxpost:  # current most recent post
  g.show("Retrieving data for " + str(ptr+1) + " through " + str(ptr+25))
  try:
    resp = urllib2.urlopen(url + str(ptr))
    html = resp.read()
  except:
    g.note("Got an error on ptr=" + str(ptr))
    continue
  lastindex = 0
  notdone=1
  while notdone:
    g.show("Retrieved html for post page starting with " + str(ptr) + ". Last index = " + str(lastindex))
    index = html.find("<code>",lastindex) + 6
    if index==5: # should really have added 6 separately after checking for -1
      notdone=0
      break
    if html[index] in ['#', 'x', 'b', 'o', '$', '[', '.', 'O', '*']: # skip any code blocks that don't look like RLE or macrocell or ASCII files
      if html[index]=='#':
        if html[index+1] not in ['C','D','N','O']:
          # don't export this one, it's probably a Python script
          lastindex = index
          continue
      index2 = html.find("</code>", index)
    
      if html[index]=="[": # macrocell
        ind1 = html.find("#R",index)
        offset = 2
        ext = ".mc"
      else:
        ind1 = html.find("rule =")
        offset=6
        if ind1==-1:
          ind1 = html.find("rule=")
          offset=5
        ext = ".rle"
        if html[index] in ['.', 'O', '*']:
          ext = ".txt"
      if ind1>-1:
        ind2 = html.find("\n",ind1)
      
      patstr = html[index:index2]
      if patstr[-1]!="\n": patstr+="\n"
      fname = os.path.join(outfolder,"pat" + str(count+addthis)[1:]) + ext
      with open(fname,'w') as f:
        if ext==".mc":
          f.write("[M2] (golly VERSION)\n#C " + url + str(ptr) +"\n"+patstr)
        else:
          f.write("#C " + url + str(ptr) + "\n" + patstr)

        count += 1
    lastindex = index
  ptr+=25

g.note("Total patterns written: " + str(count))

The full get-all-patterns script uses a somewhat different method of retrieving all the patterns from every thread. Basically it starts with the first post and collects all the posts on the page that comes up, then repeats that for every post number up to the newest post.

Doing something like that would allow the comments in the pattern files to point to the specific post containing the pattern. The fastest way to produce this ZIP file was to get rid of the code that hunts down the specific post number, so the URLs in each pattern file just point to the page of 25 posts that contains the pattern. It wouldn't be too hard to add that post-hunting code back in, though, if that would be useful.

MikeP · Post by **MikeP** » May 3rd, 2020, 7:25 am

Fantastic - that's exactly what I wanted to do - thanks very much!

ConwayLife.com

"Posts per page" URL parameter?

"Posts per page" URL parameter?

Re: "Posts per page" URL parameter?

Re: "Posts per page" URL parameter?