Is there an URL parameter I can set to display more posts on each page? Ideally, all of them.
I'd like to download the entire Oscillator Discussion Thread and try to extract all the oscillators automatically.
"Posts per page" URL parameter?
Re: "Posts per page" URL parameter?
I don't know of one. However, here's some code on GitHub that's pretty easy to adapt to collect HTML for all the posts.
Here are the contents of all the code blocks in the Oscillator Discussion Thread
-- according to this boiled-down version of the get-all-patterns code:
Code: Select all
# get-all-patterns.py, version 1.618033
# Version 1: initial working version
# Version 1.618033: hacked version specific to Oscillator Discussion thread
import golly as g
import urllib2
import re
import os
url = "https://conwaylife.com/forums/viewtopic.php?f=2&t=1437&start="
outfolder = "c:/REPLACE/THIS/WITH/YOUR/PATH/"
maxpost = 1268
ptr = 0
count = 0
addthis = 100000
while ptr<maxpost: # current most recent post
g.show("Retrieving data for " + str(ptr+1) + " through " + str(ptr+25))
try:
resp = urllib2.urlopen(url + str(ptr))
html = resp.read()
except:
g.note("Got an error on ptr=" + str(ptr))
continue
lastindex = 0
notdone=1
while notdone:
g.show("Retrieved html for post page starting with " + str(ptr) + ". Last index = " + str(lastindex))
index = html.find("<code>",lastindex) + 6
if index==5: # should really have added 6 separately after checking for -1
notdone=0
break
if html[index] in ['#', 'x', 'b', 'o', '$', '[', '.', 'O', '*']: # skip any code blocks that don't look like RLE or macrocell or ASCII files
if html[index]=='#':
if html[index+1] not in ['C','D','N','O']:
# don't export this one, it's probably a Python script
lastindex = index
continue
index2 = html.find("</code>", index)
if html[index]=="[": # macrocell
ind1 = html.find("#R",index)
offset = 2
ext = ".mc"
else:
ind1 = html.find("rule =")
offset=6
if ind1==-1:
ind1 = html.find("rule=")
offset=5
ext = ".rle"
if html[index] in ['.', 'O', '*']:
ext = ".txt"
if ind1>-1:
ind2 = html.find("\n",ind1)
patstr = html[index:index2]
if patstr[-1]!="\n": patstr+="\n"
fname = os.path.join(outfolder,"pat" + str(count+addthis)[1:]) + ext
with open(fname,'w') as f:
if ext==".mc":
f.write("[M2] (golly VERSION)\n#C " + url + str(ptr) +"\n"+patstr)
else:
f.write("#C " + url + str(ptr) + "\n" + patstr)
count += 1
lastindex = index
ptr+=25
g.note("Total patterns written: " + str(count))
Doing something like that would allow the comments in the pattern files to point to the specific post containing the pattern. The fastest way to produce this ZIP file was to get rid of the code that hunts down the specific post number, so the URLs in each pattern file just point to the page of 25 posts that contains the pattern. It wouldn't be too hard to add that post-hunting code back in, though, if that would be useful.
Re: "Posts per page" URL parameter?
Fantastic - that's exactly what I wanted to do - thanks very much!