Speedup ideas:
- if patch fails, try parsing without patch, then erase patch if that works
- make it easier to erase patch and force reparse - special patchtool command line option
- Add auto-updating ministerships, so don't have to manually move date ranges
for solicitor general et al (see member-aliases.xml)
Being errors found each day running the scraping job
Out of sequence dates are when the rescraper picked up new stuff?
Dates are quite rough as sometimes I put todays date rather than the error date
2004-07-23 Advocate General disambiguation by office, date moved forward
2004-07-21 Alias for new MP (Parmjit Gill often includes middle name)
2004-07-21 Missing "." in time.
2004-07-20 Missing column/date at start of days debate
2004-07-20 "Ivor Caplin)" extra close bracket in name
2004-07-16 Beginning headers, missing center tag
2004-07-15 FONT tag in middle of paragraph
2004-07-14 Missing (1) in wrans
2004-07-13 Terry Davis listed as voting when he had resigned, removed him
2004-07-07 Solicitor-General date range moved
2004-06-?? Missing "To ask the" (twice)
2004-06-?? Space in the middle of name "ji m"
2004-06-28 Missing (1) in first para multi-part wrans
2004-06-28 Spurious in wrans
2004-06-17 "Dame" added to list of titles in resolvemembernames.py
2004-06-16 Gareth Thomas disambiguation of office, date moved forward
2004-06-14 Confusing
in table
2004-06-14 Style padding attributes in
in a wrans
2004-06-11 Four part question broken into two sections
2004-06-10 Heading between two parts of a single written question
2004-06-08 Jim Marshall voted, when he was dead
2004-06-08 Office of name broken on separate line to partial name
2004-06-08 Solicitor-General date range moved
2004-05-25 Advocate General disambiguation of office, date moved forward
2004-05-21 Entirely missing speaker of wrans answer (fixed with broken-name)
2003-10-27 Totally spurious tag
2003-11-06 Extra 4 times
2003-11-06 Mangled "Opik"
2004-05-20 Missing (1) in first para multi-part wrans
2004-05-18 Missing "." in time stamp "4 58 pm"
2004-05-12 Missing "Robertson:" at end of name, limit to ask the
2004-05-17 Missing "to" in "to ask"
2004-04-29 Missing bold on name
2004-05-04 A couple of missing "to ask the"
2004-05-?? Some names needing tags
2004-05-?? Wrans multiple-questions (with one answer) split by heading
2004-05-06 Warning: pargraph numbers not consecutive
2004-05-06 Missing "to ask the"
2004-05-10 Solicitor-General disambiguation of office, date moved forward
2004-10-20 (page changed) Incorrect date heading
2004-05-12 Gareth Thomas disambiguation of office, date moved forward
2004-04-27 Missing bold formatting on name, some wrans without "to ask" at start of question
2004-04-28 Missing k in "to ask", missing "(1)" in multi-part question,
2004-04-27 New entity ÷ which is division symbol
2004-04-26 Bad paragraph number in wrans
2004-04-?? Added alias with hyphen in name
2004-04-23 Word from text got into bold round name
2004-04-20 Missing (1) in two wrans, typo of "M.r", missing bold round name
2004-04-19 House of Commons header merged with next
2004-03-31 Gareth Thomas disambiguation of office, date moved forward
2004-03-19 Slight misspelt wrans major heading CHURCH COMMISSIONER
2004-03-18 Part name inside bold: Multiple matches Gareth Thomas (Clwyd, West) (Lab)
2004-03-16 Wrans with two answers (although really a messup and the questions should be separate)
2004-03-04 Missing (1) in wrans which appears otherwise fine
2004-03-08 Marginal colnum case (missing tag after colnum)
2004-03-10 Slightly misspelt wrans major heading INTERNATIONAL DEVEOPMENT
2004-03-08 Missing "To ask the secretary of...." on wrans
2004-03-05 'all' rule broke, because there is one extra case (also /UL)
2004-03-05 Missing chunk in middle of wrans, detected because numeral (2) missing
2004-03-04 In the morning, 5 pages were repeated, fixed by afternoon
2004-03-03 Gareth Thomas disambiguation of office, date moved forward
2004-02-26 Mangled column number text: Missing formatting
2004-02-25 Mangled multi-part question: Missing (1) marker
2004-02-25 Mangled "to ask": To the Secretary of State for Environment
2004-02-23 Mangled name: Ms Hazel Blears)
|