You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
Gary dos Santos d40fff5dd0 Created efault ini file 11 months ago
README.md initial commit 11 months ago
database.ini.default Created efault ini file 11 months ago
database.py Removed backticks in sql 11 months ago
dokument.py initial commit 11 months ago
download.py initial commit 11 months ago
requirements.txt initial commit 11 months ago
riksdagen.db initial commit 11 months ago
riksdagen.py initial commit 11 months ago

README.md

Riksdagen scraper

This program is able to download debates from Riksdagen.se along with the metadata.

Prerequisites

Install python3 requirements

pip3 install -r requirements.txt

Usage

All available arguments

$ python3 riksdagen.py -h
usage: riksdagen.py [-h] [--start START] [--end END] [--update UPDATE]
                    [-o OUTPUT]

Archive debates from riksdagen.se

optional arguments:
  -h, --help                    show this help message and exit
  --start START                 Starting page
  --end END                     Ending page
  --update UPDATE               Year (2XXX) or all
  -o OUTPUT, --output OUTPUT    Set download directory

Download all available programs

Note that doing this will take a considerable amount of time to iterate through all the pages and years to collect the data required to start the video downloads.

$ python3 riksdagen.py --update all

Download from a specific year

This will download debates from from the end of 2019 to the beginning of 2020, in other words debates listed here. If you wish to start or end on a specific use --start <int> or --end <int> respectively replacing <int> with your desired page.

$ python3 riksdagen.py --update 2019

Download all documents

This downloads all available documents in html and sql form from http://data.riksdagen.se/data/dokument/ and divides them into folders.

$ python3 dokument.py

$ python3 dokument.py -h Optional parameters for `dokument.py.

usage: dokument.py [-h] [-o OUTPUT] [-p PROCESSES]

Archive documents from riksdagen.se

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Set download directory
  -p PROCESSES, --processes PROCESSES
                        Set number of concurrent downloads