ao3downloader (a script by nianeyna)
What is this?
This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to work with links to the Archive of Our Own itself, but has a secondary function of downloading any Pinboard bookmarks that link to the Archive of Our Own. You can ignore the Pinboard functionality if you don't know what Pinboard is or don't use Pinboard.
Quick Start
Dislike reading? Go directly to either Windows install or Mac and Linux install.
Table of Contents
- Announcements: List of changes that may be of note for returning users (not a complete changelog).
- Instructions: How to install and run ao3downloader.
- Menu Options: Explanation of the options you will see when you start ao3downloader and what they do. Note that most of these options will in turn present you with a series of prompts. These should largely be self-explanatory, however, if you are confused by any of the prompts your question may be answered in the notes.
- Notes: Explanation of some of ao3downloader's features and quirks that may not be immediately obvious. I recommend reading this.
- Known Issues: List of bugs that I know about but haven't yet been able to fix. If you encounter strange behavior, there may be a workaround here.
- Troubleshooting: If you encounter a problem running the script, please read this section carefully and do all of the steps in order to the best of your ability before sending a bug report.
- Contact: How to get in contact with me. Don't be shy!
Announcements
New, easier installation instructions are here! You no longer need to install python manually (no more version worries) or unzip any folders or any of that stuff. Just download and run one file. Plus, it can detect updates to the script and install them for you, so you don't have to check back here for updates anymore!
Instructions
Install Automatically
For the easiest experience, click on one of these links based on your operating system:
This will take you to a page that contains a downloadable installation script and some instructions for how to use it. (They're very simple instructions, I promise.)
Install With PyPi
If you know what you're about and don't care for install scripts, you can go directly to the ao3downloader PyPi package which is available here.
Menu Options Explanation
- 'download from ao3 link' - this works for most links to ao3. for example, you can use this to download a single work, a series, or any ao3 page that contains links to works or series (such as your bookmarks or an author's works). the program will download multiple pages automatically without the need to enter the next page link manually.
- 'get all work links from an ao3 listing (saves links only)' - instead of downloading works, this will simply get a list of all the work links on the page you specify (as well as subsequent pages) and save them in a .txt file inside the downloads folder (one link on each line). this is useful if you prefer to download fics through FanFicFare or some other method, rather than using the ao3 download buttons. this option is much, much faster than a full download - usually only a few seconds per page. when using this option you can also choose to download a csv (spreadsheet) file containing detailed work metadata, instead of a plain text file containing links only. Should you need to cancel a links download in the middle, please do so by pressing ctrl+c before closing the window - this will allow the script to save the metadata it has collected so far, so that you don't have to completely start over when/if you choose to resume.
- 'download links from file' - allows downloading links from a text file with one work or series link on each line. good if you have already harvested the links you want to download via some other method.
- 'download latest version of incomplete fics' - you can use this to check a folder on your computer (and any subfolders) for files downloaded from ao3 that are incomplete works. for each incomplete fic found, the program will check ao3 to see if there are any new chapters, and if so, will download the new version to the downloads folder.
- 'download missing fics from series' - checks for files downloaded from ao3 that are part of a series, and for each series found, checks the series page on ao3 and downloads any fics in the series that are not already in your library.
- 're-download fics saved in one format in a different format' - checks for all files downloaded from ao3 and redownloads every fic it finds (if possible - failed downloads due to deletion or other reasons will be logged). good if you change your mind about what format you want your library to be in. (file type choices for this option are not saved to settings.)
- 'download marked for later list and mark all as read (requires login)' - for those who like to use their marked for later as a download queue, this option takes the headache out of clearing the list after a download. note that this option does not generate 'starting page x' notifications in the console, but will still download all pages.
- 'download bookmarks from pinboard' - download ao3 bookmarks from pinboard. ignore this if you don't use pinboard. to get the api token go to settings -> password on the pinboard website.
- 'convert logfile into interactable html' - all downloads from ao3 (and some other actions) are logged in a file called log.jsonl in the 'logs' folder (if this folder does not exist it means no logs have been generated yet), along with information such as whether or not the download was successful, details about errors encountered, and so on. this option converts log.jsonl into a much more human-readable, searchable and sortable (click on the column headers to sort) html file that can be opened in any browser. the file is called 'logvisualization.html' (filename will also include some numbers indicating the timestamps of the first and last log messages it contains) and is saved in the same place as log.jsonl. If your log file is particularly large, it may get split up across several html files. Note that the searching and sorting functionality (searchbox, filters, etc) may take some time to load in after the page opens. (If it never loads, you can try refreshing the page in your browser.)
- 'configure ignore list (list of links to never try to download)' - creates (if it does not already exist) a file in the main script folder which allows you to specify links to works or series that you never want the script to attempt to download. particularly good if the work or series update option is perpetually grabbing junk you don't want. this option also gives you a chance to auto-add links to the ignore list if they were previously tagged in the log file as failed downloads due to deletion.
Notes
- IMPORTANT: some of your input choices are saved in a file called data.json. In some cases you will not be able to change these choices unless you clear your settings by deleting data.json (or editing it, if you are comfortable with json). In addition, please note that saved settings include passwords and keys and are saved in plain text. Use appropriate caution with this file.
- You may change certain behaviors of the script by editing the file settings.ini. Some of the current configurable options are:
- Whether the script should save your password - if set to 'false', you will need to re-enter your password every time you log in via the script. (Defaults to false.)
- How many seconds to pause between requests to Ao3 - the default is 0 seconds, which means that pauses will only be initiated when Ao3 requests them. Normally you should not need to adjust this, but it can be useful if you are running into odd behavior related to the rate limit.
- The file naming pattern to use. For most people ao3downloader's default file names should work fine, but if you don't like them, you can change that here.
- The purpose of entering your ao3 login information is to download archive-locked works or anything else that is not visible when you are not logged in. If you don't care about that, there is no need to enter your login information.
- Ao3 limits the number of requests a single user can make to the site in a given time period. When this limit is reached, the script will pause for the amount of time (usually a few minutes) that Ao3 requests. When this happens, the start time, end time, and length of the pause in seconds will be printed to the console. If you try to access Ao3 from your browser during this period, you will see a "Retry later" message. Don't be alarmed by this - it's normal, and you aren't in trouble. Simply wait for the specified amount of time and then refresh the page. Other than during these required pauses, you can use Ao3 as normal while the script is running.
- If you choose to 'get works from all encountered series links' then if the script encounters a work that is part of a series, it will also download the entire series that the work is a part of. This can dramatically extend the amount of time the script takes to run. If you don't want this, choose 'n' when you get this prompt. (Series that you have bookmarked directly will always be fully downloaded, regardless of what you choose here.)
- If you choose to 'download embedded images' the script will look for image links on all works it downloads and attempt to save those images to an 'images' subfolder. Images will be titled with the name of the fic + 'imgxxx' to distinguish them.
- Note that this feature does not encode any association between the downloaded images and the fic file aside from the file name.
- Most file formats will include embedded image files anyway, regardless of whether you choose this option. I have confirmed this for PDF, EPUB, MOBI, and AZW3 file formats. (If you saw me contradict this in an earlier version of this readme... no you didn't)
- Should an image download fail, the details of the failure will be logged in the log file with the message 'Problem getting image' along with the work link and the image link. It's a good idea to check the log file for these messages, since you may still be able to download the image manually or track it down some other way.
- If you need to stop a download in the middle, you can just close the window. When you restart the script:
- If you are using the option 'download from ao3 link', you will be given an option to restart the download from the page you left off on. The program will attempt to avoid re-downloading works that are already in the downloads folder.
- If you are using the option 'download bookmarks from pinboard' or 're-download fics saved in one format in a different format', the list of fics to download will be retrieved as normal but will then be filtered to remove work links that meet the following conditions:
- A record of a download attempt for that link is present in the log file AND
- There is a fic with the same title already in the downloads folder OR
- The download was marked as unsuccessful
- A record of a download attempt for that link is present in the log file AND
- If you are using the option 'download latest version of incomplete fics' or 'download missing fics from series', just make sure to add any fics you don't want to download again to your library (that is, the folder you entered when prompted 'input path to folder containing files you want to check for updates') and clean up any old versions before re-starting the download.
- Most methods of avoiding repeat downloads rely on a file called log.jsonl which is generated by the script. Make sure not to move, delete, or modify log.jsonl if you want these features to work. (Using the option to generate the log visualization file is fine.)
- When checking for incomplete fics, the code makes certain assumptions about how fic files are formatted. I have tried to make this logic as flexible as possible, but there is still some possibility that not all incomplete fics will be properly identified by the updater, especially if the files are old (since ao3 may have made changes to how they format fics for download over time) or have been edited.
- Custom work skins are not preserved in downloaded files. I don't currently have a way around that, however, when a work is downloaded the log entry for the download will contain a column (called 'workskin') indicating whether the work had a custom skin or not, so you can at least know which fics are in danger of looking garbled.
- The reason the installation instructions are on a separate site is because I didn't want to have to explain how to download the install scripts from github. The install scripts themselves (as well as the complete source code for the instructions site) are hosted in this repository. You can find them at
site/public/install
.
Known Issues
- The script will enter an infinite loop if you give it a link to an ao3 page that contains links to works or series, but does not support multiple pages of results. The most common example of this is user dashboard pages. (To download an author's works, make sure to put in the link to their "works" page, and not their "dashboard" page.) If you get into an infinite loop, you can simply close the window to get out of it.
- When downloading missing fics from series, if you are logged in, and the downloader finds a link to a series that is inaccessible because you do not have permission to access the series page, the downloader will download all of the works linked on your user dashboard page, instead. Yes... really.
- Links containing more than 4095 characters may cause issues on Mac and Linux. To work around this (on Mac and Linux only!) enter
stty -icanon
into your terminal before running ao3downloader. When you are finished running ao3downloader, enterstty icanon
to restore the default behavior. H/t github user verotheelf for this workaround. - Links containing more than 8191 characters will cause problems on Windows. There is no workaround, other than using a different link. Thankfully, it is unlikely you will run into this problem, as 8191 characters is quite a lot.
Troubleshooting
- If you are able to create logvisualization.html (menu option 'v'), take a look through the logs to see if there are any helpful error messages.
- There's nothing else here right now! The new install process is so new that I don't yet know in which exciting ways it might break. If you encounter problems, please see below for how to send me a bug report.
Questions? Comments? Bug reports?
Feel free to head over to the discussion board and make a post, or create an issue. I prefer to communicate through the above channels if possible, however I understand many of my users don't have github accounts and may not want to make one just for this, so you can also email me at nianeyna@gmail.com if you prefer. Please include "ao3downloader" in the subject line of emails about the downloader. If you are reporting a bug, please describe exactly what you did to make the bug happen to the best of your ability. (More is more! Be as detailed as possible.)
(Please note that while I will absolutely do my best to get back to you, I can't make any promises - I have a job, etc.)