Getting arxiv data
For performance reasons, let's access arxiv via its bulk data API. There are some 30k physics preprints per year. We will probably want to limit ourselves to the last 5-10 years, which still amounts to quite a number.
Relevant links:
- Terms of use (we should be OK on this, but rate limits need care).
- OAI api page