Quantcast
Viewing all articles
Browse latest Browse all 106

How to query arXiv for a specific year?

I'm using the code shown below in order to retrieve papers from arXiv. I want to retrieve papers that have words "machine" and "learning" in the title. The number of papers is large, therefore I want to implement a slicing by year (published).

How can I request records of 2020 and 2019 in search_query? Please notice that I'm not interested in post-filtering.

import urllib.requestimport timeimport feedparser# Base api query urlbase_url = 'http://export.arxiv.org/api/query?';# Search parameterssearch_query = urllib.parse.quote("ti:machine learning")start = 0total_results = 5000results_per_iteration = 1000wait_time = 3papers = []print('Searching arXiv for %s' % search_query)for i in range(start,total_results,results_per_iteration):    print("Results %i - %i" % (i,i+results_per_iteration))    query = 'search_query=%s&start=%i&max_results=%i' % (search_query,                                                         i,                                                         results_per_iteration)    # perform a GET request using the base_url and query    response = urllib.request.urlopen(base_url+query).read()    # parse the response using feedparser    feed = feedparser.parse(response)    # Run through each entry, and print out information    for entry in feed.entries:        #print('arxiv-id: %s' % entry.id.split('/abs/')[-1])        #print('Title:  %s' % entry.title)        #feedparser v4.1 only grabs the first author        #print('First Author:  %s' % entry.author)        paper = {}        paper["date"] = entry.published        paper["title"] = entry.title        paper["first_author"] = entry.author        paper["summary"] = entry.summary        papers.append(paper)    # Sleep a bit before calling the API again    print('Bulk: %i' % 1)    time.sleep(wait_time)

Viewing all articles
Browse latest Browse all 106

Trending Articles