Skip to content

Commit

Permalink
Get all CPAN distributions using scroll API for MetaCPAN
Browse files Browse the repository at this point in the history
This uses the ElasticSearch scroll API to get all CPAN distributions
<https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-scroll.html>.

Fixes <librariesio#1961>.
  • Loading branch information
zmughal committed May 30, 2021
1 parent 82926d2 commit 64fed36
Showing 1 changed file with 10 additions and 6 deletions.
16 changes: 10 additions & 6 deletions app/models/package_manager/cpan.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,20 @@ def self.package_link(project, _version = nil)
end

def self.project_names
page = 1
projects = []
size = 5000
time = '1m'
scroll_start_r = get("https://fastapi.metacpan.org/v1/release/_search?scroll=#{time}&size=#{size}&q=status:latest&fields=distribution")
projects += scroll_start_r["hits"]["hits"]
scroll_id = scroll_start_r['_scroll_id']
loop do
r = get("https://fastapi.metacpan.org/v1/release/_search?q=status:latest&fields=distribution&sort=date:desc&size=5000&from=#{page * 5000}")["hits"]["hits"]
break if r == []
r = get("https://fastapi.metacpan.org/v1/_search/scroll?scroll=#{time}&scroll_id=#{scroll_id}")
break if r["hits"]["hits"] == []

projects += r
page += 1
projects += r["hits"]["hits"]
scroll_id = r['_scroll_id']
end
projects.map { |project| project["fields"]["distribution"] }.uniq
projects.map { |project| project["fields"]["distribution"] }.flatten.uniq
end

def self.recent_names
Expand Down

0 comments on commit 64fed36

Please sign in to comment.