Skip to content

Commit

Permalink
Get all CPAN distributions using scroll API for MetaCPAN
Browse files Browse the repository at this point in the history
This uses the ElasticSearch scroll API to get all CPAN distributions
<https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-scroll.html>.

Fixes <librariesio#1961>.
  • Loading branch information
zmughal committed May 18, 2021
1 parent 03154bb commit e737a90
Showing 1 changed file with 9 additions and 6 deletions.
15 changes: 9 additions & 6 deletions app/models/package_manager/cpan.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,19 @@ def self.package_link(project, _version = nil)
end

def self.project_names
page = 1
projects = []
size = 5000
scroll_start_r = get("https://fastapi.metacpan.org/v1/release/_search?scroll=1m&size=#{size}&q=status:latest&fields=distribution")
projects += scroll_start_r["hits"]["hits"]
scroll_id = scroll_start_r['_scroll_id']
loop do
r = get("https://fastapi.metacpan.org/v1/release/_search?q=status:latest&fields=distribution&sort=date:desc&size=5000&from=#{page * 5000}")["hits"]["hits"]
break if r == []
r = get("https://fastapi.metacpan.org/v1/_search/scroll?scroll=1m&scroll_id=#{ scroll_id }")
break if r["hits"]["hits"] == []

projects += r
page += 1
projects += r["hits"]["hits"]
scroll_id = r['_scroll_id']
end
projects.map { |project| project["fields"]["distribution"] }.uniq
projects.map { |project| project["fields"]["distribution"] }.flatten.uniq
end

def self.recent_names
Expand Down

0 comments on commit e737a90

Please sign in to comment.