Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

444 add clustering methods for npi data #883

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

mknaranja
Copy link
Member

Changes and Information

Please briefly list the changes made, additional Information and what the Reviewer should look out for:

  • Add clustering for NPI data set

Merge Request - Guideline Checklist

Please check our git workflow. Use the draft feature if the Pull Request is not yet ready to review.

Checks by code author

  • Every addressed issue is linked (use the "Closes #ISSUE" keyword below)
  • New code adheres to coding guidelines
  • No large data files have been added (files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)
  • Tests are added for new functionality and a local test run was successful
  • Appropriate documentation for new functionality has been added (Doxygen in the code and Markdown files if necessary)
  • Proper attention to licenses, especially no new third-party software with conflicting license has been added
  • (For ABM development) Checked benchmark results and ran and posted a local test above from before and after development to ensure performance is monitored.

Checks by code reviewer(s)

  • Corresponding issue(s) is/are linked and addressed
  • Code is clean of development artifacts (no deactivated or commented code lines, no debugging printouts, etc.)
  • Appropriate unit tests have been added, CI passes, code coverage and performance is acceptable (did not decrease)
  • No large data files added in the whole history of commits(files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)

Closes #444

@mknaranja mknaranja linked an issue Jan 5, 2024 that may be closed by this pull request
Comment on lines +116 to +118
# NOTE: if changing method, pay attention to linkage methods;
# 'centroid', 'median', and 'ward' are correctly defined only if
# Euclidean pairwise metric is used in distance matrix that we used as input.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this "still" hold? For ward, I don't see this at all directly, for the others, (only) the meaning changes

all_subcodes = [x for x in npis.NPI_code if len(x) <=8]
if not npi_codes_considered:
npi_codes_considered = [
x for x in npis[dd.EngEng['npiCode']] if len(x.split('_')) == 2]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len == 3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 2 is the right option here

Copy link

codecov bot commented Mar 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.34%. Comparing base (8ddb863) to head (d3d1d5c).
Report is 23 commits behind head on main.

Current head d3d1d5c differs from pull request most recent head c2fab29

Please upload reports for the commit c2fab29 to get more accurate results.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #883   +/-   ##
=======================================
  Coverage   96.34%   96.34%           
=======================================
  Files         129      129           
  Lines       10056    10056           
=======================================
  Hits         9688     9688           
  Misses        368      368           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add clustering methods for NPI data
3 participants