Skip to content

achtman-lab/HierCC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HierCC on the Python Package Index (PyPI)

HierCC on the Anaconda Cloud

Hosted by

The EnteroBase Website

HierCC (Hierarchical clustering of cgMLST)

HierCC is a multi-level clustering scheme for population assignments based on core genome Multi-Locus Sequence Types (cgMLSTs). HierCC as an independent python package works with any cgMLST schemes, and has also been implemented in EnteroBase since 2018.

HierCC is open source software made available under GPL-3.0 License.

  • If you use HierCC in work contributing to a scientific publication, we ask that you cite our preprint below:

Zhou Z, Charlesworth J, Achtman M (2020) HierCC: A multi-level clustering scheme for population assignments based on core genome MLST. bioRxiv. DOI: https://doi.org/10.1101/2020.11.25.397539

  • If you use HierCC assignments that are hosted in EnteroBase, we ask that you cite our publication:

Zhou Z, Alikhan NF, Mohamed K, the Agama Study Group, Achtman M (2020) The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny and Escherichia core genomic diversity. Genome Res. 30:138-152. DOI: https://dx.doi.org/10.1101%2Fgr.251678.119

Installation

  • Python 3.6 onwards, HierCC can be directly installed and upgraded via PIP, with just one terminal command:

    pip install HierCC
    pip install --upgrade HierCC
  • HierCC is also made available as an Anaconda package, and can be installed via conda with the following command:

    conda install -c zhemin hiercc

Alternatively, you may wish to download the GitHub repo and install the dependencies yourself as shown below.

Python version

HierCC is currently supported and tested on three Python versions:

  • 3.6
  • 3.7
  • 3.8 (recommended)

Python 3.9 is currently NOT supported, because Numba, one of the libaries that HierCC depends on, is not compatible with Python 3.9. This issue is expected to get resolved early 2021 according to this thread.

Python libraries

HierCC requires:

Download dataset

A toy dataset of cgMLST profiles is hosted in this repository. It can be downloaded with one line of command:

curl -o YERwgMLST.cgMLSTv1.profile.gz https://raw.githubusercontent.com/zheminzhou/HierCC/master/examples/YERwgMLST.cgMLSTv1.profile.gz

Run HierCC

HierCC can be run on the toy dataset with the following command:

HierCC -p YERwgMLST.cgMLSTv1.profile.gz -o YERwgMLST.cgMLSTv1.HierCC

And the full usage of HierCC is:

HierCC --help
Usage: HierCC [OPTIONS]

  HierCC takes allelic profile (as in https://pubmlst.org/data/) and work
  out hierarchical clusters of all the profiles based on a minimum-spanning
  tree.

Options:
  -p, --profile TEXT           [INPUT; REQUIRED] name of the profile file. Can
                               be GZIPed.  [required]

  -o, --output TEXT            [OUTPUT; REQUIRED] Prefix for the output files.
                               These include a NUMPY and TEXT verions of the
                               same clustering result  [required]

  -a, --append TEXT            [INPUT; optional] The NUMPY version of an
                               existing HierCC result

  -m, --allowed_missing FLOAT  [INPUT; optional] Allowed proportion of missing
                               genes in pairwise comparison (Default: 0.03).

  -n, --n_proc INTEGER         [INPUT; optional] Number of processors
                               (Default: 4).

  --help                       Show this message and exit.

HierCC inputs

HierCC runs in two modes. 'Development mode' builds multi-level clusters from scratch, whilst 'Production mode' assigns cluster designations for new-coming genomes incrementally, without changing the cluster assignments of any existing genome. You can find technical details in the Supplementary Text of the bioRxiv preprint.

  • 'Development mode' requires only one file (profile) containing allelic profiles of cgMLST STs, in either plain text or GZIP format. You can find additional examples of the allelic profiles in https://pubmlst.org/data.
  • 'Production mode' requires two input files. On top of the profile file, this mode also requires a NPZ file (via --append) consisting a pre-existing multi-level assignment, which is part of the output (see below) of a previous HierCC run.

HierCC outputs

Both modes of HierCC generate two outputs:

  • <prefix>.npz
  • <prefix>.HierCC.gz

Two outputs contain the same multi-level clustering assigment for every cgMLST ST. the NPZ file is used in the production mode, whilst the HierCC.gz file is human readable. The first three lines of the <prefix>.HierCC.gz is like:

#ST_id  HC0     HC1     HC2     HC3     HC4     HC5     HC6     HC7     HC8     HC9     HC10    HC11    HC12    HC13    HC14    HC15    HC16    HC17    HC18    HC19    HC20    HC21    HC22    HC23    HC24    HC25    HC26    HC27       HC28    HC29    HC30    HC31    HC32    HC33    HC34    HC35    HC36    HC37    HC38    HC39    HC40    HC41    HC42    HC43    HC44    HC45    HC46    HC47    HC48    HC49    HC50    HC51    HC52    HC53    HC54    HC55       HC56    HC57    HC58    HC59    HC60    HC61    HC62    HC63    HC64    HC65    HC66    HC67    HC68    HC69    HC70    HC71    HC72    HC73    HC74    HC75    HC76    HC77    HC78    HC79    HC80    HC81    HC82    HC83       HC84    HC85    HC86    HC87    HC88    HC89    HC90    HC91    HC92    HC93    HC94    HC95    HC96    HC97    HC98    HC99    HC100   HC101   HC102   HC103   HC104   HC105   HC106   HC107   HC108   HC109   HC110   HC111      HC112   HC113   HC114   HC115   HC116   HC117   HC118   HC119   HC120   HC121   HC122   HC123   HC124   HC125   HC126   HC127   HC128   HC129   HC130   HC131   HC132   HC133   HC134   HC135   HC136   HC137   HC138   HC139      HC140   HC141   HC142   HC143   HC144   HC145   HC146   HC147   HC148   HC149   HC150   HC151   HC152   HC153   HC154   HC155   HC156   HC157   HC158   HC159   HC160   HC161   HC162   HC163   HC164   HC165   HC166   HC167      HC168   HC169   HC170   HC171   HC172   HC173   HC174   HC175   HC176   HC177   HC178   HC179   HC180   HC181   HC182   HC183   HC184   HC185   HC186   HC187   HC188   HC189   HC190   HC191   HC192   HC193   HC194   HC195      HC196   HC197   HC198   HC199   HC200   HC201   HC202   HC203   HC204   HC205   HC206   HC207   HC208   HC209   HC210   HC211   HC212   HC213   HC214   HC215   HC216   HC217   HC218   HC219   HC220   HC221   HC222   HC223      HC224   HC225   HC226   HC227   HC228   HC229   HC230   HC231   HC232   HC233   HC234   HC235   HC236   HC237   HC238   HC239   HC240   HC241   HC242   HC243   HC244   HC245   HC246   HC247   HC248   HC249   HC250   HC251      HC252   HC253   HC254   HC255   HC256   HC257   HC258   HC259   HC260   HC261   HC262   HC263   HC264   HC265   HC266   HC267   HC268   HC269   HC270   HC271   HC272   HC273   HC274   HC275   HC276   HC277   HC278   HC279      HC280   HC281   HC282   HC283   HC284   HC285   HC286   HC287   HC288   HC289   HC290   HC291   HC292   HC293   HC294   HC295   HC296   HC297   HC298   HC299   HC300   HC301   HC302   HC303   HC304   HC305   HC306   HC307      HC308   HC309   HC310   HC311   HC312   HC313   HC314   HC315   HC316   HC317   HC318   HC319   HC320   HC321   HC322   HC323   HC324   HC325   HC326   HC327   HC328   HC329   HC330   HC331   HC332   HC333   HC334   HC335      HC336   HC337   HC338   HC339   HC340   HC341   HC342   HC343   HC344   HC345   HC346   HC347   HC348   HC349   HC350   HC351   HC352   HC353   HC354   HC355   HC356   HC357   HC358   HC359   HC360   HC361   HC362   HC363      HC364   HC365   HC366   HC367   HC368   HC369   HC370   HC371   HC372   HC373   HC374   HC375   HC376   HC377   HC378   HC379   HC380   HC381   HC382   HC383   HC384   HC385   HC386   HC387   HC388   HC389   HC390   HC391      HC392   HC393   HC394   HC395   HC396   HC397   HC398   HC399   HC400   HC401   HC402   HC403   HC404   HC405   HC406   HC407   HC408   HC409   HC410   HC411   HC412   HC413   HC414   HC415   HC416   HC417   HC418   HC419      HC420   HC421   HC422   HC423   HC424   HC425   HC426   HC427   HC428   HC429   HC430   HC431   HC432   HC433   HC434   HC435   HC436   HC437   HC438   HC439   HC440   HC441   HC442   HC443   HC444   HC445   HC446   HC447      HC448   HC449   HC450   HC451   HC452   HC453   HC454   HC455   HC456   HC457   HC458   HC459   HC460   HC461   HC462   HC463   HC464   HC465   HC466   HC467   HC468   HC469   HC470   HC471   HC472   HC473   HC474   HC475      HC476   HC477   HC478   HC479   HC480   HC481   HC482   HC483   HC484   HC485   HC486   HC487   HC488   HC489   HC490   HC491   HC492   HC493   HC494   HC495   HC496   HC497   HC498   HC499   HC500   HC501   HC502   HC503      HC504   HC505   HC506   HC507   HC508   HC509   HC510   HC511   HC512   HC513   HC514   HC515   HC516   HC517   HC518   HC519   HC520   HC521   HC522   HC523   HC524   HC525   HC526   HC527   HC528   HC529   HC530   HC531      HC532   HC533   HC534   HC535   HC536   HC537   HC538   HC539   HC540   HC541   HC542   HC543   HC544   HC545   HC546   HC547   HC548   HC549   HC550   HC551   HC552   HC553   HC554   HC555   HC556   HC557   HC558   HC559      HC560   HC561   HC562   HC563   HC564   HC565   HC566   HC567   HC568   HC569   HC570   HC571   HC572   HC573   HC574   HC575   HC576   HC577   HC578   HC579   HC580   HC581   HC582   HC583   HC584   HC585   HC586   HC587      HC588   HC589   HC590   HC591   HC592   HC593   HC594   HC595   HC596   HC597   HC598   HC599   HC600   HC601   HC602   HC603   HC604   HC605   HC606   HC607   HC608   HC609   HC610   HC611   HC612   HC613   HC614   HC615      HC616   HC617   HC618   HC619   HC620   HC621   HC622   HC623   HC624   HC625   HC626   HC627   HC628   HC629   HC630   HC631   HC632   HC633   HC634   HC635   HC636   HC637   HC638   HC639   HC640   HC641   HC642   HC643      HC644   HC645   HC646   HC647   HC648   HC649   HC650   HC651   HC652   HC653   HC654   HC655   HC656   HC657   HC658   HC659   HC660   HC661   HC662   HC663   HC664   HC665   HC666   HC667   HC668   HC669   HC670   HC671      HC672   HC673   HC674   HC675   HC676   HC677   HC678   HC679   HC680   HC681   HC682   HC683   HC684   HC685   HC686   HC687   HC688   HC689   HC690   HC691   HC692   HC693   HC694   HC695   HC696   HC697   HC698   HC699      HC700   HC701   HC702   HC703   HC704   HC705   HC706   HC707   HC708   HC709   HC710   HC711   HC712   HC713   HC714   HC715   HC716   HC717   HC718   HC719   HC720   HC721   HC722   HC723   HC724   HC725   HC726   HC727      HC728   HC729   HC730   HC731   HC732   HC733   HC734   HC735   HC736   HC737   HC738   HC739   HC740   HC741   HC742   HC743   HC744   HC745   HC746   HC747   HC748   HC749   HC750   HC751   HC752   HC753   HC754   HC755      HC756   HC757   HC758   HC759   HC760   HC761   HC762   HC763   HC764   HC765   HC766   HC767   HC768   HC769   HC770   HC771   HC772   HC773   HC774   HC775   HC776   HC777   HC778   HC779   HC780   HC781   HC782   HC783      HC784   HC785   HC786   HC787   HC788   HC789   HC790   HC791   HC792   HC793   HC794   HC795   HC796   HC797   HC798   HC799   HC800   HC801   HC802   HC803   HC804   HC805   HC806   HC807   HC808   HC809   HC810   HC811      HC812   HC813   HC814   HC815   HC816   HC817   HC818   HC819   HC820   HC821   HC822   HC823   HC824   HC825   HC826   HC827   HC828   HC829   HC830   HC831   HC832   HC833   HC834   HC835   HC836   HC837   HC838   HC839      HC840   HC841   HC842   HC843   HC844   HC845   HC846   HC847   HC848   HC849   HC850   HC851   HC852   HC853   HC854   HC855   HC856   HC857   HC858   HC859   HC860   HC861   HC862   HC863   HC864   HC865   HC866   HC867      HC868   HC869   HC870   HC871   HC872   HC873   HC874   HC875   HC876   HC877   HC878   HC879   HC880   HC881   HC882   HC883   HC884   HC885   HC886   HC887   HC888   HC889   HC890   HC891   HC892   HC893   HC894   HC895      HC896   HC897   HC898   HC899   HC900   HC901   HC902   HC903   HC904   HC905   HC906   HC907   HC908   HC909   HC910   HC911   HC912   HC913   HC914   HC915   HC916   HC917   HC918   HC919   HC920   HC921   HC922   HC923      HC924   HC925   HC926   HC927   HC928   HC929   HC930   HC931   HC932   HC933   HC934   HC935   HC936   HC937   HC938   HC939   HC940   HC941   HC942   HC943   HC944   HC945   HC946   HC947   HC948   HC949   HC950   HC951      HC952   HC953   HC954   HC955   HC956   HC957   HC958   HC959   HC960   HC961   HC962   HC963   HC964   HC965   HC966   HC967   HC968   HC969   HC970   HC971   HC972   HC973   HC974   HC975   HC976   HC977   HC978   HC979      HC980   HC981   HC982   HC983   HC984   HC985   HC986   HC987   HC988   HC989   HC990   HC991   HC992   HC993   HC994   HC995   HC996   HC997   HC998   HC999   HC1000  HC1001  HC1002  HC1003  HC1004  HC1005  HC1006  HC1007     HC1008  HC1009  HC1010  HC1011  HC1012  HC1013  HC1014  HC1015  HC1016  HC1017  HC1018  HC1019  HC1020  HC1021  HC1022  HC1023  HC1024  HC1025  HC1026  HC1027  HC1028  HC1029  HC1030  HC1031  HC1032  HC1033  HC1034  HC1035     HC1036  HC1037  HC1038  HC1039  HC1040  HC1041  HC1042  HC1043  HC1044  HC1045  HC1046  HC1047  HC1048  HC1049  HC1050  HC1051  HC1052  HC1053  HC1054  HC1055  HC1056  HC1057  HC1058  HC1059  HC1060  HC1061  HC1062  HC1063     HC1064  HC1065  HC1066  HC1067  HC1068  HC1069  HC1070  HC1071  HC1072  HC1073  HC1074  HC1075  HC1076  HC1077  HC1078  HC1079  HC1080  HC1081  HC1082  HC1083  HC1084  HC1085  HC1086  HC1087  HC1088  HC1089  HC1090  HC1091     HC1092  HC1093  HC1094  HC1095  HC1096  HC1097  HC1098  HC1099  HC1100  HC1101  HC1102  HC1103  HC1104  HC1105  HC1106  HC1107  HC1108  HC1109  HC1110  HC1111  HC1112  HC1113  HC1114  HC1115  HC1116  HC1117  HC1118  HC1119     HC1120  HC1121  HC1122  HC1123  HC1124  HC1125  HC1126  HC1127  HC1128  HC1129  HC1130  HC1131  HC1132  HC1133  HC1134  HC1135  HC1136  HC1137  HC1138  HC1139  HC1140  HC1141  HC1142  HC1143  HC1144  HC1145  HC1146  HC1147     HC1148  HC1149  HC1150  HC1151  HC1152  HC1153  HC1154  HC1155  HC1156  HC1157  HC1158  HC1159  HC1160  HC1161  HC1162  HC1163  HC1164  HC1165  HC1166  HC1167  HC1168  HC1169  HC1170  HC1171  HC1172  HC1173  HC1174  HC1175     HC1176  HC1177  HC1178  HC1179  HC1180  HC1181  HC1182  HC1183  HC1184  HC1185  HC1186  HC1187  HC1188  HC1189  HC1190  HC1191  HC1192  HC1193  HC1194  HC1195  HC1196  HC1197  HC1198  HC1199  HC1200  HC1201  HC1202  HC1203     HC1204  HC1205  HC1206  HC1207  HC1208  HC1209  HC1210  HC1211  HC1212  HC1213  HC1214  HC1215  HC1216  HC1217  HC1218  HC1219  HC1220  HC1221  HC1222  HC1223  HC1224  HC1225  HC1226  HC1227  HC1228  HC1229  HC1230  HC1231     HC1232  HC1233  HC1234  HC1235  HC1236  HC1237  HC1238  HC1239  HC1240  HC1241  HC1242  HC1243  HC1244  HC1245  HC1246  HC1247  HC1248  HC1249  HC1250  HC1251  HC1252  HC1253  HC1254  HC1255  HC1256  HC1257  HC1258  HC1259     HC1260  HC1261  HC1262  HC1263  HC1264  HC1265  HC1266  HC1267  HC1268  HC1269  HC1270  HC1271  HC1272  HC1273  HC1274  HC1275  HC1276  HC1277  HC1278  HC1279  HC1280  HC1281  HC1282  HC1283  HC1284  HC1285  HC1286  HC1287     HC1288  HC1289  HC1290  HC1291  HC1292  HC1293  HC1294  HC1295  HC1296  HC1297  HC1298  HC1299  HC1300  HC1301  HC1302  HC1303  HC1304  HC1305  HC1306  HC1307  HC1308  HC1309  HC1310  HC1311  HC1312  HC1313  HC1314  HC1315     HC1316  HC1317  HC1318  HC1319  HC1320  HC1321  HC1322  HC1323  HC1324  HC1325  HC1326  HC1327  HC1328  HC1329  HC1330  HC1331  HC1332  HC1333  HC1334  HC1335  HC1336  HC1337  HC1338  HC1339  HC1340  HC1341  HC1342  HC1343     HC1344  HC1345  HC1346  HC1347  HC1348  HC1349  HC1350  HC1351  HC1352  HC1353  HC1354  HC1355  HC1356  HC1357  HC1358  HC1359  HC1360  HC1361  HC1362  HC1363  HC1364  HC1365  HC1366  HC1367  HC1368  HC1369  HC1370  HC1371     HC1372  HC1373  HC1374  HC1375  HC1376  HC1377  HC1378  HC1379  HC1380  HC1381  HC1382  HC1383  HC1384  HC1385  HC1386  HC1387  HC1388  HC1389  HC1390  HC1391  HC1392  HC1393  HC1394  HC1395  HC1396  HC1397  HC1398  HC1399     HC1400  HC1401  HC1402  HC1403  HC1404  HC1405  HC1406  HC1407  HC1408  HC1409  HC1410  HC1411  HC1412  HC1413  HC1414  HC1415  HC1416  HC1417  HC1418  HC1419  HC1420  HC1421  HC1422  HC1423  HC1424  HC1425  HC1426  HC1427     HC1428  HC1429  HC1430  HC1431  HC1432  HC1433  HC1434  HC1435  HC1436  HC1437  HC1438  HC1439  HC1440  HC1441  HC1442  HC1443  HC1444  HC1445  HC1446  HC1447  HC1448  HC1449  HC1450  HC1451  HC1452  HC1453  HC1454  HC1455     HC1456  HC1457  HC1458  HC1459  HC1460  HC1461  HC1462  HC1463  HC1464  HC1465  HC1466  HC1467  HC1468  HC1469  HC1470  HC1471  HC1472  HC1473  HC1474  HC1475  HC1476  HC1477  HC1478  HC1479  HC1480  HC1481  HC1482  HC1483     HC1484  HC1485  HC1486  HC1487  HC1488  HC1489  HC1490  HC1491  HC1492  HC1493  HC1494  HC1495  HC1496  HC1497  HC1498  HC1499  HC1500  HC1501  HC1502  HC1503  HC1504  HC1505  HC1506  HC1507  HC1508  HC1509  HC1510  HC1511     HC1512  HC1513  HC1514  HC1515  HC1516  HC1517  HC1518  HC1519  HC1520  HC1521  HC1522  HC1523  HC1524  HC1525  HC1526  HC1527  HC1528  HC1529  HC1530  HC1531  HC1532  HC1533  HC1534  HC1535  HC1536  HC1537  HC1538  HC1539     HC1540  HC1541  HC1542  HC1543  HC1544  HC1545  HC1546  HC1547  HC1548  HC1549  HC1550  HC1551  HC1552  HC1553
1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1 11       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1       1
2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2 22       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       2       1       1       1       1       1       1       1

The first column is the cgMLST ST, and the remaining columns are the clustering results, from almost identical (HC0) to completely different.

Run HCCeval

HCCeval is used to evaluate all the thousands of clustering levels generated by HierCC. It identifies potentially biologically meaningful clustering levels which lead to STs clusters that are less prone to change with modest changes of cutoff. It can be run on the HierCC results of the toy dataset with the following command:

HCCeval -p YERwgMLST.cgMLSTv1.profile.gz -c YERwgMLST.cgMLSTv1.HierCC.HierCC.gz -o YERwgMLST.cgMLSTv1.HierCC.eval

And the full usage of HCCeval is:

$ HCCeval --help
Usage: HCCeval [OPTIONS]

  evalHCC evaluates HierCC results using varied statistic summaries.

Options:
  -p, --profile TEXT      [INPUT; REQUIRED] name of the profile file. Can be
                          GZIPed.  [required]

  -c, --cluster TEXT      [INPUT; REQUIRED] name of the HierCC file. Can be
                          GZIPed.  [required]

  -o, --output TEXT       [OUTPUT; REQUIRED] Prefix for the output files.
                          [required]

  -s, --stepwise INTEGER  [DEFAULT: 10] Evaluate every <stepwise> levels.
  -n, --n_proc INTEGER    [DEFAULT: 4] Number of processors.
  --help                  Show this message and exit.

HCCeval inputs

HCCeval requires two inputs:

  • (profile) A file containing allelic profiles, in plain text or gzipped (see HierCC inputs).
  • (cluster) The human readable <prefix>.HierCC.gz output by HierCC (see HierCC outputs). .

HCCeval outputs

HCCeval generates two outputs of the same evaluation results:

  • <prefix>.val.tsv
  • <prefix>.val.pdf

The PDF file is a visualization of the TSV file. You can find examples of the PDF outputs in the supplemental Figure S1 of the preprint. Both files contain two statistic evaluations for the clustering levels:

  1. Normalized Mutual Information (NMI) (Kvalseth TO 1987). Mutual Information measures the similarity of two different clusterings of a dataset as a harmonic mean of homogeneity and completeness. It is similar to the more well known Rand Index, and gives out more accurate estimates in datasets that contain many small clusters, which is oftenly the case for HierCC results. HCCeval implements NMI to compare all pairwise combination of HierCC levels by their clustering results.
  2. Silhouette score (Rousseeuw PJ 1987). Silhouette score estimates the cohesiveness of a clutering result by measuring how similar an object is to its own cluster (cohesion) comparing to other clusters (separation). The Silhouette score ranges between -1 and +1, where a high value indicates a robust clustering.

In practice, 'stable blocks' are identified from HierCC results using NMI. Every stable block consists of a continuous set of HierCC levels that define highly similar clusters (NMI >= 0.9). This indicates that clusters generated by these HierCC levels are less prone to change with modest changes of the cutoffs. The most cohesive HierCC level in every stable block, as defined by its greatest Silhouette score, is likely to represent a natural separation of microbial population.

About

Hierarchical clustering of cgMLST

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.4%
  • Shell 2.6%