Skip to content

Latest commit

History

History
495 lines (426 loc) 路 26 KB

CHANGELOG.rst

File metadata and controls

495 lines (426 loc) 路 26 KB

Change Log

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning

Note that since version 1.7.0, the changelog is exclusively in GitHub releases.

This release is only compatible with PyTorch 1.10+.

New Models

  • Add BoxE by @ralphabb in #618
  • Add TripleRE by @mberr in #712
  • Add AutoSF by @mberr in #713
  • Add Transformer by @mberr in #714
  • Add Canonical Tensor Decomposition by @mberr in #663
  • Add (novel) Fixed Model by @cthoyt in #691
  • Add NodePiece model by @mberr in #621

Updated Models

  • Update R-GCN configuration by @mberr in #610
  • Update ConvKB to ERModel by @cthoyt in #425
  • Update ComplEx to ERModel by @mberr in #639
  • Rename TranslationalInteraction to NormBasedInteraction by @mberr in #651
  • Fix generic slicing dimension by @mberr in #683
  • Rename UnstructuredModel to UM and StructuredEmbedding to SE by @cthoyt in #721
  • Allow to pass unresolved loss to ERModel's __init__ by @mberr in #717

Representations and Initialization

  • Add low-rank embeddings by @mberr in #680
  • Add NodePiece representation by @mberr in #621
  • Add label-based initialization using a transformer (e.g., BERT) by @mberr in #638 and #652
  • Add label-based representation (e.g., to update language model using KGEM) by @mberr in #652
  • Remove literal representations (use label-based initialization instead) by @mberr in #679

Training

  • Fix displaying previous epoch's loss by @mberr in #627
  • Fix kwargs transmission on MultiTrainingCallback by @Rodrigo-A-Pereira in #645
  • Extend Callbacks by @mberr in #609
  • Add gradient clipping by @mberr in #607
  • Fix negative score shape for sLCWA by @mberr in #624
  • Fix epoch loss for loss reduction != "mean" by @mberr in #623
  • Add sLCWA support for Cross Entropy Loss by @mberr in #704

Inference

  • Add uncertainty estimate functions via MC dropout by @mberr in #688
  • Fix predict top k by @mberr in #690
  • Fix indexing in predict_* methods when using inverse relations by @mberr in #699
  • Move tensors to device for predict_* methods by @mberr in #658

Trackers

  • Fix wandb logging by @mberr in #647
  • Add multi-result tracker by @mberr in #682
  • Add Python result tracker by @mberr in #681
  • Update file trackers by @cthoyt in #629

Evaluation

  • Store rank count by @mberr in #672
  • Extend evaluate() for easier relation filtering by @mberr in #391
  • Rename sklearn evaluator and refactor evaluator code by @cthoyt in #708
  • Add additional classification metrics via rexmex by @cthoyt in #668

Triples and Datasets

  • Add helper dataset with internal batching for Schlichtkrull sampling by @mberr in #616
  • Refactor splitting code and improve documentation by @mberr in #709
  • Switch np.loadtxt to pandas.read_csv by @mberr in #695
  • Add binary I/O to triples factories @cthoyt in #665

Torch Usage

  • Use torch.finfo to determine suitable epsilon values by @mberr in #626
  • Use torch.isin instead of own implementation by @mberr in #635
  • Switch to using torch.inference_mode instead of torch.no_grad by @sbonner0 in #604

Miscellaneous

  • Add YAML experiment format by @mberr in #612
  • Add comparison with reproduction results during replication, if available by @mberr in #642
  • Adapt hello_world notebook to API changes by @dobraczka in #649
  • Add testing configuration for Jupyter notebooks by @mberr in #650
  • Add empty default loss_kwargs by @mali-git in #656
  • Optional extra config for reproduce by @mberr in #692
  • Store pipeline configuration in pipeline result by @mberr in #685
  • Fix upgrade to sequence by @mberr in #697
  • Fix pruner use in hpo_pipeline by @mberr in #724

Housekeeping

  • Automatically lint with black by @cthoyt in #605
  • Documentation and style guide cleanup by @cthoyt in #606

This release is only compatible with PyTorch 1.9+. Because of some changes, it's now pretty non-trivial to support both, so moving forwards PyKEEN will continue to support the latest version of PyTorch and try its best to keep backwards compatibility.

New Models

  • DistMA (#507)
  • TorusE (#510)
  • Frequency Baselines (#514)
  • Gated Distmult Literal (#591, thanks @Rodrigo-A-Pereira)

New Datasets

  • WD50K (#511)
  • Wikidata5M (#528)
  • BioKG (#585, thanks @sbonner0)

New Losses

  • Double Margin Loss (#539)
  • Focal Loss (#542)
  • Pointwise Hinge Loss (#540)
  • Soft Pointwise Hinge Loss (#540)
  • Pairwise Logistic Loss (#540)

Added

  • Tutorial in using checkpoints when bringing your own data (#498)
  • Learning rate scheduling (#492)
  • Checkpoints include entity/relation maps (#498)
  • QuatE reproducibility configurations (#486)

Changed

  • Reimplment SE (#521) and NTN (#522) with new-style models
  • Generalize pairwise loss and pointwise loss hierarchies (#540)
  • Update to use PyTorch 1.9 functionality (#489)
  • Generalize generator strategies in LCWA (#602)

Fixed

  • FileNotFoundError on Windows/Anaconda (#503, thanks @Hao-666)
  • Fixed docstring for ComplEx interaction (#504)
  • Make DistMult the default interaction function for R-GCN (#548)
  • Fix gradient error in CompGCN buffering (#573)
  • Fix splitting of numeric triples factories (#594, thanks @Rodrigo-A-Pereira)
  • Fix determinism in spitting of triples factory (#500)
  • Fix documentation and improve HPO suggestion (#524, thanks @kdutia)

1.5.0 - 2021-06-13

New Metrics

  • Adjusted Arithmetic Mean Rank Index (#378)
  • Add harmonic, geometric, and median rankings (#381)

New Trackers

  • Console Tracker (#440)
  • Tensorboard Tracker (#416; thanks @sbonner0)

New Models

  • QuatE (#367)
  • CompGCN (#382)
  • CrossE (#467)
  • Reimplementation of LiteralE with arbitrary combination (g) function (#245)

New Negative Samplers

  • Pseudo-typed Negative Sampler (#412)

Datasets

  • Removed invalid datasets (OpenBioLink filtered sets; #439)
  • Added WK3k-15K (#403)
  • Added WK3l-120K (#403)
  • Added CN3l (#403)

Added

  • Documentation on using PyKEEN in Google Colab and Kaggle (#379, thanks @jerryIsHere)
  • Pass custom training loops to pipeline (#334)
  • Compatibility later for the fft module (#288)
  • Official Python 3.9 support, now that PyTorch has it (#223)
  • Utilities for dataset analysis (#16, #392)
  • Filtering of negative sampling now uses a bloom filter by default (#401)
  • Optional embedding dropout (#422)
  • Added more HPO suggestion methods and docs (#446)
  • Training callbacks (#429)
  • Class resolver for datasets (#473)

Updated

  • R-GCN implementation now uses new-style models and is super idiomatic (#110)
  • Enable passing of interaction function by string in base model class (#384, #387)
  • Bump scipy requirement to 1.5.0+
  • Updated interfaces of models and negative samplers to enforce kwargs (#445)
  • Reorganize filtering, negative sampling, and remove triples factory from most objects ( #400, #405, #406, #409, #420)
  • Update automatic memory optimization (#404)
  • Flexibly define positive triples for filtering (#398)
  • Completely reimplemented negative sampling interface in training loops (#427)
  • Completely reimplemented loss function in training loops (#448)
  • Forward-compatibility of embeddings in old-style models and updated docs on how to use embeddings (#474)

Fixed

  • Regularizer passing in the pipeline and HPO (#345)
  • Saving results when using multimodal models (#349)
  • Add missing diagonal constraint on MuRE Model (#353)
  • Fix early stopper handling (#419)
  • Fixed saving results from pipeline (#428, thanks @kantholtz)
  • Fix OOM issues with early stopper and AMO (#433)
  • Fix ER-MLP functional form (#444)

1.4.0 - 2021-03-04

New Datasets

New Models

  • MuRE (#311)
  • PairRE (#309)
  • Monotonic affine transformer (#324)

New Algorithms

If you're interested in any of these, please get in touch with us regarding an upcoming publication.

  • Dataset Similarity (#294)
  • Dataset Deterioration (#295)
  • Dataset Remix (#296)

Added

  • New-style models (#260) for direct usage of interaction modules
  • Ability to train pipeline() using an Interaction module rather than a Model (#326, #330).

Changes

  • Lookup of assets is now mediated by the class_resolver package (#321, #327)
  • The docdata package is now used to parse structured information out of the model and dataset documentation in order to make a more informative README with links to citations (#303).

1.3.0 - 2021-02-15

We skipped version 1.2.0 because we made an accidental release before this version was ready. We're only human, and are looking into improving our release workflow to live in CI/CD so something like this doesn't happen again. However, as an end user, this won't have an effect on you.

New Datasets

New Trackers

  • General file-based Tracker (#254)
  • CSV Tracker (#254)
  • JSON Tracker (#254)

Fixed

  • Fixed ComplEx's implementation (#313)
  • Fixed OGB's reuse entity identifiers (#318, thanks @tgebhart)

Added

  • pykeen version command for more easily reporting your environment in issues (#251)
  • Functional forms of all interaction models (e.g., TransE, RotatE) (#238, pykeen.nn.functional documentation). These can be generally reused, even outside of the typical PyKEEN workflows.
  • Modular forms of all interaction models (#242, pykeen.nn.modules documentation). These wrap the functional forms of interaction models and store hyper-parameters such as the p value for the L_p norm in TransE.
  • The initializer, normalizer, and constrainer for the entity and relation embeddings are now exposed through the __init__() function of each KGEM class and can be configured. A future update will enable HPO on these as well (#282).

Refactoring and Future Preparation

This release contains a few big refactors. Most won't affect end-users, but if you're writing your own PyKEEN models, these are important. Many of them are motivated to make it possible to introduce a new interface that makes it much easier for researchers (who shouldn't have to understand the inner workings of PyKEEN) to make new models.

  • The regularizer has been refactored (#266, #274). It no longer accepts a torch.device when instantiated.
  • The pykeen.nn.Embedding class has been improved in several ways:
    • Embedding Specification class makes it easier to write new classes (#277)
    • Refactor to make shape of embedding explicit (#287)
    • Specification of complex datatype (#292)
  • Refactoring of the loss model class to provide a meaningful class hierarchy (#256, #262)
  • Refactoring of the base model class to provide a consistent interface (#246, #248, #253, #257). This allowed for simplification of the loss computation based on the new hierarchy and also new implementation of regularizer class.
  • More automated testing of typing with MyPy (#255) and automated checking of documentation with doctests (#291)

Triples Loading

We've made some improvements to the pykeen.triples.TriplesFactory to facilitate loading even larger datasets (#216). However, this required an interface change. This will affect any code that loads custom triples. If you're loading triples from a path, you should now use:

path = ...

# Old (doesn't work anymore)
tf = TriplesFactory(path=path)

# New
tf = TriplesFactory.from_path(path)

Predictions

While refactoring the base model class, we excised the prediction functionality to a new module pykeen.models.predict (docs: https://pykeen.readthedocs.io/en/latest/reference/predict.html#functions). We also renamed some of the prediction functions inside the base model to make them more consistent, but we now recommend you use the functions from pykeen.models.predict instead.

  • Model.predict_heads() -> Model.get_head_prediction_df()
  • Model.predict_relations() -> Model.get_head_prediction_df()
  • Model.predict_tails() -> Model.get_head_prediction_df()
  • Model.score_all_triples() -> Model.get_all_prediction_df()

Fixed

  • Do not create inverse triples for validation and testing factory (#270)
  • Treat nonzero applied to large tensor error as OOM for batch size search (#279)
  • Fix bug in loading ConceptNet (#290). If your experiments relied on this dataset, you should rerun them.

1.1.0 - 2021-01-20

New Datasets

New Trackers

Added

  • Add MLFlow set tags function (#139; thanks @sunny1401)
  • Add score_t/h function for ComplEx (#150)
  • Add proper testing for literal datasets and literal models (#199)
  • Checkpoint functionality (#123)
  • Random triple generation (#201)
  • Make negative sampler corruption scheme configurable (#209)
  • Add predict with inverse tripels pipeline (#208)
  • Add generalize p-norm to regularizer (#225)

Changed

  • New harness for resetting parameters (#131)
  • Modularize embeddings (#132)
  • Update first steps documentation (#152; thanks @TobiasUhmann )
  • Switched testing to GitHub Actions (#165 and #194)
  • No longer support Python 3.6
  • Move automatic memory optimization (AMO) option out of model and into training loop (#176)
  • Improve hyper-parameter defaults and HPO defaults (#181 and #179)
  • Switch internal usage to ID-based triples (#193 and #220)
  • Optimize triples splitting algorithm (#187)
  • Generalize metadata storage in triples factory (#211)
  • Add drop_last option to data loader in training loop (#217)

Fixed

  • Whitelist support in HPO pipeline (#124)
  • Improve evaluator instantiation (#125; thanks @kantholtz)
  • CPU fallback on AMO (#232)
  • Fix HPO save issues (#235)
  • Fix GPU issue in plotting (#207)

1.0.5 - 2020-10-21

Added

  • Added testing on Windows with AppVeyor and documentation for installation on Windows (#95)
  • Add ability to specify custom datasets in HPO and ablation studies (#54)
  • Add functions for plotting entities and relations (as well as an accompanying tutorial) (#99)

Changed

  • Replaced BCE loss with BCEWithLogits loss (#109)
  • Store default HPO ranges in loss classes (#111)
  • Use entrypoints for datasets (#115) to allow registering of custom datasets
  • Improved WANDB results tracker (#117, thanks @kantholtz)
  • Reorganized ablation study generation and execution (#54)

Fixed

  • Fixed bug in the initialization of ConvE (#100)
  • Fixed cross-platform issue with random integer generation (#98)
  • Fixed documentation build on ReadTheDocs (#104)

1.0.4 - 2020-08-25

Added

  • Enable restricted evaluation on a subset of entities/relations (#62, #83)

Changed

  • Use number of epochs as step instead of number of checks (#72)

Fixed

  • Fix bug in early stopping (#77)

1.0.3 - 2020-08-13

Added

  • Side-specific evaluation (#44)
  • Grid Sampler (#52)
  • Weights & Biases Tracker (#68), thanks @migalkin!

Changed

  • Update to Optuna 2.0 (#52)
  • Generalize specification of tracker (#39)

Fixed

  • Fix bug in triples factory splitter (#59)
  • Device mismatch bug (#50)

1.0.2 - 2020-07-10

Added

  • Add default values for margin and adversarial temperature in NSSA loss (#29)
  • Added FTP uploader (#35)
  • Add AWS S3 uploader (#39)

Changed

  • Improved MLflow support (#40)
  • Lots of improvements to documentation!

Fixed

  • Fix triples factory splitting bug (#21)
  • Fix problem with tensors' device during prediction (#41)
  • Fix RotatE relation embeddings re-initialization (#26)

1.0.1 - 2020-07-02

Added

  • Add fractional hits@k (#17)
  • Add link prediction pipeline (#10)

Changed

  • Update documentation (#10)