Change Log

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning

Note that since version 1.7.0, the changelog is exclusively in GitHub releases.

1.7.0

This release is only compatible with PyTorch 1.10+.

New Models

Add BoxE by @ralphabb in #618
Add TripleRE by @mberr in #712
Add AutoSF by @mberr in #713
Add Transformer by @mberr in #714
Add Canonical Tensor Decomposition by @mberr in #663
Add (novel) Fixed Model by @cthoyt in #691
Add NodePiece model by @mberr in #621

Updated Models

Update R-GCN configuration by @mberr in #610
Update ConvKB to ERModel by @cthoyt in #425
Update ComplEx to ERModel by @mberr in #639
Rename TranslationalInteraction to NormBasedInteraction by @mberr in #651
Fix generic slicing dimension by @mberr in #683
Rename UnstructuredModel to UM and StructuredEmbedding to SE by @cthoyt in #721
Allow to pass unresolved loss to ERModel's __init__ by @mberr in #717

Representations and Initialization

Add low-rank embeddings by @mberr in #680
Add NodePiece representation by @mberr in #621
Add label-based initialization using a transformer (e.g., BERT) by @mberr in #638 and #652
Add label-based representation (e.g., to update language model using KGEM) by @mberr in #652
Remove literal representations (use label-based initialization instead) by @mberr in #679

Training

Fix displaying previous epoch's loss by @mberr in #627
Fix kwargs transmission on MultiTrainingCallback by @Rodrigo-A-Pereira in #645
Extend Callbacks by @mberr in #609
Add gradient clipping by @mberr in #607
Fix negative score shape for sLCWA by @mberr in #624
Fix epoch loss for loss reduction != "mean" by @mberr in #623
Add sLCWA support for Cross Entropy Loss by @mberr in #704

Inference

Add uncertainty estimate functions via MC dropout by @mberr in #688
Fix predict top k by @mberr in #690
Fix indexing in predict_* methods when using inverse relations by @mberr in #699
Move tensors to device for predict_* methods by @mberr in #658

Trackers

Fix wandb logging by @mberr in #647
Add multi-result tracker by @mberr in #682
Add Python result tracker by @mberr in #681
Update file trackers by @cthoyt in #629

Evaluation

Store rank count by @mberr in #672
Extend evaluate() for easier relation filtering by @mberr in #391
Rename sklearn evaluator and refactor evaluator code by @cthoyt in #708
Add additional classification metrics via rexmex by @cthoyt in #668

Triples and Datasets

Add helper dataset with internal batching for Schlichtkrull sampling by @mberr in #616
Refactor splitting code and improve documentation by @mberr in #709
Switch np.loadtxt to pandas.read_csv by @mberr in #695
Add binary I/O to triples factories @cthoyt in #665

Torch Usage

Use torch.finfo to determine suitable epsilon values by @mberr in #626
Use torch.isin instead of own implementation by @mberr in #635
Switch to using torch.inference_mode instead of torch.no_grad by @sbonner0 in #604

Miscellaneous

Add YAML experiment format by @mberr in #612
Add comparison with reproduction results during replication, if available by @mberr in #642
Adapt hello_world notebook to API changes by @dobraczka in #649
Add testing configuration for Jupyter notebooks by @mberr in #650
Add empty default loss_kwargs by @mali-git in #656
Optional extra config for reproduce by @mberr in #692
Store pipeline configuration in pipeline result by @mberr in #685
Fix upgrade to sequence by @mberr in #697
Fix pruner use in hpo_pipeline by @mberr in #724

Housekeeping

Automatically lint with black by @cthoyt in #605
Documentation and style guide cleanup by @cthoyt in #606

This release is only compatible with PyTorch 1.9+. Because of some changes, it's now pretty non-trivial to support both, so moving forwards PyKEEN will continue to support the latest version of PyTorch and try its best to keep backwards compatibility.

New Models

DistMA (#507)
TorusE (#510)
Frequency Baselines (#514)
Gated Distmult Literal (#591, thanks @Rodrigo-A-Pereira)

New Datasets

WD50K (#511)
Wikidata5M (#528)
BioKG (#585, thanks @sbonner0)

New Losses

Double Margin Loss (#539)
Focal Loss (#542)
Pointwise Hinge Loss (#540)
Soft Pointwise Hinge Loss (#540)
Pairwise Logistic Loss (#540)

Added

Tutorial in using checkpoints when bringing your own data (#498)
Learning rate scheduling (#492)
Checkpoints include entity/relation maps (#498)
QuatE reproducibility configurations (#486)

Changed

Reimplment SE (#521) and NTN (#522) with new-style models
Generalize pairwise loss and pointwise loss hierarchies (#540)
Update to use PyTorch 1.9 functionality (#489)
Generalize generator strategies in LCWA (#602)

Fixed

FileNotFoundError on Windows/Anaconda (#503, thanks @Hao-666)
Fixed docstring for ComplEx interaction (#504)
Make DistMult the default interaction function for R-GCN (#548)
Fix gradient error in CompGCN buffering (#573)
Fix splitting of numeric triples factories (#594, thanks @Rodrigo-A-Pereira)
Fix determinism in spitting of triples factory (#500)
Fix documentation and improve HPO suggestion (#524, thanks @kdutia)

1.5.0 - 2021-06-13

New Metrics

Adjusted Arithmetic Mean Rank Index (#378)
Add harmonic, geometric, and median rankings (#381)

New Trackers

Console Tracker (#440)
Tensorboard Tracker (#416; thanks @sbonner0)

New Models

QuatE (#367)
CompGCN (#382)
CrossE (#467)
Reimplementation of LiteralE with arbitrary combination (g) function (#245)

New Negative Samplers

Pseudo-typed Negative Sampler (#412)

Datasets

Removed invalid datasets (OpenBioLink filtered sets; #439)
Added WK3k-15K (#403)
Added WK3l-120K (#403)
Added CN3l (#403)

Added

Documentation on using PyKEEN in Google Colab and Kaggle (#379, thanks @jerryIsHere)
Pass custom training loops to pipeline (#334)
Compatibility later for the fft module (#288)
Official Python 3.9 support, now that PyTorch has it (#223)
Utilities for dataset analysis (#16, #392)
Filtering of negative sampling now uses a bloom filter by default (#401)
Optional embedding dropout (#422)
Added more HPO suggestion methods and docs (#446)
Training callbacks (#429)
Class resolver for datasets (#473)

Updated

R-GCN implementation now uses new-style models and is super idiomatic (#110)
Enable passing of interaction function by string in base model class (#384, #387)
Bump scipy requirement to 1.5.0+
Updated interfaces of models and negative samplers to enforce kwargs (#445)
Reorganize filtering, negative sampling, and remove triples factory from most objects ( #400, #405, #406, #409, #420)
Update automatic memory optimization (#404)
Flexibly define positive triples for filtering (#398)
Completely reimplemented negative sampling interface in training loops (#427)
Completely reimplemented loss function in training loops (#448)
Forward-compatibility of embeddings in old-style models and updated docs on how to use embeddings (#474)

Fixed

Regularizer passing in the pipeline and HPO (#345)
Saving results when using multimodal models (#349)
Add missing diagonal constraint on MuRE Model (#353)
Fix early stopper handling (#419)
Fixed saving results from pipeline (#428, thanks @kantholtz)
Fix OOM issues with early stopper and AMO (#433)
Fix ER-MLP functional form (#444)

1.4.0 - 2021-03-04

New Datasets

Countries (#314)
DB100K (#316)

New Models

MuRE (#311)
PairRE (#309)
Monotonic affine transformer (#324)

New Algorithms

If you're interested in any of these, please get in touch with us regarding an upcoming publication.

Dataset Similarity (#294)
Dataset Deterioration (#295)
Dataset Remix (#296)

Added

New-style models (#260) for direct usage of interaction modules
Ability to train pipeline() using an Interaction module rather than a Model (#326, #330).

Changes

Lookup of assets is now mediated by the class_resolver package (#321, #327)
The docdata package is now used to parse structured information out of the model and dataset documentation in order to make a more informative README with links to citations (#303).

1.3.0 - 2021-02-15

We skipped version 1.2.0 because we made an accidental release before this version was ready. We're only human, and are looking into improving our release workflow to live in CI/CD so something like this doesn't happen again. However, as an end user, this won't have an effect on you.

New Datasets

CSKG (#249)
DBpedia50 (#278)

New Trackers

General file-based Tracker (#254)
CSV Tracker (#254)
JSON Tracker (#254)

Fixed

Fixed ComplEx's implementation (#313)
Fixed OGB's reuse entity identifiers (#318, thanks @tgebhart)

Added

pykeen version command for more easily reporting your environment in issues (#251)
Functional forms of all interaction models (e.g., TransE, RotatE) (#238, pykeen.nn.functional documentation). These can be generally reused, even outside of the typical PyKEEN workflows.
Modular forms of all interaction models (#242, pykeen.nn.modules documentation). These wrap the functional forms of interaction models and store hyper-parameters such as the p value for the L_p norm in TransE.
The initializer, normalizer, and constrainer for the entity and relation embeddings are now exposed through the __init__() function of each KGEM class and can be configured. A future update will enable HPO on these as well (#282).

Refactoring and Future Preparation

This release contains a few big refactors. Most won't affect end-users, but if you're writing your own PyKEEN models, these are important. Many of them are motivated to make it possible to introduce a new interface that makes it much easier for researchers (who shouldn't have to understand the inner workings of PyKEEN) to make new models.

The regularizer has been refactored (#266, #274). It no longer accepts a torch.device when instantiated.
The pykeen.nn.Embedding class has been improved in several ways:
- Embedding Specification class makes it easier to write new classes (#277)
- Refactor to make shape of embedding explicit (#287)
- Specification of complex datatype (#292)
Refactoring of the loss model class to provide a meaningful class hierarchy (#256, #262)
Refactoring of the base model class to provide a consistent interface (#246, #248, #253, #257). This allowed for simplification of the loss computation based on the new hierarchy and also new implementation of regularizer class.
More automated testing of typing with MyPy (#255) and automated checking of documentation with doctests (#291)

Triples Loading

We've made some improvements to the pykeen.triples.TriplesFactory to facilitate loading even larger datasets (#216). However, this required an interface change. This will affect any code that loads custom triples. If you're loading triples from a path, you should now use:

path = ...

# Old (doesn't work anymore)
tf = TriplesFactory(path=path)

# New
tf = TriplesFactory.from_path(path)

Predictions

While refactoring the base model class, we excised the prediction functionality to a new module pykeen.models.predict (docs: https://pykeen.readthedocs.io/en/latest/reference/predict.html#functions). We also renamed some of the prediction functions inside the base model to make them more consistent, but we now recommend you use the functions from pykeen.models.predict instead.

Model.predict_heads() -> Model.get_head_prediction_df()
Model.predict_relations() -> Model.get_head_prediction_df()
Model.predict_tails() -> Model.get_head_prediction_df()
Model.score_all_triples() -> Model.get_all_prediction_df()

Fixed

Do not create inverse triples for validation and testing factory (#270)
Treat nonzero applied to large tensor error as OOM for batch size search (#279)
Fix bug in loading ConceptNet (#290). If your experiments relied on this dataset, you should rerun them.

1.1.0 - 2021-01-20

New Datasets

CoDEx (#154)
DRKG (#156)
OGB (#159)
ConceptNet (#160)
Clinical Knowledge Graph (#209)

New Trackers

Neptune.ai (#183)

Added

Add MLFlow set tags function (#139; thanks @sunny1401)
Add score_t/h function for ComplEx (#150)
Add proper testing for literal datasets and literal models (#199)
Checkpoint functionality (#123)
Random triple generation (#201)
Make negative sampler corruption scheme configurable (#209)
Add predict with inverse tripels pipeline (#208)
Add generalize p-norm to regularizer (#225)

Changed

New harness for resetting parameters (#131)
Modularize embeddings (#132)
Update first steps documentation (#152; thanks @TobiasUhmann )
Switched testing to GitHub Actions (#165 and #194)
No longer support Python 3.6
Move automatic memory optimization (AMO) option out of model and into training loop (#176)
Improve hyper-parameter defaults and HPO defaults (#181 and #179)
Switch internal usage to ID-based triples (#193 and #220)
Optimize triples splitting algorithm (#187)
Generalize metadata storage in triples factory (#211)
Add drop_last option to data loader in training loop (#217)

Fixed

Whitelist support in HPO pipeline (#124)
Improve evaluator instantiation (#125; thanks @kantholtz)
CPU fallback on AMO (#232)
Fix HPO save issues (#235)
Fix GPU issue in plotting (#207)

1.0.5 - 2020-10-21

Added

Added testing on Windows with AppVeyor and documentation for installation on Windows (#95)
Add ability to specify custom datasets in HPO and ablation studies (#54)
Add functions for plotting entities and relations (as well as an accompanying tutorial) (#99)

Changed

Replaced BCE loss with BCEWithLogits loss (#109)
Store default HPO ranges in loss classes (#111)
Use entrypoints for datasets (#115) to allow registering of custom datasets
Improved WANDB results tracker (#117, thanks @kantholtz)
Reorganized ablation study generation and execution (#54)

Fixed

Fixed bug in the initialization of ConvE (#100)
Fixed cross-platform issue with random integer generation (#98)
Fixed documentation build on ReadTheDocs (#104)

1.0.4 - 2020-08-25

Added

Enable restricted evaluation on a subset of entities/relations (#62, #83)

Changed

Use number of epochs as step instead of number of checks (#72)

Fixed

Fix bug in early stopping (#77)

1.0.3 - 2020-08-13

Added

Side-specific evaluation (#44)
Grid Sampler (#52)
Weights & Biases Tracker (#68), thanks @migalkin!

Changed

Update to Optuna 2.0 (#52)
Generalize specification of tracker (#39)

Fixed

Fix bug in triples factory splitter (#59)
Device mismatch bug (#50)

1.0.2 - 2020-07-10

Added

Add default values for margin and adversarial temperature in NSSA loss (#29)
Added FTP uploader (#35)
Add AWS S3 uploader (#39)

Changed

Improved MLflow support (#40)
Lots of improvements to documentation!

Fixed

Fix triples factory splitting bug (#21)
Fix problem with tensors' device during prediction (#41)
Fix RotatE relation embeddings re-initialization (#26)

1.0.1 - 2020-07-02

Added

Add fractional hits@k (#17)
Add link prediction pipeline (#10)

Changed

Update documentation (#10)

Files

CHANGELOG.rst

Latest commit

History

CHANGELOG.rst

File metadata and controls

Change Log

1.7.0

New Models

Updated Models

Representations and Initialization

Training

Inference

Trackers

Evaluation

Triples and Datasets

Torch Usage

Miscellaneous

Housekeeping

1.6.0

New Models

New Datasets

New Losses

Added

Changed

Fixed

1.5.0 - 2021-06-13

New Metrics

New Trackers

New Models

New Negative Samplers

Datasets

Added

Updated

Fixed

1.4.0 - 2021-03-04

New Datasets

New Models

New Algorithms

Added

Changes

1.3.0 - 2021-02-15

New Datasets

New Trackers

Fixed

Added

Refactoring and Future Preparation

Triples Loading

Predictions

Fixed

1.1.0 - 2021-01-20

New Datasets

New Trackers

Added

Changed

Fixed

1.0.5 - 2020-10-21

Added

Changed

Fixed

1.0.4 - 2020-08-25

Added

Changed

Fixed

1.0.3 - 2020-08-13

Added

Changed

Fixed

1.0.2 - 2020-07-10

Added

Changed

Fixed

1.0.1 - 2020-07-02

Added

Changed