All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning
Note that since version 1.7.0, the changelog is exclusively in GitHub releases.
This release is only compatible with PyTorch 1.10+.
- Add BoxE by @ralphabb in #618
- Add TripleRE by @mberr in #712
- Add AutoSF by @mberr in #713
- Add Transformer by @mberr in #714
- Add Canonical Tensor Decomposition by @mberr in #663
- Add (novel) Fixed Model by @cthoyt in #691
- Add NodePiece model by @mberr in #621
- Update R-GCN configuration by @mberr in #610
- Update ConvKB to ERModel by @cthoyt in #425
- Update ComplEx to ERModel by @mberr in #639
- Rename TranslationalInteraction to NormBasedInteraction by @mberr in #651
- Fix generic slicing dimension by @mberr in #683
- Rename UnstructuredModel to UM and StructuredEmbedding to SE by @cthoyt in #721
- Allow to pass unresolved loss to ERModel's __init__ by @mberr in #717
- Add low-rank embeddings by @mberr in #680
- Add NodePiece representation by @mberr in #621
- Add label-based initialization using a transformer (e.g., BERT) by @mberr in #638 and #652
- Add label-based representation (e.g., to update language model using KGEM) by @mberr in #652
- Remove literal representations (use label-based initialization instead) by @mberr in #679
- Fix displaying previous epoch's loss by @mberr in #627
- Fix kwargs transmission on MultiTrainingCallback by @Rodrigo-A-Pereira in #645
- Extend Callbacks by @mberr in #609
- Add gradient clipping by @mberr in #607
- Fix negative score shape for sLCWA by @mberr in #624
- Fix epoch loss for loss reduction != "mean" by @mberr in #623
- Add sLCWA support for Cross Entropy Loss by @mberr in #704
- Add uncertainty estimate functions via MC dropout by @mberr in #688
- Fix predict top k by @mberr in #690
- Fix indexing in predict_* methods when using inverse relations by @mberr in #699
- Move tensors to device for predict_* methods by @mberr in #658
- Fix wandb logging by @mberr in #647
- Add multi-result tracker by @mberr in #682
- Add Python result tracker by @mberr in #681
- Update file trackers by @cthoyt in #629
- Store rank count by @mberr in #672
- Extend evaluate() for easier relation filtering by @mberr in #391
- Rename sklearn evaluator and refactor evaluator code by @cthoyt in #708
- Add additional classification metrics via rexmex by @cthoyt in #668
- Add helper dataset with internal batching for Schlichtkrull sampling by @mberr in #616
- Refactor splitting code and improve documentation by @mberr in #709
- Switch np.loadtxt to pandas.read_csv by @mberr in #695
- Add binary I/O to triples factories @cthoyt in #665
- Use torch.finfo to determine suitable epsilon values by @mberr in #626
- Use torch.isin instead of own implementation by @mberr in #635
- Switch to using torch.inference_mode instead of torch.no_grad by @sbonner0 in #604
- Add YAML experiment format by @mberr in #612
- Add comparison with reproduction results during replication, if available by @mberr in #642
- Adapt hello_world notebook to API changes by @dobraczka in #649
- Add testing configuration for Jupyter notebooks by @mberr in #650
- Add empty default loss_kwargs by @mali-git in #656
- Optional extra config for reproduce by @mberr in #692
- Store pipeline configuration in pipeline result by @mberr in #685
- Fix upgrade to sequence by @mberr in #697
- Fix pruner use in hpo_pipeline by @mberr in #724
- Automatically lint with black by @cthoyt in #605
- Documentation and style guide cleanup by @cthoyt in #606
This release is only compatible with PyTorch 1.9+. Because of some changes, it's now pretty non-trivial to support both, so moving forwards PyKEEN will continue to support the latest version of PyTorch and try its best to keep backwards compatibility.
- DistMA (#507)
- TorusE (#510)
- Frequency Baselines (#514)
- Gated Distmult Literal (#591, thanks @Rodrigo-A-Pereira)
- Double Margin Loss (#539)
- Focal Loss (#542)
- Pointwise Hinge Loss (#540)
- Soft Pointwise Hinge Loss (#540)
- Pairwise Logistic Loss (#540)
- Tutorial in using checkpoints when bringing your own data (#498)
- Learning rate scheduling (#492)
- Checkpoints include entity/relation maps (#498)
- QuatE reproducibility configurations (#486)
- Reimplment SE (#521) and NTN (#522) with new-style models
- Generalize pairwise loss and pointwise loss hierarchies (#540)
- Update to use PyTorch 1.9 functionality (#489)
- Generalize generator strategies in LCWA (#602)
- FileNotFoundError on Windows/Anaconda (#503, thanks @Hao-666)
- Fixed docstring for ComplEx interaction (#504)
- Make DistMult the default interaction function for R-GCN (#548)
- Fix gradient error in CompGCN buffering (#573)
- Fix splitting of numeric triples factories (#594, thanks @Rodrigo-A-Pereira)
- Fix determinism in spitting of triples factory (#500)
- Fix documentation and improve HPO suggestion (#524, thanks @kdutia)
1.5.0 - 2021-06-13
- QuatE (#367)
- CompGCN (#382)
- CrossE (#467)
- Reimplementation of LiteralE with arbitrary combination (g) function (#245)
- Pseudo-typed Negative Sampler (#412)
- Removed invalid datasets (OpenBioLink filtered sets; #439)
- Added WK3k-15K (#403)
- Added WK3l-120K (#403)
- Added CN3l (#403)
- Documentation on using PyKEEN in Google Colab and Kaggle (#379, thanks @jerryIsHere)
- Pass custom training loops to pipeline (#334)
- Compatibility later for the fft module (#288)
- Official Python 3.9 support, now that PyTorch has it (#223)
- Utilities for dataset analysis (#16, #392)
- Filtering of negative sampling now uses a bloom filter by default (#401)
- Optional embedding dropout (#422)
- Added more HPO suggestion methods and docs (#446)
- Training callbacks (#429)
- Class resolver for datasets (#473)
- R-GCN implementation now uses new-style models and is super idiomatic (#110)
- Enable passing of interaction function by string in base model class (#384, #387)
- Bump scipy requirement to 1.5.0+
- Updated interfaces of models and negative samplers to enforce kwargs (#445)
- Reorganize filtering, negative sampling, and remove triples factory from most objects ( #400, #405, #406, #409, #420)
- Update automatic memory optimization (#404)
- Flexibly define positive triples for filtering (#398)
- Completely reimplemented negative sampling interface in training loops (#427)
- Completely reimplemented loss function in training loops (#448)
- Forward-compatibility of embeddings in old-style models and updated docs on how to use embeddings (#474)
- Regularizer passing in the pipeline and HPO (#345)
- Saving results when using multimodal models (#349)
- Add missing diagonal constraint on MuRE Model (#353)
- Fix early stopper handling (#419)
- Fixed saving results from pipeline (#428, thanks @kantholtz)
- Fix OOM issues with early stopper and AMO (#433)
- Fix ER-MLP functional form (#444)
1.4.0 - 2021-03-04
If you're interested in any of these, please get in touch with us regarding an upcoming publication.
- New-style models (#260) for direct usage of interaction modules
- Ability to train
pipeline()
using an Interaction module rather than a Model (#326, #330).
- Lookup of assets is now mediated by the
class_resolver
package (#321, #327) - The
docdata
package is now used to parse structured information out of the model and dataset documentation in order to make a more informative README with links to citations (#303).
1.3.0 - 2021-02-15
We skipped version 1.2.0 because we made an accidental release before this version was ready. We're only human, and are looking into improving our release workflow to live in CI/CD so something like this doesn't happen again. However, as an end user, this won't have an effect on you.
pykeen version
command for more easily reporting your environment in issues (#251)- Functional forms of all interaction models (e.g., TransE, RotatE) (#238, pykeen.nn.functional documentation). These can be generally reused, even outside of the typical PyKEEN workflows.
- Modular forms of all interaction models (#242, pykeen.nn.modules documentation). These wrap the functional forms of interaction models and store hyper-parameters such as the
p
value for the L_p norm in TransE. - The initializer, normalizer, and constrainer for the entity and relation embeddings are now exposed through the
__init__()
function of each KGEM class and can be configured. A future update will enable HPO on these as well (#282).
This release contains a few big refactors. Most won't affect end-users, but if you're writing your own PyKEEN models, these are important. Many of them are motivated to make it possible to introduce a new interface that makes it much easier for researchers (who shouldn't have to understand the inner workings of PyKEEN) to make new models.
- The regularizer has been refactored (#266, #274). It no longer accepts a
torch.device
when instantiated. - The
pykeen.nn.Embedding
class has been improved in several ways: - Refactoring of the loss model class to provide a meaningful class hierarchy (#256, #262)
- Refactoring of the base model class to provide a consistent interface (#246, #248, #253, #257). This allowed for simplification of the loss computation based on the new hierarchy and also new implementation of regularizer class.
- More automated testing of typing with MyPy (#255) and automated checking of documentation with
doctests
(#291)
We've made some improvements to the pykeen.triples.TriplesFactory
to facilitate loading even larger datasets (#216). However, this required an interface change. This will affect any code that loads custom triples. If you're loading triples from a path, you should now use:
path = ...
# Old (doesn't work anymore)
tf = TriplesFactory(path=path)
# New
tf = TriplesFactory.from_path(path)
While refactoring the base model class, we excised the prediction functionality to a new module pykeen.models.predict
(docs: https://pykeen.readthedocs.io/en/latest/reference/predict.html#functions). We also renamed some of the prediction functions inside the base model to make them more consistent, but we now recommend you use the functions from pykeen.models.predict
instead.
Model.predict_heads()
->Model.get_head_prediction_df()
Model.predict_relations()
->Model.get_head_prediction_df()
Model.predict_tails()
->Model.get_head_prediction_df()
Model.score_all_triples()
->Model.get_all_prediction_df()
- Do not create inverse triples for validation and testing factory (#270)
- Treat nonzero applied to large tensor error as OOM for batch size search (#279)
- Fix bug in loading ConceptNet (#290). If your experiments relied on this dataset, you should rerun them.
1.1.0 - 2021-01-20
- Neptune.ai (#183)
- Add MLFlow set tags function (#139; thanks @sunny1401)
- Add score_t/h function for ComplEx (#150)
- Add proper testing for literal datasets and literal models (#199)
- Checkpoint functionality (#123)
- Random triple generation (#201)
- Make negative sampler corruption scheme configurable (#209)
- Add predict with inverse tripels pipeline (#208)
- Add generalize p-norm to regularizer (#225)
- New harness for resetting parameters (#131)
- Modularize embeddings (#132)
- Update first steps documentation (#152; thanks @TobiasUhmann )
- Switched testing to GitHub Actions (#165 and #194)
- No longer support Python 3.6
- Move automatic memory optimization (AMO) option out of model and into training loop (#176)
- Improve hyper-parameter defaults and HPO defaults (#181 and #179)
- Switch internal usage to ID-based triples (#193 and #220)
- Optimize triples splitting algorithm (#187)
- Generalize metadata storage in triples factory (#211)
- Add drop_last option to data loader in training loop (#217)
- Whitelist support in HPO pipeline (#124)
- Improve evaluator instantiation (#125; thanks @kantholtz)
- CPU fallback on AMO (#232)
- Fix HPO save issues (#235)
- Fix GPU issue in plotting (#207)
1.0.5 - 2020-10-21
- Added testing on Windows with AppVeyor and documentation for installation on Windows (#95)
- Add ability to specify custom datasets in HPO and ablation studies (#54)
- Add functions for plotting entities and relations (as well as an accompanying tutorial) (#99)
- Replaced BCE loss with BCEWithLogits loss (#109)
- Store default HPO ranges in loss classes (#111)
- Use entrypoints for datasets (#115) to allow registering of custom datasets
- Improved WANDB results tracker (#117, thanks @kantholtz)
- Reorganized ablation study generation and execution (#54)
- Fixed bug in the initialization of ConvE (#100)
- Fixed cross-platform issue with random integer generation (#98)
- Fixed documentation build on ReadTheDocs (#104)
1.0.4 - 2020-08-25
- Use number of epochs as step instead of number of checks (#72)
- Fix bug in early stopping (#77)
1.0.3 - 2020-08-13
1.0.2 - 2020-07-10
- Add default values for margin and adversarial temperature in NSSA loss (#29)
- Added FTP uploader (#35)
- Add AWS S3 uploader (#39)
- Improved MLflow support (#40)
- Lots of improvements to documentation!
- Fix triples factory splitting bug (#21)
- Fix problem with tensors' device during prediction (#41)
- Fix RotatE relation embeddings re-initialization (#26)
1.0.1 - 2020-07-02
- Update documentation (#10)