Skip to content

Releases: cleanlab/cleanvision

v0.3.6 Improved issue type odd_size, minor bug fix and version updates

13 Feb 19:00
8d8ffaf
Compare
Choose a tag to compare
  • Odd size issue
    We use the IQR method now instead of a hard threshold to detect odd sized images compared to the rest of the dataset. According to this, an image is marked as odd sized if size > q1 + 3 * IQR or size < q3 - 3 * IQR, where q1 and q3 are the 25th and 75th percentiles respectively.
  • Statistics
    imagelab.info['statistics'] is now updated to provide key statistics like mean, std, min, max, 25%, 50%, and 75% for all the image properties being computed while looking for issues.
  • Bug fix
    Image was being resized to zero width/height for blurry issue check, in cases where aspect ratio was unusual.
  • CI pipeline
    Version updates for black, flake8 docs requirements and datasets library.
    Added cron schedule for running tests.

Related PRs

Full Changelog: v0.3.5...v0.3.6

v0.3.5 Improved documentation, enhanced testing, and codebase refinements

30 Nov 15:38
d1334e0
Compare
Choose a tag to compare
  • Improved README, added FAQ page and updated Development guide with instructions for building docs.
  • Added tests for truncating titles in visualization, updated dev requirements, and fixed type checking issue.
  • Added code for raising exception on receiving duplicate files in filepaths argument.
  • Added a PR template

Detailed changes

New Contributors

Full Changelog: v0.3.4...v0.3.5

v0.3.4 Improved time taken to find issues in large datasets.

01 Sep 11:12
723abc7
Compare
Choose a tag to compare

Improved time taken to find issues in large datasets.

What's Changed

Full Changelog: v0.3.3...v0.3.4

v.0.3.3 Added support for cloud datasets

09 Aug 06:00
6ba83ca
Compare
Choose a tag to compare

Added support for running cleanvision on datasets residing in cloud, AWS, Google storage and Azure storage.

What's Changed

New Contributors

Full Changelog: v0.3.2...v0.3.3

v0.3.2 Visualization can also show an ID of the image along with score of the image

17 Jul 03:21
ff59d69
Compare
Choose a tag to compare

ID of the image can be shown in visualization of issues in report. This functionality is added to make the cleanvision integration in cleanlab more seamless.

Detailed changes

Full Changelog: v0.3.1...v0.3.2

v0.3.1 Added a new issue check, improvements in visualization and support for integration in cleanlab

07 Jul 19:56
3219e79
Compare
Choose a tag to compare

What's Changed

  • Added a new issue check odd_size for detecting images that are too small or too large in area relative to the dataset
Screenshot 2023-07-08 at 12 56 23 AM
  • Added support for cleanvision integration in cleanlab repo. This will enable checking for image issues from cleanlab package as well.
  • Long image titles in visualization will be truncated in visualization based on longest common prefix/suffix
  • Supported more hash types in near_duplicates issue check

Detailed changes

New Contributors

Full Changelog: v0.3.0...v0.3.1

v.0.3.0 Major improvements in dark and blurry issue types, scoring for duplicate issues

24 May 22:25
c332a25
Compare
Choose a tag to compare

Improvement in blurry check
Improved the blurry check logic to produce lesser false positives and catch blurry images which were left unidentified earlier in the dataset.

Here are some examples of images that were falsely identified as blurry previously
Screenshot 2023-05-24 at 4 03 30 PMScreenshot 2023-05-24 at 4 04 13 PMScreenshot 2023-05-24 at 4 05 32 PMScreenshot 2023-05-24 at 4 05 51 PMScreenshot 2023-05-24 at 4 11 55 PMScreenshot 2023-05-24 at 4 14 13 PM

Here are some examples of blurry images that were discovered after improvement
Screenshot 2023-05-24 at 4 07 31 PMScreenshot 2023-05-24 at 4 10 48 PMScreenshot 2023-05-24 at 4 12 51 PMScreenshot 2023-05-24 at 4 13 52 PM

Improvement in dark check
Images that were previously falsely identified as dark
Screenshot 2023-05-24 at 4 18 26 PM

Scoring for near and exact duplicate issue types
Introduced scores for near and exact duplicate checks. The score of an image is inversely proportional to the number of images identified as its duplicate. Here's an example of what the scores look like. Here top 3 images are exact duplicates of each other and so on.
Screenshot 2023-05-24 at 4 21 50 PM

Changelog

New Contributors

Full Changelog: v0.2.1...v0.3.0

v.0.2.1 Updated cleanvision for torch datasets, pre-commit hooks and documentation

19 Apr 21:55
376ecfb
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.2.1

v.0.2.0 -- Added support for HuggingFace and Torchvision datasets

11 Apr 03:31
bab8f3a
Compare
Choose a tag to compare

What's Changed

Added support for running CleanVision on HuggingFace and torchvision datasets.

v.0.1.1 -- Bugfix - Images loaded twice in Windows OS

31 Mar 18:37
6335370
Compare
Choose a tag to compare

What's Changed

  • Fixed bug where images were loaded twice in Windows OS caused by glob.glob()'s different behavior in different OSs. #143