Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor bad smell ToArrayCallWithZeroLengthArrayArgument #1447

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MartinWitt
Copy link

Repairing Code Style Issues

ToArrayCallWithZeroLengthArrayArgument

The performance of the empty array version is the same, and sometimes even better, compared
to the pre-sized version. Also, passing a pre-sized array is dangerous for a concurrent or
synchronized collection as a data race is possible between the size and toArray
calls. This may result in extra nulls at the end of the array if the collection was concurrently
shrunk during the operation.


See https://shipilev.net/blog/2016/arrays-wisdom-ancients/ for more details.

Repairing Code Style Issues

  • ToArrayCallWithZeroLengthArrayArgument (1)

- ToArrayCallWithZeroLengthArrayArgument
The performance of the empty array version is the same, and sometimes even better, compared
to the pre-sized version. Also, passing a pre-sized array is dangerous for a concurrent or
synchronized collection as a data race is possible between the <code>size</code> and <code>toArray</code>
calls. This may result in extra <code>null</code>s at the end of the array if the collection was concurrently
shrunk during the operation.</p>
See https://shipilev.net/blog/2016/arrays-wisdom-ancients/ for more details.
@mohrezaei
Copy link
Member

The linked performance data has some serious issues. The largest collection size tested is 1000. The data shows a clear trend where the sized variant is getting faster and faster (relative to the zero), but they stop short of where it gets interesting. At 1M, I would bet it's entirely reversed.

Also, the tests are run with a 5 year old JDK, with a GC that's no longer default. The benefits of the zero method (eliding zeroing) is something that may be optimized in newer JDKs.

My recommendation: re-run the benchmarks with much larger sizes and newer JDK's before using it to change production code.

@mohrezaei
Copy link
Member

Sorry, I didn't realize this was limited to sizes 4 or less. That's likely much harder to optimize in a meaningful way. The most relevant line from the benchmark is this:

ToArrayBench.sized       10    hashset  avgt   30    43.808 ±   0.629  ns/op
ToArrayBench.zero        10    hashset  avgt   30    44.192 ±   0.823  ns/op

So there is no measurable difference between the two(significant overlap of avg with error), so all this PR is doing is causing more garbage creation.

@motlin
Copy link
Contributor

motlin commented Jun 19, 2023

While the performance argument may not be convincing, the point about toArray(emptyArray) being an atomic way of dealing with concurrent collections is compelling to me.

@mohrezaei
Copy link
Member

The statement "passing a pre-sized array is dangerous for a concurrent or synchronized collection" seems to contain a misunderstanding of the contract of toArray. Here is the javadoc:

Returns an array containing all of the elements in this collection; the runtime type of the returned array is that of the specified array. If the collection fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this collection.
If this collection fits in the specified array with room to spare (i.e., the array has more elements than this collection), the element in the array immediately following the end of the collection is set to null. (This is useful in determining the length of this collection only if the caller knows that this collection does not contain any null elements.)

Most of this is handled in java.util.AbstractCollection.toArray (which is the ultimate parent of say, org.eclipse...ConcurrentHashMap.KeySet):

    public <T> T[] toArray(T[] a) {
        // Estimate size of array; be prepared to see more or fewer elements
        int size = size();
        T[] r = a.length >= size ? a :
                  (T[])java.lang.reflect.Array
                  .newInstance(a.getClass().getComponentType(), size);
        ... // more resizing checks

Please also note the specifics of this code: the very first thing it does is call size() and allocate an array if you pass in a zero length array. No synchronization.

@motlin
Copy link
Contributor

motlin commented Jun 19, 2023

I have this impression that on concurrent collections, the only method that can conveniently and atomically give you O(n) elements that were at least in the collection at some point is toArray(emptyArray). I'm looking around and not finding this documented anywhere. I don't see it on a Javadoc contract or in Effective Java or JCIP. But it's clear that java.util.concurrent.ConcurrentHashMap.CollectionView#toArray() was written this way. It returns a trimmed array with no nulls at the end.

@mohrezaei
Copy link
Member

on java.util.concurrent.ConcurrentHashMap calling toArray(new X[0]) and toArray(new X[c.size()]) will return identical results, as the function, once again, allocates new X[c.size()] in the first case. So it's trimmed regardless (which seems unnecessary, given the contract of the function).

@motlin
Copy link
Contributor

motlin commented Jun 19, 2023

I am seeing different behaviors for different sizes.

public static void main(String[] args)
{
    testWithPreSize(0);
    testWithPreSize(2);
}

private static void testWithPreSize(int preSize)
{
    ConcurrentHashMap<String, Integer> map = new java.util.concurrent.ConcurrentHashMap<String, Integer>();
    map.put("One", 1);
    Object[] sizeTwoArray = new Object[preSize];
    Object[] result = map.entrySet().toArray(sizeTwoArray);
    System.out.println("Result is " + Arrays.toString(result) + " for pre-sized to " + preSize);
}

Prints:

Result is [One=1] for pre-sized to 0
Result is [One=1, null] for pre-sized to 2

I'm sure you know this though, so I'm trying to guess where our disconnect is. I think part of it might be, why would the size ever be greater than the number of entries (2 here).

In org.eclipse.collections.impl.map.immutable.ImmutableMapFactoryImpl#withAll in the expression map.entrySet().toArray(new Map.Entry[map.entrySet().size()]) I'm supposing that map.entrySet().size() returns N and because of a concurrent removal, toArray() returns an array with N-1 entries plus a null at the end.

@mohrezaei
Copy link
Member

Sorry, I misread the code: it only trims if the array sent in was too small. So it honors the contract of the function as specified in Collection.
So the caller has two choices:

  1. toArray(new X[c.size()]) (or even size()*1.1): put stuff into my array. Don't reallocate unless you need space. I'll handle nulls, if any, at the end, because I'm trying to get this done with as little GC as possible.
  2. toArray(new X[0]): allocate a new array based on the initial size of the collection. Trim it if entries were removed and couldn't be found to fill it.

The guarantees around the entries in the array are so weak (it may have values that are no longer in the map, and values that are missing) that I'm not sure "it can have nulls, but it'll allocate less" is a real issue.

@motlin
Copy link
Contributor

motlin commented Jun 19, 2023

I agree that this isn't a real issue in practice. ConcurrentHashMap doesn't allow null keys nor null values, so if keySet().toArray(preSizedArray) or values().toArray(preSizedArray) results contain nulls, it unambiguously means that the map shrunk.

And map.entrySet().toArray(preSizedArray) is unambiguous since we can distinguish between null and an Entry containing null. This even works for maps like HashMap which allow nulls.

Result is [null=null] for pre-sized to 0
Result is [null=null, null] for pre-sized to 2

If anyone ever ran into this issue in practice, it would manifest as a NullPointerException inside ImmutableMapFactory.withAll(). If we want to fix that, I think the 0 size thing would work. If that's a performance problem, another option might be:

Map.Entry<K, V>[] entries = map.entrySet().toArray(new Map.Entry[map.entrySet().size()]);
if (entries.length == 0 || entries[0] == null)
{
    return this.empty();
}
if (entries.length == 1 || entries[1] == null)
{
    return this.of(entries[0].getKey(), entries[0].getValue());
}
if (entries.length == 2 || entries[2] == null)
{
    return this.of(
            entries[0].getKey(), entries[0].getValue(),
            entries[1].getKey(), entries[1].getValue());
}
if (entries.length == 3 || entries[3] == null)
{
    return this.of(
            entries[0].getKey(), entries[0].getValue(),
            entries[1].getKey(), entries[1].getValue(),
            entries[2].getKey(), entries[2].getValue());
}
if (entries.length == 4 || entries[4] == null)
{
    return this.of(
            entries[0].getKey(), entries[0].getValue(),
            entries[1].getKey(), entries[1].getValue(),
            entries[2].getKey(), entries[2].getValue(),
            entries[3].getKey(), entries[3].getValue());
}
return new ImmutableUnifiedMap<>(map);

@mohrezaei
Copy link
Member

That works. A bit less "or-y":

int realLength = entries.length;
while (realLength > 0 && entries[realLength - 1] == null) realLength--;

Then switch on realLength.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants