refactor bad smell ToArrayCallWithZeroLengthArrayArgument #1447

MartinWitt · 2023-03-19T13:31:21Z

Repairing Code Style Issues

ToArrayCallWithZeroLengthArrayArgument

The performance of the empty array version is the same, and sometimes even better, compared
to the pre-sized version. Also, passing a pre-sized array is dangerous for a concurrent or
synchronized collection as a data race is possible between the size and toArray
calls. This may result in extra nulls at the end of the array if the collection was concurrently
shrunk during the operation.

See https://shipilev.net/blog/2016/arrays-wisdom-ancients/ for more details.

Repairing Code Style Issues

ToArrayCallWithZeroLengthArrayArgument (1)

- ToArrayCallWithZeroLengthArrayArgument The performance of the empty array version is the same, and sometimes even better, compared to the pre-sized version. Also, passing a pre-sized array is dangerous for a concurrent or synchronized collection as a data race is possible between the <code>size</code> and <code>toArray</code> calls. This may result in extra <code>null</code>s at the end of the array if the collection was concurrently shrunk during the operation.</p> See https://shipilev.net/blog/2016/arrays-wisdom-ancients/ for more details.

mohrezaei · 2023-03-19T13:56:13Z

The linked performance data has some serious issues. The largest collection size tested is 1000. The data shows a clear trend where the sized variant is getting faster and faster (relative to the zero), but they stop short of where it gets interesting. At 1M, I would bet it's entirely reversed.

Also, the tests are run with a 5 year old JDK, with a GC that's no longer default. The benefits of the zero method (eliding zeroing) is something that may be optimized in newer JDKs.

My recommendation: re-run the benchmarks with much larger sizes and newer JDK's before using it to change production code.

mohrezaei · 2023-03-19T17:35:52Z

Sorry, I didn't realize this was limited to sizes 4 or less. That's likely much harder to optimize in a meaningful way. The most relevant line from the benchmark is this:

ToArrayBench.sized       10    hashset  avgt   30    43.808 ±   0.629  ns/op
ToArrayBench.zero        10    hashset  avgt   30    44.192 ±   0.823  ns/op

So there is no measurable difference between the two(significant overlap of avg with error), so all this PR is doing is causing more garbage creation.

motlin · 2023-06-19T15:16:50Z

While the performance argument may not be convincing, the point about toArray(emptyArray) being an atomic way of dealing with concurrent collections is compelling to me.

mohrezaei · 2023-06-19T15:50:01Z

The statement "passing a pre-sized array is dangerous for a concurrent or synchronized collection" seems to contain a misunderstanding of the contract of toArray. Here is the javadoc:

Returns an array containing all of the elements in this collection; the runtime type of the returned array is that of the specified array. If the collection fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this collection.
If this collection fits in the specified array with room to spare (i.e., the array has more elements than this collection), the element in the array immediately following the end of the collection is set to null. (This is useful in determining the length of this collection only if the caller knows that this collection does not contain any null elements.)

Most of this is handled in java.util.AbstractCollection.toArray (which is the ultimate parent of say, org.eclipse...ConcurrentHashMap.KeySet):

    public <T> T[] toArray(T[] a) {
        // Estimate size of array; be prepared to see more or fewer elements
        int size = size();
        T[] r = a.length >= size ? a :
                  (T[])java.lang.reflect.Array
                  .newInstance(a.getClass().getComponentType(), size);
        ... // more resizing checks

Please also note the specifics of this code: the very first thing it does is call size() and allocate an array if you pass in a zero length array. No synchronization.

motlin · 2023-06-19T17:31:11Z

I have this impression that on concurrent collections, the only method that can conveniently and atomically give you O(n) elements that were at least in the collection at some point is toArray(emptyArray). I'm looking around and not finding this documented anywhere. I don't see it on a Javadoc contract or in Effective Java or JCIP. But it's clear that java.util.concurrent.ConcurrentHashMap.CollectionView#toArray() was written this way. It returns a trimmed array with no nulls at the end.

mohrezaei · 2023-06-19T18:16:32Z

on java.util.concurrent.ConcurrentHashMap calling toArray(new X[0]) and toArray(new X[c.size()]) will return identical results, as the function, once again, allocates new X[c.size()] in the first case. So it's trimmed regardless (which seems unnecessary, given the contract of the function).

motlin · 2023-06-19T18:31:08Z

I am seeing different behaviors for different sizes.

public static void main(String[] args)
{
    testWithPreSize(0);
    testWithPreSize(2);
}

private static void testWithPreSize(int preSize)
{
    ConcurrentHashMap<String, Integer> map = new java.util.concurrent.ConcurrentHashMap<String, Integer>();
    map.put("One", 1);
    Object[] sizeTwoArray = new Object[preSize];
    Object[] result = map.entrySet().toArray(sizeTwoArray);
    System.out.println("Result is " + Arrays.toString(result) + " for pre-sized to " + preSize);
}

Prints:

Result is [One=1] for pre-sized to 0
Result is [One=1, null] for pre-sized to 2

I'm sure you know this though, so I'm trying to guess where our disconnect is. I think part of it might be, why would the size ever be greater than the number of entries (2 here).

In org.eclipse.collections.impl.map.immutable.ImmutableMapFactoryImpl#withAll in the expression map.entrySet().toArray(new Map.Entry[map.entrySet().size()]) I'm supposing that map.entrySet().size() returns N and because of a concurrent removal, toArray() returns an array with N-1 entries plus a null at the end.

mohrezaei · 2023-06-19T18:40:39Z

Sorry, I misread the code: it only trims if the array sent in was too small. So it honors the contract of the function as specified in Collection.
So the caller has two choices:

toArray(new X[c.size()]) (or even size()*1.1): put stuff into my array. Don't reallocate unless you need space. I'll handle nulls, if any, at the end, because I'm trying to get this done with as little GC as possible.
toArray(new X[0]): allocate a new array based on the initial size of the collection. Trim it if entries were removed and couldn't be found to fill it.

The guarantees around the entries in the array are so weak (it may have values that are no longer in the map, and values that are missing) that I'm not sure "it can have nulls, but it'll allocate less" is a real issue.

motlin · 2023-06-19T19:13:55Z

I agree that this isn't a real issue in practice. ConcurrentHashMap doesn't allow null keys nor null values, so if keySet().toArray(preSizedArray) or values().toArray(preSizedArray) results contain nulls, it unambiguously means that the map shrunk.

And map.entrySet().toArray(preSizedArray) is unambiguous since we can distinguish between null and an Entry containing null. This even works for maps like HashMap which allow nulls.

Result is [null=null] for pre-sized to 0
Result is [null=null, null] for pre-sized to 2

If anyone ever ran into this issue in practice, it would manifest as a NullPointerException inside ImmutableMapFactory.withAll(). If we want to fix that, I think the 0 size thing would work. If that's a performance problem, another option might be:

Map.Entry<K, V>[] entries = map.entrySet().toArray(new Map.Entry[map.entrySet().size()]);
if (entries.length == 0 || entries[0] == null)
{
    return this.empty();
}
if (entries.length == 1 || entries[1] == null)
{
    return this.of(entries[0].getKey(), entries[0].getValue());
}
if (entries.length == 2 || entries[2] == null)
{
    return this.of(
            entries[0].getKey(), entries[0].getValue(),
            entries[1].getKey(), entries[1].getValue());
}
if (entries.length == 3 || entries[3] == null)
{
    return this.of(
            entries[0].getKey(), entries[0].getValue(),
            entries[1].getKey(), entries[1].getValue(),
            entries[2].getKey(), entries[2].getValue());
}
if (entries.length == 4 || entries[4] == null)
{
    return this.of(
            entries[0].getKey(), entries[0].getValue(),
            entries[1].getKey(), entries[1].getValue(),
            entries[2].getKey(), entries[2].getValue(),
            entries[3].getKey(), entries[3].getValue());
}
return new ImmutableUnifiedMap<>(map);

mohrezaei · 2023-06-19T22:35:31Z

That works. A bit less "or-y":

int realLength = entries.length;
while (realLength > 0 && entries[realLength - 1] == null) realLength--;

Then switch on realLength.

MartinWitt mentioned this pull request Mar 19, 2023

laughing-train-summary MartinWitt/laughing-train#125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor bad smell ToArrayCallWithZeroLengthArrayArgument #1447

refactor bad smell ToArrayCallWithZeroLengthArrayArgument #1447

MartinWitt commented Mar 19, 2023

mohrezaei commented Mar 19, 2023

mohrezaei commented Mar 19, 2023

motlin commented Jun 19, 2023

mohrezaei commented Jun 19, 2023

motlin commented Jun 19, 2023

mohrezaei commented Jun 19, 2023

motlin commented Jun 19, 2023

mohrezaei commented Jun 19, 2023

motlin commented Jun 19, 2023

mohrezaei commented Jun 19, 2023

refactor bad smell ToArrayCallWithZeroLengthArrayArgument #1447

Are you sure you want to change the base?

refactor bad smell ToArrayCallWithZeroLengthArrayArgument #1447

Conversation

MartinWitt commented Mar 19, 2023

Repairing Code Style Issues

ToArrayCallWithZeroLengthArrayArgument

Repairing Code Style Issues

mohrezaei commented Mar 19, 2023

mohrezaei commented Mar 19, 2023

motlin commented Jun 19, 2023

mohrezaei commented Jun 19, 2023

motlin commented Jun 19, 2023

mohrezaei commented Jun 19, 2023

motlin commented Jun 19, 2023

mohrezaei commented Jun 19, 2023

motlin commented Jun 19, 2023

mohrezaei commented Jun 19, 2023