Error occurred when using Etl to slice the grid tiff file in hdfs #3508

KiktMa · 2023-03-22T10:29:08Z

An error occurred while using Etl from Geotrellis to build a pyramid model for raster data in hdfs and store it in the accumulo database

I am using geotrellis2.1.0，Scala2.11.7，hadoop2.7.7，spark2.3.4，jdk1.8

After I have written input.json, output.json, and backend-profiles.json, I use spark-submit to submit the task geotrellis. spark. etl. SinglebandIngest

./bin/spark-submit --class geotrellis.spark.etl.SinglebandIngest --master yarn /usr/local/app/spark/spark-2.3.4/jars/geotrellis-spark-etl_2.11-2.1.0.jar --input file:///app/tif/json/input.json --output file:///app/tif/json/output.json --backend-profiles file:///app/tif/json/backend-profiles.json

Error Reporting Results：

TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, node1, executor 2): java.lang.NegativeArraySizeException
        atscala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:93)
        at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:91)
        at scala.Array$.ofDim(Array.scala:218)
        at geotrellis.raster.UByteArrayTile$.ofDim(UByteArrayTile.scala:239)
        at geotrellis.raster.UByteArrayTile$.empty(UByteArrayTile.scala:267)
        at geotrellis.raster.ArrayTile$.empty(ArrayTile.scala:431)
        at geotrellis.raster.io.geotiff.GeoTiffTile.mutable(GeoTiffTile.scala:698)
        at geotrellis.raster.io.geotiff.GeoTiffTile.toArrayTile(GeoTiffTile.scala:690)
        at geotrellis.spark.io.RasterReader$$anon$1.readFully(RasterReader.scala:67)
        at geotrellis.spark.io.RasterReader$$anon$1.readFully(RasterReader.scala:63)
        at geotrellis.spark.io.hadoop.HadoopGeoTiffRDD$$anonfun$apply$5$$anonfun$apply$6.apply(HadoopGeoTiffRDD.scala:148)
        at geotrellis.spark.io.hadoop.HadoopGeoTiffRDD$$anonfun$apply$5$$anonfun$apply$6.apply(HadoopGeoTiffRDD.scala:147)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
        at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1021)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1019)
        at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2130)
        at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2130)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

My tiff file only has 180Mb. How can I solve this problem，I increased the driver memory to 2G, but I still couldn't resolve this error

The text was updated successfully, but these errors were encountered:

pomadchin · 2023-03-22T11:01:39Z

Hello @KiktMa, we dropped the etl module support and it lives in a stale state in a separate repo https://github.com/geotrellis/spark-etl

Many of the old GeoTrellis issues have been already addressed, I’d recommend you trying with one of the most up to date versions.

Could I also ask you to drop here gdalinfo output of the TIFF? (GIS / other sensitive data can be omitted)

KiktMa · 2023-03-22T13:07:41Z

Thank you for your reply, as this tiff is a confidential file, I'm sorry I can't publish its @pomadchin

pomadchin · 2023-03-22T13:14:19Z

@KiktMa gdalinfo with no sensitive data is needed; no tags / extent / etc.

The point of the the gdalinfo output is to understand the TIFF segments structure. The data I need to try to help is Size and Band metadata (size & type):

gdalinfo file.tif
Driver: GTiff/GeoTIFF
Files: file.tif
Size is 6000, 6000
Coordinate System is <removed>
Metadata:
  <removed>
Image Structure Metadata:
  <removed>
Corner Coordinates:
<removed>
Band 1 Block=6000x1 Type=Int16, ColorInterp=<removed>

KiktMa · 2023-03-22T13:23:18Z

Files: D:\test_tif\caijian.tif
Size is 63472, 61105
Coordinate System is:
Metadata:
Image Structure Metadata:
Corner Coordinates:
Band 1 Block=63472x1 Type=Byte, ColorInterp=

I also want to ask, if I use geotrellis version 3.5, how do I read the grid tiff in hdfs to build a pyramid model and upload the results to Accumulo @pomadchin

pomadchin · 2023-03-22T14:02:26Z

@KiktMa I think this is related to #1691 and it is a known dup issue.

The solution to that is to try using GDALRasterSource and / or re-tile TIFF to make it TILED via the gdal_translate in.tif out.tif -co BIGTIFF=YES -co TILED=YES -co COMPRESS=LZW command

The example of reading TIFFs via the RasterSource API and building a Pyramid: https://github.com/pomadchin/vlm-performance/blob/feature/gt-3.x/src/main/scala/geotrellis/contrib/performance/IngestRasterSource.scala#L52-L72

KiktMa · 2023-04-21T02:20:41Z

@pomadchin Hello, I'm sorry to bother you again. I have a question to ask you. I have already stored the pyramid model in Accumulo, but I cannot understand the structure of the table

\x00\x00\x00\x00\x00\x1E"\xCB layer_slope:11: []    x\x9C\xED\xD1\xB1\x0D\xC2@\x00\x04A\xCB\x11\x01\xE4H$TBK\xDF\x82k\xA0\x18\xDA{L\x11\xE8\x83\x9D\xB9\x06N\xDA\xFD}\xFF\xDC\xAE\xC7\xE5\xDC\xF1\xDC\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xFE\xE01\xC6X\xFD\x81u^s\xCE\xD5\x1FXG\xFF6\

This is part of the table. I know that \x00\x00\x00\x00\x00\x1E"\xCB represents rowid, but I am not very familiar with this encoding. How should I parse the encoding of value, and there is a timestamp in the table. I have asked chatgpt, but it gave a different answer, and now I am very confused

pomadchin added the bug label Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error occurred when using Etl to slice the grid tiff file in hdfs #3508

Error occurred when using Etl to slice the grid tiff file in hdfs #3508

KiktMa commented Mar 22, 2023

pomadchin commented Mar 22, 2023 •

edited

KiktMa commented Mar 22, 2023 •

edited

pomadchin commented Mar 22, 2023 •

edited

KiktMa commented Mar 22, 2023 •

edited

pomadchin commented Mar 22, 2023 •

edited

KiktMa commented Apr 21, 2023

Error occurred when using Etl to slice the grid tiff file in hdfs #3508

Error occurred when using Etl to slice the grid tiff file in hdfs #3508

Comments

KiktMa commented Mar 22, 2023

An error occurred while using Etl from Geotrellis to build a pyramid model for raster data in hdfs and store it in the accumulo database

pomadchin commented Mar 22, 2023 • edited

KiktMa commented Mar 22, 2023 • edited

pomadchin commented Mar 22, 2023 • edited

KiktMa commented Mar 22, 2023 • edited

pomadchin commented Mar 22, 2023 • edited

KiktMa commented Apr 21, 2023

pomadchin commented Mar 22, 2023 •

edited

KiktMa commented Mar 22, 2023 •

edited

pomadchin commented Mar 22, 2023 •

edited

KiktMa commented Mar 22, 2023 •

edited

pomadchin commented Mar 22, 2023 •

edited