Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error occurred when using Etl to slice the grid tiff file in hdfs #3508

Open
KiktMa opened this issue Mar 22, 2023 · 6 comments
Open

Error occurred when using Etl to slice the grid tiff file in hdfs #3508

KiktMa opened this issue Mar 22, 2023 · 6 comments
Labels

Comments

@KiktMa
Copy link

KiktMa commented Mar 22, 2023

An error occurred while using Etl from Geotrellis to build a pyramid model for raster data in hdfs and store it in the accumulo database

I am using geotrellis2.1.0Scala2.11.7hadoop2.7.7spark2.3.4jdk1.8

After I have written input.json, output.json, and backend-profiles.json, I use spark-submit to submit the task geotrellis. spark. etl. SinglebandIngest

./bin/spark-submit --class geotrellis.spark.etl.SinglebandIngest --master yarn /usr/local/app/spark/spark-2.3.4/jars/geotrellis-spark-etl_2.11-2.1.0.jar --input file:///app/tif/json/input.json --output file:///app/tif/json/output.json --backend-profiles file:///app/tif/json/backend-profiles.json

Error Reporting Results:

TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, node1, executor 2): java.lang.NegativeArraySizeException
        atscala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:93)
        at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:91)
        at scala.Array$.ofDim(Array.scala:218)
        at geotrellis.raster.UByteArrayTile$.ofDim(UByteArrayTile.scala:239)
        at geotrellis.raster.UByteArrayTile$.empty(UByteArrayTile.scala:267)
        at geotrellis.raster.ArrayTile$.empty(ArrayTile.scala:431)
        at geotrellis.raster.io.geotiff.GeoTiffTile.mutable(GeoTiffTile.scala:698)
        at geotrellis.raster.io.geotiff.GeoTiffTile.toArrayTile(GeoTiffTile.scala:690)
        at geotrellis.spark.io.RasterReader$$anon$1.readFully(RasterReader.scala:67)
        at geotrellis.spark.io.RasterReader$$anon$1.readFully(RasterReader.scala:63)
        at geotrellis.spark.io.hadoop.HadoopGeoTiffRDD$$anonfun$apply$5$$anonfun$apply$6.apply(HadoopGeoTiffRDD.scala:148)
        at geotrellis.spark.io.hadoop.HadoopGeoTiffRDD$$anonfun$apply$5$$anonfun$apply$6.apply(HadoopGeoTiffRDD.scala:147)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
        at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1021)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1019)
        at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2130)
        at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2130)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

My tiff file only has 180Mb. How can I solve this problem,I increased the driver memory to 2G, but I still couldn't resolve this error

@pomadchin
Copy link
Member

pomadchin commented Mar 22, 2023

Hello @KiktMa, we dropped the etl module support and it lives in a stale state in a separate repo https://github.com/geotrellis/spark-etl

Many of the old GeoTrellis issues have been already addressed, I’d recommend you trying with one of the most up to date versions.

Could I also ask you to drop here gdalinfo output of the TIFF? (GIS / other sensitive data can be omitted)

@KiktMa
Copy link
Author

KiktMa commented Mar 22, 2023

Thank you for your reply, as this tiff is a confidential file, I'm sorry I can't publish its @pomadchin

@pomadchin
Copy link
Member

pomadchin commented Mar 22, 2023

@KiktMa gdalinfo with no sensitive data is needed; no tags / extent / etc.

The point of the the gdalinfo output is to understand the TIFF segments structure. The data I need to try to help is Size and Band metadata (size & type):

gdalinfo file.tif
Driver: GTiff/GeoTIFF
Files: file.tif
Size is 6000, 6000
Coordinate System is <removed>
Metadata:
  <removed>
Image Structure Metadata:
  <removed>
Corner Coordinates:
<removed>
Band 1 Block=6000x1 Type=Int16, ColorInterp=<removed>

@KiktMa
Copy link
Author

KiktMa commented Mar 22, 2023

Files: D:\test_tif\caijian.tif
Size is 63472, 61105
Coordinate System is:
Metadata:
Image Structure Metadata:
Corner Coordinates:
Band 1 Block=63472x1 Type=Byte, ColorInterp=

I also want to ask, if I use geotrellis version 3.5, how do I read the grid tiff in hdfs to build a pyramid model and upload the results to Accumulo @pomadchin

@pomadchin
Copy link
Member

pomadchin commented Mar 22, 2023

@KiktMa I think this is related to #1691 and it is a known dup issue.

The solution to that is to try using GDALRasterSource and / or re-tile TIFF to make it TILED via the gdal_translate in.tif out.tif -co BIGTIFF=YES -co TILED=YES -co COMPRESS=LZW command

The example of reading TIFFs via the RasterSource API and building a Pyramid: https://github.com/pomadchin/vlm-performance/blob/feature/gt-3.x/src/main/scala/geotrellis/contrib/performance/IngestRasterSource.scala#L52-L72

@pomadchin pomadchin added the bug label Mar 22, 2023
@KiktMa
Copy link
Author

KiktMa commented Apr 21, 2023

@pomadchin Hello, I'm sorry to bother you again. I have a question to ask you. I have already stored the pyramid model in Accumulo, but I cannot understand the structure of the table

\x00\x00\x00\x00\x00\x1E"\xCB layer_slope:11: []    x\x9C\xED\xD1\xB1\x0D\xC2@\x00\x04A\xCB\x11\x01\xE4H$TBK\xDF\x82k\xA0\x18\xDA{L\x11\xE8\x83\x9D\xB9\x06N\xDA\xFD}\xFF\xDC\xAE\xC7\xE5\xDC\xF1\xDC\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xFE\xE01\xC6X\xFD\x81u^s\xCE\xD5\x1FXG\xFF6\

This is part of the table. I know that \x00\x00\x00\x00\x00\x1E"\xCB represents rowid, but I am not very familiar with this encoding. How should I parse the encoding of value, and there is a timestamp in the table. I have asked chatgpt, but it gave a different answer, and now I am very confused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants