Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read the Tiff file by GDALRasterSource. Unable to construct dataset dimensions. GDAL Error Code: 4 #3465

Open
qw845602 opened this issue May 21, 2022 · 21 comments
Labels
bug question Further information is requested

Comments

@qw845602
Copy link

qw845602 commented May 21, 2022

Describe the bug

Cannot read the Tiff file by GDALRasterSource. Unable to construct dataset dimensions. GDAL Error Code: 4

To Reproduce

Provide as able:

  • Steps to reproduce the behavior
  • Code example
package com.example.gdalread
import cats.syntax.option._

import geotrellis.layer.{FloatingLayoutScheme, KeyExtractor, LayoutLevel, SpatialKey}
import geotrellis.proj4.LatLng
import geotrellis.raster.RasterSource
import geotrellis.raster.gdal.GDALRasterSource
import geotrellis.raster.resample.{Bilinear, PointResampleMethod}
import geotrellis.spark.{MultibandTileLayerRDD, RasterSourceRDD, RasterSummary}
import org.apache.log4j.{Level, Logger}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import geotrellis.raster.gdal._
import com.azavea.gdal._

object gdal_read_test {
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org").setLevel(Level.ERROR)
    System.load("/root/anaconda3/envs/gdal-3.1.2/lib/libgdal.so.27")
    System.getProperty("java.library.path")
    GDALWarp.init(100)
    print("enter country_pop_sgdal_read_test  tatus_____________21.12________________")
    var startTime = System.currentTimeMillis();
    implicit val conf =
      new SparkConf()
        .setAppName("gdal_read_test")
        .setMaster("spark://master:7077")
        .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
        .set("spark.kryo.registrator", "geotrellis.spark.store.kryo.KryoRegistrator")
        .set("spark.executor.cores", "6")
        .set("spark.executor.memory", "4g")
        .set("spark.driver.memory", "2g")
        .set("spark.num.executors", "3")
        .set("spark.cores.max", "20")
        .set("spark.executorEnv.LD_LIBRARY_PATH", "/root/anaconda3/envs/gdal-3.1.2/lib/:/usr/local/lib")
        .set("spark.dynamicAllocation.enabled","false")
        .set("spark.default.parallelism","600")
        .set("spark.repartitioning","true")
        .set("spark.sql.shuffle.partitions","600")

    implicit val sc = new SparkContext(conf)
    val Path="/geo/file/raster/RS/Landsat/L71149033_03320030531_B10.TIF"
    println("path",Path)
    val targetCRS = LatLng
    println("targetCRS",targetCRS)
    val method: PointResampleMethod = Bilinear
    val tilesize= 256 // 256
    val layoutScheme = FloatingLayoutScheme(tilesize)
    val raster_source_single=GDALRasterSource(Path)
    val raster_source=Seq(raster_source_single)
    val sourceRDD: RDD[RasterSource] =sc.parallelize(raster_source)

    val summary = RasterSummary.fromRDD(sourceRDD)
    val LayoutLevel(zoom, layout) = summary.levelFor(layoutScheme)
    val context_rdd: MultibandTileLayerRDD[SpatialKey] = RasterSourceRDD.tiledLayerRDD(sourceRDD, layout, KeyExtractor.spatialKeyExtractor, rasterSummary = summary.some)
    val sum_resu_rdd: RDD[Int] =context_rdd.map{ single_rdd=>
      single_rdd._2.band(0).toArray().sum

    }
    val sum_resu=sum_resu_rdd.collect()
    println("result: ",sum_resu)

    val endTime = System.currentTimeMillis
    println("total time is ",(endTime - startTime) / 1000,"s")
    sc.stop()
  }
}
  • Inputs
  • Actual output
  • encouter erroe when reading any tiff
  • Expected output
  • read the tif and output the sum value of exach tile

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

CentOS Linux release 7.9.2009 (Core)

  • Java version:

  • java version "11.0.12" 2021-07-20 LTS
    Java(TM) SE Runtime Environment 18.9 (build 11.0.12+8-LTS-237)
    Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.12+8-LTS-237, mixed mode)

  • Scala version:

  • 2.12.8

  • GeoTrellis version:

  • 3.5.2

Additional context

Add any other context about the problem here.
bugreport.zip
bugreport2.zip
bugreport3.zip
bugreport4.zip

@pomadchin pomadchin added the question Further information is requested label May 21, 2022
@pomadchin
Copy link
Member

pomadchin commented May 21, 2022

Hey @qw845602, could you minimize example? i.e:

GDALRasterSource("path/to/tiff").rasterExtent

The other thing is GDAL Error Code: 4: it can mean so many things, could you post a stack trace here as well? It usually writes below down what was the function that caused problems.

Also for the context from Gitter: there is a good chance, that GDAL is improperly installed / java.library.path is improperly set, and it all can be connected.

@qw845602
Copy link
Author

Hey @qw845602, could you minimize example? i.e:

GDALRasterSource("path/to/tiff").rasterExtent

The other thing is GDAL Error Code: 4: it can mean so many things, could you post a stack trace here as well? It usually writes below down what was the function that caused problems.

Also for the context from Gitter: there is a good chance, that GDAL is improperly installed / java.library.path is improperly set, and it all can be connected.

What is a stack trace? Just indicate which function cause the problem or where the error occurs?

@pomadchin
Copy link
Member

The stack trace is the actual error that includes the functions stack call, you already sent it in gitter.

@pomadchin
Copy link
Member

Ok, here it is:

Caused by: geotrellis.raster.gdal.MalformedDataException: Unable to construct dataset dimensions. GDAL Error Code: 4
    at geotrellis.raster.gdal.GDALDataset$.$anonfun$dimensions$1(GDALDataset.scala:160)
    at geotrellis.raster.gdal.GDALDataset$.$anonfun$dimensions$1$adapted(GDALDataset.scala:157)
    at geotrellis.raster.gdal.GDALDataset$.errorHandler$extension(GDALDataset.scala:406)
    at geotrellis.raster.gdal.GDALDataset$.dimensions$extension1(GDALDataset.scala:157)
    at geotrellis.raster.gdal.GDALDataset$.rasterExtent$extension1(GDALDataset.scala:197)
    at geotrellis.raster.gdal.GDALRasterSource.gridExtent$lzycompute(GDALRasterSource.scala:93)
    at geotrellis.raster.gdal.GDALRasterSource.gridExtent(GDALRasterSource.scala:93)
    at geotrellis.raster.RasterMetadata.extent(RasterMetadata.scala:52)
    at geotrellis.raster.RasterMetadata.extent$(RasterMetadata.scala:52)
    at geotrellis.raster.RasterSource.extent(RasterSource.scala:43)
    at geotrellis.spark.RasterSummary$.$anonfun$collect$1(RasterSummary.scala:108)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:194)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(Thread.java:834)

@qw845602
Copy link
Author

Hey @qw845602, could you minimize example? i.e:

GDALRasterSource("path/to/tiff").rasterExtent

The other thing is GDAL Error Code: 4: it can mean so many things, could you post a stack trace here as well? It usually writes below down what was the function that caused problems.

Also for the context from Gitter: there is a good chance, that GDAL is improperly installed / java.library.path is improperly set, and it all can be connected.

The error was uploaded in bugreport2. I didn't find GDALRasterSource("path/to/tiff").rasterExtent function and used GDALRasterSource("path/to/tiff").dimensions instead. The error is "[1 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." /geo/file/raster/RS/Landsat/L71149033_03320030531_B10.TIF no such file or directory" . It seems that GDAL could not find the Tiff. What needs to be mentioned is that the tiff file is stored in HDFS.

@pomadchin
Copy link
Member

pomadchin commented May 21, 2022

@qw845602 well that's a different issue, you'd need to have a GDAL build with HDFS support; I don't believe it's enabled y default.

Also this approach even in case it works may lead to extra overhead caused by the extra JVM that GDAL will create to establish HDFS connection.

@qw845602
Copy link
Author

@qw845602 well that's a different issue, you'd need to have a GDAL build with HDFS support

Can GDALRasterSource read TIFF directly from the disk in the spark cluster? Does it needs to put the tiff on each node at the same location? I remembered that HadoopGeotiffRDD could not read the tif file in the disk in spark cluster mode.

@pomadchin
Copy link
Member

@qw845602 both HadoopGeotiffRDD and GDALRasterSource can read files directly from cluster local disks, yes, in this case you'd need to have copies all over the places.

We've never encountered these issues since were relying mostly on S3 storage, and GDAL supports S3 reads by default.

@qw845602
Copy link
Author

@qw845602 both HadoopGeotiffRDD and GDALRasterSource can read files directly from cluster local disks, yes, in this case you'd need to have copies all over the places.

We've never encountered these issues since were relying mostly on S3 storage, and GDAL supports S3 reads by default.

Yeah, I have tried to read from the cluster local disks, the code and errror are shown in bugreport3. The error is " java.lang.IllegalArgumentException: requirement failed: x-aligned: offset by CellSize".It is also an error occured when the code is run in local mode, which i have mentioned before in the thread of gitter.

@pomadchin
Copy link
Member

@qw845602 could you post a minimized code to reproduce requirement failed: x-aligned: offset by CellSize? I believe this is related to tiling to layout though, not to reading.

@pomadchin
Copy link
Member

pomadchin commented May 21, 2022

@qw845602 let me summarize:

  1. There are problems with GDAL installation clsuter wide
  2. TIFFs are located in HDFS so that's problematic to read, and it definitely explains the GDAL Error 4 that you had
    • Solution to that is to have GDAL with HDFS support installed on all nodes, and to have them configured so GDAL has access to HDFS
  3. GDALRasterSource works fine with local reads, however you experience issues with tiling it to layout
  4. The initial reason why GDALRasterSource is used related to

Is it a correct summary?

@qw845602
Copy link
Author

@qw845602 could you post a minimized code to reproduce requirement failed: x-aligned: offset by CellSize? I believe this is related to tiling to layout though, not to readi

@qw845602 let me summarize:

  1. There are problems with GDAL installation on a cluster
  2. TIFFs are located on HDFS so that's problematic to read, and it definitely explains the GDAL Error 4 that you had
  • Solution to that is to have GDAL with HDFS support installed on all nodes, and to have them configured so GDAL has access to HDFS
  1. When trying local reads, GDALRasterSource works, however you experience issues when performing tiling to layout
  1. The initial reason why GDAL is related to

Is it a correct summary?

1 to 2 are correct. For summary 3, I am not quite sure is it related with performing tiling to layout. I found it needs to indicate the layoutcheme in https://github.com/pomadchin/vlm-performance/blob/feature/gt-3.x/src/main/scala/geotrellis/contrib/performance/IngestRasterSource.scala#L52:L59, I only know two types of layoutscheme, including ZoomedLayoutScheme and FloatingLayoutScheme. Since the tif need to be processed as a pyramid, i chose the FloatingLayoutScheme. Are there any other solutions to create a "TileLayerRDD[SpatialKey]" using GDALRasterSource? I have read the link in summary 3, but i have not find a solution to that. For summay 4, yeah, the tif file is very large, about several hundred GB, but i am not quite sure about the reason. It encounters ArrayIndexOutOfBoundsException error using HadoopGeotiffRdd.

@pomadchin
Copy link
Member

pomadchin commented May 21, 2022

@qw845602 yea, 3. is exactly about it; 👍

I'm afraid there are no quick / easy solutions to your problem: or to figure out GDAL issues and get really deep into it, or to use GDAL to convert TIFFs into tiled and compressed TIFFs: gdal_translate in.tif out.tif -co TILED=YES -co COMPRESS=LZW

The last one would not hurt to try, at least to check that it can work as expected with your data.

@qw845602
Copy link
Author

I have translated the tif using the command gdal_translate in.tif out.tif -co TILED=YES -co COMPRESS=LZW, however, the error " java.lang.IllegalArgumentException: requirement failed: x-aligned: offset by CellSize" still exist. It is so strange.

@pomadchin
Copy link
Member

@qw845602 is it by using non GDAL reads? Try it without GDAL

@qw845602
Copy link
Author

qw845602 commented May 21, 2022

@qw845602 yea, 3. is exactly about it; 👍

I'm afraid there are no quick / easy solutions to your problem: or to figure out GDAL issues and get really deep into it, or to use GDAL to convert TIFFs into tiled and compressed TIFFs: gdal_translate in.tif out.tif -co TILED=YES -co COMPRESS=LZW

The last one would not hurt to try, at least to check that it can work as expected with your data.

I have upload the tif after translated as well as the code and error in bugreport4. I have also tried zoomlayoutscheme, but it also cause the same error. So i don't know how to deal with the layoutscheme.

@qw845602
Copy link
Author

How to Try it without GDAL?

@qw845602
Copy link
Author

qw845602 commented May 21, 2022

@qw845602 is it by using non GDAL reads? Try it without GDAL

Some error occured in uploading bugreport4, now it is uploaded successfully. Is it mean that I need to translate the tif which caused Arrayindexoutofbound error and to see if it could be read by HadoopGeoTiffRDD?

@pomadchin
Copy link
Member

pomadchin commented May 21, 2022

@qw845602 yes, you may try HadoopGeoTiffRDD, but you can also replace GDALRasterSource with RasterSource - it will use non GDAL underlying reader

@pomadchin pomadchin added the bug label May 22, 2022
@qw845602
Copy link
Author

qw845602 commented Jun 15, 2022

@qw845602 yea, 3. is exactly about it; 👍

I'm afraid there are no quick / easy solutions to your problem: or to figure out GDAL issues and get really deep into it, or to use GDAL to convert TIFFs into tiled and compressed TIFFs: gdal_translate in.tif out.tif -co TILED=YES -co COMPRESS=LZW

The last one would not hurt to try, at least to check that it can work as expected with your data.

Yeah,it works by using the command "gdal_translate in.tif out.tif -co BIGTIFF=YES -co TILED=YES -co COMPRESS=LZW", After translating the tif, I can read the tif as rdd using the function hadoopGeoTiffRDD.

@RunBoo
Copy link

RunBoo commented Oct 27, 2023

@pomadchin Hello, I'm currently working on using the geotrellis-server project to publish a WMTS service. I'm providing a data link as the source: "file:///E:/Geotrellis/Tiles/attributes?layers=tiles&zoom=10&band_count=1". Under this path, I have pre-cut tile data using Geotrellis.
image
image

I'm using Scala 2.12.8, Geotrellis 3.6.1, and GDAL 3.0.4. And I'm on Windows operating system.
My stack trace is as follows:
17:36:31.296 [raster-io-0] DEBUG geotrellis.server.ogc.Main - GetCapabilities: /?SERVICE=WMS&REQUEST=GetCapabilities
17:36:31.369 [raster-io-0] ERROR org.http4s.server.service-errors - Error servicing request: GET / from 127.0.0.1
geotrellis.raster.gdal.MalformedDataException: Unable to construct dataset dimensions. GDAL Error Code: 4
at geotrellis.raster.gdal.GDALDataset$.$anonfun$dimensions$1(GDALDataset.scala:160)
at geotrellis.raster.gdal.GDALDataset$.$anonfun$dimensions$1$adapted(GDALDataset.scala:157)
at geotrellis.raster.gdal.GDALDataset$.errorHandler$extension(GDALDataset.scala:422)
at geotrellis.raster.gdal.GDALDataset$.dimensions$extension1(GDALDataset.scala:157)
at geotrellis.raster.gdal.GDALDataset$.rasterExtent$extension1(GDALDataset.scala:197)
at geotrellis.raster.gdal.GDALRasterSource.gridExtent$lzycompute(GDALRasterSource.scala:93)
at geotrellis.raster.gdal.GDALRasterSource.gridExtent(GDALRasterSource.scala:93)
at geotrellis.server.ogc.wms.CapabilitiesView$.$anonfun$modelAsLayer$2(CapabilitiesView.scala:277)
at scala.collection.immutable.List.map(List.scala:293)
at geotrellis.server.ogc.wms.CapabilitiesView$.$anonfun$modelAsLayer$1(CapabilitiesView.scala:265)
at map @ geotrellis.server.ogc.wms.CapabilitiesView$.modelAsLayer(CapabilitiesView.scala:264)
at mapN @ geotrellis.server.ogc.wms.CapabilitiesView$.modelAsLayer(CapabilitiesView.scala:291)
at mapN @ geotrellis.server.ogc.wms.CapabilitiesView$.modelAsLayer(CapabilitiesView.scala:291)
at map @ geotrellis.server.ogc.wms.CapabilitiesView.toXML(CapabilitiesView.scala:111)
at flatMap @ geotrellis.server.ogc.wms.WmsView.$anonfun$responseFor$5(WmsView.scala:142)
at delay @ io.chrisdavenport.log4cats.slf4j.internal.Slf4jLoggerInternal$Slf4jLogger.$anonfun$debug$4(Slf4jLoggerInternal.scala:68)
at delay @ io.chrisdavenport.log4cats.slf4j.internal.Slf4jLoggerInternal$Slf4jLogger.isDebugEnabled(Slf4jLoggerInternal.scala:50)
at ifM$extension @ io.chrisdavenport.log4cats.slf4j.internal.Slf4jLoggerInternal$Slf4jLogger.info(Slf4jLoggerInternal.scala:76)
at >>$extension @ geotrellis.server.ogc.wms.WmsView.responseFor(WmsView.scala:141)
at sequence @ org.http4s.HttpRoutes$.$anonfun$of$2(HttpRoutes.scala:79)
at defer @ org.http4s.HttpRoutes$.$anonfun$of$1(HttpRoutes.scala:79)
at $anonfun$combineK$1 @ org.http4s.syntax.KleisliResponseOps.$anonfun$orNotFound$1(KleisliSyntax.scala:49)
at getOrElse @ org.http4s.syntax.KleisliResponseOps.$anonfun$orNotFound$1(KleisliSyntax.scala:49)
at defer @ org.http4s.server.blaze.Http1ServerStage$$anon$2.run(Http1ServerStage.scala:200)
at flatMap @ org.http4s.server.blaze.Http1ServerStage$$anon$2.run(Http1ServerStage.scala:202)
[1 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." `/E:/Geotrellis/Tiles/attributes?layers=tiles&zoom=10&band_count=1' does not exist in the file system, and is not recognized as a supported dataset name.

How can I solve this problem? Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants