Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create calculated column on azure blob table #23

Closed
sztuka-billtech opened this issue May 13, 2024 · 1 comment
Closed

Cannot create calculated column on azure blob table #23

sztuka-billtech opened this issue May 13, 2024 · 1 comment

Comments

@sztuka-billtech
Copy link

When trying to create a calculated column on a json, gzipped, hive-partitioned table that is read from azure blob storage, dqops throws this error when collecting statistics on such calculated column.

The calculated column uses this query:

dayname(scraped_at::timestamp)

This calculated column is working perfectly for a s3 hive partitioned parquet file table. Manually setting the column data type to STRING, or VARCHAR does not seem to help.

Error stacktrace:

2024-05-13 09:41:41.120 [pool-5-thread-2] ERROR c.d.c.jobqueue.BaseDqoJobQueueImpl -- Failed to execute a job: com.dqops.execution.statistics.jobs.DqoStatisticsCollectionJobFailedException: Cannot collect statistics on the table *redacted* on the connection azure, the first error: Cannot invoke "com.dqops.metadata.sources.ColumnTypeSnapshotSpec.getColumnType()" because "typeSnapshot" is nulljava.lang.NullPointerException: Cannot invoke "com.dqops.metadata.sources.ColumnTypeSnapshotSpec.getColumnType()" because "typeSnapshot" is null
	at com.dqops.metadata.sources.fileformat.TableOptionsFormatter.lambda$formatColumns$1(TableOptionsFormatter.java:97)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.SliceOps$1$1.accept(SliceOps.java:200)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:601)
	at com.dqops.metadata.sources.fileformat.TableOptionsFormatter.formatColumns(TableOptionsFormatter.java:95)
	at com.dqops.metadata.sources.fileformat.JsonFileFormatSpec.buildSourceTableOptionsString(JsonFileFormatSpec.java:96)
	at com.dqops.metadata.sources.fileformat.FileFormatSpec.buildTableOptionsString(FileFormatSpec.java:170)
	at com.dqops.execution.sqltemplates.rendering.JinjaTemplateRenderParameters.createFromTrimmedObjects(JinjaTemplateRenderParameters.java:161)
	at com.dqops.execution.sqltemplates.rendering.JinjaSqlTemplateSensorRunner.prepareSensor(JinjaSqlTemplateSensorRunner.java:104)
	at com.dqops.execution.sensors.DataQualitySensorRunnerImpl.prepareSensor(DataQualitySensorRunnerImpl.java:93)
	at com.dqops.execution.statistics.TableStatisticsCollectorsExecutionServiceImpl.prepareSensors(TableStatisticsCollectorsExecutionServiceImpl.java:264)
	at com.dqops.execution.statistics.TableStatisticsCollectorsExecutionServiceImpl.executeCollectorsOnTable(TableStatisticsCollectorsExecutionServiceImpl.java:154)
	at com.dqops.execution.statistics.StatisticsCollectorsExecutionServiceImpl.executeStatisticsCollectorsOnTable(StatisticsCollectorsExecutionServiceImpl.java:171)
	at com.dqops.execution.statistics.jobs.CollectStatisticsOnTableQueueJob.onExecute(CollectStatisticsOnTableQueueJob.java:82)
	... 8 common frames omitted
Wrapped by: com.dqops.execution.statistics.jobs.DqoStatisticsCollectionJobFailedException: Cannot collect statistics on the table *redacted* on the connection azure, the first error: Cannot invoke "com.dqops.metadata.sources.ColumnTypeSnapshotSpec.getColumnType()" because "typeSnapshot" is null
	at com.dqops.execution.statistics.jobs.CollectStatisticsOnTableQueueJob.onExecute(CollectStatisticsOnTableQueueJob.java:99)
	at com.dqops.execution.statistics.jobs.CollectStatisticsOnTableQueueJob.onExecute(CollectStatisticsOnTableQueueJob.java:39)
	at com.dqops.core.jobqueue.DqoQueueJob.execute(DqoQueueJob.java:128)
	... 6 common frames omitted
Wrapped by: com.dqops.core.jobqueue.exceptions.DqoQueueJobExecutionException: com.dqops.execution.statistics.jobs.DqoStatisticsCollectionJobFailedException: Cannot collect statistics on the table *redacted* on the connection azure, the first error: Cannot invoke "com.dqops.metadata.sources.ColumnTypeSnapshotSpec.getColumnType()" because "typeSnapshot" is null
	at com.dqops.core.jobqueue.DqoQueueJob.execute(DqoQueueJob.java:142)
	at com.dqops.core.jobqueue.BaseDqoJobQueueImpl.jobProcessingThreadLoop(BaseDqoJobQueueImpl.java:203)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

Steps to reproduce:

  1. Create a hive-partitioned json gzipped newline-delimited azure blob data source table, where some field contains a timestamp
  2. Create a calculated column using query mentioned above

Bug observed in:

@dqops
Copy link
Owner

dqops commented May 15, 2024

Problem fixed. DuckDB is case-sensitive, and the schema definition for nested fields must be very strict. The current version on develop is no longer trying to align data types and change them to upper case.

@dqops dqops closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants