You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to create a calculated column on a json, gzipped, hive-partitioned table that is read from azure blob storage, dqops throws this error when collecting statistics on such calculated column.
The calculated column uses this query:
dayname(scraped_at::timestamp)
This calculated column is working perfectly for a s3 hive partitioned parquet file table. Manually setting the column data type to STRING, or VARCHAR does not seem to help.
Error stacktrace:
2024-05-13 09:41:41.120 [pool-5-thread-2] ERROR c.d.c.jobqueue.BaseDqoJobQueueImpl -- Failed to execute a job: com.dqops.execution.statistics.jobs.DqoStatisticsCollectionJobFailedException: Cannot collect statistics on the table *redacted* on the connection azure, the first error: Cannot invoke "com.dqops.metadata.sources.ColumnTypeSnapshotSpec.getColumnType()" because "typeSnapshot" is nulljava.lang.NullPointerException: Cannot invoke "com.dqops.metadata.sources.ColumnTypeSnapshotSpec.getColumnType()" because "typeSnapshot" is null
at com.dqops.metadata.sources.fileformat.TableOptionsFormatter.lambda$formatColumns$1(TableOptionsFormatter.java:97)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.SliceOps$1$1.accept(SliceOps.java:200)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:601)
at com.dqops.metadata.sources.fileformat.TableOptionsFormatter.formatColumns(TableOptionsFormatter.java:95)
at com.dqops.metadata.sources.fileformat.JsonFileFormatSpec.buildSourceTableOptionsString(JsonFileFormatSpec.java:96)
at com.dqops.metadata.sources.fileformat.FileFormatSpec.buildTableOptionsString(FileFormatSpec.java:170)
at com.dqops.execution.sqltemplates.rendering.JinjaTemplateRenderParameters.createFromTrimmedObjects(JinjaTemplateRenderParameters.java:161)
at com.dqops.execution.sqltemplates.rendering.JinjaSqlTemplateSensorRunner.prepareSensor(JinjaSqlTemplateSensorRunner.java:104)
at com.dqops.execution.sensors.DataQualitySensorRunnerImpl.prepareSensor(DataQualitySensorRunnerImpl.java:93)
at com.dqops.execution.statistics.TableStatisticsCollectorsExecutionServiceImpl.prepareSensors(TableStatisticsCollectorsExecutionServiceImpl.java:264)
at com.dqops.execution.statistics.TableStatisticsCollectorsExecutionServiceImpl.executeCollectorsOnTable(TableStatisticsCollectorsExecutionServiceImpl.java:154)
at com.dqops.execution.statistics.StatisticsCollectorsExecutionServiceImpl.executeStatisticsCollectorsOnTable(StatisticsCollectorsExecutionServiceImpl.java:171)
at com.dqops.execution.statistics.jobs.CollectStatisticsOnTableQueueJob.onExecute(CollectStatisticsOnTableQueueJob.java:82)
... 8 common frames omitted
Wrapped by: com.dqops.execution.statistics.jobs.DqoStatisticsCollectionJobFailedException: Cannot collect statistics on the table *redacted* on the connection azure, the first error: Cannot invoke "com.dqops.metadata.sources.ColumnTypeSnapshotSpec.getColumnType()" because "typeSnapshot" is null
at com.dqops.execution.statistics.jobs.CollectStatisticsOnTableQueueJob.onExecute(CollectStatisticsOnTableQueueJob.java:99)
at com.dqops.execution.statistics.jobs.CollectStatisticsOnTableQueueJob.onExecute(CollectStatisticsOnTableQueueJob.java:39)
at com.dqops.core.jobqueue.DqoQueueJob.execute(DqoQueueJob.java:128)
... 6 common frames omitted
Wrapped by: com.dqops.core.jobqueue.exceptions.DqoQueueJobExecutionException: com.dqops.execution.statistics.jobs.DqoStatisticsCollectionJobFailedException: Cannot collect statistics on the table *redacted* on the connection azure, the first error: Cannot invoke "com.dqops.metadata.sources.ColumnTypeSnapshotSpec.getColumnType()" because "typeSnapshot" is null
at com.dqops.core.jobqueue.DqoQueueJob.execute(DqoQueueJob.java:142)
at com.dqops.core.jobqueue.BaseDqoJobQueueImpl.jobProcessingThreadLoop(BaseDqoJobQueueImpl.java:203)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Steps to reproduce:
Create a hive-partitioned json gzipped newline-delimited azure blob data source table, where some field contains a timestamp
Create a calculated column using query mentioned above
Problem fixed. DuckDB is case-sensitive, and the schema definition for nested fields must be very strict. The current version on develop is no longer trying to align data types and change them to upper case.
When trying to create a calculated column on a json, gzipped, hive-partitioned table that is read from azure blob storage, dqops throws this error when collecting statistics on such calculated column.
The calculated column uses this query:
dayname(scraped_at::timestamp)
This calculated column is working perfectly for a s3 hive partitioned parquet file table. Manually setting the column data type to STRING, or VARCHAR does not seem to help.
Error stacktrace:
Steps to reproduce:
Bug observed in:
The text was updated successfully, but these errors were encountered: