Welcome

This is the Cascading.Hive module.

It provides Cascading Tap/Scheme for HCatalog and Scheme for Hive native file formats(RCFile and ORC).

Notes

Maven dependency

<dependency>
  <groupId>com.squareup.cascading-hive</groupId>
  <artifactId>cascading-hive</artifactId>
  <version>0.0.2</version>
  <scope>compile</scope>
</dependency>

Hive version

Currently, this module only works with Apache Hive 0.12.x. If you want to use it with other versions of Hive, you need to patch few classes.

Projection pushdown

Both RC and ORC support projection pushdown to reduce read I/O when only a subset of fields needed.

You can enalbe this either by creating the scheme using additional argument to indicate the selected columns, e.g.

//only col1 and col4 will be read
Scheme rcScheme = new RCFile("col1 int, col2 string, col3 string, col4 long", "0,3");

Scheme orcScheme = new ORCFile("col1 int, col2 string, col3 string, col4 long", "0,3");

or by setting Hive specific properties for your flow:

hive.io.file.read.all.columns=false
hive.io.file.readcolumn.ids=0,3

HCatalog usage

To talk with your production HCatalog, you have to include real hive-site.xml in your artifact. Once you build a fat jar artifact, you need to add datanucleus libs into CLASSPATH, because they are excluded from this artifact.

hadoop jar $your_fat_jar -libjars $HIVE_HOME/lib/datanucleus-core-3.2.2.jar,$HIVE_HOME/lib/datanucleus-rdbms-3.2.1.jar,$HIVE_HOME/lib/datanucleus-api-jdo-3.2.2.jar $your_options

Scalding usage

To use RCFile/ORC with Scalding, check out ColumnarSerDeSource.scala. It requires Scalding 0.9.1.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Welcome

Notes

Maven dependency

Hive version

Projection pushdown

HCatalog usage

Scalding usage

About

Releases

Packages

Languages

da3mon/cascading.hive

Folders and files

Latest commit

History

Repository files navigation

Welcome

Notes

Maven dependency

Hive version

Projection pushdown

HCatalog usage

Scalding usage

About

Resources

Stars

Watchers

Forks

Languages