Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] materialize_features fails with some combinations of features #920

Open
2 of 4 tasks
loomlike opened this issue Dec 13, 2022 · 0 comments
Open
2 of 4 tasks
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@loomlike
Copy link
Collaborator

loomlike commented Dec 13, 2022

Willingness to contribute

No. I cannot contribute a bug fix at this time.

Feathr version

0.9.0

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0): Ubuntu 20.0
  • Python version: 3.10
  • Spark version, if reporting runtime issue: 3.2.x and 3.3.1

Describe the problem

Materialize job fails on some combinations of features, throwing following errors:

Caused by: java.lang.NullPointerException
        at com.linkedin.feathr.common.types.protobuf.FeatureValueOuterClass$FeatureValue$Builder.setStringValue(FeatureValueOuterClass.java:1728)
        at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$getConversionFunction$4(RedisOutputUtils.scala:110)
        at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$2(RedisOutputUtils.scala:51)
        at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$2$adapted(RedisOutputUtils.scala:48)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.immutable.Range.foreach(Range.scala:158)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$1(RedisOutputUtils.scala:48)

Tracking information

No response

Code to reproduce bug

# anchored feature
Feature(
        name="account_country",
        key=account_id,
        feature_type=STRING, 
        transform="accountCountry",
    ),
...

# average amount of transaction in that week
avg_transaction_amount = Feature(
    name="avg_transaction_amount",
    key=account_id,
    feature_type=FLOAT,
    transform=WindowAggTransformation(
        agg_expr="cast_float(transactionAmount)", agg_func="AVG", window="7d"
    ),
)
...

client.materialize_features(
    MaterializationSettings(
        ACCOUNT_FEATURE_TABLE_NAME,
        backfill_time=backfill_time,
        sinks=[RedisSink(table_name=ACCOUNT_FEATURE_TABLE_NAME)],
        feature_names=["account_country", "avg_transaction_amount"],
    ),
    allow_materialize_non_agg_feature=True,
)

feature_names=["account_country"], feature_names=["avg_transaction_amount"], and other combinations like ['account_country', 'num_transaction_count_in_day'] work without errors.

Only ["account_country", "avg_transaction_amount"] this combination fails.

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants