Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB's Decimal128 seems to be returned as fixed_size_binary[16] #203

Open
K-to-the-D opened this issue Mar 13, 2024 · 3 comments
Open
Assignees
Labels
duplicate This issue or pull request already exists

Comments

@K-to-the-D
Copy link

K-to-the-D commented Mar 13, 2024

Hi,

when I use pymongoarrow.api.aggregate_arrow_all() it seems to return Decimal128 as FixedSizeBinary when context.finish() is called.
When looking at the code, my assumption is, it stems from lib.pyx where return pyarrow_wrap_array(out).cast(Decimal128Type_()) in line 784 does not cast the fixed_sized_binary back to Decimal128.

pymongo==4.6.2
pymongoarrow==1.3.0
pyarrow==15.0.1

@blink1073
Copy link
Member

Hi @K-to-the-D, can you please share some example code?

I set a debug point in this test and the resulting data types were:

pyarrow.Table
Int64: int32
float: double
int: int32
datetime: timestamp[ms]
ObjectId: extension<pymongoarrow.objectid<ObjectIdType>>
Decimal128: extension<pymongoarrow.decimal128<Decimal128Type>>
str: string
bool: bool
Binary: extension<pymongoarrow.binary<BinaryType>>
Code: extension<pymongoarrow.code<CodeType>>

@K-to-the-D
Copy link
Author

K-to-the-D commented Mar 22, 2024

Hi @blink1073,

thanks for the response. You are right, it works for top-level Decimal128.
Unfortunately, I have to deal with Objects that contain nested Decimal128 fields.

Example code:

from pymongo import MongoClient
from bson.decimal128 import Decimal128
from pymongoarrow.api import aggregate_arrow_all

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["my_dummy_database"]
collection = db["my_dummy_collection"]

# Insert object with Decimal128
collection.insert_one(
    {
        "name": "Product",
        "price": {
            "net": Decimal128("29.99"),
            "gross": Decimal128("35.99"),
        },
    }
)

pipeline = [
    {"$match": {"price.gross": {"$lt": Decimal128("50.00")}}},
]

# Execute aggregation and retrieve PyArrow Table
arrow_table = aggregate_arrow_all(collection, pipeline)

# Display the result and type
print(f"types:\t{arrow_table["price"].type}")
print(f"values:\t{arrow_table["price"][0]}")
types:  struct<net: fixed_size_binary[16], gross: fixed_size_binary[16]>
values: [('net', b'\xb7\x0b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00<0'), ('gross', b'\x0f\x0e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00<0')]

@blink1073
Copy link
Member

Ah, understood, this will be fixed by https://jira.mongodb.org/browse/ARROW-179.

@blink1073 blink1073 added the duplicate This issue or pull request already exists label Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants