-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream support for exporting pdbs #108
Comments
Hey @djberenberg I've actually done this already. Code to follow once I find it :) I agree this would be a nice feature for biopandas |
Here you go: def to_pdb_stream(df: pd.DataFrame) -> StringIO:
"""Writes a PDB dataframe to a stream.
:param df: PDB dataframe
:type df: pandas.DataFrame
:return: StringIO Buffer
:rtype: StringIO
"""
df = df.copy().drop(columns=["model_id"])
df.residue_number = df.residue_number.astype(int)
records = [r.strip() for r in list(set(df.record_name))]
dfs = {r: df.loc[df.record_name == r] for r in records}
for r in dfs:
for col in pdb_records[r]:
dfs[r][col["id"]] = dfs[r][col["id"]].apply(col["strf"])
dfs[r]["OUT"] = pd.Series("", index=dfs[r].index)
for c in dfs[r].columns:
# fix issue where coordinates with four or more digits would
# cause issues because the columns become too wide
if c in {"x_coord", "y_coord", "z_coord"}:
for idx in range(dfs[r][c].values.shape[0]):
if len(dfs[r][c].values[idx]) > 8:
dfs[r][c].values[idx] = str(
dfs[r][c].values[idx]).strip()
if c not in {"line_idx", "OUT"}:
dfs[r]["OUT"] = dfs[r]["OUT"] + dfs[r][c]
df = pd.concat(dfs, sort=False)
df.sort_values(by="line_idx", inplace=True)
output = StringIO()
s = df["OUT"].tolist()
for idx in range(len(s)):
if len(s[idx]) < 80:
s[idx] = f"{s[idx]}{' ' * (80 - len(s[idx]))}"
to_write = "\n".join(s)
output.write(to_write)
output.write("\n")
return output |
Thank you @a-r-j !!! |
Wow, thanks @a-r-j . If this works for you @djberenberg , it'd be great to add this to biopandas as a PR :) |
Sure @rasbt , I'll add it to the open PR once I've got a moment. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the workflow you want to enable
I'd like to be able to export a pdb to a stream instead of to disk. In particular the reason why I'd like to do so is so that I can pass the stream directly to
wandb.Molecule
Describe your proposed solution
The
PandasPdb.to_pdb
method could accept apath_or_stream: typing.Union[io.StringIO, str]
instead of just apath: str
argument. Internally, ifpath_or_stream
happens to be aio.StringIO
object, we don't need anopenf
function and instread can just execute the internal loops seen here, wheref
is now theio.StringIO
object.Making this change would enable inplace filling the stream with the pdb text.
Describe alternatives you've considered, if relevant
Currently I am needlessly writing to disk temporarily, reopening the file, and passing its contents to the
wandb.Molecule
object.Additional context
The text was updated successfully, but these errors were encountered: