Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespaces and the package system #9

Open
flying-sheep opened this issue Apr 11, 2022 · 7 comments
Open

Namespaces and the package system #9

flying-sheep opened this issue Apr 11, 2022 · 7 comments
Labels

Comments

@flying-sheep
Copy link

flying-sheep commented Apr 11, 2022

Hi, I did my whole PhD doing mostly R and stopped using it then. I authored a few packages, and am the maintainer of the R kernel for Jupyter.

I think you overlooked the number 1 reason why R probably won’t become a serious programming language:

Its packages have two flat namespaces (pkg:::internal and pkg::exported).

That means that

  1. Writing packages will leave you in a symbol jungle of your own creation. All your packages’ files are essentially be concatenated and evaluated in the internal namespace (with access to things you imported in the NAMESPACE file).

    This means you’ll never build bigger, more complex packages that aren’t a complete mess. If you want clean, complex functionality, you’ll have to maintain several smaller packages, which is a high burden for individuals. In contrast, Python’s subpackages solve this effortlessly.

  2. Many useful packages are designed to be library()d, not having their items accessed via ::. This means 2. is very hard to fix, even if there was interest do do so: If you e.g. have pkg1 define a generic gen <- ...; setGeneric('gen') and pkg2 defines both Cls <- setClass('Cls') and setMethod('gen', 'Cls'), you can‘t just do pkg1::gen(pkg2::Cls(...)), since the method isn’t in a visible namespace. This was S4, but S3 has the same problem (and others).

I think CRAN’s more vetted package publishing process (compared to PyPI’s), combined with Python’s namespace system would be ideal. But given the choice, Python is just the better language for building things.

PS: the extremely implicit and underdocumented package building process is also an R problem. Python fixed its own packaging mess by replacing their system. In R, you still have a giant pile of possible files and variables that modify how your package is built, especially if you use Rcpp or so.

@ReeceGoding
Copy link
Owner

Thanks for this. You could be on to something, but I'm afraid that I have to admit complete and utter ignorance on this topic.

@t-wojciech
Copy link

PS: the extremely implicit and underdocumented package building process is also an R problem.

Personally, I think R has the best system for creating packages. There's the usual guide to Writing R Extensions, but have you seen a great book R Packages from Hadley Wickham and Jenny Bryan? This is much easier than in Python. In addition, RStudio has created usethis for automatic package development which makes life even easier.

This means you’ll never build bigger, more complex packages that aren’t a complete mess.

What about data.table, quanteda, terra? They're pretty big, and the code looks solid.

@flying-sheep
Copy link
Author

flying-sheep commented Apr 12, 2022

Personally, I think R has the best system for creating packages.

I don’t want to argue about this. My criticism is specifically referring to all the little screws in different places that you can turn. There’s several possible Makevars and Makefiles that you can put in different places and whose presence will affect how the package is built. This is mostly a problem if you “want to do things right” and are building a package with nontrivial compiled code.

But it’s indicative of a greater problem with the system: There’s only the one and it grew over years. It’s not designed from the ground up to make sense. All the usethis stuff is of course great, but the underlying packaging system has decades of legacy.

For Python, they got to build a new system recently, with all the hindsight from their own mess, R, Ruby, node’s npm, Rust’s cargo, … It’s pretty neat since it only specifies some minimal metadata that defines a build backend. The build backend does the actual work, so once a build backend has accumulated too much legacy (such as setuptools), new packages can decide to use another that’s simpler, faster and easier. This system has much less potential to become messy.

What about data.table, quanteda, terra? They're pretty big, and the code looks solid.

The first file I opened has 3280 lines: https://github.com/Rdatatable/data.table/blob/master/R/data.table.R

The complexity of the average function is pretty high. There’s a lot of branching and early returns in big function bodies. /edit: Jesus Christ, [.data.table is like 1800 lines, wtf.

I don‘t think this is a very maintainable code base judged by the standards of languages that can afford breaking things down into smaller, hierarchical units. It might be one by R’s standards, but that distinction is the whole problem.

@eddelbuettel
Copy link

that modify how your package is built, especially if you use Rcpp or so.

Nice unsubstantiated conjecture you have there. How R builds things is well documented and understood, and seems to work for the 2500+ CRAN packages using Rcpp. [ This is an utter aside to the document but as I am somewhat involved with Rcpp I couldn't let this stand. ]

@flying-sheep
Copy link
Author

flying-sheep commented Apr 13, 2022

Dirk, of course you know the ins and outs like few other people. I’m talking from my experience of “just adding a little bit of compiled code that needs to modify compiler flags”. Even after reading (what I thought was) everything on the topic, it wasn’t all too clear in what files to modify those flags in and where to put those files.

“Writing R extensions” comprehensive, complete, has examples for individual concepts, but none on how to put it all together. Technical documentation is not all that a beginner needs to get started; there need to be example repositories/folders to understand where things are, there need to be use cases and screenshots, sections like “Things crashed, how to compile with debug symbols”, … Otherwise people will have to read and understand several sections and piece things together to know where exactly a behavior modifying file is to be put and what to put in it.

“R packages” says the above is “beyond its scope”: https://r-pkgs.org/src.html#make

I’m not blaming anyone or anything. I respect your work a lot. I’m just saying people could benefit from more example packages, and that R’s package system has no room to start new. Be aware that I probably wasn’t the noobiest noob to try what I did. I’m sure that where I prevailed in the end, others gave up or ended up with a suboptimal hack. Don’t handwave my experience away by saying I just wasn’t good enough.

@eddelbuettel
Copy link

Without a minimally complete verifiable example it is just hearsay amounting to FUD as we have nothing to work with. If you truly want to make things better (given how passionate your writing), you could help with steps that can actually aid in improving things.

@flying-sheep
Copy link
Author

flying-sheep commented Apr 13, 2022

You’re funny, how am I to reproduce the frustration of trying to understand something? (Especially when that was like 5 years ago)

I figured it out eventually, after asking people and bashing my head against “Writing R extensions”. Years later I gave up trying to make R work for me.

“Writing R extensions” still has no screenshots and links to example repos and “R packages” doesn’t address compiled code. Given my current involvement, I’ll call my job done by vaguely gesturing in that direction and trusting that people understand that this state of affairs is not nearly as noob friendly as it could be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants