Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-numbers #5

Open
kwstat opened this issue Mar 25, 2022 · 5 comments
Open

Non-numbers #5

kwstat opened this issue Mar 25, 2022 · 5 comments
Labels

Comments

@kwstat
Copy link
Contributor

kwstat commented Mar 25, 2022

Your article is extremely thorough. You touched on a few of these points below, but maybe there's something interesting here below. This is not really an "issue", more of a comment.

# Things that are not floating point numbers ---

# In theory:
NaN  # Not A (floating point) Number.  IEEE 754 standard.
NA   # Placeholder for an unknown value. Invented by R. Logical.
NULL # Empty object (like an empty set).  Nothing.
Inf  # Infinity
-Inf # Negative infinity

# In practice:
# Any mental model of NA/NaN will fail you. Dante's Inferno.

length(NA)   # 1  Something there, but we don't know what.
length(NaN)  # 1  Something there, but not representable.
length(NULL) # 0  Nothing there.

sqrt(-1)             # NaN. 'i' in mathematics, not defined in floating point.

# NaN is an NA, but NA is not an NaN
is.nan(NA) # FALSE
is.na (NaN) # TRUE

min(c())               # Inf
min(c(NA), na.rm=TRUE) # Inf
min(NaN)               # NaN

max(c())               # -Inf
max(c(NA), na.rm=TRUE) # -Inf
max(NaN)               # NaN

# https://en.wikipedia.org/wiki/Empty_sum
sum(NA)               # NA
sum(NA, na.rm=TRUE)   # 0    # Horrible

mean(NA)              # NA
mean(NA, na.rm=TRUE)  # NaN

var(NA)               # NA
var(NA, na.rm=TRUE)   # NA

# https://en.wikipedia.org/wiki/Empty_product
prod(NA)              # NA
prod(NA, na.rm=TRUE)  # 1    # Horrible

NA | TRUE   # TRUE
NA & FALSE  # FALSE

# https://en.wikipedia.org/wiki/Division_by_zero
0/0   # NaN
1/0   # Inf.  Shouldn't it be NaN?!

Inf >= NA  # NA.  If NA is placeholder, this should be TRUE!

NA * 0      # NA. Because NA could be Inf, and Inf*0 is NaN. Right???
NA ^ 0      # 1
NaN ^ 0     # 1

NA %in% 1:3 # FALSE
match(NA, 1:3) # NA

matrix(nrow=2,ncol=2)  # matrix initializes with NAs
vector(mode="numeric", length=2) # vector initializes with 0s

# NULL can be assigned to an object.
x <- NULL
x
# NULL assigned to list elements removes them.
x <- list(1,"a",TRUE)
x[[1]] <- NULL
x
# NULL assigned to data.frame columns removes them
x <- data.frame(a=1:2, b=3:4)
x
x$a <- NULL
x

# https://blog.revolutionanalytics.com/2016/07/understanding-na-in-r.html
https://stats.stackexchange.com/questions/5686/what-is-the-difference-between-nan-and-na
@EmilHvitfeldt
Copy link

Very thorough post, and good notes here too! I couldn't help to point out that you CAN take the square root of -1 if you represent it as a complex number. I don't remember the last time I have seen complex numbers being used in R, but they are there 😄

sqrt(0i-1)
#> [1] 0+1i

@ReeceGoding
Copy link
Owner

ReeceGoding commented Mar 26, 2022

A good read! I'll definitely link to this in a future version. I fully admit ignorance regarding these non-numbers, so it's no wonder that I never discovered these issues myself. I know that NA must always be handled with great care, but I always take NaN as a warning sign that I've committed a grave error and must fix it before making any other steps.

Speaking of warnings, I will defend R by saying that many of these examples throw warnings that you've not shown. However, a lot of them don't, so it's not like R is totally innocent. There's also a handful that I kind of explain. For example,

sum(NA)               # NA
sum(NA, na.rm=TRUE)   # 0    # Horrible

is bad, but I can see their reasoning. Sum of NA being NA makes sense and if you remove NA (as in the second example), then you've got an empty sum, which is certainly 0. The mean example probably returns NaN because it'll boil down to sum(empty_set)/length(empty_set) which is bound to be division by 0. I've got no such defence for the var example, but the use parameter in its documentation seems very relevant.

@ReeceGoding
Copy link
Owner

I've linked to this page in the latest version. I'll keep the issue open just in case it attracts similar interesting comments.

@kwstat
Copy link
Contributor Author

kwstat commented Mar 27, 2022

I sorta understand the point of view about

sum(NA, na.rm=TRUE)

But suppose x is a vector of monthly sales per seller. If a seller is not on the payroll for a year, you probably want the yearly total to be NA, not zero, so I've had to write code like:

if(all(is.na(x)) total=NA else total = sum(x, na.rm=TRUE)

@ReeceGoding
Copy link
Owner

I think we agree. I get why they thought it made sense, but whether or not it was a good idea is a totally different question that I have no answer for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants