I have read a lot about how Julia handles missing values:
https://docs.julialang.org/en/v1/manual/missing/
but what’s still not clear to me is the differences (and pros and cons) vs how Python and R treat them. Could someone help a total newbie to Julia shed some light?
-
A big bugbear of mine used to be that pandas doesn’t support nullable ints, only nullable floats. This has now been addressed, but is not ideal as it has introduced differences between pandas.na and numpy.nan Is there any comparable confusion in Julia?
-
Another bugbear used to be that pandas would remove records in groupbys, if grouping by a null variable. Say the field “city” contains “New York” 5 times and then 3 null records, then a count by city would show only “New York: 5”. This has been addressed in pandas with the dropna argument, but maybe not in R (not sure). How does Julia do it?
-
What are the advantages of having a ‘missing’ which is different from nan? is it that it lets you distinguish between data which is missing (e.g. not collected, not known at all) vs data which is the result of an incorrect calculation? Or is there something else?
-
Can you think of any other meaningful differences among the 3 languages when it comes to missing values?
4 posts - 4 participants