I’m reading in a json file, putting it into DataFrames to clean it up, before dumping into MongoDB (via mongoc). It is almost there, my only problem is that one of the fields is a timestamp, which is coming in as a string. The strings are of the form: “2019-11-18T13:09:31Z”. The problem is that DateTime doesn’t like that Zulu timezone, and Timezones.jl doesn’t help. So I am now trying to strip off the last character (since in practice they are all recorded in UTC). the files actually contain JSON in just the last line, I strip out the HTTP headers in the first few lines.
I am using DrWatson, JSONTables, DataFrames, Dates. I have a function
function read_jsondump(filename)
cap_file = readlines(datadir(“import”, filename))
df = DataFrame(jsontable(cap_file[end]))
transform(df, :time => chop.(:time))
df[!, :time] = convert.(DateTime, df[:, :time])
return df
end
I’m still getting confused in the use of broadcasting is places like this. the transform line doesn’t work, doesn’t like chop. For reference, in pandas what I do here is
df[‘timestamp’] = pd.to_datetime(df[‘time’])
which actually creates a new field and I then delete the original, but that doesn’t matter. The point is that pandas to_datetime can handle the ‘Z’, but Julia DateTime can’t, so I have to work around it.
I am working my way through the Introduction to Dataframes tutorial, and have also just bought Tom Kwong’s book, so hopefully my confusion about handling things like this will disappear soon.
6 posts - 3 participants