Let’s say I have a data frame with columns c1,c2,…,cn and I run a regression of c1 on c2, c3, c4 and c2.*c4:
sm = GLM.lm( @formula(c1 ~ c2 + c3 +c4 +c2&c4), df)
If I were in Python, I could then do
formula = 'c1 ~ c2 + c3 +c4 +c2&c4'
import patsy
y, X = patsy.dmatrices(formula, df)
This would give me X
, the design matrix of the regression–i.e. columns corresponding to c2, c3, c4, and c2.*c4.
How can I get this matrix in Julia? I could probably do this manually, but if I had a large number of regressors and interactions, it might get tricky, so I’d like a systematic approach.
The reason I’m asking is that I eventually want to do counterfactuals for what would happen if I change c2 to 0 or to 1. You could throw this in as a bonus, but I think once I get the first part, I can figure that out.
6 posts - 3 participants