Quantcast
Channel: First steps - JuliaLang
Viewing all articles
Browse latest Browse all 2795

Help with ChisqTest

$
0
0

@darrencl wrote:

Hi,

I want to test my variable is independent from target y with ChisqTest from HypothesisTests.jl, so I think I would need to use the contingency table instead of goodness of fit (like sklearn's).

First, I did one-hot-encode my categorical variable, then fetch it to ChisqTest function. I saw there is a k parameter which affect the degree of freedom (it seems degree of freedom = (k - 1)^2). I am not a statistician here, so what value should I put?

Anyway, using my one-hot-encoded feature, it seems that this produces NaN p-values in all of my feature. Why is that? I am using titanic dataset from RDatasets.jl. Here’s the sample that it produces NaN when testing one of my feature against target ‘y’ (Survived).

julia> titanic = dataset("datasets", "Titanic");

julia> X = one_hot_encode(titanic[:, [:Class, :Sex, :Age]]; drop_original=true)
32×8 DataFrame
│ Row │ Class_1st │ Class_2nd │ Class_3rd │ Class_Crew │ Sex_Female │ Sex_Male │ Age_Adult │ Age_Child │
│     │ Bool      │ Bool      │ Bool      │ Bool       │ Bool       │ Bool     │ Bool      │ Bool      │
├─────┼───────────┼───────────┼───────────┼────────────┼────────────┼──────────┼───────────┼───────────┤
│ 1   │ 1         │ 0         │ 0         │ 0          │ 0          │ 1        │ 0         │ 1         │
│ 2   │ 0         │ 1         │ 0         │ 0          │ 0          │ 1        │ 0         │ 1         │
│ 3   │ 0         │ 0         │ 1         │ 0          │ 0          │ 1        │ 0         │ 1         │
│ 4   │ 0         │ 0         │ 0         │ 1          │ 0          │ 1        │ 0         │ 1         │
│ 5   │ 1         │ 0         │ 0         │ 0          │ 1          │ 0        │ 0         │ 1         │
│ 6   │ 0         │ 1         │ 0         │ 0          │ 1          │ 0        │ 0         │ 1         │
│ 7   │ 0         │ 0         │ 1         │ 0          │ 1          │ 0        │ 0         │ 1         │
│ 8   │ 0         │ 0         │ 0         │ 1          │ 1          │ 0        │ 0         │ 1         │
│ 9   │ 1         │ 0         │ 0         │ 0          │ 0          │ 1        │ 1         │ 0         │
│ 10  │ 0         │ 1         │ 0         │ 0          │ 0          │ 1        │ 1         │ 0         │
│ 11  │ 0         │ 0         │ 1         │ 0          │ 0          │ 1        │ 1         │ 0         │
│ 12  │ 0         │ 0         │ 0         │ 1          │ 0          │ 1        │ 1         │ 0         │
│ 13  │ 1         │ 0         │ 0         │ 0          │ 1          │ 0        │ 1         │ 0         │
│ 14  │ 0         │ 1         │ 0         │ 0          │ 1          │ 0        │ 1         │ 0         │
│ 15  │ 0         │ 0         │ 1         │ 0          │ 1          │ 0        │ 1         │ 0         │
│ 16  │ 0         │ 0         │ 0         │ 1          │ 1          │ 0        │ 1         │ 0         │
│ 17  │ 1         │ 0         │ 0         │ 0          │ 0          │ 1        │ 0         │ 1         │
│ 18  │ 0         │ 1         │ 0         │ 0          │ 0          │ 1        │ 0         │ 1         │
│ 19  │ 0         │ 0         │ 1         │ 0          │ 0          │ 1        │ 0         │ 1         │
│ 20  │ 0         │ 0         │ 0         │ 1          │ 0          │ 1        │ 0         │ 1         │
│ 21  │ 1         │ 0         │ 0         │ 0          │ 1          │ 0        │ 0         │ 1         │
│ 22  │ 0         │ 1         │ 0         │ 0          │ 1          │ 0        │ 0         │ 1         │
│ 23  │ 0         │ 0         │ 1         │ 0          │ 1          │ 0        │ 0         │ 1         │
│ 24  │ 0         │ 0         │ 0         │ 1          │ 1          │ 0        │ 0         │ 1         │
│ 25  │ 1         │ 0         │ 0         │ 0          │ 0          │ 1        │ 1         │ 0         │
│ 26  │ 0         │ 1         │ 0         │ 0          │ 0          │ 1        │ 1         │ 0         │
│ 27  │ 0         │ 0         │ 1         │ 0          │ 0          │ 1        │ 1         │ 0         │
│ 28  │ 0         │ 0         │ 0         │ 1          │ 0          │ 1        │ 1         │ 0         │
│ 29  │ 1         │ 0         │ 0         │ 0          │ 1          │ 0        │ 1         │ 0         │
│ 30  │ 0         │ 1         │ 0         │ 0          │ 1          │ 0        │ 1         │ 0         │
│ 31  │ 0         │ 0         │ 1         │ 0          │ 1          │ 0        │ 1         │ 0         │
│ 32  │ 0         │ 0         │ 0         │ 1          │ 1          │ 0        │ 1         │ 0         │

julia> y = Vector{Int64}(recode(titanic.Survived,
                                    "No"=> 1,
                                    "Yes"=> 2)
                                    );

julia> X_data=convert(Matrix, X);

julia> ChisqTest(Int.(X_data[:,1]), y,2)
Pearson's Chi-square Test
-------------------------
Population details:
    parameter of interest:   Multinomial Probabilities
    value under h_0:         [0.5, 0.0, 0.5, 0.0]
    point estimate:          [0.5, 0.0, 0.5, 0.0]
    95% confidence interval: Tuple{Float64,Float64}[(0.25, 0.8761), (0.0, 0.3761), (0.25, 0.8761), (0.0, 0.3761)]

Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           NaN

Details:
    Sample size:        8
    statistic:          NaN
    degrees of freedom: 1
    residuals:          [0.0, NaN, 0.0, NaN]
    std. residuals:     [NaN, NaN, NaN, NaN]

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 2795

Latest Images

Trending Articles



Latest Images