Metadata-Version: 2.1
Name: skcredit
Version: 0.0.3
Summary: scorecard
Home-page: https://github.com/JYLFamily/scikit-credit
Author: JYLFamily
Author-email: jiangyilanf@gmail.com
License: UNKNOWN
Description: # scikit-credit
        scikit-credit 是用于信贷风控评分卡建模的自动化工具。如下是 scikit-credit 的简介：
        * 自动分箱
        * 特征选择
        * 模型训练 
        * 评分生成
        
        # 玩家
        信贷风控模型分析师
        
        # 安装
        
        ```
        pip install skcredit
        ```
        
        # 示例
        
        ## 完整代码请参见 [example.ipynb](https://github.com/JYLFamily/scikit-credit/blob/master/examples/example.ipynb)
        
        ## 单特征分箱
        
        ### 分类单特征分箱
        
        #### 默认参数
        ```
        from skcredit.feature_discretization import SplitCat
        
        # min_bin_cnt_negative=75
        # min_bin_cnt_positive=75
        # min_information_value_split_gain=0.015
        
        sc = SplitCat()
        sc.fit(dataset["EDUCATION"], dataset["default.payment.next.month"])
        print(sc.table.to_markdown())
        ```
        |    | Column    | Bucket       |   CntPositive |   CntNegative |       WoE |        IvS |
        |---:|:----------|:-------------|--------------:|--------------:|----------:|-----------:|
        |  0 | EDUCATION | {0, 1, 4, 5} |          1529 |          6756 | -0.229193 | 0.0181077  |
        |  1 | EDUCATION | {2, 3, 6}    |          3456 |         10759 |  0.120993 | 0.00955926 |
        |  2 | EDUCATION | {nan}        |             0 |             0 |  0        | 0          |
        
        
        #### 调整参数
        
        ```
        from skcredit.feature_discretization import SplitCat
        
        sc = SplitCat(min_information_value_split_gain=0.0001)
        sc.fit(dataset["EDUCATION"], dataset["default.payment.next.month"])
        print(sc.table.to_markdown())
        ```
        |    | Column    | Bucket          |   CntPositive |   CntNegative |        WoE |        IvS |
        |---:|:----------|:----------------|--------------:|--------------:|-----------:|-----------:|
        |  0 | EDUCATION | {0, 1, 4, 5, 6} |          2069 |          8984 | -0.209693  | 0.0152528  |
        |  1 | EDUCATION | {2}             |          3330 |         10700 |  0.0914156 | 0.00400755 |
        |  2 | EDUCATION | {3}             |          1237 |          3680 |  0.168463  | 0.00486862 |
        |  3 | EDUCATION | {nan}           |             0 |             0 |  0         | 0          |
        
        ### 连续单特征分箱
        
        #### 默认参数 
        
        ```
        from skcredit.feature_discretization import SplitNum
        
        # min_bin_cnt_negative=75
        # min_bin_cnt_positive=75
        # min_information_value_split_gain=0.015
        
        sn = SplitNum()
        sn.fit(dataset["LIMIT_BAL"], dataset["default.payment.next.month"])
        print(sn.table.to_markdown())
        ```
        |    | Column    | Bucket             |   CntPositive |   CntNegative |       WoE |       IvS |
        |---:|:----------|:-------------------|--------------:|--------------:|----------:|----------:|
        |  0 | LIMIT_BAL | (-inf,40000.0]     |          1555 |          2756 |  0.686382 | 0.0798734 |
        |  1 | LIMIT_BAL | (40000.0,140000.0] |          2767 |          8212 |  0.170854 | 0.0111888 |
        |  2 | LIMIT_BAL | (140000.0,+inf)    |          2314 |         12396 | -0.419709 | 0.0763266 |
        |  3 | LIMIT_BAL | [nan]              |             0 |             0 |  0        | 0         |
        
        #### 调整参数
        
        ```
        from skcredit.feature_discretization import SplitNum
        
        sn = SplitNum(min_information_value_split_gain=0.0001)
        sn.fit(dataset["LIMIT_BAL"], dataset["default.payment.next.month"])
        print(sn.table.to_markdown())
        ```
        |    | Column    | Bucket              |   CntPositive |   CntNegative |          WoE |         IvS |
        |---:|:----------|:--------------------|--------------:|--------------:|-------------:|------------:|
        |  0 | LIMIT_BAL | (-inf,10000.0]      |           197 |           296 |  0.851531    | 0.0144909   |
        |  1 | LIMIT_BAL | (10000.0,40000.0]   |          1358 |          2460 |  0.664539    | 0.0660227   |
        |  2 | LIMIT_BAL | (40000.0,70000.0]   |          1328 |          3593 |  0.263374    | 0.0122039   |
        |  3 | LIMIT_BAL | (70000.0,120000.0]  |          1112 |          3468 |  0.121269    | 0.00232077  |
        |  4 | LIMIT_BAL | (120000.0,140000.0] |           327 |          1151 |  0.000260766 | 3.35032e-09 |
        |  5 | LIMIT_BAL | (140000.0,220000.0] |          1103 |          5184 | -0.288856    | 0.0160792   |
        |  6 | LIMIT_BAL | (220000.0,240000.0] |           223 |          1133 | -0.366765    | 0.00546071  |
        |  7 | LIMIT_BAL | (240000.0,260000.0] |           138 |           733 | -0.411205    | 0.00434948  |
        |  8 | LIMIT_BAL | (260000.0,360000.0] |           556 |          3164 | -0.480137    | 0.0247926   |
        |  9 | LIMIT_BAL | (360000.0,460000.0] |           165 |          1160 | -0.691543    | 0.0171397   |
        | 10 | LIMIT_BAL | (460000.0,+inf)     |           129 |          1022 | -0.811017    | 0.0197102   |
        | 11 | LIMIT_BAL | [nan]               |             0 |             0 |  0           | 0           |
        
        ## 多特征分箱
        
        ### 手动分箱
        
        ```
        from skcredit.feature_discretization import DiscreteCust
        
        sc = SplitCat(min_information_value_split_gain=0.0001)
        sc.fit(train_x["EDUCATION"], train_y)
        sn = SplitNum(min_information_value_split_gain=0.0001)
        sn.fit(train_x["LIMIT_BAL"], train_y)
        
        cust = DiscreteCust(keep_columns=["ID"], date_columns=[], cat_spliter={"EDUCATION": sc}, num_spliter={"LIMIT_BAL":sn})
        cust.fit(train_x, train_y)
        train_x = cust.transform(train_x)
        test_x  = cust.transform(test_x )
        ```
        
        ### 自动分箱
        
        ```
        auto = DiscreteAuto(keep_columns=["ID"], date_columns=[], cat_columns=cat_columns, num_columns=num_columns)
        auto.fit(train_x, train_y)
        train_x = auto.transform(train_x)
        test_x  = auto.transform(test_x )
        ```
        
        ### 查看分箱后信息
        
        ```
        # cust.information_value_score.head() 
        print(auto.information_value_score.head().to_markdown())
        ```
        |       |      IvS |
        |:------|---------:|
        | PAY_0 | 0.874626 |
        | PAY_2 | 0.560568 |
        | PAY_3 | 0.420644 |
        | PAY_4 | 0.364907 |
        | PAY_5 | 0.330149 |
        
        ```
        # cust.information_value_table.head()
        print(auto.information_value_table.head().to_markdown())
        ```
        |    | Column   | Bucket    |   CntPositive |   CntNegative |       WoE |       IvS |
        |---:|:---------|:----------|--------------:|--------------:|----------:|----------:|
        |  0 | PAY_0    | (-inf,0]  |          2391 |         14987 | -0.578847 | 0.217663  |
        |  1 | PAY_0    | (0,1]     |           959 |          1821 |  0.615374 | 0.0544047 |
        |  2 | PAY_0    | (1,+inf)  |          1635 |           707 |  2.09499  | 0.602558  |
        |  3 | PAY_0    | [nan]     |             0 |             0 |  0        | 0         |
        |  0 | PAY_2    | (-inf,0]  |          3105 |         16066 | -0.387067 | 0.113953  |
        
        ## 特征选择
        ```
        select = SelectBins(keep_columns=["ID"], date_columns=[])
        select.fit(train_x, train_y)
        train_x = select.transform(train_x)
        test_x  = select.transform(test_x )
        ```
        ```
        select = SelectCIFE(keep_columns=["ID"], date_columns=[], nums_feature=10)
        select.fit(train_x, train_y)
        train_x = select.transform(train_x)
        test_x  = select.transform(test_x )
        ```
        
        ## 模型训练
        
        ```
        lmclassifier = LMClassifier(keep_columns=["ID"], date_columns=[])
        lmclassifier.fit(train_x, train_y)
        print("train ks {}".format(lmclassifier.score(train_x, train_y)))
        print("test  ks {}".format(lmclassifier.score(test_x,  test_y )))
        ```
        train ks 0.41442 \
        test  ks 0.39140
        
        ## 评分卡生成
        ```
        lmcreditcard = LMCreditcard(
                keep_columns=["ID"], date_columns=[], discrete=discrete, lmclassifier=lmclassifier, BASE=500,  PDO=20,  ODDS=1)
        lmcreditcard.show_scorecard()
        ```
        |    | Column   | Bucket           |        WoE |   Coefficients |   PartialScore |   OffsetScores |
        |---:|:---------|:-----------------|-----------:|---------------:|---------------:|---------------:|
        |  0 | PAY_AMT2 | (-inf,30.0]      |  0.555291  |      0.284123  |      -4.55231  |        536.095 |
        |  1 | PAY_AMT2 | (30.0,4931.0]    |  0.0159422 |      0.284123  |      -0.130696 |        536.095 |
        |  2 | PAY_AMT2 | (4931.0,15282.0] | -0.424907  |      0.284123  |       3.48342  |        536.095 |
        |  3 | PAY_AMT2 | (15282.0,+inf)   | -1.21984   |      0.284123  |      10.0003   |        536.095 |
        |  4 | PAY_AMT2 | [nan]            |  0         |      0.284123  |      -0        |        536.095 |
        |  0 | PAY_AMT1 | (-inf,10.0]      |  0.688936  |      0.283557  |      -5.63668  |        536.095 |
        |  1 | PAY_AMT1 | (10.0,4861.0]    | -0.0125914 |      0.283557  |       0.10302  |        536.095 |
        |  2 | PAY_AMT1 | (4861.0,+inf)    | -0.591584  |      0.283557  |       4.84017  |        536.095 |
        |  3 | PAY_AMT1 | [nan]            |  0         |      0.283557  |      -0        |        536.095 |
        |  0 | PAY_0    | (-inf,0]         | -0.578847  |      0.749677  |      12.5211   |        536.095 |
        |  1 | PAY_0    | (0,1]            |  0.615374  |      0.749677  |     -13.3112   |        536.095 |
        |  2 | PAY_0    | (1,+inf)         |  2.09499   |      0.749677  |     -45.317    |        536.095 |
        |  3 | PAY_0    | [nan]            |  0         |      0.749677  |      -0        |        536.095 |
        |  0 | PAY_3    | (-inf,0]         | -0.318517  |      0.204782  |       1.88204  |        536.095 |
        |  1 | PAY_3    | (0,+inf)         |  1.36739   |      0.204782  |      -8.07961  |        536.095 |
        |  2 | PAY_3    | [nan]            |  0         |      0.204782  |      -0        |        536.095 |
        |  0 | PAY_2    | (-inf,0]         | -0.387067  |      0.0994362 |       1.11054  |        536.095 |
        |  1 | PAY_2    | (0,+inf)         |  1.51702   |      0.0994362 |      -4.35252  |        536.095 |
        |  2 | PAY_2    | [nan]            |  0         |      0.0994362 |      -0        |        536.095 |
        |  0 | PAY_4    | (-inf,0]         | -0.265818  |      0.331433  |       2.54205  |        536.095 |
        |  1 | PAY_4    | (0,+inf)         |  1.41463   |      0.331433  |     -13.5283   |        536.095 |
        |  2 | PAY_4    | [nan]            |  0         |      0.331433  |      -0        |        536.095 |
        |  0 | PAY_AMT4 | (-inf,0.0]       |  0.475069  |      0.407116  |      -5.58058  |        536.095 |
        |  1 | PAY_AMT4 | (0.0,1813.0]     |  0.0656848 |      0.407116  |      -0.771592 |        536.095 |
        |  2 | PAY_AMT4 | (1813.0,4000.0]  | -0.128344  |      0.407116  |       1.50764  |        536.095 |
        |  3 | PAY_AMT4 | (4000.0,+inf)    | -0.51447   |      0.407116  |       6.04342  |        536.095 |
        |  4 | PAY_AMT4 | [nan]            |  0         |      0.407116  |      -0        |        536.095 |
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: OS Independent
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Unix
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Description-Content-Type: text/markdown
