featureprocessing Package¶
featureprocessing Package¶
ComBat Module¶
Decomposition Module¶
- WORC.featureprocessing.Decomposition.Decomposition(features, patientinfo, config, output, label_type=None, verbose=True)[source]¶
Perform decompositions to two components of the feature space.
Useage is similar to StatisticalTestFeatures.
Parameters¶
- features: string, mandatory
contains the paths to all .hdf5 feature files used. modalityname1=file1,file2,file3,… modalityname2=file1,… Thus, modalities names are always between a space and a equal sign, files are split by commas. We assume that the lists of files for each modality has the same length. Files on the same position on each list should belong to the same patient.
- patientinfo: string, mandatory
Contains the path referring to a .txt file containing the patient label(s) and value(s) to be used for learning. See the Github Wiki for the format.
- config: string, mandatory
path referring to a .ini file containing the parameters used for feature extraction. See the Github Wiki for the possible fields and their description.
# TODO: outputs
- verbose: boolean, default True
print final feature values and labels to command line or not.
FeatureConverter Module¶
- WORC.featureprocessing.FeatureConverter.FeatureConverter(feat_in, toolbox, config, feat_out)[source]¶
Convert features as extracted by a third-party toolbox to WORC format.
Parameters¶
- feat_in: string
Path to input feature file as outputted by the feature extraction toolbox.
- toolbox: string
Name of toolbox from which features are extracted.
- config: string
Path to .ini file containing the configuration for this function.
- feat_out: string
Path to .hdf5 file to which converted features should be saved
- WORC.featureprocessing.FeatureConverter.convert_PREDICT(features, feat_out)[source]¶
Convert features from PREDICT toolbox to WORC compatible format.
As PREDICT is the WORC default toolbox, we only need to add the name of the toolbox.
ICCThreshold Module¶
- class WORC.featureprocessing.ICCThreshold.ICCThreshold(ICCtype='intra', threshold=0.75)[source]¶
Bases:
BaseEstimator,SelectorMixinObject to fit feature selection based on intra- or inter-class correlation coefficient as defined by
Shrout, Patrick E., and Joseph L. Fleiss. “Intraclass correlations: uses in assessing rater reliability.” Psychological bulletin 86.2 (1979): 420. http://rokwa.x-y.net/Shrout-Fleiss-ICC.pdf
For the intra-class, we use ICC(3,1).For the inter-class ICC, we should use ICC(2,1) according to definitions of the paper, but according to radiomics literatue (https://www.tandfonline.com/doi/pdf/10.1080/0284186X.2018.1445283?needAccess=true, https://www.tandfonline.com/doi/pdf/10.3109/0284186X.2013.812798?needAccess=true), we use ICC(3,1) anyway.
The default threshold of 0.75 is also based on the literature metioned above.
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {}¶
- __init__(ICCtype='intra', threshold=0.75)[source]¶
Parameters¶
- ICCtype: string, default ‘intra’
Type of ICC used. intra results in ICC(3,1), inter in ICC(2,1)
- threshold: float, default 0.75
Threshold for ICC-value in order for feature to be selected
- __module__ = 'WORC.featureprocessing.ICCThreshold'¶
- fit(X_trains)[source]¶
Select only features specificed by the metric and threshold per patient.
Parameters¶
- X_trains: numpy array, mandatory
Array containing feature values used for model_selection. Number of objects on first axis, features on second axis, observers on third axis.
- Y_train: numpy array, mandatory
Array containing the binary labels for each object in X_train.
- set_fit_request(*, X_trains: bool | None | str = '$UNCHANGED$') ICCThreshold¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- X_trainsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
X_trainsparameter infit.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, inputarray: bool | None | str = '$UNCHANGED$') ICCThreshold¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- inputarraystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
inputarrayparameter intransform.
Returns¶
- selfobject
The updated object.
- WORC.featureprocessing.ICCThreshold.convert_features_ICC_threshold(features_in, csv_out=None, features_out=None, threshold=0.75)[source]¶
For features from multiple observers, compute ICC, return values, and optionally apply thresholding and save output.
features_in: list, containing one list per observer. csv_out: csv file, name of file to which ICC values should be written features_out: list, containing file names of output features.
Imputer Module¶
- class WORC.featureprocessing.Imputer.Imputer(missing_values='nan', strategy='mean', n_neighbors=5)[source]¶
Bases:
objectModule for feature imputation.
- __dict__ = mappingproxy({'__module__': 'WORC.featureprocessing.Imputer', '__doc__': 'Module for feature imputation.', '__init__': <function Imputer.__init__>, 'fit': <function Imputer.fit>, 'transform': <function Imputer.transform>, '__dict__': <attribute '__dict__' of 'Imputer' objects>, '__weakref__': <attribute '__weakref__' of 'Imputer' objects>, '__annotations__': {}})¶
- __init__(missing_values='nan', strategy='mean', n_neighbors=5)[source]¶
Imputation of feature values using either sklearn, missingpy or (WIP) fancyimpute approaches.
Parameters¶
- missing_valuesnumber, string, np.nan (default) or None
The placeholder for the missing values. All occurrences of missing_values will be imputed.
- strategystring, optional (default=”mean”)
The imputation strategy.
Supported using sklearn: - If “mean”, then replace missing values using the mean along
each column. Can only be used with numeric data.
If “median”, then replace missing values using the median along each column. Can only be used with numeric data.
If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data.
If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.
Supported using missingpy: - If ‘knn’, then use a nearest neighbor search. Can be
used with strings or numeric data.
WIP: More strategies using fancyimpute
- n_neighborsint, optional (default = 5)
Number of neighboring samples to use for imputation if method is knn.
- __module__ = 'WORC.featureprocessing.Imputer'¶
- __weakref__¶
list of weak references to the object (if defined)
OneHotEncoderWrapper Module¶
- class WORC.featureprocessing.OneHotEncoderWrapper.OneHotEncoderWrapper(feature_labels_tofit, handle_unknown='ignore', verbose=False)[source]¶
Bases:
objectModule for OneHotEncoding features.
- __dict__ = mappingproxy({'__module__': 'WORC.featureprocessing.OneHotEncoderWrapper', '__doc__': 'Module for OneHotEncoding features.', '__init__': <function OneHotEncoderWrapper.__init__>, 'fit': <function OneHotEncoderWrapper.fit>, 'transform': <function OneHotEncoderWrapper.transform>, '__dict__': <attribute '__dict__' of 'OneHotEncoderWrapper' objects>, '__weakref__': <attribute '__weakref__' of 'OneHotEncoderWrapper' objects>, '__annotations__': {}})¶
- __init__(feature_labels_tofit, handle_unknown='ignore', verbose=False)[source]¶
Init preprocessor of features.
- __module__ = 'WORC.featureprocessing.OneHotEncoderWrapper'¶
- __weakref__¶
list of weak references to the object (if defined)
- transform(inputarray)[source]¶
Transform feature array.
Transform the inputarray to select only the features based on the result from the fit function.
Parameters¶
- inputarray: numpy array, mandatory
Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.
Preprocessor Module¶
- class WORC.featureprocessing.Preprocessor.Preprocessor(verbose=True)[source]¶
Bases:
objectModule for feature preprocessing.
- Currently implemented:
Remove features with > 80% NaNs
- __dict__ = mappingproxy({'__module__': 'WORC.featureprocessing.Preprocessor', '__doc__': 'Module for feature preprocessing.\n\n Currently implemented:\n - Remove features with > 80% NaNs\n ', '__init__': <function Preprocessor.__init__>, 'fit': <function Preprocessor.fit>, 'transform': <function Preprocessor.transform>, '__dict__': <attribute '__dict__' of 'Preprocessor' objects>, '__weakref__': <attribute '__weakref__' of 'Preprocessor' objects>, '__annotations__': {}})¶
- __module__ = 'WORC.featureprocessing.Preprocessor'¶
- __weakref__¶
list of weak references to the object (if defined)
- transform(inputarray)[source]¶
Transform feature array.
Transform the inputarray to select only the features based on the result from the fit function.
Parameters¶
- inputarray: numpy array, mandatory
Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.
Relief Module¶
- class WORC.featureprocessing.Relief.SelectMulticlassRelief(n_neighbours=3, sample_size=1, distance_p=2, numf=None, random_state=None)[source]¶
Bases:
BaseEstimator,SelectorMixinObject to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {}¶
- __init__(n_neighbours=3, sample_size=1, distance_p=2, numf=None, random_state=None)[source]¶
Parameters¶
- n_neightbors: integer
Number of nearest neighbours used.
- sample_size: float
Percentage of samples used to calculate score
- distance_p: integer
Parameter in minkov distance usde for nearest neighbour calculation
- numf: integer, default None
Number of important features to be selected with respect to their ranking. If None, all are used.
- __module__ = 'WORC.featureprocessing.Relief'¶
- fit(X, y, random_state=None)[source]¶
Select only features specificed by parameters per patient.
Parameters¶
- feature_values: numpy array, mandatory
Array containing feature values used for model_selection. Number of objects on first axis, features on second axis.
- feature_labels: list, mandatory
Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.
- multi_class_relief(feature_set, label_set, nb=3, sample_size=1, distance_p=2, numf=None, random_state=None)[source]¶
- set_fit_request(*, random_state: bool | None | str = '$UNCHANGED$') SelectMulticlassRelief¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- random_statestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
random_stateparameter infit.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, inputarray: bool | None | str = '$UNCHANGED$') SelectMulticlassRelief¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- inputarraystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
inputarrayparameter intransform.
Returns¶
- selfobject
The updated object.
Scalers Module¶
- class WORC.featureprocessing.Scalers.LogStandardScaler(*, copy=True, with_mean=True, with_std=True)[source]¶
Bases:
StandardScalerScale features using z-score and a logit transform.
This scaler first applies a logit transform to each feature before applying a z-score, i.e. the standard scaler. To handle negative and zero values, a constant is added before applying the logit transform:
lij = log(fij - min(Fj) + median(Fj) - min(Fj)) Zij = (lij - mu)/ sigma
Based on https://arxiv.org/pdf/2012.06875v1.pdf.
- __annotations__ = {}¶
- __module__ = 'WORC.featureprocessing.Scalers'¶
- fit(X, y=None)[source]¶
Compute the mean and std to be used for later scaling.
Parameters¶
- X{array-like, sparse matrix}, shape [n_samples, n_features]
The data used to compute the mean and standard deviation used for later scaling along the features axis.
- y
Ignored
- set_inverse_transform_request(*, copy: bool | None | str = '$UNCHANGED$') LogStandardScaler¶
Request metadata passed to the
inverse_transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toinverse_transformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toinverse_transform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
copyparameter ininverse_transform.
Returns¶
- selfobject
The updated object.
- set_partial_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LogStandardScaler¶
Request metadata passed to the
partial_fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topartial_fitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topartial_fit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inpartial_fit.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') LogStandardScaler¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
copyparameter intransform.
Returns¶
- selfobject
The updated object.
- class WORC.featureprocessing.Scalers.RobustStandardScaler(*, copy=True, with_mean=True, with_std=True)[source]¶
Bases:
StandardScalerScale features using z-score that is robust to outliers.
This scaler removes outliers (<5th and >95th percentile) and afterwards uses z-scoring to scale the features.
This scaler is thus a combination of the RobustScaler and StandardScaler from sklearn, hence please see those respective documentations for more information:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html
- __annotations__ = {}¶
- __module__ = 'WORC.featureprocessing.Scalers'¶
- fit(X, y=None)[source]¶
Compute the mean and std to be used for later scaling.
Note: if over 80% of the features are excluded in robustness, we switch to the standardscaler, as otherwise all numbers will be NaN after scaling.
Parameters¶
- X{array-like, sparse matrix}, shape [n_samples, n_features]
The data used to compute the mean and standard deviation used for later scaling along the features axis.
- y
Ignored
- set_inverse_transform_request(*, copy: bool | None | str = '$UNCHANGED$') RobustStandardScaler¶
Request metadata passed to the
inverse_transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toinverse_transformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toinverse_transform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
copyparameter ininverse_transform.
Returns¶
- selfobject
The updated object.
- set_partial_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RobustStandardScaler¶
Request metadata passed to the
partial_fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topartial_fitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topartial_fit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inpartial_fit.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') RobustStandardScaler¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
copyparameter intransform.
Returns¶
- selfobject
The updated object.
- class WORC.featureprocessing.Scalers.WORCScaler(method='robust_z_score', skip_features=None, verbose=False)[source]¶
Bases:
TransformerMixin,BaseEstimatorScale features using an sklearn scaler.
Additionally, several features can be excluded. Mostly useful when using also categorical features such as patient sex.
- __annotations__ = {}¶
- __init__(method='robust_z_score', skip_features=None, verbose=False)[source]¶
Initialize object.
Parameters¶
- method: string
Name of scaler used: robust_z_score, z_score, robust, or minmax
- skip_features: list of strings
If any of these elements occur as substring in a feature label, this feature is excluded.
- __module__ = 'WORC.featureprocessing.Scalers'¶
- set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', feature_labels: bool | None | str = '$UNCHANGED$') WORCScaler¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- X_trainstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
X_trainparameter infit.- feature_labelsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
feature_labelsparameter infit.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, X_test: bool | None | str = '$UNCHANGED$') WORCScaler¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- X_teststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
X_testparameter intransform.
Returns¶
- selfobject
The updated object.
SelectGroups Module¶
- class WORC.featureprocessing.SelectGroups.SelectGroups(parameters, toolboxes=['PREDICT'])[source]¶
Bases:
BaseEstimator,SelectorMixinObject to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.
The following groups can be selected, and are detected through looking for the following substrings in the feature label:
histogram_features: hf_
shape_features: sf_
orientation_features: of_
semantic_features: semf_
dicom_features: df_
phase_features: phasef_
vessel_features: vf_
texture_Gabor_features: Gabor
texture_GLCM_features: GLCM_
texture_GLCMMS_features: GLCMMS_
texture_GLRLM_features: GLRLM_
texture_GLSZM_features: GLSZM_
texture_GLDZM_features: GLDZM_
texture_NGTDM_features: NGTDM_
texture_NGLDM_features: NGLDM_
texture_LBP_features: LBP_
fractal_features: fracf_
location_features: locf_
RGRD_features: rgrdf_
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {}¶
- __init__(parameters, toolboxes=['PREDICT'])[source]¶
Parameters¶
- parameters: dict, mandatory
Contains the settings for the groups to be selected. Should contain the settings for the following groups: - histogram_features - shape_features - orientation_features - semantic_features - dicom_features - coliage_features - phase_features - vessel_features - texture_Gabor_features - texture_GLCM_features - texture_GLCMMS_features - texture_GLRLM_features - texture_GLSZM_features - texture_GLDZM_features - texture_NGTDM_features - texture_NGLDM_features - texture_LBP_features - fractal_features - location_features - RGRD_features
Also, should contain a parameter for selecting per feature toolbox: - PREDICT - PyRadiomics
And a parameter to select whether transformation have been applied: - original_features - wavelet_features - log_features
- __module__ = 'WORC.featureprocessing.SelectGroups'¶
- fit(feature_labels)[source]¶
Select only features specificed by parameters per patient.
Parameters¶
- feature_labels: list, optional
Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.
- set_fit_request(*, feature_labels: bool | None | str = '$UNCHANGED$') SelectGroups¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- feature_labelsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
feature_labelsparameter infit.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, inputarray: bool | None | str = '$UNCHANGED$') SelectGroups¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- inputarraystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
inputarrayparameter intransform.
Returns¶
- selfobject
The updated object.
SelectIndividuals Module¶
- class WORC.featureprocessing.SelectIndividuals.SelectIndividuals(parameters=['hf_mean', 'sf_compactness'])[source]¶
Bases:
BaseEstimator,SelectorMixinObject to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {}¶
- __init__(parameters=['hf_mean', 'sf_compactness'])[source]¶
Parameters¶
- parameters: dict, mandatory
Contains the settings for the groups to be selected. Should contain the settings for the following groups: - histogram_features - shape_features - orientation_features - semantic_features - patient_features - coliage_features - phase_features - vessel_features - log_features - texture_features
- __module__ = 'WORC.featureprocessing.SelectIndividuals'¶
- fit(feature_labels)[source]¶
Select only features specificed by parameters per patient.
Parameters¶
- feature_labels: list, optional
Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.
- set_fit_request(*, feature_labels: bool | None | str = '$UNCHANGED$') SelectIndividuals¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- feature_labelsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
feature_labelsparameter infit.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, inputarray: bool | None | str = '$UNCHANGED$') SelectIndividuals¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- inputarraystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
inputarrayparameter intransform.
Returns¶
- selfobject
The updated object.
StatisticalTestFeatures Module¶
- WORC.featureprocessing.StatisticalTestFeatures.StatisticalTestFeatures(features, patientinfo, config, output_csv=None, output_png=None, output_tex=None, plot_test='MWU', Bonferonni=True, fontsize='small', yspacing=1, threshold=0.05, verbose=True, label_type=None)[source]¶
Perform several statistical tests on features, such as a student t-test.
Parameters¶
- features: string, mandatory
contains the paths to all .hdf5 feature files used. modalityname1=file1,file2,file3,… modalityname2=file1,… Thus, modalities names are always between a space and a equal sign, files are split by commas. We assume that the lists of files for each modality has the same length. Files on the same position on each list should belong to the same patient.
- patientinfo: string, mandatory
Contains the path referring to a .txt file containing the patient label(s) and value(s) to be used for learning. See the Github Wiki for the format.
- config: string, mandatory
path referring to a .ini file containing the parameters used for feature extraction. See the Github Wiki for the possible fields and their description.
# TODO: outputs
- verbose: boolean, default True
print final feature values and labels to command line or not.
StatisticalTestThreshold Module¶
- class WORC.featureprocessing.StatisticalTestThreshold.StatisticalTestThreshold(metric='ttest', threshold=0.05)[source]¶
Bases:
BaseEstimator,SelectorMixinObject to fit feature selection based on statistical tests.
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {}¶
- __init__(metric='ttest', threshold=0.05)[source]¶
Parameters¶
- metric: string, default ‘ttest’
Statistical test used for selection. Options are ttest, Welch, Wilcoxon, MannWhitneyU
- threshold: float, default 0.05
Threshold for p-value in order for feature to be selected
- __module__ = 'WORC.featureprocessing.StatisticalTestThreshold'¶
- fit(X_train, Y_train)[source]¶
Select only features specificed by the metric and threshold per patient.
Parameters¶
- X_train: numpy array, mandatory
Array containing feature values used for model_selection. Number of objects on first axis, features on second axis.
- Y_train: numpy array, mandatory
Array containing the binary labels for each object in X_train.
- set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', Y_train: bool | None | str = '$UNCHANGED$') StatisticalTestThreshold¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- X_trainstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
X_trainparameter infit.- Y_trainstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
Y_trainparameter infit.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, inputarray: bool | None | str = '$UNCHANGED$') StatisticalTestThreshold¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- inputarraystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
inputarrayparameter intransform.
Returns¶
- selfobject
The updated object.
VarianceThreshold Module¶
- class WORC.featureprocessing.VarianceThreshold.VarianceThresholdMean(threshold)[source]¶
Bases:
BaseEstimator,SelectorMixinSelect features based on variance among objects. Similar to VarianceThreshold from sklearn, but does take the mean of the feature into account.
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {}¶
- __module__ = 'WORC.featureprocessing.VarianceThreshold'¶
- set_fit_request(*, image_features: bool | None | str = '$UNCHANGED$') VarianceThresholdMean¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- image_featuresstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
image_featuresparameter infit.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, inputarray: bool | None | str = '$UNCHANGED$') VarianceThresholdMean¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.Parameters¶
- inputarraystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
inputarrayparameter intransform.
Returns¶
- selfobject
The updated object.
- WORC.featureprocessing.VarianceThreshold.selfeat_variance(image_features, labels=None, thresh=0.99, method='nomean')[source]¶
Select features using a variance threshold.
Parameters¶
- image_features: numpy array, mandatory
Array containing the feature values to apply the variance threshold selection on. The rows correspond to the patients, the column to the features.
- labels: numpy array, optional
Array containing the labels of the corresponding features. Array should therefore have the same shape as the image_features array.
- thresh: float, default 0.99
Threshold to be used as lower boundary for feature variance among patients.
- method: string, default nomean.
Method to use for selection. Default: do not use the mean of the features. Other valid option is ‘mean’.
Returns¶
- image_features: numpy array
Transformed features array.
- labels: list or None
When labels are given, returns the transformed labels. That object contains a list of all label names kept.
- sel: VarianceThreshold object
The fitted variance threshold object.