beexai.training package

Submodules

beexai.training.models module

Architectures for neural networks.

class beexai.training.models.NNModel(input_dim: int, output_dim: int, task: str, n_neurons: int = 32, device: str = 'cpu', batch_norm: bool = True, use_dropout: bool = True, dropout_rate: float = 0.1, n_hidden_layers: int = 1)[source]

Bases: NeuralNetwork

Inherit from NeuralNetwork to overwrite fit and predict methods.

output_dim

output dimension

Type:: int

device

device to use

Type:: str

fit()[source]: fit the model

predict()[source]: predict the output

predict_proba()[source]: predict the output probabilities

fit(x_train: DataFrame | ndarray | Tensor, y_train: DataFrame | ndarray | Tensor, learning_rate: float = 0.005, epochs: int = 1000, loss_file: str | None = None, x_val: DataFrame | ndarray | Tensor | None = None, y_val: DataFrame | ndarray | Tensor | None = None) → Any[source]

predict(x_test: DataFrame | ndarray | Tensor) → Tensor[source]

predict_proba(x_test: DataFrame | ndarray | Tensor) → Tensor[source]

train_step(x_train: Tensor, y_train: Tensor, criterion: Any, optimizer: Any) → float[source]

Train the model for one epoch.

Parameters:

x_train (torch.Tensor) – features
y_train (torch.Tensor) – labels
criterion (any) – loss function
optimizer (any) – optimizer

Returns:

loss

Return type:

float

val_step(x_val: DataFrame | ndarray | Tensor, y_val: DataFrame | ndarray | Tensor, criterion: Any) → float[source]

Validate the model for one epoch.

Parameters:

x_val (torch.Tensor) – features
y_val (torch.Tensor) – labels
criterion (any) – loss function

Returns:

loss

Return type:

float

class beexai.training.models.NeuralNetwork(input_dim: int, output_dim: int, task: str, n_neurons: int = 32, batch_norm: bool = True, use_dropout: bool = True, dropout_rate: float = 0.1, n_hidden_layers: int = 1)[source]

Bases: Module

Neural network class.

forward(x_in: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class beexai.training.models.NeuralNetworkBlock(n_neurons: int = 32, batch_norm: bool = True, use_dropout: bool = True, dropout_rate: float = 0.1)[source]

Bases: Module

Neural network block class.

forward(x_in: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

beexai.training.train module

Training models and evaluating their performance.

class beexai.training.train.Trainer(model_name: str, task: str, model_params: dict | None = None, device: str = 'cpu')[source]

Bases: object

Trainer class

models

dictionary of available models

Type:: dict

model_name

name of the model

Type:: str

model_params

parameters of the model

Type:: dict

task

task to perform

Type:: str

device

device to use

Type:: str

model

model object

Type:: callable

cross_val()[source]: cross validation for the model

train()[source]: train the model

get_metrics()[source]: get the metrics of the model

save_model()[source]: save the model

load_model()[source]: load the model

Parameters:

model_name (str) – Name of the model from models dict. Must be one of ‘LogisticRegression’, ‘LinearRegression’, ‘DecisionTreeClassifier’, ‘RandomForestClassifier’, ‘GradientBoostingClassifier’, ‘XGBClassifier’, ‘DecisionTreeRegressor’, ‘RandomForestRegressor’, ‘GradientBoostingRegressor’, ‘XGBRegressor’, ‘NeuralNetwork’, ‘HistGradientBoostingClassifier’, ‘HistGradientBoostingRegressor’
task (str) – “classification” or “regression”.
model_params (dict) – Parameters for the model
device (str) – device to use. Defaults to “cpu”.

cross_val(x_train: DataFrame, y_train: DataFrame, param_grid: dict | None = None, scoring: str | None = None, kfold: int | KFold = 5, search_type: str = 'grid') → Callable[source]

Cross validation for the model

Parameters:

x_train (pd.DataFrame) – train set
y_train (pd.DataFrame) – target
param_grid (dict, optional) – grid search parameters. Defaults to None.
scoring (str, optional) – scoring metric. Defaults to None.
kfold (Union[int, KFold], optional) – number of folds or kfold object. Defaults to 5.
search_type (str, optional) – “grid” or “random”. Defaults to “grid”.

Returns:

best model

Return type:

callable

get_metrics(x: DataFrame, y: DataFrame) → dict[source]

Get metrics for the model. Accuracy and f1 score for classification, mse and r2 score for regression.

Parameters:

x (pd.DataFrame) – test set
y (pd.DataFrame) – target

Raises:

Exception – Task must be either classification or regression

Returns:

dictionary of metrics

Return type:

dict

load_model(path: str)[source]: Load the model

save_model(path: str)[source]: Save the model

Perform training on the whole training set.

Parameters:

x_train (pd.DataFrame) – x_train
y_train (pd.DataFrame) – y_train
learning_rate (float, optional) – learning rate. Defaults to 0.005.
epochs (int, optional) – number of epochs. Defaults to 1000.
loss_file (str, optional) – path to save the loss plot. Defaults to None.
x_val (pd.DataFrame, optional) – validation set. Defaults to None.
y_val (pd.DataFrame, optional) – validation target. Defaults to None.

Returns:

trained model

Return type:

callable

beexai.training.train.grid_search_all_models(x_train: DataFrame, y_train: DataFrame, task: str, params_dict: dict | None = None, params_grid_dict: dict | None = None, scoring: str | None = None, kfold: int | KFold = 5, search_type: str = 'grid') → Tuple[dict, dict][source]

Grid search for all models

Parameters:

x_train (pd.DataFrame) – x_train
y_train (pd.DataFrame) – y_train
task (str) – “classification” or “regression”
params_dict (dict, optional) – parameters for each model. Defaults to None.
params_grid_dict (dict, optional) – grid search parameters for each model. Defaults to None.
scoring (str, optional) – scoring metric. Defaults to None.
kfold (Union[int, KFold], optional) – kfold object. Defaults to 5.
search_type (str, optional) – “grid” or “random”. Defaults to “grid”.

Returns:

best models and best parameters

Return type:

Tuple[dict, dict]

beexai.training.train.test_all_models(task: str, x_train: DataFrame, x_test: DataFrame, y_train: DataFrame, y_test: DataFrame, params_dict: dict | None = None) → None[source]

Train and test all models on the whole training set

Parameters:

task (str) – “classification” or “regression”
x_train (pd.DataFrame) – train set
x_test (pd.DataFrame) – test set
y_train (pd.DataFrame) – train target
y_test (pd.DataFrame) – test target
params_dict (dict, optional) – parameters for each model. Defaults to None.

beexai.training package

Submodules

beexai.training.models module

beexai.training.train module

Module contents