beexai.training package

Submodules

beexai.training.models module

Architectures for neural networks.

class beexai.training.models.NNModel(input_dim: int, output_dim: int, task: str, n_neurons: int = 32, device: str = 'cpu', batch_norm: bool = True, use_dropout: bool = True, dropout_rate: float = 0.1, n_hidden_layers: int = 1)[source]

Bases: NeuralNetwork

Inherit from NeuralNetwork to overwrite fit and predict methods.

output_dim

output dimension

Type:

int

device

device to use

Type:

str

fit()[source]

fit the model

predict()[source]

predict the output

predict_proba()[source]

predict the output probabilities

fit(x_train: DataFrame | ndarray | Tensor, y_train: DataFrame | ndarray | Tensor, learning_rate: float = 0.005, epochs: int = 1000, loss_file: str | None = None, x_val: DataFrame | ndarray | Tensor | None = None, y_val: DataFrame | ndarray | Tensor | None = None) Any[source]
predict(x_test: DataFrame | ndarray | Tensor) Tensor[source]
predict_proba(x_test: DataFrame | ndarray | Tensor) Tensor[source]
train_step(x_train: Tensor, y_train: Tensor, criterion: Any, optimizer: Any) float[source]

Train the model for one epoch.

Parameters:
  • x_train (torch.Tensor) – features

  • y_train (torch.Tensor) – labels

  • criterion (any) – loss function

  • optimizer (any) – optimizer

Returns:

loss

Return type:

float

val_step(x_val: DataFrame | ndarray | Tensor, y_val: DataFrame | ndarray | Tensor, criterion: Any) float[source]

Validate the model for one epoch.

Parameters:
  • x_val (torch.Tensor) – features

  • y_val (torch.Tensor) – labels

  • criterion (any) – loss function

Returns:

loss

Return type:

float

class beexai.training.models.NeuralNetwork(input_dim: int, output_dim: int, task: str, n_neurons: int = 32, batch_norm: bool = True, use_dropout: bool = True, dropout_rate: float = 0.1, n_hidden_layers: int = 1)[source]

Bases: Module

Neural network class.

forward(x_in: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class beexai.training.models.NeuralNetworkBlock(n_neurons: int = 32, batch_norm: bool = True, use_dropout: bool = True, dropout_rate: float = 0.1)[source]

Bases: Module

Neural network block class.

forward(x_in: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

beexai.training.train module

Training models and evaluating their performance.

class beexai.training.train.Trainer(model_name: str, task: str, model_params: dict | None = None, device: str = 'cpu')[source]

Bases: object

Trainer class

models

dictionary of available models

Type:

dict

model_name

name of the model

Type:

str

model_params

parameters of the model

Type:

dict

task

task to perform

Type:

str

device

device to use

Type:

str

model

model object

Type:

callable

cross_val()[source]

cross validation for the model

train()[source]

train the model

get_metrics()[source]

get the metrics of the model

save_model()[source]

save the model

load_model()[source]

load the model

Parameters:
  • model_name (str) – Name of the model from models dict. Must be one of ‘LogisticRegression’, ‘LinearRegression’, ‘DecisionTreeClassifier’, ‘RandomForestClassifier’, ‘GradientBoostingClassifier’, ‘XGBClassifier’, ‘DecisionTreeRegressor’, ‘RandomForestRegressor’, ‘GradientBoostingRegressor’, ‘XGBRegressor’, ‘NeuralNetwork’, ‘HistGradientBoostingClassifier’, ‘HistGradientBoostingRegressor’

  • task (str) – “classification” or “regression”.

  • model_params (dict) – Parameters for the model

  • device (str) – device to use. Defaults to “cpu”.

cross_val(x_train: DataFrame, y_train: DataFrame, param_grid: dict | None = None, scoring: str | None = None, kfold: int | KFold = 5, search_type: str = 'grid') Callable[source]

Cross validation for the model

Parameters:
  • x_train (pd.DataFrame) – train set

  • y_train (pd.DataFrame) – target

  • param_grid (dict, optional) – grid search parameters. Defaults to None.

  • scoring (str, optional) – scoring metric. Defaults to None.

  • kfold (Union[int, KFold], optional) – number of folds or kfold object. Defaults to 5.

  • search_type (str, optional) – “grid” or “random”. Defaults to “grid”.

Returns:

best model

Return type:

callable

get_metrics(x: DataFrame, y: DataFrame) dict[source]

Get metrics for the model. Accuracy and f1 score for classification, mse and r2 score for regression.

Parameters:
  • x (pd.DataFrame) – test set

  • y (pd.DataFrame) – target

Raises:

Exception – Task must be either classification or regression

Returns:

dictionary of metrics

Return type:

dict

load_model(path: str)[source]

Load the model

save_model(path: str)[source]

Save the model

train(x_train: DataFrame | ndarray | Tensor, y_train: DataFrame | ndarray | Tensor, learning_rate: float = 0.005, epochs: int = 1000, loss_file: str | None = None, x_val: DataFrame | ndarray | Tensor | None = None, y_val: DataFrame | ndarray | Tensor | None = None) Callable[source]

Perform training on the whole training set.

Parameters:
  • x_train (pd.DataFrame) – x_train

  • y_train (pd.DataFrame) – y_train

  • learning_rate (float, optional) – learning rate. Defaults to 0.005.

  • epochs (int, optional) – number of epochs. Defaults to 1000.

  • loss_file (str, optional) – path to save the loss plot. Defaults to None.

  • x_val (pd.DataFrame, optional) – validation set. Defaults to None.

  • y_val (pd.DataFrame, optional) – validation target. Defaults to None.

Returns:

trained model

Return type:

callable

beexai.training.train.grid_search_all_models(x_train: DataFrame, y_train: DataFrame, task: str, params_dict: dict | None = None, params_grid_dict: dict | None = None, scoring: str | None = None, kfold: int | KFold = 5, search_type: str = 'grid') Tuple[dict, dict][source]

Grid search for all models

Parameters:
  • x_train (pd.DataFrame) – x_train

  • y_train (pd.DataFrame) – y_train

  • task (str) – “classification” or “regression”

  • params_dict (dict, optional) – parameters for each model. Defaults to None.

  • params_grid_dict (dict, optional) – grid search parameters for each model. Defaults to None.

  • scoring (str, optional) – scoring metric. Defaults to None.

  • kfold (Union[int, KFold], optional) – kfold object. Defaults to 5.

  • search_type (str, optional) – “grid” or “random”. Defaults to “grid”.

Returns:

best models and best parameters

Return type:

Tuple[dict, dict]

beexai.training.train.test_all_models(task: str, x_train: DataFrame, x_test: DataFrame, y_train: DataFrame, y_test: DataFrame, params_dict: dict | None = None) None[source]

Train and test all models on the whole training set

Parameters:
  • task (str) – “classification” or “regression”

  • x_train (pd.DataFrame) – train set

  • x_test (pd.DataFrame) – test set

  • y_train (pd.DataFrame) – train target

  • y_test (pd.DataFrame) – test target

  • params_dict (dict, optional) – parameters for each model. Defaults to None.

Module contents