PyEGRO GPR Module API Reference

This document provides detailed API reference for the Gaussian Process Regression (GPR) module in the PyEGRO package.

Table of Contents

MetaTraining Class

The MetaTraining class provides a high-level interface for training and managing Gaussian Process Regression models.

Constructor

MetaTraining(
    test_size=0.3,
    num_iterations=1000,
    prefer_gpu=True,
    show_progress=True,
    show_hardware_info=True,
    show_model_info=True,
    output_dir='RESULT_MODEL_GPR',
    data_dir='DATA_PREPARATION',
    data_info_file=None,
    data_training_file=None,
    kernel='matern15',
    learning_rate=0.01,
    patience=50
)

Parameters

  • test_size (float, optional): Fraction of data to use for testing if no test data is provided. Default: 0.3
  • num_iterations (int, optional): Number of training iterations. Default: 1000
  • prefer_gpu (bool, optional): Whether to use GPU if available. Default: True
  • show_progress (bool, optional): Whether to show detailed progress. Default: True
  • show_hardware_info (bool, optional): Whether to show system hardware info. Default: True
  • show_model_info (bool, optional): Whether to show model architecture info. Default: True
  • output_dir (str, optional): Directory for saving results. Default: 'RESULT_MODEL_GPR'
  • data_dir (str, optional): Directory containing input data. Default: 'DATA_PREPARATION'
  • data_info_file (str, optional): Path to data info JSON file. Default: None
  • data_training_file (str, optional): Path to training data CSV file. Default: None
  • kernel (str, optional): Kernel to use for GPR model. Options: 'matern25', 'matern15', 'matern05', 'rbf'. Default: 'matern15'
  • learning_rate (float, optional): Learning rate for optimizer. Default: 0.01
  • patience (int, optional): Number of iterations to wait for improvement before early stopping. Default: 50

Methods

train

Train the GPR model with more flexible data options.

train(X=None, y=None, X_test=None, y_test=None, feature_names=None, custom_data=False)
Parameters
  • X (numpy.ndarray, pandas.DataFrame, or str, optional): Training features or path to training data CSV. Default: None
  • y (numpy.ndarray, pandas.DataFrame, or pandas.Series, optional): Training targets (only needed if custom_data=True). Default: None
  • X_test (numpy.ndarray or pandas.DataFrame, optional): Test features. Default: None
  • y_test (numpy.ndarray, pandas.DataFrame, or pandas.Series, optional): Test targets. Default: None
  • feature_names (list of str, optional): List of feature names. Default: None
  • custom_data (bool, optional): Whether using custom data instead of loading from files. Default: False
Returns
  • model (GPRegressionModel): Trained model
  • scaler_X (StandardScaler): Feature scaler
  • scaler_y (StandardScaler): Target scaler

predict

Make predictions using the trained model.

predict(X)
Parameters
  • X (numpy.ndarray or pandas.DataFrame): Input features
Returns
  • mean (numpy.ndarray): Mean predictions
  • std (numpy.ndarray): Standard deviations of predictions

load_model

Load a trained model from disk.

load_model(model_path=None)
Parameters
  • model_path (str, optional): Path to the saved model file. If None, uses the default path. Default: None
Returns
  • model (GPRegressionModel): Loaded model

Print the learned hyperparameters of the model.

print_hyperparameters()

GPRegressionModel Class

The GPRegressionModel class implements a Gaussian Process Regression model with configurable kernels.

Constructor

GPRegressionModel(train_x, train_y, likelihood, kernel='matern15')

Parameters

  • train_x (torch.Tensor): Training input data
  • train_y (torch.Tensor): Training target data
  • likelihood (gpytorch.likelihoods.Likelihood): GP likelihood function
  • kernel (str, optional): Kernel type. Options: 'matern25', 'matern15', 'matern05', 'rbf'. Default: 'matern15'

Methods

forward

Computes the mean and covariance of the GP posterior.

forward(x)
Parameters
  • x (torch.Tensor): Input data
Returns
  • gpytorch.distributions.MultivariateNormal: Distribution with predicted mean and covariance

DeviceAgnosticGPR Class

The DeviceAgnosticGPR class provides a device-agnostic handler for GPR models, making it easier to use models on both CPU and GPU.

Constructor

DeviceAgnosticGPR(prefer_gpu=False)

Parameters

  • prefer_gpu (bool, optional): Whether to use GPU if available. Default: False

Methods

load_model

Load model using state dict approach.

load_model(model_dir='RESULT_MODEL_GPR')
Parameters
  • model_dir (str, optional): Directory containing the model files. Default: 'RESULT_MODEL_GPR'
Returns
  • bool: True if model was loaded successfully, False otherwise

predict

Make predictions with the loaded GPR model.

predict(X, batch_size=1000)
Parameters
  • X (numpy.ndarray): Input features (n_samples, n_features)
  • batch_size (int, optional): Batch size for processing large datasets. Default: 1000
Returns
  • Tuple of (mean_predictions, std_predictions) as numpy arrays

Visualization Functions

visualize_gpr

Create comprehensive visualizations for GPR model performance.

visualize_gpr(
    meta,
    X_train,
    y_train,
    X_test=None,
    y_test=None, 
    variable_names=None,
    bounds=None,
    savefig=False,
    output_dir=None
)

Parameters

  • meta (MetaTraining): Trained MetaTraining instance
  • X_train (numpy.ndarray or pandas.DataFrame): Training inputs
  • y_train (numpy.ndarray, pandas.DataFrame, or pandas.Series): Training targets
  • X_test (numpy.ndarray or pandas.DataFrame, optional): Test inputs. Default: None
  • y_test (numpy.ndarray, pandas.DataFrame, or pandas.Series, optional): Test targets. Default: None
  • variable_names (list of str, optional): Names of input variables. Default: None
  • bounds (numpy.ndarray, optional): Bounds of input variables for sampling. Default: None
  • savefig (bool, optional): Whether to save figures to disk. Default: False
  • output_dir (str, optional): Directory to save figures. Default: None, uses meta.output_dir

Returns

  • figures (dict): Dictionary of figure handles