PyEGRO GPR Examples¶

This document provides practical examples of using the PyEGRO GPR module for Gaussian Process Regression modeling tasks.

Table of Contents¶

Basic Usage
Working with Synthetic Data
Working with CSV Data
Custom Data Preparation
Visualization Examples
Model Loading and Prediction
Advanced Usage

Basic Usage¶

Here's a minimal example of training a GPR model with the PyEGRO module:

import numpy as np
from PyEGRO.meta.gpr import MetaTraining

# Generate some synthetic data
X = np.random.rand(100, 2)  # 100 samples, 2 features
y = np.sin(X[:, 0]) + np.cos(X[:, 1]) + 0.1 * np.random.randn(100)

# Initialize meta-training
meta = MetaTraining(
    num_iterations=500,
    prefer_gpu=True,
    kernel='matern15'
)

# Train the model
model, scaler_X, scaler_y = meta.train(X=X, y=y)

# Make predictions
X_new = np.random.rand(10, 2)
y_pred, y_std = meta.predict(X_new)

print("Predictions:", y_pred.flatten())
print("Uncertainties:", y_std.flatten())

Working with Synthetic Data¶

The following example demonstrates how to use the GPR module with synthetic data and visualize the results:

import numpy as np
import matplotlib.pyplot as plt
from PyEGRO.meta.gpr import MetaTraining
from PyEGRO.meta.gpr.visualization import visualize_gpr

# Set random seed for reproducibility
np.random.seed(42)

# Define a true function to sample from
def true_function(x):
    return x * np.sin(x)

# Generate synthetic data
# Training data
n_train = 30
X_train = np.random.uniform(0, 10, n_train).reshape(-1, 1)
y_train = true_function(X_train) + 0.5 * np.random.randn(n_train, 1)  # Add noise

# Testing data
n_test = 50
X_test = np.linspace(0, 12, n_test).reshape(-1, 1)
y_test = true_function(X_test) + 0.25 * np.random.randn(n_test, 1)  # Add less noise

# Define bounds for visualization
bounds = np.array([[0, 12]])
variable_names = ['x']

# Initialize and train GPR model
print("Training GPR model with synthetic data...")
meta = MetaTraining(
    num_iterations=500,
    prefer_gpu=True,
    show_progress=True,
    output_dir='RESULT_MODEL_GPR_SYNTHETIC',
    kernel='matern05',
    learning_rate=0.01,
    patience=50
)

# Train model with synthetic data
model, scaler_X, scaler_y = meta.train(
    X=X_train,
    y=y_train,
    X_test=X_test,
    y_test=y_test,
    feature_names=variable_names
)

# Generate visualization
figures = visualize_gpr(
    meta=meta,
    X_train=X_train,
    y_train=y_train,
    X_test=X_test,
    y_test=y_test,
    variable_names=variable_names,
    bounds=bounds,
    savefig=True
)

# Display figures
plt.show()

Working with CSV Data¶

This example demonstrates how to use the GPR module with data stored in CSV files:

import numpy as np
import pandas as pd
import json
import os
from PyEGRO.meta.gpr import MetaTraining
from PyEGRO.meta.gpr.visualization import visualize_gpr

# Load initial data and problem configuration
with open('DATA_PREPARATION/data_info.json', 'r') as f:
    data_info = json.load(f)

# Load training data
training_data = pd.read_csv('DATA_PREPARATION/training_data.csv')

# Load testing data (if available)
test_data = pd.read_csv('DATA_PREPARATION/testing_data.csv')

# Get problem configuration
bounds = np.array(data_info['input_bound'])
variable_names = [var['name'] for var in data_info['variables']]

# Get target column name (default to 'y' if not specified)
target_column = data_info.get('target_column', 'y')

# Extract features and targets
X_train = training_data[variable_names].values
y_train = training_data[target_column].values.reshape(-1, 1)

# Extract testing data
X_test = test_data[variable_names].values
y_test = test_data[target_column].values.reshape(-1, 1)

# Initialize and train GPR model
print("Training GPR model with CSV data...")
meta = MetaTraining(
    num_iterations=500,
    prefer_gpu=True,
    show_progress=True,
    output_dir='RESULT_MODEL_GPR',
    kernel='matern05',
    learning_rate=0.01,
    patience=50
)

# Train model
model, scaler_X, scaler_y = meta.train(
    X=X_train,
    y=y_train,
    X_test=X_test,
    y_test=y_test,
    feature_names=variable_names
)

# Generate visualization
figures = visualize_gpr(
    meta=meta,
    X_train=X_train,
    y_train=y_train,
    X_test=X_test,
    y_test=y_test,
    variable_names=variable_names,
    bounds=bounds,
    savefig=True
)

Alternatively, you can let the MetaTraining class load the data automatically:

# Initialize with data paths
meta = MetaTraining(
    data_dir='DATA_PREPARATION',
    data_info_file='DATA_PREPARATION/data_info.json',
    data_training_file='DATA_PREPARATION/training_data.csv',
    num_iterations=500,
    prefer_gpu=True,
    kernel='matern05'
)

# Train the model (it will load data from the specified files)
model, scaler_X, scaler_y = meta.train()

Custom Data Preparation¶

Example of preparing a data_info.json file:

import json
import numpy as np
import pandas as pd

# Sample data generation
np.random.seed(42)
n_samples = 100
X1 = np.random.uniform(0, 10, n_samples)
X2 = np.random.uniform(-5, 5, n_samples)
y = 2 * X1 + 3 * X2 + np.sin(X1) * np.cos(X2) + np.random.randn(n_samples) * 0.5

# Create a DataFrame
data = pd.DataFrame({
    'x1': X1,
    'x2': X2,
    'output': y
})

# Split into training (80%) and testing (20%)
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

# Calculate bounds
x1_min, x1_max = data['x1'].min(), data['x1'].max()
x2_min, x2_max = data['x2'].min(), data['x2'].max()

# Create data_info.json
data_info = {
    "variables": [
        {"name": "x1", "type": "continuous"},
        {"name": "x2", "type": "continuous"}
    ],
    "input_bound": [
        [x1_min, x1_max],
        [x2_min, x2_max]
    ],
    "target_column": "output"
}

# Create DATA_PREPARATION directory
import os
os.makedirs("DATA_PREPARATION", exist_ok=True)

# Save files
with open("DATA_PREPARATION/data_info.json", "w") as f:
    json.dump(data_info, f, indent=4)

train_data.to_csv("DATA_PREPARATION/training_data.csv", index=False)
test_data.to_csv("DATA_PREPARATION/testing_data.csv", index=False)

print("Data preparation complete.")

Visualization Examples¶

Creating visualizations for 1D and 2D models:

import numpy as np
import matplotlib.pyplot as plt
from PyEGRO.meta.gpr import MetaTraining
from PyEGRO.meta.gpr.visualization import visualize_gpr

# 1D Example (using previous synthetic data example)
# ...

# Generate visualizations
figures = visualize_gpr(
    meta=meta,
    X_train=X_train,
    y_train=y_train,
    X_test=X_test,
    y_test=y_test,
    variable_names=['x'],
    bounds=np.array([[0, 12]]),
    savefig=True,
    output_dir='visualizations/1d_model'
)

# 2D Example
n_samples = 100
X_train = np.random.rand(n_samples, 2) * np.array([10, 8])
y_train = np.sin(X_train[:, 0]) * np.cos(X_train[:, 1]) + 0.1 * np.random.randn(n_samples)
y_train = y_train.reshape(-1, 1)

meta_2d = MetaTraining(
    num_iterations=300,
    kernel='rbf',
    output_dir='RESULT_MODEL_GPR_2D'
)

model_2d, _, _ = meta_2d.train(X=X_train, y=y_train)

figures_2d = visualize_gpr(
    meta=meta_2d,
    X_train=X_train,
    y_train=y_train,
    variable_names=['x1', 'x2'],
    bounds=np.array([[0, 10], [0, 8]]),
    savefig=True,
    output_dir='visualizations/2d_model'
)

plt.show()

Model Loading and Prediction¶

Example of loading a trained model and making predictions:

import numpy as np
from PyEGRO.meta.gpr.gpr_utils import DeviceAgnosticGPR

# Initialize device-agnostic loader
gpr_loader = DeviceAgnosticGPR(prefer_gpu=True)

# Load the trained model
loaded = gpr_loader.load_model(model_dir='RESULT_MODEL_GPR')

if loaded:
    # Generate new input data for prediction
    X_new = np.random.rand(10, 2) * np.array([10, 8])

    # Make predictions
    y_pred, y_std = gpr_loader.predict(X_new)

    # Print results
    for i in range(len(X_new)):
        print(f"Input: {X_new[i]}, Prediction: {y_pred[i][0]:.4f} ± {y_std[i][0]:.4f}")
else:
    print("Failed to load model")

Alternatively, use the MetaTraining class to load and use the model:

from PyEGRO.meta.gpr import MetaTraining

# Initialize meta
meta = MetaTraining()

# Load model
meta.load_model('RESULT_MODEL_GPR/gpr_model.pth')

# Make predictions
X_new = np.random.rand(10, 2) * np.array([10, 8])
y_pred, y_std = meta.predict(X_new)

# Print results
for i in range(len(X_new)):
    print(f"Input: {X_new[i]}, Prediction: {y_pred[i][0]:.4f} ± {y_std[i][0]:.4f}")

# Print model hyperparameters
meta.print_hyperparameters()

Advanced Usage¶

Customizing Kernels¶

The GPR module supports different kernel types for different kinds of data:

# For smooth functions
meta_smooth = MetaTraining(kernel='rbf')

# For less smooth functions with continuous derivatives
meta_matern25 = MetaTraining(kernel='matern25')  # Matérn 5/2

# For functions with continuous first derivatives
meta_matern15 = MetaTraining(kernel='matern15')  # Matérn 3/2

# For continuous but non-differentiable functions
meta_matern05 = MetaTraining(kernel='matern05')  # Matérn 1/2

Batch Processing for Large Datasets¶

For larger datasets, you can use the DeviceAgnosticGPR class with batch processing:

import numpy as np
from PyEGRO.meta.gpr.gpr_utils import DeviceAgnosticGPR

# Generate a large dataset
n_samples = 10000
X_large = np.random.rand(n_samples, 5)  # 10,000 samples with 5 features

# Load a trained model
gpr = DeviceAgnosticGPR(prefer_gpu=True)
gpr.load_model('RESULT_MODEL_GPR')

# Make predictions with batch processing
y_pred, y_std = gpr.predict(X_large, batch_size=500)  # Process 500 samples at a time

print(f"Processed {n_samples} samples with shapes: {y_pred.shape}, {y_std.shape}")

Early Stopping and Learning Rate Scheduling¶

The MetaTraining class includes built-in early stopping and learning rate scheduling:

meta = MetaTraining(
    num_iterations=1000,  # Maximum iterations
    learning_rate=0.01,   # Initial learning rate
    patience=50           # Patience for early stopping
)

# The optimizer will reduce the learning rate when progress plateaus
# Early stopping will trigger if no improvement after 'patience' iterations
model, _, _ = meta.train(X=X_train, y=y_train)