PyEGRO Co-Kriging Examples¶
This document provides practical examples of using the PyEGRO Co-Kriging module for multi-fidelity modeling tasks.
Table of Contents¶
- Introduction to Co-Kriging
- Installation
- Basic Usage
- Working with Synthetic Data
- Working with CSV Data
- 2D Examples
- Visualization Examples
- Model Loading and Prediction
- Advanced Usage
Introduction to Co-Kriging¶
Co-Kriging is a multi-fidelity modeling approach that combines data from different fidelity levels, typically: - Low-fidelity data: Larger dataset that is cheaper to obtain but less accurate - High-fidelity data: Smaller dataset that is more expensive to obtain but more accurate
The Kennedy & O'Hagan (2000) approach used in this module creates a statistical relationship between the fidelity levels, allowing for more accurate predictions with fewer high-fidelity samples than would be needed with standard Gaussian Process Regression.
Installation¶
The PyEGRO Co-Kriging module depends on the following packages: - numpy - pandas - torch - gpytorch - scikit-learn - matplotlib - joblib - rich (for enhanced progress displays)
Ensure these dependencies are installed before using the module:
Basic Usage¶
Here's a minimal example of training a Co-Kriging model with the PyEGRO module:
import numpy as np
from PyEGRO.meta.cokriging import MetaTrainingCoKriging
# Define high and low fidelity functions (for demonstration)
def high_fidelity_function(x):
return np.sin(8 * x) + 0.2 * x
def low_fidelity_function(x):
return 0.5 * np.sin(8 * x) + 0.15 * x + 0.5
# Generate synthetic data
n_high = 15 # Fewer high-fidelity points
n_low = 50 # More low-fidelity points
X_high = np.random.uniform(0, 1, n_high).reshape(-1, 1)
y_high = high_fidelity_function(X_high) + 0.05 * np.random.randn(n_high, 1)
X_low = np.random.uniform(0, 1, n_low).reshape(-1, 1)
y_low = low_fidelity_function(X_low) + 0.1 * np.random.randn(n_low, 1)
# Initialize meta-training for Co-Kriging
meta = MetaTrainingCoKriging(
num_iterations=300,
prefer_gpu=True,
kernel='matern25'
)
# Train the model
model, scaler_X, scaler_y = meta.train(
X_low=X_low,
y_low=y_low,
X_high=X_high,
y_high=y_high
)
# Make predictions
X_new = np.linspace(0, 1, 100).reshape(-1, 1)
y_pred_high, y_std_high = meta.predict(X_new, fidelity='high')
y_pred_low, y_std_low = meta.predict(X_new, fidelity='low')
print("Co-Kriging Model trained successfully")
Working with Synthetic Data¶
The following example demonstrates how to use the Co-Kriging module with synthetic data and visualize the results:
import numpy as np
import matplotlib.pyplot as plt
from PyEGRO.meta.cokriging import MetaTrainingCoKriging
from PyEGRO.meta.cokriging.visualization import visualize_cokriging
# Set random seed for reproducibility
np.random.seed(42)
# Define high and low fidelity functions
def high_fidelity_function(x):
return (6*x - 2)**2 * np.sin(12*x - 4)
def low_fidelity_function(x):
return 0.5 * high_fidelity_function(x) + 10 * (x - 0.5) - 5
# Generate synthetic data
# Low fidelity: more samples but less accurate
n_low = 80
X_low = np.random.uniform(0, 1, n_low).reshape(-1, 1)
y_low = low_fidelity_function(X_low) + np.random.normal(0, 1.0, X_low.shape) # More noise
# High fidelity: fewer samples but more accurate
n_high = 20
X_high = np.random.uniform(0, 1, n_high).reshape(-1, 1)
y_high = high_fidelity_function(X_high) + np.random.normal(0, 0.5, X_high.shape) # Less noise
# Test data
n_test = 40
X_test = np.linspace(0, 1, n_test).reshape(-1, 1)
y_test = high_fidelity_function(X_test) + np.random.normal(0, 0.3, X_test.shape) # Even less noise
# Define bounds and variable names
bounds = np.array([[0, 1]])
variable_names = ['x']
# Initialize and train Co-Kriging model
print("Training Co-Kriging model with synthetic data...")
meta = MetaTrainingCoKriging(
num_iterations=300,
prefer_gpu=True,
show_progress=True,
output_dir='RESULT_MODEL_COKRIGING_SYNTHETIC'
)
# Train model with synthetic data
model, scaler_X, scaler_y = meta.train(
X_low=X_low,
y_low=y_low,
X_high=X_high,
y_high=y_high,
X_test=X_test,
y_test=y_test,
feature_names=variable_names
)
# Generate visualization
figures = visualize_cokriging(
meta=meta,
X_low=X_low,
y_low=y_low,
X_high=X_high,
y_high=y_high,
X_test=X_test,
y_test=y_test,
variable_names=variable_names,
bounds=bounds,
savefig=True
)
# Display figures
plt.show()
Working with CSV Data¶
This example demonstrates how to use the Co-Kriging module with data stored in CSV files:
import numpy as np
import pandas as pd
import json
import os
from PyEGRO.meta.cokriging import MetaTrainingCoKriging
from PyEGRO.meta.cokriging.visualization import visualize_cokriging
# Load initial data and problem configuration
with open('DATA_PREPARATION/data_info.json', 'r') as f:
data_info = json.load(f)
# Load high and low fidelity training data
high_fidelity_data = pd.read_csv('DATA_PREPARATION/training_data_high.csv')
low_fidelity_data = pd.read_csv('DATA_PREPARATION/training_data_low.csv')
# Load testing data
test_data = pd.read_csv('DATA_PREPARATION/testing_data.csv')
# Get problem configuration
bounds = np.array(data_info['input_bound'])
variable_names = [var['name'] for var in data_info['variables']]
# Get target column name (default to 'y' if not specified)
target_column = data_info.get('target_column', 'y')
# Extract features and targets
X_high = high_fidelity_data.drop([target_column], axis=1, errors='ignore').values
y_high = high_fidelity_data[target_column].values.reshape(-1, 1)
X_low = low_fidelity_data.drop([target_column], axis=1, errors='ignore').values
y_low = low_fidelity_data[target_column].values.reshape(-1, 1)
# Extract testing data
X_test = test_data.drop([target_column], axis=1, errors='ignore').values
y_test = test_data[target_column].values.reshape(-1, 1)
# Initialize and train model
print("Training Co-Kriging model with CSV data...")
meta = MetaTrainingCoKriging(
num_iterations=300,
prefer_gpu=True,
show_progress=True,
output_dir='RESULT_MODEL_COKRIGING'
)
# Train model
model, scaler_X, scaler_y = meta.train(
X_low=X_low,
y_low=y_low,
X_high=X_high,
y_high=y_high,
X_test=X_test,
y_test=y_test,
feature_names=variable_names
)
# Generate visualization
figures = visualize_cokriging(
meta=meta,
X_low=X_low,
y_low=y_low,
X_high=X_high,
y_high=y_high,
X_test=X_test,
y_test=y_test,
variable_names=variable_names,
bounds=bounds,
savefig=True
)
Custom Data Preparation¶
Example of preparing data files for Co-Kriging:
import json
import numpy as np
import pandas as pd
import os
# Define fidelity functions
def high_fidelity_function(x1, x2):
return np.sin(x1) * np.cos(x2) + 0.2 * x1 * x2
def low_fidelity_function(x1, x2):
return 0.5 * high_fidelity_function(x1, x2) + 0.2 * x1 - 0.1 * x2 + 0.5
# Generate data
np.random.seed(42)
# High-fidelity data (fewer samples)
n_high = 30
X1_high = np.random.uniform(-2, 2, n_high)
X2_high = np.random.uniform(-2, 2, n_high)
y_high = high_fidelity_function(X1_high, X2_high) + np.random.normal(0, 0.05, n_high)
# Low-fidelity data (more samples)
n_low = 100
X1_low = np.random.uniform(-2, 2, n_low)
X2_low = np.random.uniform(-2, 2, n_low)
y_low = low_fidelity_function(X1_low, X2_low) + np.random.normal(0, 0.1, n_low)
# Test data
n_test = 50
X1_test = np.random.uniform(-2, 2, n_test)
X2_test = np.random.uniform(-2, 2, n_test)
y_test = high_fidelity_function(X1_test, X2_test) + np.random.normal(0, 0.05, n_test)
# Create DataFrames
high_fidelity_df = pd.DataFrame({
'x1': X1_high,
'x2': X2_high,
'y': y_high
})
low_fidelity_df = pd.DataFrame({
'x1': X1_low,
'x2': X2_low,
'y': y_low
})
test_df = pd.DataFrame({
'x1': X1_test,
'x2': X2_test,
'y': y_test
})
# Create data_info.json
data_info = {
"variables": [
{"name": "x1", "type": "continuous"},
{"name": "x2", "type": "continuous"}
],
"input_bound": [
[-2, 2],
[-2, 2]
],
"target_column": "y"
}
# Create directory
os.makedirs("DATA_PREPARATION", exist_ok=True)
# Save files
with open("DATA_PREPARATION/data_info.json", "w") as f:
json.dump(data_info, f, indent=4)
high_fidelity_df.to_csv("DATA_PREPARATION/training_data_high.csv", index=False)
low_fidelity_df.to_csv("DATA_PREPARATION/training_data_low.csv", index=False)
test_df.to_csv("DATA_PREPARATION/testing_data.csv", index=False)
print("Data preparation complete for Co-Kriging.")
2D Examples¶
Example of using Co-Kriging with 2D inputs:
import numpy as np
import matplotlib.pyplot as plt
from PyEGRO.meta.cokriging import MetaTrainingCoKriging
from PyEGRO.meta.cokriging.visualization import visualize_cokriging
# Define high and low fidelity 2D functions
def high_fidelity_function(x1, x2):
return np.sin(x1) * np.cos(x2) + 0.2 * x1 * x2
def low_fidelity_function(x1, x2):
return 0.5 * high_fidelity_function(x1, x2) + 0.2 * x1 - 0.1 * x2 + 0.5
# Generate synthetic data
# Low fidelity: more samples but less accurate
n_low = 100
X1_low = np.random.uniform(-2, 2, n_low)
X2_low = np.random.uniform(-2, 2, n_low)
X_low = np.column_stack([X1_low, X2_low])
y_low = low_fidelity_function(X1_low, X2_low).reshape(-1, 1) + np.random.normal(0, 0.1, (n_low, 1))
# High fidelity: fewer samples but more accurate
n_high = 25
X1_high = np.random.uniform(-2, 2, n_high)
X2_high = np.random.uniform(-2, 2, n_high)
X_high = np.column_stack([X1_high, X2_high])
y_high = high_fidelity_function(X1_high, X2_high).reshape(-1, 1) + np.random.normal(0, 0.05, (n_high, 1))
# Test data: grid for visualization
n_test = 16
X1_test = np.linspace(-2, 2, 4)
X2_test = np.linspace(-2, 2, 4)
X1_grid, X2_grid = np.meshgrid(X1_test, X2_test)
X1_test = X1_grid.flatten()
X2_test = X2_grid.flatten()
X_test = np.column_stack([X1_test, X2_test])
y_test = high_fidelity_function(X1_test, X2_test).reshape(-1, 1) + np.random.normal(0, 0.02, (n_test, 1))
# Define bounds and variable names
bounds = np.array([[-2, 2], [-2, 2]])
variable_names = ['x1', 'x2']
# Initialize and train Co-Kriging model
print("Training Co-Kriging model with 2D synthetic data...")
meta = MetaTrainingCoKriging(
num_iterations=300,
prefer_gpu=True,
show_progress=True,
output_dir='RESULT_MODEL_COKRIGING_2D'
)
# Train model with synthetic data
model, scaler_X, scaler_y = meta.train(
X_low=X_low,
y_low=y_low,
X_high=X_high,
y_high=y_high,
X_test=X_test,
y_test=y_test,
feature_names=variable_names
)
# Generate visualization
figures = visualize_cokriging(
meta=meta,
X_low=X_low,
y_low=y_low,
X_high=X_high,
y_high=y_high,
X_test=X_test,
y_test=y_test,
variable_names=variable_names,
bounds=bounds,
savefig=True
)
plt.show()
Visualization Examples¶
The Co-Kriging module provides comprehensive visualization tools, accessible through the visualize_cokriging function:
from PyEGRO.meta.cokriging.visualization import visualize_cokriging
# After training a model (continuing from previous examples)
figures = visualize_cokriging(
meta=meta,
X_low=X_low,
y_low=y_low,
X_high=X_high,
y_high=y_high,
X_test=X_test,
y_test=y_test,
variable_names=variable_names,
bounds=bounds,
savefig=True,
output_dir='visualization_results'
)
# Access individual figures
actual_vs_predicted = figures['actual_vs_predicted']
r2_comparison = figures['r2_comparison']
response_surface = figures['response_surface']
# Customize and save a specific figure
import matplotlib.pyplot as plt
fig = figures['response_surface']
fig.suptitle('Custom Title for Response Surface')
fig.savefig('custom_response_surface.png', dpi=300)
Model Loading and Prediction¶
Example of loading a trained model and making predictions:
import numpy as np
from PyEGRO.meta.cokriging.cokriging_utils import DeviceAgnosticCoKriging
# Initialize device-agnostic loader
cokriging_loader = DeviceAgnosticCoKriging(prefer_gpu=True)
# Load the trained model
loaded = cokriging_loader.load_model(model_dir='RESULT_MODEL_COKRIGING')
if loaded:
# Generate new input data for prediction
X_new = np.random.rand(10, 2) * 4 - 2 # Values between -2 and 2
# Make high-fidelity predictions
y_pred_high, y_std_high = cokriging_loader.predict(X_new, fidelity='high')
# Make low-fidelity predictions
y_pred_low, y_std_low = cokriging_loader.predict(X_new, fidelity='low')
# Print results
print("High-fidelity predictions:")
for i in range(len(X_new)):
print(f"Input: {X_new[i]}, Prediction: {y_pred_high[i][0]:.4f} ± {y_std_high[i][0]:.4f}")
print("\nLow-fidelity predictions:")
for i in range(len(X_new)):
print(f"Input: {X_new[i]}, Prediction: {y_pred_low[i][0]:.4f} ± {y_std_low[i][0]:.4f}")
else:
print("Failed to load model")
Alternatively, use the MetaTrainingCoKriging class to load and use the model:
from PyEGRO.meta.cokriging import MetaTrainingCoKriging
# Initialize meta
meta = MetaTrainingCoKriging()
# Load model
meta.load_model('RESULT_MODEL_COKRIGING/cokriging_model.pth')
# Make predictions
X_new = np.random.rand(10, 2) * 4 - 2 # Values between -2 and 2
# High-fidelity predictions
y_pred_high, y_std_high = meta.predict(X_new, fidelity='high')
# Low-fidelity predictions
y_pred_low, y_std_low = meta.predict(X_new, fidelity='low')
# Print results
for i in range(len(X_new)):
print(f"Input: {X_new[i]}")
print(f" High-fidelity: {y_pred_high[i][0]:.4f} ± {y_std_high[i][0]:.4f}")
print(f" Low-fidelity: {y_pred_low[i][0]:.4f} ± {y_std_low[i][0]:.4f}")
# Print model hyperparameters
meta.print_hyperparameters()
Advanced Usage¶
Comparing Different Kernels¶
Co-Kriging can benefit from different kernel choices depending on the smoothness of your function:
import numpy as np
import matplotlib.pyplot as plt
from PyEGRO.meta.cokriging import MetaTrainingCoKriging
# Generate data (continuing with previous synthetic data example)
# ...
# List of kernels to compare
kernels = ['matern25', 'matern15', 'matern05', 'rbf']
models = {}
metrics = {}
# Train a model with each kernel
for kernel in kernels:
print(f"Training with {kernel} kernel...")
meta = MetaTrainingCoKriging(
num_iterations=300,
kernel=kernel,
output_dir=f'RESULT_MODEL_COKRIGING_{kernel}'
)
model, _, _ = meta.train(
X_low=X_low,
y_low=y_low,
X_high=X_high,
y_high=y_high,
X_test=X_test,
y_test=y_test
)
models[kernel] = model
metrics[kernel] = meta.metrics
# Compare test R² scores
plt.figure(figsize=(10, 6))
plt.bar(kernels, [metrics[k]['test_r2'] for k in kernels])
plt.ylim(0, 1)
plt.title('Test R² Score by Kernel Type')
plt.ylabel('R² Score')
plt.grid(axis='y', alpha=0.3)
plt.show()
Handling Large-Scale Multi-Fidelity Data¶
For larger datasets, you can use batch processing:
import numpy as np
from PyEGRO.meta.cokriging.cokriging_utils import DeviceAgnosticCoKriging
# Generate or load a large dataset
n_samples = 10000
X_large = np.random.rand(n_samples, 5) * 4 - 2 # 10,000 samples, 5 features
# Load a trained model
cokriging = DeviceAgnosticCoKriging(prefer_gpu=True)
cokriging.load_model('RESULT_MODEL_COKRIGING')
# Make predictions with batch processing
y_pred, y_std = cokriging.predict(X_large, fidelity='high', batch_size=500)
print(f"Processed {n_samples} samples with shapes: {y_pred.shape}, {y_std.shape}")
Uncertainty Quantification¶
Co-Kriging is particularly useful for uncertainty quantification in multi-fidelity simulations:
import numpy as np
import matplotlib.pyplot as plt
from PyEGRO.meta.cokriging import MetaTrainingCoKriging
# After training a model (continuing from previous examples)
# Create a fine grid for predictions
x_grid = np.linspace(-2, 2, 200).reshape(-1, 1)
# Get predictions at both fidelity levels
y_high, std_high = meta.predict(x_grid, fidelity='high')
y_low, std_low = meta.predict(x_grid, fidelity='low')
# Plot with uncertainty bounds
plt.figure(figsize=(10, 6))
plt.plot(x_grid, y_high, 'r-', label='High-fidelity prediction')
plt.fill_between(x_grid.flatten(),
(y_high - 2*std_high).flatten(),
(y_high + 2*std_high).flatten(),
alpha=0.2, color='red', label='95% confidence interval')
plt.plot(x_grid, y_low, 'b--', label='Low-fidelity prediction')
plt.fill_between(x_grid.flatten(),
(y_low - 2*std_low).flatten(),
(y_low + 2*std_low).flatten(),
alpha=0.1, color='blue')
# Plot training data
plt.scatter(X_high, y_high, c='red', s=60, label='High-fidelity data',
marker='o', edgecolor='black', zorder=5)
plt.scatter(X_low, y_low, c='blue', s=40, label='Low-fidelity data',
marker='s', alpha=0.7, zorder=4)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Multi-fidelity predictions with uncertainty quantification')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Cross-Validation for Co-Kriging¶
Implementing cross-validation for Co-Kriging models:
import numpy as np
from sklearn.model_selection import KFold
from PyEGRO.meta.cokriging import MetaTrainingCoKriging
# Assuming X_high, y_high, X_low, y_low are already defined
# Set up cross-validation
n_splits = 5
kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)
# Metrics storage
high_r2_scores = []
low_r2_scores = []
# Perform cross-validation for high-fidelity data
for train_idx, test_idx in kf.split(X_high):
# Split data
X_high_train, X_high_test = X_high[train_idx], X_high[test_idx]
y_high_train, y_high_test = y_high[train_idx], y_high[test_idx]
# Use all low-fidelity data (this is a common approach in multi-fidelity modeling)
meta = MetaTrainingCoKriging(num_iterations=200, show_progress=False)
# Train model
meta.train(
X_low=X_low,
y_low=y_low,
X_high=X_high_train,
y_high=y_high_train
)
# Make predictions
y_high_pred, _ = meta.predict(X_high_test, fidelity='high')
y_low_pred, _ = meta.predict(X_low, fidelity='low')
# Calculate R² scores
high_r2 = 1 - np.sum((y_high_test - y_high_pred) ** 2) / np.sum((y_high_test - np.mean(y_high_test)) ** 2)
low_r2 = 1 - np.sum((y_low - y_low_pred) ** 2) / np.sum((y_low - np.mean(y_low)) ** 2)
high_r2_scores.append(high_r2)
low_r2_scores.append(low_r2)
print(f"High-fidelity CV R² scores: {high_r2_scores}")
print(f"Mean high-fidelity CV R²: {np.mean(high_r2_scores):.4f} ± {np.std(high_r2_scores):.4f}")
print(f"Low-fidelity CV R² scores: {low_r2_scores}")
print(f"Mean low-fidelity CV R²: {np.mean(low_r2_scores):.4f} ± {np.std(low_r2_scores):.4f}")