Hi. Thanks for the nice package. I would like to use Optuna to find the points that minimize a function. As a simple example: suppose that we have a function of two variables, f(x,y), and I want to find the value of y that minimizes f for a given x. I implemented it like this:
import numpy as np
import optuna
def objective(trial, x):
y = trial.suggest_uniform('y', -1., 1.)
return 100.*(y-x**2)**2 + (1-x)**2
def optimal_y(X, n_trials):
optuna.logging.set_verbosity(optuna.logging.WARNING)
y_optimal = np.empty(X.shape)
counter = 0
for x in X:
study = optuna.create_study()
study.optimize(lambda trial: objective(trial, x), n_trials=n_trials)
best_params = study.best_params
y_optimal[counter] = np.asarray(list(best_params.values()))
del study
counter += 1
return y_optimal
if __name__ == '__main__':
X = np.linspace(start=-1,stop=1,num=20)
y_optimal = optimal_y(X, 100)
print(y_optimal)
It works fine. But the problem is that this version finds the minimum for each given x in serial. Now, I would like to change the code in such a way that I am able to do optimization for the vector X at one time, I mean for all x values in parallel. Is there any way to do that?
study.add_trial
to be used for this purpose?
Hi. First of all thanks a lot for developing optuna, is an amazing software! I'm doing hyperarameter optimization using vanilla model:
study_name = 'wd_dr_hidden_lr_e3'
storage = 'sqlite:///e3_%s.db'%field
n_trials = 50
objective = Objective(device, seed, f_maps, f_params, batch_size, splits,
arch, min_lr, beta1, beta2, epochs, root_out, field)
sampler = optuna.samplers.TPESampler(n_startup_trials=10)
study = optuna.create_study(study_name=study_name, sampler=sampler, storage=storage,
load_if_exists=True)
study.optimize(objective, n_trials)
I'm using 2 GPUs (run on different terminals) and after a few trials I get this error:
Traceback (most recent call last):
File "main_hyperparams.py", line 222, in <module>
study.optimize(objective, n_trials)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/study.py", line 315, in optimize
show_progress_bar=show_progress_bar,
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/_optimize.py", line 65, in _optimize
progress_bar=progress_bar,
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/_optimize.py", line 156, in _optimize_sequential
trial = _run_trial(study, func, catch)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/_optimize.py", line 238, in _run_trial
study._tell(trial, TrialState.COMPLETE, value)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/study.py", line 603, in _tell
self._storage.set_trial_state(trial._trial_id, state)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/storages/_cached_storage.py", line 200, in set_trial_state
return self._flush_trial(trial_id)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/storages/_cached_storage.py", line 404, in _flush_trial
datetime_complete=updates.datetime_complete,
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/storages/_rdb/storage.py", line 604, in _update_trial
raise RuntimeError("Cannot change attributes of finished trial.")
RuntimeError: Cannot change attributes of finished trial.
I was wondering if I'm doing something wrong here. Thanks for your help!
Happy New Year! Kicking off with the release of v2.4.0, thanks to everyone who was involved. This is a minor but still a big release. Please check out the highlights and release note at https://github.com/optuna/optuna/releases/tag/v2.4.0 or via the Tweet https://twitter.com/OptunaAutoML/status/1348897690545840135?s=20. In short, it contains
Python 3.9 support (with the exclusion of integration modules)
Multi-objective optimization that’s now stable as a first-class citizen
Sampler that wraps BoTorch for Bayesian optimization. This sampler opens up Optuna for constrained optimization using slack variables, i.e. outcome constraints such as x0 + x1 < y
. See https://github.com/optuna/optuna/blob/release-v2.4.0/examples/botorch_simple.py
Richer and more easily extensible tutorial https://optuna.readthedocs.io/en/v2.4.0/tutorial/index.html
May I ask is there any CmaEs sampler usage?
I am wondering CmaES is only used for sampling relative parameters, while random sampler is used to sample independent parameters. Is the relative search space is determined by backend if we follow the demo codes?
import optuna
def objective(trial):
x = trial.suggest_uniform("x", -1, 1)
y = trial.suggest_int("y", -1, 1)
return x ** 2 + y
sampler = optuna.samplers.CmaEsSampler()
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=20)
Hi Team, I am using a dataset of about 140,000 rows and 300 features (after categorical encoding). I am also using optuna integration for xgboost(using xgb.cv().Firstly, I tried with xgboost 1.3.3 and optuna 2.4.0 - program ran for 18 hours using 32 CPUs and not even a single trial completed. I then ran using xgboost 1.3.1 and optuna 2.4.0 -- program ran for 3 hours using 60 CPUs and not even a single trial completed. I am now trying with xgboost 1.2.1 and optuna 2.3.0 using 60 CPUs - Can anyone help to understand in case of any compatibility issues ?
The code that I am using is given below:
def objective(trial):
# Define the search space
param_sp = {
'base_score' : 0.5,
'booster' : 'gbtree',
'colsample_bytree' : trial.suggest_categorical('colsample_bytree', [0.7,0.8,0.9,1.0]),
'learning_rate' : trial.suggest_categorical('learning_rate',[0.1]),
'max_depth' : trial.suggest_categorical('max_depth', [6,8,10]),
'objective' : 'binary:logistic',
'scale_pos_weight' : trial.suggest_categorical('scale_pos_weight', [ratio1,ratio2,ratio3,ratio4,1,10,30,50,75,99,100]),
'subsample' : trial.suggest_categorical('subsample', [0.5,0.6,0.7,0.8,0.9,1.0]),
'verbosity' : 1,
'tree_method' :'auto',
'predictor' :'cpu_predictor',
'eval_metric' :'aucpr'
}
#Add the pruning Call Back
pruning_callback = optuna.integration.XGBoostPruningCallback(trial, "test-aucpr")
#Perform Native API cross validation
xgb_cv_results=xgb.cv(param_sp,dtrain,stratified=True,folds=skfolds,metrics='aucpr',num_boost_round=500,early_stopping_rounds=50,as_pandas=True,verbose_eval=False,seed=42,shuffle=True,callbacks=[pruning_callback])
# Set n_estimators as a trial attribute
trial.set_user_attr("n_estimators", len(xgb_cv_results))
# Extract the best score.
best_score = xgb_cv_results["test-aucpr-mean"].values[-1]
return best_score
pruner = optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=20, interval_steps=10)
study = optuna.create_study(study_name='XGB_Optuna_0.1_Iter1',direction='maximize',sampler=TPESampler(consider_magic_clip=True,seed=42,multivariate=False),pruner=pruner)
# perform the search
print('\nPerforming Bayesian Hyper Parameter Optimization..')
study.optimize(objective, n_trials=100,n_jobs=-1)
Hi Team, I am using XGBoost 1.3.3 and Optuna 2.4.0. My dataset has 138k rows and 300 columns (after categorical encoding). I am trying to replicate the example as in - https://github.com/optuna/optuna/blob/master/examples/pruning/xgboost_integration.py (but only for booster='gbtree'). When I run the code , I get the message 'segmentation fault' and the program returns to the $prompt (I am using amazon linux). Can anyone please help to understand as to why am I getting the message 'segmentation fault' ?
The code that I am using is as given below:
# Import data into xgb.DMatrix form
dtrain = xgb.DMatrix(X_train,label=y_train)
dtest = xgb.DMatrix(X_test,label=y_test)
# define the search space and the objecive function
def objective(trial):
param_sp = {
'base_score' : 0.5,
'booster' : 'gbtree',
'colsample_bylevel' : trial.suggest_categorical('colsample_bylevel',[0.7,0.8,0.9]),
'colsample_bynode' : trial.suggest_categorical('colsample_bynode',[0.7,0.8,0.9]),
'colsample_bytree' : trial.suggest_categorical('colsample_bytree',[0.7,0.8,0.9]),
'gamma' : trial.suggest_categorical('gamma',[0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,0.3,0.5,0.7,0.9,1,2,3,4,5,6,7,8,9,10]),
'learning_rate' : trial.suggest_categorical('learning_rate',[0.1]),
'max_delta_step' : trial.suggest_categorical('max_delta_step', [0,1,2,3,4,5,6,7,8,9,10]),
'max_depth' : trial.suggest_categorical('max_depth', [10]),
'min_child_weight' : trial.suggest_categorical('min_child_weight', [1,3,5,7,9,11,13,15,17,19,21]),
'objective' : 'binary:logistic',
'reg_alpha' : trial.suggest_categorical('reg_alpha', [0.000000001,0.00000001,0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1,10,100]),
'reg_lambda' : trial.suggest_categorical('reg_lambda', [0.000000001,0.00000001,0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1,10,100]),
'scale_pos_weight' : trial.suggest_categorical('scale_pos_weight', [ratio1,1,10,20,30,40,50,60,70,80,90,100,1000]),
'seed' : 42,
'subsample' : trial.suggest_categorical('subsample', [0.5,0.6,0.7,0.8,0.9]),
'verbosity' : 1,
'tree_method' :'auto',
'predictor' :'cpu_predictor',
'eval_metric' :'error'
}
#Add the pruning Call Back
pruning_callback = optuna.integration.XGBoostPruningCallback(trial, "validation-error")
#Perform validation
xgb_bst=xgb.train(param_sp,dtrain,num_boost_round=1000,evals=[(dtest, "validation")],early_stopping_rounds=100,verbose_eval=False,callbacks=[pruning_callback])
# Set n_estimators as a trial attribute
trial.set_user_attr("n_estimators", xgb_bst.best_ntree_limit)
# Extract the best score.
preds = xgb_bst.predict(dtest)
pred_labels = np.rint(preds)
f1 = metrics.f1_score(y_test, pred_labels)
return f1
pruner = optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=20, interval_steps=10)
study = optuna.create_study(study_name='XGB_Optuna_0.1_max_depth_10_Error_Val_500_trials',direction='minimize',sampler=TPESampler(consider_magic_clip=True,seed=42,multivariate=False),pruner=pruner)
# perform the search
print('\nPerforming Bayesian Hyper Parameter Optimization..')
study.optimize(objective, n_trials=500,n_jobs=16)
Hello
I'm trying to visualize the study output in jupyter notebook
optuna.visualization.plot_optimization_history(study)
optuna.visualization.plot_slice(study)
optuna.visualization.plot_contour(study, params=['epochs', 'learning_rate'])
Nothing happen when I run these commands.
Has anybody tested to do some visualization in similar environment ?
val_loss
, which is supposed to be minimised. How can I be sure that the prunner is minimising the val_loss
while still having a maximise optimisation step?
loss = weight_1 * loss_component_1 + weight_2 * loss_component_2 + ...
. And constraint sum(weight_i) = 1; all(weight_i) > 0 & all(weight_i) < 1
. I want to find optimal combination of weight_i
. So essentially I need to sample parameters from multinoulli distribution. Of course I can sample parameters from uniform and then normalize them, but I don't feel this is the right way.
Hi everyone. I am going to use Optuna for hyperparameter optimization of an iterative process in which the number of samples increases by iterations. I start Optuna from scratch for iteration 0, but for the next iterations I use accumulated trials from all previous iterations. With this warm-up scheme after some iterations the search space becomes so small and it concentrates on a very small region in the parameter space. Now, I need to give it the chance to look into other regions in the parameter space after a few iterations. One idea that I have is to force it to forget the trials from long time ago, for example when it starts iteration 5 I want to ignore the trials from iteration 0 and 1 and so on. To do so I use this piece of code to manually change the state of those trials from 'COMPLETE' to 'FAIL'; with this when the 'study' is loaded only the trials with state='COMPLETE' are taken into account.
def makefailSqliteTable(storage):
try:
sqliteConnection = sqlite3.connect(storage)
cursor = sqliteConnection.cursor()
sql_update_query = """Update trials set state = 'FAIL' """
cursor.execute(sql_update_query)
sqliteConnection.commit()
cursor.close()
except sqlite3.Error as error:
print("Failed to update sqlite table", error)
finally:
if (sqliteConnection):
sqliteConnection.close()
print("The SQLite connection is closed")
def updateSqliteTable(storage, N):
try:
sqliteConnection = sqlite3.connect(storage)
cursor = sqliteConnection.cursor()
df = pd.read_sql_query("SELECT * from trials", sqliteConnection)
sql_update_query = """Update trials set state = 'COMPLETE' where number > """ + str(len(df)-N)
cursor.execute(sql_update_query)
sqliteConnection.commit()
cursor.close()
except sqlite3.Error as error:
print("Failed to update sqlite table", error)
finally:
if (sqliteConnection):
sqliteConnection.close()
print("The SQLite connection is closed")
I would like to know whether this procedure does the thing that I want. I mean, does it really forget the history from long time ago?