I have created a folder br_index inside data_collector in order to implement a code that keeps track of brazil largest index (ibovespa) historic composition. Very similar to what has been done inside us_index and cn_index.
I have created an issue #956 and I would like to upload the code I’ve written so it could be reviewed and, if approved, merged into the main repository. However, I tried to
git push my branch but I don’t have permission.
How should I proceed?
excess_return_without_cost mean 0.000692
While trying to submit a PR #990 , it failed in one test
Test MacOS / build (macos-latest, 3.7) (pull_request).
The error being a consequence from executing the command
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --interval 1d --region cn,
zipfile.BadZipFile: File is not a zip file Error: Process completed with exit code 1.
However, when I create, in my macos, a virtual environment with python 3.7, which is the same version used to test the code, my script runs successfully.
2022-03-21 16:18:00.530 | WARNING | qlib.tests.data:_download_data:57 - The data for the example is collected from Yahoo Finance. Please be aware that the quality of the data might not be perfect. (You can refer to the original data source: https://finance.yahoo.com/lookup.) 2022-03-21 16:18:00.531 | INFO | qlib.tests.data:_download_data:59 - qlib_data_cn_1d_latest.zip downloading...... 216677376it [01:30, 2393653.69it/s] 2022-03-21 16:19:31.063 | WARNING | qlib.tests.data:_unzip:82 - will delete the old qlib data directory(features, instruments, calendars, features_cache, dataset_cache): /Users/igorlimarochaazevedo/.qlib/qlib_data/cn_data 2022-03-21 16:19:31.064 | INFO | qlib.tests.data:_unzip:85 - /Users/.qlib/qlib_data/cn_data/20220321161759_qlib_data_cn_1d_latest.zip unzipping...... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 43788/43788 [00:07<00:00, 5577.22it/s]
How could I resolve this error?
It is worth metioning, that in my understanding at least, the code changes being submited to the PR does not not interfere in the execution process of
In the DataLayer : Data Framework & Usage documentation under Multiple Stock Modes section it says
The trade unit defines the unit number of stocks can be used in a trade, and the limit threshold defines the bound set to the percentage of ups and downs of a stock.
Can anyone give me further explanation or references on what trade unit and limit threshold are?
Because I don't understand exactly how they work in the stock market environment. And as consequence, how they affect qlib execution when initialized using one region or the other as the given example
I’m executing XGBoost with my own data downloaded form Brazil’s Stock Exchange. In order to better understand the data used for training in XGBoost provided by the DataHandler module, I downloaded it into .csv format and opened with pandas.
It has a column named label, which if I understood correctly is provided by the workflow variable defined by the user. In other words, this label variable is
label: ["Ref($close, -2) / Ref($close, -1) - 1”]. As the sample from qlib shows.
However, when I use qlib’s Data Retrieval to calculate such formula
"Ref($close, -2) / Ref($close, -1) - 1” it returns a different value.
Why is that?
Dataframe used for training in XGBoost:
Ref($close, -2) / Ref($close, -1) - 1: 1.08125
Value returned by qlib’s data retrieval:
Ref($close, -2) / Ref($close, -1) - 1: -0.011585
Code used for qlib’s data retrieval:
df_ = D.features(D.instruments('ibov'), ['Ref($close, -2)/Ref($close, -1)-1'], '2008-01-02', '2014-12-30') df_.loc[(['BBAS3.SA'], '2008-01-02'), :]
Qlib comes with some benchmarks examples models for china stock market such as XGBoost. And those models already come with some predefined parameters that if I understood correctly are optimal paramters.
However, if I modify the china dataset or use another dataset from a different stock market such as US or BR(Brazilian) how can I obtain such optimal parameters as the ones shown below?
model: class: XGBModel module_path: qlib.contrib.model.xgboost kwargs: eval_metric: rmse colsample_bytree: 0.8879 eta: 0.0421 max_depth: 8 n_estimators: 647 subsample: 0.8789 nthread: 20
In the following documentation it says the following
In the Alpha158, Qlib uses the label
Ref($close, -2)/Ref($close, -1) - 1that means the change from T+1 to T+2, rather than Ref($close, -1)/$close - 1, of which the reason is that when getting the T day close price of a china stock, the stock can be bought on T+1 day and sold on T+2 day.
However, in Aplha360 should we use the same equation -
Ref($close, -2)/Ref($close, -1) - 1 - for label? Or should we use
Ref($close, -1)/$close - 1?
In this other page from the documentation it says,
The Pearson correlation coefficient series between label and prediction score. In the above example, the label is formulated as
Ref($close, -1)/$close - 1. Please refer to Data Feature for more details.
Are those two formulas related? Shouldn't them be the same?
Issue(Label: Question) #1060
Why some models require
learn_processors definition and other don't ?
I’ve read the documentation available at link and the code. But I couldn’t understand why some models required
infer_processors and others don’t. CatBoost doesn’t define any
infer_processor, while MLP does define some. Why is that?
I’ve found this explanation for the difference between
learning, however I’m not being able to understand why some models need inference and other don’t ;-;
rec = R.get_recorder() rid = rec.id # save the record id # Inference and saving signal sr = SignalRecord(model, dataset, rec) sr.generate()