Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Freddy Boulton
    @freddyaboulton
    Thanks for uploading the wheels @mloning ! Our windows ci is now passing
    Out of curiosity - what went wrong with your CD pipeline?
    Markus Löning
    @mloning
    See alan-turing-institute/sktime#1131 (some tag related issues with appveyor)
    Kishen Sharma
    @KishenSharma6
    Hi everyone! My name is Kishen Sharma and I am an aspiring Machine Learning Engineer. I am looking forward to contributing and learning more about designing open-source applications, very excited to meet you all!
    Markus Löning
    @mloning
    Hi @KishenSharma6 - welcome and great to have you on board! Check out the mentorship program on sktime.org in case you're interested!
    1 reply
    MariaWaleska
    @MariaWaleska
    Hello everyone! @mloning I am very glad this is my first time using sktime and I would like to use this package in a real-world dataset that I'm currently working with. The goal is to implement a "Multivariate time series classification" and the data I am working with is a series of different csv files and each csv file happens to be a multivariate dataset. My problem is that I need to convert all these files into nested dataframes and I was going over the code implemented by @Fleaurent in this link: https://gist.github.com/Fleaurent/74046aee6592ff791e6b7872866be7bc However , what happens is that each cell of the "converted" dataframe is a list instead of a pd.Series and when trying to fit this into the model I get the following error:
    "If passed as a pd.DataFrame, X must be a nested pd.DataFrame, with pd.Series or np.arrays inside cells. " I was hoping there is a way to convert multivariate dataframes into nested ones, is there any implementation for this? or a new update for this function?
    Markus Löning
    @mloning
    Hi @MariaWaleska, an easier option may be to convert it into a 3d numpy array with shape (n_instances, n_dimensions, n_timepoints) where instances are the different units of observations and dimensions the different variables
    MariaWaleska
    @MariaWaleska
    Thank you for your quick response @mloning! I implemented it that way in the past and I have question that might be dumb when working with 3D numpy arrays for this purpose. Since each file has different number of rows, and the numpy array has to be declared before being used, I end up with arrays where most of the rows are filled with nans. For example, if I declare an array with dimensions (2, 387, 4) where 387 represents the dataset with the most number of rows, another dataset with only 79 rows will be filled with nans for the remaining rows. In other words, not all the datasets have the same "n_dimensions". Does this represent any problem when fitting a model?
    Franz Király
    @fkiraly
    yes, this is correct - currently, there are no "good" (as in: performant, subsettable, etc) data containers that can represent unequal length sequences. awkward-array is a very promising candidate here, but still in development and not yet supported by sktime (we plan to)

    Does this represent any problem when fitting a model?

    really depends on the model - performance may vary...

    you could try using a pipeline where the first element is a binner or summarizer, which will result in equal number of f eatures per series
    MariaWaleska
    @MariaWaleska
    Thank you for your response @fkiraly. I will certaintly try to do that! will update in case I find a solution for this.
    Dmitry Labazkin
    @labdmitriy
    Hello everyone! First of all thank you for your package!
    Could you please help if we can pass fit_params to regressor while fitting after we made reduction? I’ve found that evaluate() function has fit_params argument, but fit method does not have.
    For example if I need to pass categorical features list to gradient boosting regressor while fitting.
    Dmitry Labazkin
    @labdmitriy
    To be more specific I mean forecasting task.
    Freddy Boulton
    @freddyaboulton
    Hello, I'm interested in adding python 3.9 support for sktime. Issue listed here: alan-turing-institute/sktime#1147 . I should talk to a core developer before embarking. Who should I reach out to?
    Franz Király
    @fkiraly
    @labdmitriy, yes, this is a good question - would you mind opening an issue on the sktime repo, explaining how you think it should work (perhaps with some hypothetical code) and tagging in @aiwalter (wrote evaluate function) as well as @lovkush-a (refactored reduction) and @mloning (wrote reduction)?
    @freddyaboulton, I think @mloning knows most about this. I think currently no one is working on it, so until the discussion gets rolling you could look what would happen if you just up the version naively and where things break in a local developer version.
    the issue for this is here: alan-turing-institute/sktime#664
    oh, and please PM me your email address, so I can invite you to the dev slack, @freddyaboulton - we should be closely discussing. Thanks for your great help on conda btw!
    Kishen Sharma
    @KishenSharma6

    Having trouble installing sktime in my venv, is sktime supported by mac m1 chips at this time?

    image.png

    Pranav-India
    @Pranav-India

    Hi I am working on a price prediction kind of scenario and I have 6 parameter that will have impact on the prediction along with the Historic values of the price. I tried prediction with "forecast = ThetaForecaster(sp=12)" and I got the results out But when I checked the backend implementation of ThetaForecaster in fit function the description says we are ignoring exog

    "Parameters

        y : pd.Series
            Target time series to which to fit the forecaster.
        fh : int, list or np.array, optional (default=None)
            The forecasters horizon with the steps ahead to to predict.
        X : pd.DataFrame, optional (default=None)
            Exogenous variables are ignored
        Returns

    "
    so is there any different way to include x_train so it can be used for predictions?

    Franz Király
    @fkiraly
    @Pranav-India, in theory, if you use all_estimators (from registry module) to look for estimators with the tag "univariate_only" = False", this will give you the forecasters that make use of exogeneous data. You can get explanations on forecaster tags using all_tags. Unfortunately, the "univariate_only" tag is not named very descriptively, it should rather be called "uses_exogeneous" or similar. It's also not always correct - I just checked AutoARIMA, and it should be False but it's set to True, even though autoARIMA makes use of exogeneous variables. (I just opened an issue to fix this alan-turing-institute/sktime#1267)
    Franz Király
    @fkiraly

    is sktime supported by mac m1 chips at this time?

    Anyone has an answer to this?

    Pranav-India
    @Pranav-India
    Hi @fkiraly Thank you for your response . also the library is super helpful Thank you this..
    Brian
    @Data-drone
    I am trying to use cross-validation with sktime. I have set a cv with SlidingWindowSplitter with forecasting horizon set as 7 however in my results opject when I look at the results object, under y_pred and y_test I just see one value? not 7?
    Markus Löning
    @mloning
    Hi @Data-drone what do you mean by results object? Do you use evaluate or ForecastingGridSearchCV?
    Brian
    @Data-drone
    I used evaluate which I ran like evaluation_results = evaluate(forecaster=forecaster, y=train['target'].astype('float64'), cv=cv, return_data=True)
    Markus Löning
    @mloning
    @Data-drone okay your code looks fine to me, could you post a code snippet to reproduce that behaviour?
    Brian
    @Data-drone
    import pandas as pd
    import numpy as np
    from sktime.datasets import load_airline
    from sktime.forecasting.base import ForecastingHorizon
    from sktime.forecasting.theta import ThetaForecaster
    from sktime.forecasting.model_selection import SlidingWindowSplitter
    from sktime.forecasting.model_evaluation import evaluate
    
    y = load_airline()
    
    forecaster = ThetaForecaster(sp=12)  # monthly seasonal periodicity
    
    cv = SlidingWindowSplitter(fh=7, window_length=28, step_length=7)
    
    evaluation_results = evaluate(forecaster=forecaster, y=y, 
             cv=cv, return_data=True)
    
    print("Length train: {0}".format(len(evaluation_results.y_train[0])))
    
    print("Length pred: {0}".format(len(evaluation_results.y_pred[0])))
    Here is some working code. Based on the settings in SlidingWindowSplitter I thought that len(evaluation_results.y_pred[0]) would be 7
    Chaim Yosef Glancz
    @chaimglancz
    hi Brian @Data-drone I think this works fh=[1,2,3,4,5,6,7]
    Chaim Yosef Glancz
    @chaimglancz
    hi @mloning I'm working on sliding vs expanding and it looks like there is a mistake in there but maybe I'm wrong
    Chaim Yosef Glancz
    @chaimglancz
    hi @mloning I'm working on sliding vs expanding WindowSplitter and it looks like there is a mistake in there but maybe I'm wrong
    I put a print function in the model's files and here is how it looks
    we see that the sliding window doesn't get a small window the first time as you can see by the errors
    https://drive.google.com/file/d/1qsuTRKFJcL4eFXiBIuguM-Mu0wkpQrRP/view?usp=sharing
    https://drive.google.com/file/d/1yD25rlej5sheo9nWcrfEFBA_sEfdf19B/view?usp=sharing
    _
    Brian
    @Data-drone
    @chaimglancz Ah okay thanks. I thought if I set the horizon as 7 it would forecast 7 periods from just the value for 7 periods ahead
    Ilyas Moutawwakil
    @IlyasMoutawwakil
    Hi I was wondering why can't we get past predictions (by that I mean predictions made by the model on the train set) in sktime, actually one of the most fundamental ways to compare models is to compare their performance on train set relatively to test set performance and see if they're over-fitting: if the model performance on train set is way better than it is one the test/validation set. But in sktime we can only (I guess) see future predictions because of the fh argument that starts at 1 (as the first prediction after train set) which means we can only compare y_pred to y_test (as in cross-validation).
    most of the forecasters are from statsmodels or return a statsmodels (or prophet) forecaster where it's possible to do the predictions I'm mentionning above.
    Franz Király
    @fkiraly
    @ilyasmoutawwakil, yes, you can do backtesting etc, check out the forecasting tutorial (a function you can use for that is evaluate). For forecasters, the general model is that it can look at everything it has seen, so using a negative fh will not result in proper forecasts, evaluation will be too over-optimistic.
    Ilyas Moutawwakil
    @IlyasMoutawwakil
    so I'm trying to do some in sample predictions. it works with some forecasters (prophet) but not all of them (arima). It raises an error saying that In-sample predictions undefined for start=0 when d=1
    can someone explain to me why please
    Franz Király
    @fkiraly
    this was discussed in alan-turing-institute/sktime#1076
    the gist of it, I believe, is that ARIMA - as an abstract method - can't do this, as opposed to being an issue with the implementation
    in your opinion: what should be happening, if you ask for in-sample predictions for the earliest point (the one with index "0")?
    would be nice to see an example
    note that, naively, the ARIMA model with difference parameter d will need to have seen already d data points if you want to apply it to forecast, so it's not clear what a forecast would be before you have seen d points
    Ilyas Moutawwakil
    @IlyasMoutawwakil
    oh right now I see, since I used to use ARIMA models from statsmodel with d=0 I forgot that autoARIMA might return a model with d>0.
    What about returning None for points a model can't predict (makes sense in lots of contexts)
    Franz Király
    @fkiraly
    hmmmmmmm - yes, that sounds very sensible! I've updated alan-turing-institute/sktime#859 to be a "good first issue" with that request.
    (PS: if you haven't seen it, kindly resolve the merge conflict in your nice prophet upgrade alan-turing-institute/sktime#1378 so we can merge the PR)