what's going on with sktime?

It is super late at night and I have fallen into an internet rabbit hole of the timeseries machine learning community. I’ve been working with a lot of time domain data for a while now and of the several good python libraries available, sktime has been one that I haven’t quite used too much. My go-to ones have been - 1. tsfresh for the feature extraction capabilities, 2. tslearn for standard machine learning algorithms, 3. stumpy for all things matrix profile and 4. tsai for deep learning based algorithms.

I have been playing around with a short project idea of examining clustering perfromance on all available timeseries datasets in the mighty UCR/UEA timeseries classification archive using various timeseries feature-sets like the ones available in tsfresh and the increasingly popular Catch-22 features available in pycatch22 package¹, so imagine my delight when I went to timeseriesclassification.com to fetch all the datasets and read “The scikit-learn compatible aeon toolkit contains the state of the art algorithms for time series classification. All of the datasets and results stored here are directly accessible in code using aeon.” I thought, “Awesome, eveything I need in one place!”. So I go check out aeon and it is fantastic. But my brain must have done a random access of some forgotten recess of my mind because I found myself thinking, “Huh, this looks familiar. Almost like sktime…”. And indeed that’s when I fell into the current rabbit hole from whence I write this post.

Turns out there is some drama-llama stuff around this whole sktime vs. aeon saga. Turns out aeon is a fork of sktime created by Tony Bagnall of UEA who departed(?) from sktime after another core developer allegedly took over the project and kicked others out(?) after fallout over some financial issues(?). The whole thing sounds like a bit of a mess. Anyhoo, I have no horse in the race. Preliminary examination suggests either library is fine for my purposes. I am going to go with aeon for the aforementioned short project. And this Alice needs to climb out of this hole and go to bed now.

I know, I know, some of the datasets available in the UCR/UEA archive are not amenable to features based classification perhaps and are more separable in terms of shapes or a combination of both but I digress. My plan is to see consensus between clustering and classification performance on the datasets. ↩