Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data format of UCR2018 #27

Open
guoxishu opened this issue May 11, 2020 · 8 comments
Open

data format of UCR2018 #27

guoxishu opened this issue May 11, 2020 · 8 comments

Comments

@guoxishu
Copy link

In utils.py, there is "pd.read_csv(..._TRAIN.tsv)",but there is now only data of ts format provided on the official website. Then there is obvious error for "y_train = df_train.values[:,0]" if data of ts format is used. Can you add some comment about the data shape? I'm really quite confused.

@hfawaz
Copy link
Owner

hfawaz commented May 11, 2020

I am not quite sure which format is now available, I will get back to you once I re-check the UCR archive.

@oceanfly
Copy link

oceanfly commented Jun 3, 2020

Hi Sir, I have the same question here. Do you have any updates about the data format? thanks!

@oceanfly
Copy link

oceanfly commented Jun 3, 2020

more specifically, I see the errors are:

python3 main.py TSC Coffee fcn _itr_8
Method: TSC Coffee fcn _itr_8
Traceback (most recent call last):
File "main.py", line 150, in
datasets_dict = read_dataset(root_dir, archive_name, dataset_name)
File "/Users/taosun/Documents/GitHub/dl-4-tsc/utils/utils.py", line 105, in read_dataset
x_train, y_train = readucr(file_name + '_TRAIN')
File "/Users/taosun/Documents/GitHub/dl-4-tsc/utils/utils.py", line 33, in readucr
data = np.loadtxt(filename, delimiter=',')
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1146, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1074, in read_data
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1074, in
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 781, in floatconv
return float(x)
ValueError: could not convert string to float: '@problemName Coffee'

@L-Medici
Copy link

i had the same problem and solved it by using Coffee_TRAIN.txt and Coffee_TEST.txt instead of the once with .ts format. Then if you go to line 33 of utils.py u can see [data = np.loadtxt(filename, delimiter=' ,')]. Here you just need to swap the "," with a double spacebar, since the txt file has a different separator for parseing. Now i have different error but at least you can solve that one XD

@andrew128
Copy link

andrew128 commented Oct 28, 2020

I think the problem is that the UCR dataset was updated in 2018, which changed the formatting (and added new datasets). I found the old dataset through this link, which appears to work. Hope this helps anyone running into this!

@guoxishu
Copy link
Author

guoxishu commented Oct 28, 2020 via email

@xuyxu
Copy link

xuyxu commented Dec 28, 2020

I managed to run the baselines on UCR dataset in the data formt arff with the following modifications:

  • Install liac-arff: pip install liac-arff ;
  • Add import arff in the header of utils.py ;
  • Add the implementation of this function on reading arff data in utils.py:
def load_data(datapath):
    """ Load .arff dataset on univariate time series classification """
    trainfile = datapath.split('/')[-2] + '_TRAIN.arff'
    testfile = datapath.split('/')[-2] + '_TEST.arff'

    train = arff.load(open(os.path.join(datapath, trainfile), 'r'))['data']
    test = arff.load(open(os.path.join(datapath, testfile), 'r'))['data']

    # Post-processing
    x_train, y_train = [], []
    for row in train:
        x_train.append(row[:-1])
        y_train.append(row[-1])
    x_train = np.vstack(x_train)
    enc = LabelEncoder()
    y_train = enc.fit_transform(y_train)

    x_test, y_test = [], []
    for row in test:
        x_test.append(row[:-1])
        y_test.append(row[-1])
    x_test = np.vstack(x_test)
    y_test = enc.transform(y_test)
    
    return x_train, y_train, x_test, y_test
  • Replace the code snippt from Line 76-90 of utils.py with x_train, y_train, x_test, y_test = load_data(root_dir_dataset)

Hope this could be helpful for someone else ;-)

@YHY-10
Copy link

YHY-10 commented May 9, 2022

hello, I have found the data in .tsv format. This is the website https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Hope this can help you~
Besides, I also have tried to change the code to run on the data in .arff format by using "arff.loadarff" and I succeeded. But running on .txt format failed.
My device type is RTX 3060, Cuda 11.5 CuDNN 8.4, in which case I met the problem
"Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed ab
ove.
[[node model/conv1d/conv1d (defined at C:\Users\11642\Desktop\科研\第四周\dl-4-tsc-master_origin\classifiers\fcn.py:73) ]] [Op:__inference_train_function_1608]"
Don't worry! This is not the problem of version mismatch, but Insufficient graphics memory. What you should do is add the code below at the head of main.py.
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
for dev in physical_devices: # 如果使用多块GPU时
tf.config.experimental.set_memory_growth(dev, True)
This can limit the usage of you GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants