Welcome to clana’s documentation!¶
How to use clana with MNIST¶
Prerequesites¶
Install clana
and execute the example:
$ pip install clana
$ python mnist_example.py
This will generate the clana files.
Usage¶
distribution¶
$ clana distribution --gt gt-test.csv
11.35% 1 (1135 elements)
10.32% 2 (1032 elements)
10.28% 7 (1028 elements)
10.10% 3 (1010 elements)
10.09% 9 (1009 elements)
9.82% 4 ( 982 elements)
9.80% 0 ( 980 elements)
9.74% 8 ( 974 elements)
9.58% 6 ( 958 elements)
8.92% 5 ( 892 elements)
get-cm¶
This is an intermediate step required for the visualization.
$ clana get-cm --predictions train-pred.csv --gt gt-train.csv --n 10
2019-07-02 21:53:40,547 - root - INFO - cm was written to 'cm.json'
visualize¶
$ clana visualize --cm cm.json
Score: 12634
2019-07-02 22:13:54,987 - root - INFO - n=10
2019-07-02 22:13:54,987 - root - INFO - ## Starting Score: 12634.00
2019-07-02 22:13:54,988 - root - INFO - Current: 12249.00 (best: 12249.00, hot_prob_thresh=100.0000%, step=0, swap=False)
2019-07-02 22:13:54,988 - root - INFO - Current: 10457.00 (best: 10457.00, hot_prob_thresh=100.0000%, step=1, swap=False)
2019-07-02 22:13:54,988 - root - INFO - Current: 10453.00 (best: 10453.00, hot_prob_thresh=100.0000%, step=3, swap=False)
2019-07-02 22:13:54,988 - root - INFO - Current: 10340.00 (best: 10340.00, hot_prob_thresh=100.0000%, step=6, swap=True)
2019-07-02 22:13:54,989 - root - INFO - Current: 10166.00 (best: 10166.00, hot_prob_thresh=100.0000%, step=14, swap=True)
2019-07-02 22:13:54,989 - root - INFO - Current: 9644.00 (best: 9644.00, hot_prob_thresh=100.0000%, step=17, swap=True)
2019-07-02 22:13:54,989 - root - INFO - Current: 9617.00 (best: 9617.00, hot_prob_thresh=100.0000%, step=19, swap=True)
2019-07-02 22:13:54,990 - root - INFO - Current: 9528.00 (best: 9528.00, hot_prob_thresh=100.0000%, step=38, swap=False)
2019-07-02 22:13:54,992 - root - INFO - Current: 9297.00 (best: 9297.00, hot_prob_thresh=100.0000%, step=86, swap=True)
2019-07-02 22:13:54,993 - root - INFO - Current: 9092.00 (best: 9092.00, hot_prob_thresh=100.0000%, step=109, swap=True)
2019-07-02 22:13:54,994 - root - INFO - Current: 9018.00 (best: 9018.00, hot_prob_thresh=100.0000%, step=123, swap=True)
Score: 9018
Perm: [0, 6, 5, 3, 8, 1, 2, 7, 9, 4]
2019-07-02 22:13:55,029 - root - INFO - Classes: [0, 6, 5, 3, 8, 1, 2, 7, 9, 4]
Accuracy: 94.34%
2019-07-02 22:13:55,152 - root - INFO - Save figure at '/home/moose/confusion_matrix.tmp.pdf'
2019-07-02 22:13:55,269 - root - INFO - Found threshold for local connection: 258
2019-07-02 22:13:55,269 - root - INFO - Found 9 clusters
2019-07-02 22:13:55,270 - root - INFO - silhouette_score=-0.0067092812311967
1: [0]
1: [6]
1: [5]
1: [3]
1: [8]
1: [1]
1: [2]
2: [7, 9]
1: [4]
The following file formats are used within clana
.
Label Format¶
The label file format is a text format. It is used to make sense of the prediction. The order matters.
Specification¶
One label per line
It is a CSV file with
;
as the delimiter and"
as the quoting character.The first value is a short version of the label. It has to be unique over all short versions.
The second value is a long version of the label. It has to be unique over all long versions.
Example¶
Computer Vision¶
car;car
cat;cat
dog;dog
mouse;mouse
mnist.csv:
0;0
1;1
2;2
3;3
4;4
5;5
6;6
7;7
8;8
9;9
Language Identification¶
German;de
English;en
French;fr
Classification Dump Format¶
TODO: THIS IS WAY TOO BIG!
The classification dump format is a text format. It describes what the output of a classifier for some inputs.
Specification¶
The Classification Dump Format is a text format.
Each line contains exactly one output of the classifier for one input.
It is a CSV file with
;
as the delimiter and"
as the quoting character.The first value is an identifier for the input. It is no longer than 60 characters.
The second and following values are the outputs for each label. Each of those values is a number in
[0, 1]
.The outputs are in the same order as in the related
label.csv
file.
Example¶
identifier 1;0.1;0.3;0.6
ident 2;0.8;0.1;0.1
Ground Truth Format¶
The Ground Truth Format is a text file format. It is used to describe the ground truth of data.
Specification¶
Each line contains the ground truth of exactly one element.
It is a CSV file with
;
as the delimiter and"
as the quoting character.The first value is an identifier for the input. It is no longer than 60 characters.
The second and following values are the outputs for each label. Each of those values is a number in
[0, 1]
.The outputs are in the same order as in the related
label.csv
file.
Example¶
identifier 1;1;0;1
identifier 1;0.5;0;0.5