I have a question on Clustering part of the
package. I did not find an answer to my question in the manuals for the
package, I hope you can help me.
I have 3 clusters: {2,12,30}, {4,3,11}, and {10,20,25}. I want to use
K-means learning scheme.
The problem I am having is how do I put this data into the csv or arff
files, so the sofware would interpret it correctly.
I did not have any luck with .arff file that looks like the following:
@relation Kmeans
@attribute cluster1 real
@attribute cluster2 real
@attribute cluster3 real
@data
2,12,30
4,3,11
10,20,25
Therefore, I decided to use .csv file:
@relation Kmeans
2,4,10
12,3,20
30,11,25
The Classify part of the package recognized 3 clusters and calculated
means. But when I applied K-means learning scheme to this data, the final
clusters stayed the same and were not regrouped according to the new
means.
I am, also, not sure how do my arff and csv file should look like.
Whould would you suggest to solve this problem?
Your help is greatly appreciated.

Hello All,
leave-one-out problem:
I wanted to apply leave-one-out in Weka. I thought if
I set number of folds to 1 in cross validation (either
in explorer or experimenter), that would do the job
but the number of folds should be min 2. So how can I
apply leave-one out in weka?
summary problem:
the field summary of in the result of Experiment work
contains ? and because of this, I cannot get the value
of summary. Why does the summary field contain "?" ?
header display
How can I display the column key field name in the
test output area? for ex, I select training number,
percent correct etc in column key field but weka just
display the values. It's difficult to read the values
without having the key field name.
Shuffling patterns:
In cross validation, does Weka take the instances
(patterns or samples) at random or does it take them
in order?
Look forward to your reply,
Haleh
__________________________________
Do you Yahoo!?
Meet the all-new My Yahoo! - Try it today!
http://my.yahoo.com

i tried to use the database access (it already got set) but i didn't make it .
it didn't read the data but only says 'Couldn't read from database: null'
i want to use the database at the MySQL but i dont know what's wrong !
urgent i need your help~

Hello,
I plan to make some contribution to the Weka toolbox. It seemed to me
there was an email for that purpose, but I don't find it anymore on the
website. How can I do ?
Best regards,
Nicolas

Phyto Ner wrote:
> When comparing multiple classifiers using ROC curves (ThresholdCurve),
> I can find an optimal classifier for my practical constraints. Let's
> say it is a given classifier at a threshold of 0.1 on class C.
>
> How should I interpret this threshold of 0.1?
>
> Does it mean that the classifier should always guess class C when the
> probability of class C in
> "classifier.distributionForInstance(Instance)" is higher than 0.1
> (even if another class has an higher prob)?
Yes. You can equivalently view the threshold as a re-weighting of the
posterior. So a threshold of 0.1 on class + means that 0.1 for + ties
with 0.9 for -; in other words, the class + probabilities should be
weighted with a factor 5 and the class - probabilities with a factor
5/9. After re-weighting you can again choose the class which maximises
the (re-weighted) posterior.
It is often more natural to think in terms of weights, particularly
because thresholds don't generalise to more than two classes. See
@inproceedings{lachiche-flach-icml03,
author={N. Lachiche and P.A. Flach},
title={Improving accuracy and cost of two-class and multi-class
probabilistic classifiers using ROC curves},
booktitle={Proc. 20th International Conference on Machine Learning
(ICML'03)},
ISBN={1-57735-189-4},
publisher={AAAI Press},
pages={416--423},
month={January},
year={2003},
abstract={The probability estimates of a naive Bayes classifier are
inaccurate if some of its underlying independence assumptions are
violated. The decision criterion for using these estimates for
classification therefore has to be learned from the data. This paper
proposes the use of ROC curves for this purpose. For two classes, the
algorithm is a simple adaptation of the algorithm for tracing a ROC
curve by sorting the instances according to their predicted probability
of being positive. As there is no obvious way to upgrade this
algorithm to the multi-class case, we propose a hill-climbing
approach which adjusts the weights for each class in a pre-defined
order. Experiments on a wide range of datasets show the proposed
method leads to significant improvements over the naive Bayes
classifier's accuracy. Finally, we discuss an method to find the
global optimum, and show how its computational complexity would make
it untractable.},
abstract-url={http://www.cs.bris.ac.uk/Publications/pub_info.jsp?
id=1000706},
url={http://www.cs.bris.ac.uk/Publications/Papers/1000706.pdf},
pubtype={102}
}
Hope this helps,
--Peter

Hello,
My name is Julien Velcin and I use Weka since 1 year and a half already
for my research. At this time, I have troubles because I need to save
clustering results into an ARFF file, in order to evaluate this
categorization, but with command line instructions. I know that it is
possible (and easy) to do that using the GUI, but I have to automatize
the whole process and I must use the command line instead. Could
somebody help please ? Thank you for all,
Julien

Hello,
When comparing multiple classifiers using ROC curves (ThresholdCurve),
I can find an optimal classifier for my practical constraints. Let's
say it is a given classifier at a threshold of 0.1 on class C.
How should I interpret this threshold of 0.1?
Does it mean that the classifier should always guess class C when the
probability of class C in
"classifier.distributionForInstance(Instance)" is higher than 0.1
(even if another class has an higher prob)?
thanks,
David
U. of Ottawa

Hi all,
First, thanks to the weka team for making a very
good tool.
I have a question about Bayes Net. From the weka bayes
net tutorial, it is possible to input a BIF file to
compare with. Is it possible to make this BIF file an
input to the local search algorithm so that instead of
starting from scratch, the algorithm can start from an
existing graph?
Thank you,
Amira
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com