weka.clusterers
Class sIB

java.lang.Object
  extended by weka.clusterers.AbstractClusterer
      extended by weka.clusterers.RandomizableClusterer
          extended by weka.clusterers.sIB
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, Clusterer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class sIB
extends RandomizableClusterer
implements TechnicalInformationHandler

Cluster data using the sequential information bottleneck algorithm.

Note: only hard clustering scheme is supported. sIB assign for each instance the cluster that have the minimum cost/distance to the instance. The trade-off beta is set to infinite so 1/beta is zero.

For more information, see:

Noam Slonim, Nir Friedman, Naftali Tishby: Unsupervised document classification using sequential information maximization. In: Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval, 129-136, 2002.

BibTeX:

 @inproceedings{Slonim2002,
    author = {Noam Slonim and Nir Friedman and Naftali Tishby},
    booktitle = {Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval},
    pages = {129-136},
    title = {Unsupervised document classification using sequential information maximization},
    year = {2002}
 }
 

Valid options are:

 -I <num>
  maximum number of iterations
  (default 100).
 -M <num>
  minimum number of changes in a single iteration
  (default 0).
 -N <num>
  number of clusters.
  (default 2).
 -R <num>
  number of restarts.
  (default 5).
 -U
  set not to normalize the data
  (default true).
 -V
  set to output debug info
  (default false).
 -S <num>
  Random number seed.
  (default 1)

Version:
$Revision: 5538 $
Author:
Noam Slonim, Anna Huang
See Also:
Serialized Form

Constructor Summary
sIB()
           
 
Method Summary
 void buildClusterer(Instances data)
          Generates a clusterer.
 int clusterInstance(Instance instance)
          Cluster a given instance, this is the method defined in Clusterer interface do nothing but just return the cluster assigned to it
 java.lang.String debugTipText()
          Returns the tip text for this property
 Capabilities getCapabilities()
          Returns default capabilities of the clusterer.
 boolean getDebug()
          Get debug mode
 int getMaxIterations()
          Get the max number of iterations
 int getMinChange()
          get the minimum number of changes
 boolean getNotUnifyNorm()
          Get whether to normalize instances to unify prior probability before building the clusterer
 int getNumClusters()
          Get the number of clusters
 int getNumRestarts()
          Get the number of restarts
 java.lang.String[] getOptions()
          Gets the current settings.
 java.lang.String getRevision()
          Returns the revision string.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 java.lang.String globalInfo()
          Returns a string describing this clusterer
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
           
 java.lang.String maxIterationsTipText()
          Returns the tip text for this property.
 java.lang.String minChangeTipText()
          Returns the tip text for this property.
 java.lang.String notUnifyNormTipText()
          Returns the tip text for this property.
 int numberOfClusters()
          Get the number of clusters
 java.lang.String numClustersTipText()
          Returns the tip text for this property.
 java.lang.String numRestartsTipText()
          Returns the tip text for this property.
 void setDebug(boolean v)
          Set debug mode - verbose output
 void setMaxIterations(int i)
          Set the max number of iterations
 void setMinChange(int m)
          set the minimum number of changes
 void setNotUnifyNorm(boolean b)
          Set whether to normalize instances to unify prior probability before building the clusterer
 void setNumClusters(int n)
          Set the number of clusters
 void setNumRestarts(int i)
          Set the number of restarts
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 java.lang.String toString()
           
 
Methods inherited from class weka.clusterers.RandomizableClusterer
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.clusterers.AbstractClusterer
distributionForInstance, forName, makeCopies, makeCopy
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

sIB

public sIB()
Method Detail

buildClusterer

public void buildClusterer(Instances data)
                    throws java.lang.Exception
Generates a clusterer.

Specified by:
buildClusterer in interface Clusterer
Specified by:
buildClusterer in class AbstractClusterer
Parameters:
data - the training instances
Throws:
java.lang.Exception - if something goes wrong

clusterInstance

public int clusterInstance(Instance instance)
                    throws java.lang.Exception
Cluster a given instance, this is the method defined in Clusterer interface do nothing but just return the cluster assigned to it

Specified by:
clusterInstance in interface Clusterer
Overrides:
clusterInstance in class AbstractClusterer
Parameters:
instance - the instance to be assigned to a cluster
Returns:
the number of the assigned cluster as an integer
Throws:
java.lang.Exception - if instance could not be clustered successfully

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -I <num>
  maximum number of iterations
  (default 100).
 -M <num>
  minimum number of changes in a single iteration
  (default 0).
 -N <num>
  number of clusters.
  (default 2).
 -R <num>
  number of restarts.
  (default 5).
 -U
  set not to normalize the data
  (default true).
 -V
  set to output debug info
  (default false).
 -S <num>
  Random number seed.
  (default 1)

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableClusterer
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableClusterer
Returns:
an enumeration of all the available options.

getOptions

public java.lang.String[] getOptions()
Gets the current settings.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableClusterer
Returns:
an array of strings suitable for passing to setOptions()

debugTipText

public java.lang.String debugTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setDebug

public void setDebug(boolean v)
Set debug mode - verbose output

Parameters:
v - true for verbose output

getDebug

public boolean getDebug()
Get debug mode

Returns:
true if debug mode is set

maxIterationsTipText

public java.lang.String maxIterationsTipText()
Returns the tip text for this property.

Returns:
tip text for this property

setMaxIterations

public void setMaxIterations(int i)
Set the max number of iterations

Parameters:
i - max number of iterations

getMaxIterations

public int getMaxIterations()
Get the max number of iterations

Returns:
max number of iterations

minChangeTipText

public java.lang.String minChangeTipText()
Returns the tip text for this property.

Returns:
tip text for this property

setMinChange

public void setMinChange(int m)
set the minimum number of changes

Parameters:
m - the minimum number of changes

getMinChange

public int getMinChange()
get the minimum number of changes

Returns:
the minimum number of changes

numClustersTipText

public java.lang.String numClustersTipText()
Returns the tip text for this property.

Returns:
tip text for this property

setNumClusters

public void setNumClusters(int n)
Set the number of clusters

Parameters:
n - number of clusters

getNumClusters

public int getNumClusters()
Get the number of clusters

Returns:
the number of clusters

numberOfClusters

public int numberOfClusters()
Get the number of clusters

Specified by:
numberOfClusters in interface Clusterer
Specified by:
numberOfClusters in class AbstractClusterer
Returns:
the number of clusters

numRestartsTipText

public java.lang.String numRestartsTipText()
Returns the tip text for this property.

Returns:
tip text for this property

setNumRestarts

public void setNumRestarts(int i)
Set the number of restarts

Parameters:
i - number of restarts

getNumRestarts

public int getNumRestarts()
Get the number of restarts

Returns:
number of restarts

notUnifyNormTipText

public java.lang.String notUnifyNormTipText()
Returns the tip text for this property.

Returns:
tip text for this property

setNotUnifyNorm

public void setNotUnifyNorm(boolean b)
Set whether to normalize instances to unify prior probability before building the clusterer

Parameters:
b - true to normalize, otherwise false

getNotUnifyNorm

public boolean getNotUnifyNorm()
Get whether to normalize instances to unify prior probability before building the clusterer

Returns:
true if set to normalize, false otherwise

globalInfo

public java.lang.String globalInfo()
Returns a string describing this clusterer

Returns:
a description of the clusterer suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the clusterer.

Specified by:
getCapabilities in interface Clusterer
Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class AbstractClusterer
Returns:
the capabilities of this clusterer
See Also:
Capabilities

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class AbstractClusterer
Returns:
the revision

main

public static void main(java.lang.String[] argv)