Instance Representation¶
The primary classes in the concept_formation package
(CobwebTree
,
Cobweb3Tree
, and
TrestleTree
) learn from
instances that are represented as python dictionaries (i.e., lists of attribute
values). This representation is different from the feature vector representation
that is used by most machine learning packages (e.g., ScikitLearn). The concept_formation package uses the
dictionary format instead of feature vectors for two reasons: dictionaries are
more human readable and dictionaries offer more flexibility in the kinds of
data that can be represented (e.g., attributes in dictionaries can have other
dictionaries as values). Furthermore, it is a general format that many other
representations, such as JSON, can be easily converted into. In fact, the
concept_formation package has methods for facilitating such conversions.
Attributes¶
The concept_formation package supports four kinds of attributes:
- Constant Attributes
- The default attribute type. Constant attributes are typically strings but any attribute that does not satisfy the conditions for the other categories will be assumed to be constant.
- Variable Attributes
- Any attribute that can be renamed to maximize mapping between an instance
and a concept. This allows for matching attributes based on the similarity
of their values rather than strictly on their attribute names. Variable are
denoted with a question mark
'?'
as their first element (e.g.,'?variable-attribute'
).
- Relational Attributes
- An attribute that represents a relationship between other attributes or
values of the instance. Relation attributes are represented as tuples (e.g.,
('relation', 'obj1', 'obj2')
). Relations can only be in the top level of the instance (i.e., component values, described below, cannot contain relations). If a relationship needs to be expressed between attributes of component values, then preorder unary relations can be used. For example, to express a relationship of feature1 of subobject1 I might have:('relation', ('feature1', 'subobject1')
).
- Hidden Attributes
- Attributes that are maintained in the concept knowledge base but are not
considered during concept formation. These are useful for propagating unique
ids or other bookkeeping labels into the knoweldge base without biasing
concept formation. Hidden attributes are denoted as constant or relational
attributes that have an
'_'
as their first element (i.e.,attribute[0] == '_'
). For constants, this means that the first character is an underscore (e.g.,"_hidden"
). For relations, this means that the first element in the tuple is an string underscore (e.g.,('_', 'hidden-relation', 'obj')
).
Only the constant and hidden attributes are supported by
CobwebTree
and
Cobweb3Tree
.
TrestleTree
supports all
attribute types.
In general attribute names must be hashable (so they can be used in a
dictionary and must be zero index-able (e.g., attribute[0]
, so that they
can be tested to determine if they are hidden.
Values¶
For each of these attribute type, the concept_formation package supports three kinds of values:
- Nominal Values
- All non-numerical values (typically strings or booleans).
- Numerical Values
- All values that are recognized by Python as numbers (i.e.,
isinstance(val, Number)
).
- Component Values
- All dictionary values (i.e., sub-instances). All component values are
internally converted into unary relations, so unary relations can also be
used directly. For example
{'subobject: {'attr': 'value'}}
is equivalent to{('attr', 'subobject'): 'value'}
. Note that sub-instances cannot contain relations. Instead include the relations in the top-level instance and use unary relations to refer to elements of sub-instances (e.g.,('relation1' ('att1', 'subobject'))
).
The CobwebTree
class supports
only nominal values. The
Cobweb3Tree
supports both
nominal and numeric values. Finally, the
TrestleTree
supports all value
types.
Example Instance¶
Here is an instance that provides an example of each of these different attribute-value type combinations:
# Data is stored in a list of dictionaries where values can be either nominal,
# numeric, hidden, component, unbound attributes, or relational.
In [1]: instance = {'f1': 'v1', # constant attribute with nominal value
...: 'f2': 2.6, # constant attribute with numerical value
...: 'f3': {'sub-feature1': 'v1'}, # constant attribute with component value
...: '?f4': 'v1', # variable attribute with nominal value
...: '?f5': 2.6, # variable attribute with numerical value
...: '?f6': {'sub-feature1': 'v1'}, # variable attribute with component value
...: ('some-relation', 'f3', '?f4'): True, #relation attribute with nominal value
...: ('some-relation2', 'f3', '?f4'): 2.6, #relation attribute with numeric value
...: ('some-relation3', 'f3', '?f4'): {'sub-feature1': 'v1'}, #relation attribute with component value
...: ('some-relation4', 'f3', ('sub-feature1', '?f4')): True, # relation attribute that uses unary relation to access sub-feature1 of ?f4. It also has a nominal value.
...: '_f7': 'v1', # hidden attribute with nominal value
...: '_f8': 2.6, # hidden attribute with numeric value
...: '_f9': {'sub-feature1': 'v1'}, # hidden attribute with component value
...: }
...: