Rule discovery from time series
Content
Introduction
Rule Discovery Process
Review on Rule Discovery from Time Series
Problem Statement
Conclusion
m time series, 2007
34
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Complex temporal abstractions
Used to detect patterns characterized by behaviors which can’t be represented by basic TAs
Corresponding to patterns whose time intervals are based on temporal relationships defined in Allen’s algebra (13 temporal operators)
Originating from the same and from different time series
35
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
36
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Allen’s temporal operators on two periods of time
BEFORE (period1, period2) 1111 1 2 2222
EQUAL (period1, period2) 11111
22222
MEETS (period1, period2) 1111 12 2222
OVERLAPS (period1, period2) 11111
22222
DURING (period1, period2) 11111
22222222
STARTS (period1, period2) 1 1111
2 2222222
FINISHES (period1, period2) 1111 1
2222222 2
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
37
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Temporal association rule
Temporal relationships (OVERLAPS, FINISHED-BY, MEETS, BEFORE, EQUALS, STARTS PRECEDES) between the complex temporal patterns
Rule format: A p C
A: the set {a1, a2, a3, , an} AoI (set of abstractions of interest)
p : a triple (LS-left shift, G-gap, RS-right shift)
C: an abstraction AoI
Interpretation: a temporal association rule holds when all the patterns in the antecedent intersect and when the relation PRECEDES holds between the intervals of this intersection and an episode of the pattern in the consequent.
38
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Definition of the set AoI of abstractions of interest
39
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
40
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
41
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
42
Support = RTS / TSO
RTS: the rule time span corresponding to the union of the episodes in which both the antecedent and the consequent of the rule occur
TSO: the time span, i.e. the total duration, of the observation period over which the rule is derived
Interpretation: a measure of how the rule is ‘spread’ over the observation time span
Confidence = NARTS / NAT
NARTS: the number of times (episodes) the antecedent occurs during RTS
NAT: the number of time (episodes) the antecedent occurs during TSO
Interpretation: the frequency of the rule over the total amount of episodes of the antecedent
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
43
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
44
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Algorithm for extracting temporal association rules
45
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
46
Discrete values: up, down, and none
Temporal item: a discrete value + its position in rules + time stamp
Example: g 1L is an “up” discrete value at the time point “1” expected to appear on the left hand side of temporal association rules.
Temporal item set: a non-empty set of temporal items
Temporal frequent item set: a temporal item set whose support is greater than or equal to a support threshold ( MinSup )
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
47
Rule format: LHS ( ∆ ) RHS
LHS & RHS: disjoint temporal frequent item sets.
∆ (association interval): the interval of different two time stamps of the LHS and RHS. ∆ = 0..n.
Interpretation: RHS follows LHS after ∆.
Temporal transaction set: a combination of the transaction at the time t (t=0..n-∆) and the transaction at the time t + ∆.
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
48
Support of a temporal item set = the number of occurrences of the temporal item set / the number of temporal transaction sets
Confidence of a temporal association rule = support (LHS RHS)/support (LHS)
Precision = (# of matched rules)/(# of extracted rules)
A measure of correctness of the extracted rules
Precision [0, 1]
The KEGG cell cycle regulation path (known)
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
49
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
a temporal transaction set
# of temporal transaction sets = 5
Support ({g 1L }) = 3/5 = 60%
Support ({g 2R }) = 3/5 = 60%
Support ({g 3R }) = 3/5 = 60%
Support ({g 2L }) = 3/5 = 60%
Support ({g 1L , g 2R }) = 3/5 = 60%
Support ({g 1L , g 3R }) = 1/5 = 20%
Confidence (g 1 (2) g 2 ) =
Support ({g 1L , g 2R }) / support({g 1L })
Confidence (g 1 (2) g 2 ) = 60%/60% Confidence (g 1 (2) g 2 ) = 1
50
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
Rule format:
H: an event corresponding to an appearance of a pattern P within the time range of the rule head
B: an event corresponding to the characteristics of stock prices within the time range of the rule body. Investor-specified rule body conditions decide the characteristics of stock prices (INCREASE, DECREASE, UNCHANGED).
s = support (H) = (number of occurrences of a pattern P corresponding to H)*100/(number of occurrences of patterns whose lengths are same as that of the pattern P corresponding to H)
c = confidence (H, B) = (number of occurrences of patterns that are matched to H and also satisfied with the condition on B)*100/(number of occurrences of patterns that are matched to H)
Interpretation: B happens after time t since H has occurred.
51
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
Changing patterns of stock prices
52
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
Process flow for finding frequent patterns
53
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
54
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
Categorization
A symbol sequence S’’ =
55
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
The Apriori algorithm to discover frequent patterns
Discover a set of frequent patterns F 1 of length 1 by scanning S’’
Perform the self-join of F k-1 (2<=k<= length(S ’’)) to produce a set of candidate patterns Ck of length k
For each pattern in C k , scan S’’ to decide if the support of is greater than or equal to MinSup
If yes, then is inserted into F k .
56
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
57
Review – Summary on Patterns
Refs
Meaning
Length
Shape
Overlapping
Temporal aspect
Extraction
[Das98]
Not specified
Fixed
Fixed
Yes
No
Subsequence clustering
[Au04]
Not specified
Fixed
Fixed
No
No
Fuzzifying
[Daf05]
Pre-specified
Arbitrary
Fixed
No (almost)
No
Subsequence clustering
[Sac07]
Pre-specified
Arbitrary
Not fixed
No
Yes
A sliding window algorithm that recursively expand segments by adding new points until some error bound is exceeded
[Nam08]
Pre-specified
Arbitrary
Not fixed
No
Yes
A discrete value conversion algorithm
[Ha09]
Not specified
Arbitrary
Not fixed
Yes
No
Categorization and the Apriori algorithm
58
Review – Summary on Rules
Refs
Point/interval-based rules
Inter/intra-transaction rules
Predictive rules
Single/multiple time series rules
Association (dependence, correlation, casual, classification) rules
[Das98]
Point
Inter
Yes (not always)
Both
correlation
[Au04]
None
None
None
Single
classification
[Daf05]
Point
Inter
Yes (not always)
Both
correlation
[Sac07]
Interval
Both
Yes (not always)
Both
correlation
[Nam08]
Point
Both
Yes (not always)
Both
correlation
[Ha09]
Point
Inter
Yes
Single
classification
59
Review – Summary on Algorithms
Refs
In-memory
Index structures
Measures
Algorithm analysis
[Das98]
Yes
No
Frequency, confidence, J-measure
[Au04]
Yes
No
Interestingness, weight of evidence
[Daf05]
Yes
No
Frequency, confidence, J-measure
[Sac07]
Yes
No
Support, confidence, precision, recall
[Nam08]
Yes
No
Support, confidence, precision
[Ha09]
Yes
Yes (B-tree)
Support, confidence
60
Review – Summary on User Support
Refs
Rule interpretation
Rule visualization
End-users/data mining application developers
[Das98]
Yes
No
None
[Au04]
No
No
None
[Daf05]
Yes
No
None
[Sac07]
Yes
No
None
[Nam08]
Yes
No
None
[Ha09]
Yes
No
End-users
61
Problem Statement
Temporal aspect investigation for rules
Items (Patterns) of interest
Rule model
Rule discovery technique evaluation (analysis)
Choice of measures (support, confidence, j-measure, weight of evidence, )
User support
62
Conclusion
Rule discovery from time series
Rule discovery process
Several issues regarding time series rule discovery
Patterns
Rules
Rule discovery algorithms
User support
Problem statement
63
Thank you.
64
File đính kèm:
rule_discovery_from_time_series.ppt



