Rule discovery from time series
Content
Introduction
Rule Discovery Process
Review on Rule Discovery from Time Series
Problem Statement
Conclusion
m time series, 2007 34 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 Complex temporal abstractions Used to detect patterns characterized by behaviors which can’t be represented by basic TAs Corresponding to patterns whose time intervals are based on temporal relationships defined in Allen’s algebra (13 temporal operators) Originating from the same and from different time series 35 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 36 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 Allen’s temporal operators on two periods of time BEFORE (period1, period2) 1111 1 2 2222 EQUAL (period1, period2) 11111 22222 MEETS (period1, period2) 1111 12 2222 OVERLAPS (period1, period2) 11111 22222 DURING (period1, period2) 11111 22222222 STARTS (period1, period2) 1 1111 2 2222222 FINISHES (period1, period2) 1111 1 2222222 2 TRUE TRUE TRUE TRUE TRUE TRUE TRUE 37 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 Temporal association rule Temporal relationships (OVERLAPS, FINISHED-BY, MEETS, BEFORE, EQUALS, STARTS PRECEDES) between the complex temporal patterns Rule format: A p C A: the set {a1, a2, a3, , an} AoI (set of abstractions of interest) p : a triple (LS-left shift, G-gap, RS-right shift) C: an abstraction AoI Interpretation: a temporal association rule holds when all the patterns in the antecedent intersect and when the relation PRECEDES holds between the intervals of this intersection and an episode of the pattern in the consequent. 38 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 Definition of the set AoI of abstractions of interest 39 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 40 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 41 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 42 Support = RTS / TSO RTS: the rule time span corresponding to the union of the episodes in which both the antecedent and the consequent of the rule occur TSO: the time span, i.e. the total duration, of the observation period over which the rule is derived Interpretation: a measure of how the rule is ‘spread’ over the observation time span Confidence = NARTS / NAT NARTS: the number of times (episodes) the antecedent occurs during RTS NAT: the number of time (episodes) the antecedent occurs during TSO Interpretation: the frequency of the rule over the total amount of episodes of the antecedent Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 43 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 44 Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007 Algorithm for extracting temporal association rules 45 Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008 46 Discrete values: up, down, and none Temporal item: a discrete value + its position in rules + time stamp Example: g 1L is an “up” discrete value at the time point “1” expected to appear on the left hand side of temporal association rules. Temporal item set: a non-empty set of temporal items Temporal frequent item set: a temporal item set whose support is greater than or equal to a support threshold ( MinSup ) Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008 47 Rule format: LHS ( ∆ ) RHS LHS & RHS: disjoint temporal frequent item sets. ∆ (association interval): the interval of different two time stamps of the LHS and RHS. ∆ = 0..n. Interpretation: RHS follows LHS after ∆. Temporal transaction set: a combination of the transaction at the time t (t=0..n-∆) and the transaction at the time t + ∆. Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008 48 Support of a temporal item set = the number of occurrences of the temporal item set / the number of temporal transaction sets Confidence of a temporal association rule = support (LHS RHS)/support (LHS) Precision = (# of matched rules)/(# of extracted rules) A measure of correctness of the extracted rules Precision [0, 1] The KEGG cell cycle regulation path (known) Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008 49 Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008 a temporal transaction set # of temporal transaction sets = 5 Support ({g 1L }) = 3/5 = 60% Support ({g 2R }) = 3/5 = 60% Support ({g 3R }) = 3/5 = 60% Support ({g 2L }) = 3/5 = 60% Support ({g 1L , g 2R }) = 3/5 = 60% Support ({g 1L , g 3R }) = 1/5 = 20% Confidence (g 1 (2) g 2 ) = Support ({g 1L , g 2R }) / support({g 1L }) Confidence (g 1 (2) g 2 ) = 60%/60% Confidence (g 1 (2) g 2 ) = 1 50 Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009 Rule format: H: an event corresponding to an appearance of a pattern P within the time range of the rule head B: an event corresponding to the characteristics of stock prices within the time range of the rule body. Investor-specified rule body conditions decide the characteristics of stock prices (INCREASE, DECREASE, UNCHANGED). s = support (H) = (number of occurrences of a pattern P corresponding to H)*100/(number of occurrences of patterns whose lengths are same as that of the pattern P corresponding to H) c = confidence (H, B) = (number of occurrences of patterns that are matched to H and also satisfied with the condition on B)*100/(number of occurrences of patterns that are matched to H) Interpretation: B happens after time t since H has occurred. 51 Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009 Changing patterns of stock prices 52 Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009 Process flow for finding frequent patterns 53 Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009 54 Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009 Categorization A symbol sequence S’’ = 55 Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009 The Apriori algorithm to discover frequent patterns Discover a set of frequent patterns F 1 of length 1 by scanning S’’ Perform the self-join of F k-1 (2<=k<= length(S ’’)) to produce a set of candidate patterns Ck of length k For each pattern in C k , scan S’’ to decide if the support of is greater than or equal to MinSup If yes, then is inserted into F k . 56 Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009 57 Review – Summary on Patterns Refs Meaning Length Shape Overlapping Temporal aspect Extraction [Das98] Not specified Fixed Fixed Yes No Subsequence clustering [Au04] Not specified Fixed Fixed No No Fuzzifying [Daf05] Pre-specified Arbitrary Fixed No (almost) No Subsequence clustering [Sac07] Pre-specified Arbitrary Not fixed No Yes A sliding window algorithm that recursively expand segments by adding new points until some error bound is exceeded [Nam08] Pre-specified Arbitrary Not fixed No Yes A discrete value conversion algorithm [Ha09] Not specified Arbitrary Not fixed Yes No Categorization and the Apriori algorithm 58 Review – Summary on Rules Refs Point/interval-based rules Inter/intra-transaction rules Predictive rules Single/multiple time series rules Association (dependence, correlation, casual, classification) rules [Das98] Point Inter Yes (not always) Both correlation [Au04] None None None Single classification [Daf05] Point Inter Yes (not always) Both correlation [Sac07] Interval Both Yes (not always) Both correlation [Nam08] Point Both Yes (not always) Both correlation [Ha09] Point Inter Yes Single classification 59 Review – Summary on Algorithms Refs In-memory Index structures Measures Algorithm analysis [Das98] Yes No Frequency, confidence, J-measure [Au04] Yes No Interestingness, weight of evidence [Daf05] Yes No Frequency, confidence, J-measure [Sac07] Yes No Support, confidence, precision, recall [Nam08] Yes No Support, confidence, precision [Ha09] Yes Yes (B-tree) Support, confidence 60 Review – Summary on User Support Refs Rule interpretation Rule visualization End-users/data mining application developers [Das98] Yes No None [Au04] No No None [Daf05] Yes No None [Sac07] Yes No None [Nam08] Yes No None [Ha09] Yes No End-users 61 Problem Statement Temporal aspect investigation for rules Items (Patterns) of interest Rule model Rule discovery technique evaluation (analysis) Choice of measures (support, confidence, j-measure, weight of evidence, ) User support 62 Conclusion Rule discovery from time series Rule discovery process Several issues regarding time series rule discovery Patterns Rules Rule discovery algorithms User support Problem statement 63 Thank you. 64
File đính kèm:
rule_discovery_from_time_series.ppt