TY - GEN
T1 - Believe it today or tomorrow? Detecting untrustworthy information from dynamic multi-source data
AU - Xiao, Houping
AU - Li, Yaliang
AU - Gao, Jing
AU - Wang, Fei
AU - Ge, Liang
AU - Fan, Wei
AU - Vu, Long H.
AU - Turaga, Deepak S.
N1 - Publisher Copyright: Copyright © SIAM.
PY - 2015
Y1 - 2015
N2 - A vast ocean of data is collected every day, and numerous applications call for the extraction of actionable insights from data. One important task is to detect untrustworthy information because such information usually indicates critical, unusual, or suspicious activities. In this paper, we study the important problem of detecting untrustworthy information from a novel perspective of correlating and comparing multiple sources that describe the same set of items. Different from existing work, we recognize the importance of time dimension in modeling the commonalities among multiple sources. We represent dynamic multi-source data as tensors and develop a joint non-negative tensor factorization approach to capture the common patterns across sources. We then conduct a comparison between source input and common patterns to identify inconsistencies as an indicator of untrustworthiness. An incremental factorization approach is developed to improve the computational efficiency on dynamically arriving data. We also propose a method to handle data sparseness. Experiments are conducted on hotel rating, network traffic flow, and weather forecast data that are collected from multiple sources. Results demonstrate the advantages of the proposed approach in detecting inconsistent and untrustworthy information.
AB - A vast ocean of data is collected every day, and numerous applications call for the extraction of actionable insights from data. One important task is to detect untrustworthy information because such information usually indicates critical, unusual, or suspicious activities. In this paper, we study the important problem of detecting untrustworthy information from a novel perspective of correlating and comparing multiple sources that describe the same set of items. Different from existing work, we recognize the importance of time dimension in modeling the commonalities among multiple sources. We represent dynamic multi-source data as tensors and develop a joint non-negative tensor factorization approach to capture the common patterns across sources. We then conduct a comparison between source input and common patterns to identify inconsistencies as an indicator of untrustworthiness. An incremental factorization approach is developed to improve the computational efficiency on dynamically arriving data. We also propose a method to handle data sparseness. Experiments are conducted on hotel rating, network traffic flow, and weather forecast data that are collected from multiple sources. Results demonstrate the advantages of the proposed approach in detecting inconsistent and untrustworthy information.
UR - https://www.scopus.com/pages/publications/84961938125
U2 - 10.1137/1.9781611974010.45
DO - 10.1137/1.9781611974010.45
M3 - Conference contribution
T3 - SIAM International Conference on Data Mining 2015, SDM 2015
SP - 397
EP - 405
BT - SIAM International Conference on Data Mining 2015, SDM 2015
A2 - Venkatasubramanian, Suresh
A2 - Ye, Jieping
PB - Society for Industrial and Applied Mathematics Publications
T2 - SIAM International Conference on Data Mining 2015, SDM 2015
Y2 - 30 April 2015 through 2 May 2015
ER -