TY - GEN
T1 - Empath
T2 - 2011 8th International Conference and Expo on Emerging Technologies for a Smarter World, CEWIT 2011
AU - Ward, Charles B.
AU - Choi, Yejin
AU - Skiena, Steven
AU - Xavier, Eduardo C.
PY - 2011
Y1 - 2011
N2 - Sentiment analysis is the fundamental component in text-driven monitoring or forecasting systems, where the general sentiment towards real-world entities (e.g., people, products, organizations) are analyzed based on the sentiment signals embedded in a myriad of web text available today. Building such systems involves several practically important problems, from data cleansing (e.g., boilerplate removal, web-spam detection), and sentiment analysis at individual mention-level (e.g., phrase, sentence-, document-level) to the aggregation of sentiment for each entity-level (e.g., person, company) analysis. Most previous research in sentiment analysis however, has focused only on individual mention-level analysis, and there has been relatively less work that copes with other practically important problems for enabling a large-scale sentiment monitoring system. In this paper, we propose Empath, a new framework for evaluating entity-level sentiment analysis. Empath leverages objective measurements of entities in various domains such as people, companies, countries, movies, and sports, to facilitate entity-level sentiment analysis and tracking. We demonstrate the utility of Empath for the evaluation of a large-scale sentiment system by applying it to various lexicons using Lydia, our own large scale text-analytics tool, over a corpus consisting of more than a terabyte of newspaper data. We expect that Empath will encourage research that encompasses end-to-end pipelines to enable a large-scale text-driven monitoring and forecasting systems.
AB - Sentiment analysis is the fundamental component in text-driven monitoring or forecasting systems, where the general sentiment towards real-world entities (e.g., people, products, organizations) are analyzed based on the sentiment signals embedded in a myriad of web text available today. Building such systems involves several practically important problems, from data cleansing (e.g., boilerplate removal, web-spam detection), and sentiment analysis at individual mention-level (e.g., phrase, sentence-, document-level) to the aggregation of sentiment for each entity-level (e.g., person, company) analysis. Most previous research in sentiment analysis however, has focused only on individual mention-level analysis, and there has been relatively less work that copes with other practically important problems for enabling a large-scale sentiment monitoring system. In this paper, we propose Empath, a new framework for evaluating entity-level sentiment analysis. Empath leverages objective measurements of entities in various domains such as people, companies, countries, movies, and sports, to facilitate entity-level sentiment analysis and tracking. We demonstrate the utility of Empath for the evaluation of a large-scale sentiment system by applying it to various lexicons using Lydia, our own large scale text-analytics tool, over a corpus consisting of more than a terabyte of newspaper data. We expect that Empath will encourage research that encompasses end-to-end pipelines to enable a large-scale text-driven monitoring and forecasting systems.
UR - https://www.scopus.com/pages/publications/84857221210
U2 - 10.1109/CEWIT.2011.6135866
DO - 10.1109/CEWIT.2011.6135866
M3 - Conference contribution
SN - 9781457715914
T3 - 2011 8th International Conference and Expo on Emerging Technologies for a Smarter World, CEWIT 2011
BT - 2011 8th International Conference and Expo on Emerging Technologies for a Smarter World, CEWIT 2011
Y2 - 2 November 2011 through 3 November 2011
ER -