Toward order-of-magnitude cascade prediction
Proceedings of the 2015 IEEE/ACM international conference on Advances in …, 2015•dl.acm.org
When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a
social network, an important question arises: will it spread to" viral" proportions--where" viral"
is defined as an order-of-magnitude increase. However, several previous studies have
established that cascade size and frequency are related through a power-law-which leads to
a severe imbalance in this classification problem. In this paper, we devise a suite of
measurements based on" structural diversity"--the variety of social contexts (communities) in …
social network, an important question arises: will it spread to" viral" proportions--where" viral"
is defined as an order-of-magnitude increase. However, several previous studies have
established that cascade size and frequency are related through a power-law-which leads to
a severe imbalance in this classification problem. In this paper, we devise a suite of
measurements based on" structural diversity"--the variety of social contexts (communities) in …
When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to "viral" proportions -- where "viral" is defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power-law - which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on "structural diversity" -- the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class - despite this class comprising under 2% of samples. This significantly outperforms our baseline approach as well as the current state-of-the-art. Our work also demonstrates how we can tradeoff between precision and recall.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果