KDD Cup 2012:今年的数据挖掘相关会议在中国，充分体现了我们的IT企业在经历了原始社会后的第一次进步，开始注重机器学习这已经是国外巨头梦寐以求的热土。本次金牌赞助是华为、腾讯、百度，比赛分为两组，一组以研究微博用户关注为基础数据，预测未给出的用户关系；另外一组应该是对搜索引擎广告推荐的相关预测，数据还未出来大家拭目以待，3月1日。报名参赛与数据发布提交地址：http://www.kddcup2012.org/
This year's KDD Cup is sponsored by Tencent Inc., which is China's largest Internet company in terms of active users (over 700 Million users as of Jan. 2012). Tencent Inc. owns a full portfolio of popular products including instance messaging, email, and news portal, search engine, online games, blogging and micro-blogging in China, offering a rich opportunity to build user models for highly effective user intent prediction and result recommendation. This year's KDD Cup consists of two separate tasks.
User Modeling based on Microblog Data and Search Click Data
Task 1. Social Network Mining on Microblogs (Weibo)
Tencent Weibo (http://t.qq.com/) offers a wealth of social-networking information. For the 2012 KDD Cup, the released data represents a sampled snapshot of the Tencent Weibo users' preferences for various items - the recommendation to users and follow-relation history. In addition, items are tied together within a hierarchy. That is, each person, organization or group belongs to specific categories, and a category belongs to higher-level categories. In the competition, both users and items (person, organizations and groups) are represented as anonymous numbers that are made meaningless, so that no identifying information is revealed. The data consists of 10 million users and 50,000 items, with over 300 million recommendation records and about three million social-networking "following" actions. Items are linked together within a defined hierarchy, and the privacy-protected user information is very rich as well. The data has timestamps on user activities.
Task 1 is to predict which users a given user will follow, among all potential users.
Task 2. User Click Modeling based on Search Engine Log Data
Online advertising has been the financial support of the Internet industry for years. Three successful kinds of computational ad systems are search ad, contextual ad and social networking ad systems. Search ads systems retrieve and rank ads given a query, and display result ads together with results from the search engine. Once a user clicks on an ad, the advertiser pays the search engine for its help on promotion. The ranking of ads is to maximize users' satisfaction, advertisers' return-on-investment and search engine's revenue. Contextual ad systems involve an additional role, the publishers, who own Internet properties like Web sites, forums or mobile apps. Programs embedded in these properties request ads from ad systems. The ad system finds ads that semantically match content of the properties. Recently, a third kind of computational ad systems is gaining popularity, including social network ads, gained a lot of attention, where the ad system ranks ads with consideration of social relationship.
In all aforementioned systems, a key algorithmic component is to predict the click-through rate (pCTR) of ads. This is because all such systems optimize monetization under the supervision of economic rules (e.g., General Second Price auction, the one behind Google AdWords and others); and these rules require ads pCTR values to rank ads and to price clicks. The closer the pCTR to the truth, the more effective the monetization would be. The use of user information, including demographics and historical behaviors on search engines, e-business platforms, social networks, and micro-blogs, is likely valuable to improve the accuracy of ads pCTR in all above systems.
Task 2's aim is to accurately predict the ads' click-through rate in online computational ad systems.
Feb 20, 2012Competition announcement linked to KDD official site
Mar 1, 2012Registration opens (dataset ready for the public)
Mar 15, 2012Competition begins
Jun 1, 2012Competition ends (submission deadline)
Jun 5, 2012Results compiled
Jun 8, 2012Winners notified
Aug 12, 2012Workshop
*Note that this is only an initial announcement. Stay tuned for more detailed announcements.
KDDCUP 2012 Organizers
- Dr. Gordon Sun, Chief Scientist, Tencent Inc.
- Dr. Yading Aden Yue, Expert Researcher, Tencent Inc.
- Dr. Yi Wang, Deputy Director, Contextual Advertising Platform, Tencent Inc.
- Mr. Jian Jimmy Hu, Scientist, Tencent Inc.
- Dr. Yong Nicky Li, Leader, Data Mining Group, Tencent Inc.