{"id":10875,"date":"2012-07-24T10:48:16","date_gmt":"2012-07-24T05:18:16","guid":{"rendered":"https:\/\/simplify360.com\/blog\/2012\/07\/24\/top-challenges-of-social-media-data\/"},"modified":"2023-08-17T04:43:49","modified_gmt":"2023-08-17T04:43:49","slug":"top-challenges-of-social-media-data","status":"publish","type":"post","link":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/","title":{"rendered":"Top Challenges of Social Media Data"},"content":{"rendered":"<h2 style=\"font-size: 30px;\">Introduction<\/h2>\n<p>Social Media can be defined as a set of channels using which people can communicate in a many-to-many relationship as opposed to one-to-many relationship of the traditional media like Radio, Television or Magazines. Broadly speaking, Social Media consists of micro-blogging (like Twitter and\u00a0 Facebook), blogging (like WordPress and Blogger), forums (where users ask questions and post complaints), photo sharing (like Flickr) and video sharing (like YouTube). \u00a0With the growth of Social Media, so has grown the demand for getting meaningful information out of raw data. And to get this done, two different kinds of problems need to be solved. The data collection is the first problem and information retrieval the second. It is the second kind of problem that this article tries to address \u2013 the challenges faced in Social Media Data Analytics.<\/p>\n<p style=\"text-align: center;\">\n<h2>What data analytics are we talking about?<\/h2>\n<p>Let\u2019s take a case where data for a new brand is collected from Twitter, Facebook, Blogger and Youtube. When this data is viewed in its raw form, it appears like a mosaic. A trained analyst can immediately see patterns in it and form conclusions and business decisions based on it. Once in a while this exercise can be carried out but owing to its high costs, this might not be always feasible. Instead, a system can be made that partially emulates the work of the analyst. The system can do things like estimating the polarity of the message (or finding author\u2019s sentiment), it can list down what authors in a particular country are saying about the brand, it can list out the popular positive and negative topics related to the brand and it can also be used to get some fancy albeit very speculative features like guessing the age group and gender of the posters.<\/p>\n<h2>Machine Learning \u2013 Supervised or Unsupervised<\/h2>\n<p style=\"text-align: justify;\">Most of the challenges can be solved through the standard machine learning techniques. But unlike the other machine learning problems, the problems faced here are different. For instance, getting a good training data is extremely difficult. This means unsupervised learning approaches are hard to apply. Also, completely supervised learning approach can be tedious and time consuming to train. One good solution to this problem is as shown in [1] whereby the authors collect positive sentiment tweets by searching the emoticon \ud83d\ude42 and negative sentiment tweets by searching the emoticon :(. Once the data problem has been solved, training algorithms can then be chosen.<\/p>\n<p><a style=\"text-align: center;\" href=\"https:\/\/simplify360.com\/blog\/wp-content\/uploads\/2012\/07\/MP900406576.jpg\"><img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter size-full wp-image-989\" title=\"Test Tubes of Colored Liquid\" src=\"https:\/\/simplify360.com\/blog\/wp-content\/uploads\/2012\/07\/MP900406576.jpg\" alt=\"\" width=\"461\" height=\"307\" \/><\/a><\/p>\n<p>At this point, it is also important to note that rather than using a very complex algorithm with long training time and high accuracy, it might be better to use a simple algorithm that trains fast with moderate accuracy and better runtime. A more desirable trait of the algorithm is that it should adapt to the feedback generated from the user for incorrectly tagged messages, and for that it should be quicker to train. Hence simple algorithms like Na\u00efve Bayes and Maximum Entropy are also good candidates, while most tree based algorithms are not so well suited, as they are over sensitive to bad data.<\/p>\n<h2>Storage Requirement \u2013 Speed and Volume<\/h2>\n<p>There are systems that solve the data volume problems very elegantly like Google\u2019s Big Data and its implementation as Hadoop and Hbase. And yet other systems which solve the speed problem by redundancy storage and indexing like RDBMS systems like MySql. However both of these are not quite suitable for Social Media data storage as it needs both the speedy retrieval and should support a large volume. It is quite common as a requirement for such systems to handle a data rate of a hundred data entries per second and their realtime retrieval. Indexing solutions like Lucene and Solr handle this quite well and are well suited for the task. Expensive RDBMS systems like Oracle and MsSql Server might also solve the task.<\/p>\n<h2>Language Handling<\/h2>\n<p>This is not a difficult task to solve, as once a system has been prepared for English the same can be repeated for other languages. The task is however a tedious and big. The training models have to be prepared in such a way that for any message, language has to be determined first and then the model corresponding to the detected language should be applied over the message. This means that any learning model in the system should be trained for all the required languages.<\/p>\n<p><a href=\"https:\/\/simplify360.com\/blog\/wp-content\/uploads\/2012\/07\/ml-result.png\"><img decoding=\"async\" class=\"alignright size-full wp-image-992\" title=\"ml-result\" src=\"https:\/\/simplify360.com\/blog\/wp-content\/uploads\/2012\/07\/ml-result.png\" alt=\"\" width=\"259\" height=\"185\" \/><\/a><\/p>\n<h2>Picture and Video<\/h2>\n<p>This is a desirable feature of a system \u2013 to get the meaningful data from picture and video. This poses a new challenge because such processing is CPU extensive and cannot be done in realtime. So this can be achieved by a separate subsystem possibly utilizing the Map-Reduce architecture. This has been wonderfully solved by YouTube to check for copyright infringements in uploaded video, see [2].<\/p>\n<p>In summary, Social Media Data has a huge scope in terms of innovations. A lot can be done on this field to connect social media data to the real business value. We are only seeing the start of it.<\/p>\n<p>References:<\/p>\n<p>[1] <a href=\"http:\/\/www.stanford.edu\/~alecmgo\/papers\/TwitterDistantSupervision09.pdf\"><em>http:\/\/www.stanford.edu\/~alecmgo\/papers\/TwitterDistantSupervision09.pdf<\/em><\/a><\/p>\n<p>[2] <a href=\"http:\/\/www.ted.com\/talks\/margaret_stewart_how_youtube_thinks_about_copyright.html\"><em>http:\/\/www.ted.com\/talks\/margaret_stewart_how_youtube_thinks_about_copyright.html<\/em><\/a><\/p>\n<p>Image Source : <em>filosophy.org<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Social Media can be defined as a set of channels using which people can communicate in a many-to-many relationship as opposed to one-to-many relationship of the traditional media like Radio, Television or Magazines. Broadly speaking, Social Media consists of micro-blogging (like Twitter and\u00a0 Facebook), blogging (like WordPress and Blogger), forums (where users ask questions &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/\"> <span class=\"screen-reader-text\">Top Challenges of Social Media Data<\/span> Read More \u00bb<\/a><\/p>\n","protected":false},"author":13,"featured_media":3219,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"default","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","footnotes":""},"categories":[1643,1685],"tags":[1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991],"class_list":["post-10875","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-social-crm","category-social-media-analytics","tag-big-data","tag-hadoop","tag-hbase","tag-machine-learning-in-social-media","tag-mysql","tag-oracle","tag-rdbms","tag-sentiment-analysis-of-conversation","tag-social-media-data","tag-supervised-machine-learning","tag-unsupervised-machine-learning"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Top Challenges of Social Media Data - Simplify 360<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Top Challenges of Social Media Data - Simplify 360\" \/>\n<meta property=\"og:description\" content=\"Introduction Social Media can be defined as a set of channels using which people can communicate in a many-to-many relationship as opposed to one-to-many relationship of the traditional media like Radio, Television or Magazines. Broadly speaking, Social Media consists of micro-blogging (like Twitter and\u00a0 Facebook), blogging (like WordPress and Blogger), forums (where users ask questions &hellip; Top Challenges of Social Media Data Read More \u00bb\" \/>\n<meta property=\"og:url\" content=\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/\" \/>\n<meta property=\"og:site_name\" content=\"Simplify 360\" \/>\n<meta property=\"article:published_time\" content=\"2012-07-24T05:18:16+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-17T04:43:49+00:00\" \/>\n<meta name=\"author\" content=\"Simplify360\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/\"},\"author\":{\"name\":\"Simplify360\",\"@id\":\"https:\/\/simplify360.com\/blog\/#\/schema\/person\/90bc4f8d55a2c63512c7e124a2007967\"},\"headline\":\"Top Challenges of Social Media Data\",\"datePublished\":\"2012-07-24T05:18:16+00:00\",\"dateModified\":\"2023-08-17T04:43:49+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/\"},\"wordCount\":863,\"publisher\":{\"@id\":\"https:\/\/simplify360.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#primaryimage\"},\"thumbnailUrl\":\"\",\"keywords\":[\"Big data\",\"Hadoop\",\"Hbase\",\"Machine Learning in Social Media\",\"MySQL\",\"Oracle\",\"RDBMS\",\"Sentiment Analysis of Conversation\",\"Social Media Data\",\"Supervised Machine Learning\",\"Unsupervised Machine Learning\"],\"articleSection\":[\"Social CRM\",\"Social Media Analytics\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/\",\"url\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/\",\"name\":\"Top Challenges of Social Media Data - Simplify 360\",\"isPartOf\":{\"@id\":\"https:\/\/simplify360.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#primaryimage\"},\"thumbnailUrl\":\"\",\"datePublished\":\"2012-07-24T05:18:16+00:00\",\"dateModified\":\"2023-08-17T04:43:49+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#primaryimage\",\"url\":\"\",\"contentUrl\":\"\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/simplify360.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Top Challenges of Social Media Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/simplify360.com\/blog\/#website\",\"url\":\"https:\/\/simplify360.com\/blog\/\",\"name\":\"Simplify360\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/simplify360.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/simplify360.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/simplify360.com\/blog\/#organization\",\"name\":\"Simplify360\",\"url\":\"https:\/\/simplify360.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/simplify360.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/simplify360.com\/blog\/wp-content\/uploads\/2023\/08\/cropped-simpify-360-logo.png\",\"contentUrl\":\"https:\/\/simplify360.com\/blog\/wp-content\/uploads\/2023\/08\/cropped-simpify-360-logo.png\",\"width\":800,\"height\":161,\"caption\":\"Simplify360\"},\"image\":{\"@id\":\"https:\/\/simplify360.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/simplify360.com\/blog\/#\/schema\/person\/90bc4f8d55a2c63512c7e124a2007967\",\"name\":\"Simplify360\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/simplify360.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/ef5f2955c2f7f2e929791ca3e8cb6ae021f05e4e6d53352524aad18145c995db?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/ef5f2955c2f7f2e929791ca3e8cb6ae021f05e4e6d53352524aad18145c995db?s=96&d=mm&r=g\",\"caption\":\"Simplify360\"},\"url\":\"https:\/\/simplify360.com\/blog\/author\/simplify360\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Top Challenges of Social Media Data - Simplify 360","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/","og_locale":"en_US","og_type":"article","og_title":"Top Challenges of Social Media Data - Simplify 360","og_description":"Introduction Social Media can be defined as a set of channels using which people can communicate in a many-to-many relationship as opposed to one-to-many relationship of the traditional media like Radio, Television or Magazines. Broadly speaking, Social Media consists of micro-blogging (like Twitter and\u00a0 Facebook), blogging (like WordPress and Blogger), forums (where users ask questions &hellip; Top Challenges of Social Media Data Read More \u00bb","og_url":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/","og_site_name":"Simplify 360","article_published_time":"2012-07-24T05:18:16+00:00","article_modified_time":"2023-08-17T04:43:49+00:00","author":"Simplify360","twitter_card":"summary_large_image","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#article","isPartOf":{"@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/"},"author":{"name":"Simplify360","@id":"https:\/\/simplify360.com\/blog\/#\/schema\/person\/90bc4f8d55a2c63512c7e124a2007967"},"headline":"Top Challenges of Social Media Data","datePublished":"2012-07-24T05:18:16+00:00","dateModified":"2023-08-17T04:43:49+00:00","mainEntityOfPage":{"@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/"},"wordCount":863,"publisher":{"@id":"https:\/\/simplify360.com\/blog\/#organization"},"image":{"@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#primaryimage"},"thumbnailUrl":"","keywords":["Big data","Hadoop","Hbase","Machine Learning in Social Media","MySQL","Oracle","RDBMS","Sentiment Analysis of Conversation","Social Media Data","Supervised Machine Learning","Unsupervised Machine Learning"],"articleSection":["Social CRM","Social Media Analytics"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/","url":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/","name":"Top Challenges of Social Media Data - Simplify 360","isPartOf":{"@id":"https:\/\/simplify360.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#primaryimage"},"image":{"@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#primaryimage"},"thumbnailUrl":"","datePublished":"2012-07-24T05:18:16+00:00","dateModified":"2023-08-17T04:43:49+00:00","breadcrumb":{"@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#primaryimage","url":"","contentUrl":""},{"@type":"BreadcrumbList","@id":"https:\/\/simplify360.com\/blog\/top-challenges-of-social-media-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/simplify360.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Top Challenges of Social Media Data"}]},{"@type":"WebSite","@id":"https:\/\/simplify360.com\/blog\/#website","url":"https:\/\/simplify360.com\/blog\/","name":"Simplify360","description":"","publisher":{"@id":"https:\/\/simplify360.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/simplify360.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/simplify360.com\/blog\/#organization","name":"Simplify360","url":"https:\/\/simplify360.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/simplify360.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/simplify360.com\/blog\/wp-content\/uploads\/2023\/08\/cropped-simpify-360-logo.png","contentUrl":"https:\/\/simplify360.com\/blog\/wp-content\/uploads\/2023\/08\/cropped-simpify-360-logo.png","width":800,"height":161,"caption":"Simplify360"},"image":{"@id":"https:\/\/simplify360.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/simplify360.com\/blog\/#\/schema\/person\/90bc4f8d55a2c63512c7e124a2007967","name":"Simplify360","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/simplify360.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/ef5f2955c2f7f2e929791ca3e8cb6ae021f05e4e6d53352524aad18145c995db?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ef5f2955c2f7f2e929791ca3e8cb6ae021f05e4e6d53352524aad18145c995db?s=96&d=mm&r=g","caption":"Simplify360"},"url":"https:\/\/simplify360.com\/blog\/author\/simplify360\/"}]}},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"Simplify360","author_link":"https:\/\/simplify360.com\/blog\/author\/simplify360\/"},"uagb_comment_info":0,"uagb_excerpt":"Introduction Social Media can be defined as a set of channels using which people can communicate in a many-to-many relationship as opposed to one-to-many relationship of the traditional media like Radio, Television or Magazines. Broadly speaking, Social Media consists of micro-blogging (like Twitter and\u00a0 Facebook), blogging (like WordPress and Blogger), forums (where users ask questions&hellip;","_links":{"self":[{"href":"https:\/\/simplify360.com\/blog\/wp-json\/wp\/v2\/posts\/10875","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/simplify360.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/simplify360.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/simplify360.com\/blog\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/simplify360.com\/blog\/wp-json\/wp\/v2\/comments?post=10875"}],"version-history":[{"count":0,"href":"https:\/\/simplify360.com\/blog\/wp-json\/wp\/v2\/posts\/10875\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/simplify360.com\/blog\/wp-json\/"}],"wp:attachment":[{"href":"https:\/\/simplify360.com\/blog\/wp-json\/wp\/v2\/media?parent=10875"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/simplify360.com\/blog\/wp-json\/wp\/v2\/categories?post=10875"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/simplify360.com\/blog\/wp-json\/wp\/v2\/tags?post=10875"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}