At InRev, we are always pushing the boundaries to bring intelligence to our systems and products. Simplify360 is obviously one of our best enterprise application doing just that. In addition to integration with multiple third party application’s API’s, major portion of our time in spent on crunching those raw data with sharp algorithms to bring insights out of it.
So let me share the basics of intelligence behind some of our analytical engines. The algorithm used by Simplify360’s automatic classifier is developed in house by the team.
Sentiment Classifier
The sentiment classifier is based on the textual analysis using unigram (single word) and bigram (word pair) features and calculate the overall sentiment using the algorithms – SVM, Maximum Entropy and Naive Bayes.
If all three algorithms agree on a common sentiment (i.e. either all positive or all negative), then the system classifies with corresponding sentiment (positive/negative). The accuracy we observe is greater than 75%.
Gender and Age Classifier
The Gender classification is based on a hybrid method. The first method maps the name to the internal database and classifies the gender.
The second method is based on basic heuristics to predict whether the name belongs to male or female. An example is that the names ending with a consonant is 80% male. If there is no indication of user’s names (like in blogs), the system uses textual analysis to determine gender.
Textual analysis is used to determine age group. There are certain features which are specific to a certain age group. Example, teenagers use a lot of short forms as well as words like school, teacher,homework etc., mid-age groups use children, marriage etc. Our classifier works relying on these features.
Accuracy is about 60% for age and 70% for gender analysis. We are still working to improve the model.
Brand Influencer
In addition to klout score, which is a third party API (only for volume based systems), the system calculates a score based on various parameters of the profile. Currently, the system only classifies the influence of the Twitter Profile. The system considers the age of account, number of tweets made, the follower/following ration, and the retweets received etc.