Part-of-Speech Tagging with Trigram Hidden Markov Models and the Viterbi Algorithm

Posted on June 07 2017 in Natural Language Processing • Tagged with pos tagging, markov chain, viterbi algorithm, natural language processing, machine learning, pythonLeave a comment

Hidden Markov Model

The hidden Markov model or HMM for short is a probabilistic sequence model that assigns a label to each unit in a sequence of observations. The model computes a probability distribution over possible sequences of labels and chooses the best label sequence that maximizes the probability of …

Continue reading

Multilevel URL Community Detection with Infomap

Posted on March 13 2017 in Machine Learning • Tagged with community detection, network, graph, machine learning, pythonLeave a comment

Recently I've discovered a dataset online that is a collection of 20M web queries from about 650K users over three months from March 1 to May 31 2006.

The columns are listed as follows:

  • User ID - an anonymous user ID number
  • Query - the query submitted by the user
  • Query Time …
Continue reading

Bayesian Approach to Ranking Movies

Posted on March 03 2017 in Bayesian Statistics • Tagged with ranking, recommender system, statistics, machine learning, rLeave a comment

Many people like watching movies, and I do too. Recently I've discovered this somewhat sketchy Korean site that hosts quite a few movies online. I call it sketchy because I almost always see ads asking me to download a piece of software that's supposed to protect my computer from a …

Continue reading

Estimating the Total Daily Number of Customers at Bon Me

Posted on January 30 2016 in Bayesian Statistics • Tagged with bayesian statistics, hierarchical bayes, monte carlo simulation, statistics, machine learning, r, jagsLeave a comment

Bon Me is one of my favorite places to grab a quick bite for lunch. Surprisingly I didn't know about its existence until earlier last year despite being so close to my workplace. I really recommend you try miso-braised pulled pork on brown rice without cilantro. It's really addicting.

Anyhow …

Continue reading

Predicting the Fitbit Challenge Winner with Hierarchical Bayes

Posted on November 28 2015 in Bayesian Statistics • Tagged with bayesian statistics, hierarchical bayes, monte carlo simulation, statistics, machine learning, r, jagsLeave a comment

Hi there! It's been a little more than a month since my last post. Thanksgiving was two days ago, and I had a good time with my family. How's everyone doing?

With the upcoming Cyber Monday, I'm sure many of you have cool electronics and gadgets in mind to look …

Continue reading

Automatic Text Summarization

Posted on September 10 2015 in Natural Language Processing • Tagged with text summarization, natural language processing, machine learning, rLeave a comment

Background

Automatic text summarization is an area of machine learning that has made significant progress over the past years. We read hundreds and thousands of articles either on our desktop, tablet, or mobile devices, and we simply don't have the time to peruse all of them. As such problem of …

Continue reading

Clustering Burger King Menu with the Dirichlet Process

Posted on August 18 2015 in Machine Learning • Tagged with clustering, machine learning, burger kingLeave a comment

an interesting graph

Burger King Cluster Sample

I was going to write part two of the previous post on A/B testing now using Bayesian methods, but I plan to do that in another time since today I'm going to write about clustering, a widely used machine learning technique, specifically clustering Burger King menu …

Continue reading

A/B Testing Part I

Posted on April 03 2015 in Statistics • Tagged with a/b testing, bayesian statistics, statistics, machine learningLeave a comment

This is going to be my first post on a topic in data science, and in the next few posts including this one, I will talk about A/B testing, specifically how to do it right using Bayesian methods in comparison with the traditional Frequentist hypothesis testing.

the basic concept …

Continue reading