Overparametrized Linear Models

Posted on July 09 2017 in Statistics • Tagged with overparametrization, linear models, statistics, rLeave a comment

Surprisingly, quite a few data scientists overlook the importance of linear regression and the problem of overparametrization. In this post, I'm going to describe mathematically what it means for a model to be overparametrized and the general strategies used by a statistician to resolve non-uniqueness.

Ranks, Subspaces, and Bases

Suppose …

Continue reading

Bayesian Zero-Inflated Poisson Model

Posted on July 05 2017 in Bayesian Statistics • Tagged with zero inflated poisson, mcmc, bayesian statistics, statistics, rLeave a comment

Wikipedia defines

The zero-inflated Poisson model concerns a random event containing excess zero-count data in unit time. For instance, the number of insurance claims within a population for a certain type of risk would be zero-inflated by those people who have not taken out insurance against the risk and thus …

Continue reading

Foursquare Location-Content-Aware Recommender System

Posted on June 26 2017 in Bayesian Statistics • Tagged with latent dirichlet allocation, hierarchical bayes, recommender system, statistics, rLeave a comment

Foursquare uses its unique location technology and foot traffic panel to produce personalized recommendations of spatial items such as restaurants. Recently I've read a paper LCARS: A Spatial Item Recommender System that I implemented from scratch in R.

The recommender system combines the querying user's interest and the local preference …

Continue reading

Predicting Free Public WiFi Locations in Seoul with Point Process Models using Foursquare API

Posted on June 25 2017 in Spatial Statistics • Tagged with point process models, spatial statistics, rLeave a comment

Introduction

The field of spatial statistics is experiencing a period of rapid growth, though less so compared to that of machine learning and artificial intelligence, with the onset of location technology and efficient solution techniques for high performance computing, which has begun to replace classical frequentist inference procedures developed years …

Continue reading

Bayesian Approach to Ranking Movies

Posted on March 03 2017 in Bayesian Statistics • Tagged with ranking, recommender system, statistics, machine learning, rLeave a comment

Many people like watching movies, and I do too. Recently I've discovered this somewhat sketchy Korean site that hosts quite a few movies online. I call it sketchy because I almost always see ads asking me to download a piece of software that's supposed to protect my computer from a …

Continue reading

Estimating the Total Daily Number of Customers at Bon Me

Posted on January 30 2016 in Bayesian Statistics • Tagged with bayesian statistics, hierarchical bayes, monte carlo simulation, statistics, machine learning, r, jagsLeave a comment

Bon Me is one of my favorite places to grab a quick bite for lunch. Surprisingly I didn't know about its existence until earlier last year despite being so close to my workplace. I really recommend you try miso-braised pulled pork on brown rice without cilantro. It's really addicting.

Anyhow …

Continue reading

Predicting the Fitbit Challenge Winner with Hierarchical Bayes

Posted on November 28 2015 in Bayesian Statistics • Tagged with bayesian statistics, hierarchical bayes, monte carlo simulation, statistics, machine learning, r, jagsLeave a comment

Hi there! It's been a little more than a month since my last post. Thanksgiving was two days ago, and I had a good time with my family. How's everyone doing?

With the upcoming Cyber Monday, I'm sure many of you have cool electronics and gadgets in mind to look …

Continue reading

Automatic Text Summarization

Posted on September 10 2015 in Natural Language Processing • Tagged with text summarization, natural language processing, machine learning, rLeave a comment

Background

Automatic text summarization is an area of machine learning that has made significant progress over the past years. We read hundreds and thousands of articles either on our desktop, tablet, or mobile devices, and we simply don't have the time to peruse all of them. As such problem of …

Continue reading