data-newsletter-5 data-science best-practices
5 Tips for moving your Data Science Operation to the next Level
26 Sep 2017 Principles for disciplined data science include: Discoverability, Automation, Collaboration, Empowerment and Deployment.
recommender-systems data-newsletter-5
Highlights of the Talk with Dr. Konstan on Recommender Systems
24 Sep 2017 Some highlights of the Podcast Episode with Dr. Joseph Konstan on interesting topics related to Recommender Systems. Discussed topics include serendipity, serpentining, diversity and temporal effects.
python data-visualization plotting
Seaborn by Example: Data Visualization and Plotting using Python
09 Sep 2017 Seaborn is a higher-level interface to Matplotlib. It has a more convenient API and has useful data visualization functions right out of the box.
Read More ›data-newsletter-5 data-science
Data Provenance: Quick Summary + Reasons Why
07 Sep 2017 Data Provenance (also called Data Lineage) is version control for data. It refers to keeping track of modifications to datasets you use and train models on. This is crucial in data science projects if you need to ensure data quality and reproducibility.
technology social-tagging
Thoughts on Engaging Users in Social Tagging Systems
05 Sep 2017 Some thoughts on how social tagging systems can foster user engagement with appropriate incentives.
Read More ›data-newsletter-4 recommender-systems
Lessons from the Netflix Prize: Changing Requirements and Cost-Effectiveness
04 Sep 2017 Netflix never really used the #1 winning solution to the Netflix Challenge. Some of the reasons were that just wasn't cost-effective to implement the full thing and another was that requirements had changed.
Read More ›data-newsletter-4 kaggle data-science
Winning Solutions Overview: Kaggle Instacart Competition
04 Sep 2017 The Instacart "Market Basket Analysis" competition focused on predicting repeated orders based upon past behaviour. Among the best-ranking solutings, there were many approaches based on gradient boosting and feature engineering and one approach based on end-to-end neural networks.
Read More ›technology data-newsletter-4 machine-learning
A Quick Summary of Ensemble Learning Strategies
01 Sep 2017 Ensemble learning refers to mixing the outputs of several classifiers in various ways, so as to get a better result than each classifier individually.
Read More ›technology data-newsletter-4 machine-learning model-evaluation
Evaluation Metrics for Classification Problems: Quick Examples + References
31 Aug 2017 There are multiple ways to measure your model's performance in machine learning, depending upon what objectives you have in mind. Some of the most important are Accuracy, Precision, Recall, F1 and AUC.
Pandas for Large Data: Examples and Tips
13 Aug 2017 In order to successfully work with large data on Pandas, there are some ways to reduce memory usage and make sure you get good speed performance.
Read More ›