Entries by tag: data-science

Including child/synonym tags

Scikit-Learn Pipeline Examples  21 Oct 2017    scikit-learn
Examples of how to use classifier pipelines on Scikit-learn. Includes examples on cross-validation regular classifiers, meta classifiers such as one-vs-rest and also keras models using the scikit-learn wrappers. Read More ›

Kaggle NYC Taxi Trips Competition: Overview and Results  17 Oct 2017    kaggle
Overview of Kaggle competition: New York City Taxi Trip Duration. Read More ›

Pandas DataFrame by Example: GroupBy Examples  11 Oct 2017    pandas groupby
Examples of specific ways to do what you want using groupby on Pandas Dataframes. Read More ›

Scaling Data Teams  09 Oct 2017    data-science data-newsletter-5
Needs of data teams are mostly around data access and sharing; Columnar databases are often more efficient for analytics; MS Excel is useful at many scales; Stakeholder communication is important to make your work more relevant; Use metrics to get to know how data products are being used. Read More ›

Paper Summary: Recursive Neural Language Architecture for Tag Prediction  04 Oct 2017    paper-summary tags neural-nets embeddings
Summary of the 2016 article "Recursive Neural Language Architecture for Tag Prediction" by Kataria. Read More ›

Paper Summary: Translating Embeddings for Modeling Multi-relational Data  30 Sep 2017    embeddings structure paper-summary neural-networks
Summary of the 2013 article "Translating Embeddings for Modeling Multi-relational Data" by Bordes et al. Read More ›

5 Tips for moving your Data Science Operation to the next Level  25 Sep 2017    data-newsletter-5 data-science best-practices
Principles for disciplined data science include: Discoverability, Automation, Collaboration, Empowerment and Deployment. Read More ›

Data Provenance: Quick Summary + Reasons Why  07 Sep 2017    data-newsletter-5 data-science
Data Provenance (also called Data Lineage) is version control for data. It refers to keeping track of modifications to datasets you use and train models on. This is crucial in data science projects if you need to ensure data quality and reproducibility. Read More ›

Winning Solutions Overview: Kaggle Instacart Competition  03 Sep 2017    data-newsletter-4 kaggle data-science
The Instacart "Market Basket Analysis" competition focused on predicting repeated orders based upon past behaviour. Among the best-ranking solutings, there were many approaches based on gradient boosting and feature engineering and one approach based on end-to-end neural networks. Read More ›

A Quick Summary of Ensemble Learning Strategies  01 Sep 2017    data-newsletter-4 machine-learning
Ensemble learning refers to mixing the outputs of several classifiers in various ways, so as to get a better result than each classifier individually. Read More ›

Evaluation Metrics for Classification Problems: Quick Examples + References  31 Aug 2017    data-newsletter-4 machine-learning
There are multiple ways to measure your model's performance in machine learning, depending upon what objectives you have in mind. Some of the most important are Accuracy, Precision, Recall, F1 and AUC. Read More ›

Pandas for Large Data  13 Aug 2017    data-newsletter-4 pandas performance
In order to successfully work with large data on Pandas, there are some ways to reduce memory usage and make sure you get good speed performance. Read More ›

Python Pickling for Data Science: Examples and Tips on how to use Pickle as part of your Data Work  12 Jul 2017    python pickle data-science
Pickle is a well-known Python tool for saving arbitrary variable contents into file. Here are a couple of examples and tips on how you can use it to make your data science work more efficient and easily reproducible. Read More ›

Machine Learning and Data Science: Generally Applicable Tips and Tricks  18 May 2017    machine-learning data-science best-practices
A couple of general, practical tips and tricks that may be used when dealing with data science and/or machine learning problems. Read More ›

Data-related Job Descriptions: Making of a Data Team  19 Mar 2017    data-science
A simple description of some common job titles / positions of may come across when looking at the data work landscape. See what positions may be best suited for yourself and your company. Read More ›

Scikit-Learn Cheatsheet: Reference and Examples  10 Mar 2017    wip scikit-learn
Just a couple of things you may find yourself doing over and over again when working with scikit-learn. Read More ›

Tricks for Training Neural Nets Faster  20 Feb 2017    wip neural nets
Tricks and Practical tips for training neural nets faster. Credit is mostly to Geoff Hinton and Yann LeCun. Read More ›

Pandas DataFrame by Example  15 Dec 2015    pandas python
Lots of examples of ways to use one of the most versatile data structures in the whole Python data analysis stack. Learn how to slice and dice, select and perform commonly used operations on DataFrames. Read More ›

One-Hot Encoding a Feature on a Pandas Dataframe: an Example  27 Nov 2015    pandas
One-hot encoding is a simple way to transform categorical features into vectors that are easy to deal with. Learn how to do this on a Pandas DataFrame. Read More ›