queirozf.com

Paper Summary: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 19 Apr 2025 paper-summary language-modeling reinforcement-learning

Summary of the 2025 article "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" by DeepSeek AI. Read More ›

Paper Summary: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 06 Apr 2025 paper-summary reinforcement-learning language-modeling

Summary of the 2024 article "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" by Shao et al. Read More ›

Paper Summary: Proximal Policy Optimization Algorithms 06 Apr 2025 paper-summary reinforcement-learning language-modeling

Summary of the 2017 article "Proximal Policy Optimization Algorithms" by Schulman et al. Read More ›

Paper Summary: Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study 06 Oct 2024 paper-summary alignment instruction-tuning

Summary of the 2024 article "Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study" by Xu et al. Read More ›

Paper Summary: The Science of Detecting LLM-Generated Texts 28 Jul 2024 paper-summary language-modeling

Summary of the 2023 article "The Science of Detecting LLM-Generated Texts" by Tang et al. Read More ›

Paper Summary: Multitask Prompted Training Enables Zero-Shot Task Generalization 31 Mar 2024 paper-summary instruction-tuning language-modeling

Summary of the 2021 article "Multitask Prompted Training Enables Zero-Shot Task Generalization" by Sahn et al. AKA the T0 (T-zero) article Read More ›

Paper Summary: Constitutional AI 16 Nov 2023 paper-summary instruction-tuning language-models

Summary of the 2022 article "Constitutional AI" by Anthropic. Read More ›

Paper Summary: Llama 2: Open Foundation and Fine-Tuned Chat Models 01 Aug 2023 paper-summary instruction-following language-modeling

Summary of the 2023 article "Llama 2: Open Foundation and Fine-Tuned Chat Models" by Touvron et al. Read More ›

Paper Summary: Fine-tuned Language models are Zero-Shot Learners 02 Jul 2023 paper-summary instruction-following

Summary of the 2022 article "Fine-tuned Language models are Zero-Shot Learners" by Wei et al, aka the FLAN article. Read More ›

Paper Summary: Direct Preference Optimization: Your Language Model is Secretly a Reward Model 23 Jun 2023 paper-summary instruction-following

Summary of the 2023 article "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" by Rafailov et al. Read More ›

Paper Summary: Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling 18 Jun 2023 paper-summary language-models

Summary of the 2023 article "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling" by Biderman et al. Read More ›

Paper Summary: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention 04 Jun 2023 paper-summary language-modeling instruction-following

Summary of the 2023 article "LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention" by Zhang et al. Read More ›

Paper Summary: LLaMA: Open and Efficient Foundation Language Models 04 Jun 2023 paper-summary llms

Summary of the 2023 article "LLaMA: Open and Efficient Foundation Language Models" by Touvron et al. Read More ›

Paper Summary: Self-instruct: Aligning Language Models with Self-generated Instructions 03 Jun 2023 paper-summary language-modeling alignment

Summary of the 2022 article "Self-instruct: Aligning Language Models with Self-generated Instructions" by Wang et al. Read More ›

Paper Summary: Training language models to follow instructions with human feedback 05 Feb 2023 paper-summary language-models alignment

Summary of the 2022 article "Training language models to follow instructions with human feedback" by Ouyang et al. AKA the InstructGPT article Read More ›

Paper Summary: Language Models are Few-Shot Learners 01 Jan 2023 paper-summary language-models

Summary of the 2020 article "Language Models are Few-Shot Learners" by Brown et al. AKA the GPT-3 Paper. Read More ›

Paper Summary: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 01 Jan 2023 paper-summary language-models

Summary of the 2018 article "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin et al. Read More ›

Paper Summary: Long Short-Term Memory-Networks for Machine Reading 25 Dec 2022 paper-summary attention sequence-learning

Summary of the 2016 article "Long Short-Term Memory-Networks for Machine Reading" by Cheng et al. AKA the "Self-attention" article Read More ›

Pandas Fillna Examples: Filling in Missing Data 16 Oct 2022 pandas

Examples on the most common ways you will find yourself using fillna and related functions in pandas. Read More ›

Pandas Dataframe examples: Plotting Histograms 31 Jul 2022 matplotlib pandas

Several examples on how to draw histograms based on pandas dataframes. Read More ›

Pandas Examples: Looping over Dataframe Rows 13 Jun 2022 pandas

Everything you need to know about how to loop and/or iterate over rows in a pandas dataframe, as efficiently as possible. Read More ›

Pandas Examples: Plotting Date/Time data with Matplotlib/Pyplot 24 Apr 2022 pandas matplotlib

Examples on how to plot time-series or general date or time data from a pandas dataframe, using matplotlib behind the scenes. Read More ›

Paper Summary: Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer 29 Aug 2021 paper-summary natural-language-processing

Summary of the 2020 article "Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer" by Raffel et al. AKA the T5 article. Read More ›

Paper Summary: Identifying Mislabeled Instances in Classification Datasets 28 Jun 2021 paper-summary machine-learning-engineering machine-learning

Summary of the 2019 article "Identifying Mislabeled Instances in Classification Datasets" by Mueller and Markert. Read More ›

Pandas Dataframe Examples: Styling Cells and Conditional Formatting 09 May 2021 python pandas

Some examples on how to highlight and style cells in pandas dataframes when some criteria is met. Useful for analytics and presenting data. Read More ›

Normalize Text for Natural Language Processing Tasks: Reference and Examples 02 May 2021 nlp preprocessing python

A couple of common preprocessing tasks you need in order to be able to use raw text in NLP tools. Read More ›

Paper Summary: The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets 29 Mar 2021 paper-summary model-evaluation

Summary of the 2015 article "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets" by Saito and Hemsmeier. Read More ›

Pandas Dataframes: Apply Examples 26 Sep 2020 pandas

Examples on how to use pandas apply, on columns, dataframes, etc, with best practices and warnings about performance. Read More ›

11 Types of Data Products, with Examples 22 Sep 2020 product-management data-science data-products

Here is a list of data products you can build using various types of data science methods. Includes use cases and main techniques for each. Read More ›

Paper Summary: Improving Language Understanding by Generative Pre-Training 11 Sep 2020 paper-summary natural-language-processing sequence-learning transformer-architecture

Summary of the 2018 article "Improving Language Understanding by Generative Pre-Training" by Radford et al. Read More ›

Paper Summary: ULMFIT: Universal Language Model Fine-tuning for Text Classification 22 Jul 2020 paper-summary natural-language-processing embeddings sequence-learning

Summary of the 2018 article "ULMFIT: Universal Language Model Fine-tuning for Text Classification" by Howard and Ruder. Read More ›

Paper Summary: Attention is All you Need 27 Jun 2020 paper-summary sequence-learning attention transformer-architecture

Summary of the 2017 article "Attention is All you Need" by Vaswani et al. Read More ›

Project Review: Text Classification of Legal Documents (Another one) 25 Apr 2020 project-review natural-language-processing

Short review with lessons learned for a contract project worked on during early 2020. The aim of the project was to classify documents into classes, with some peculiarities and specific rules. Read More ›

Pandas Display Options: Examples and Reference 24 Mar 2020 pandas

Variety of examples on how to set display options on Pandas, to control things like the number of rows, columns, number formatting, etc. Especially useful for working in Jupyter notebooks. Read More ›

Pandas Dataframes: CSV Quoting and Escaping Strategies 24 Mar 2020 pandas

Reading and writing pandas dataframes to CSV files in a way that's safe and avoiding problems due to quoting, escaping and encoding issues. Read More ›

Paper Summary: Hidden Technical Debt in Machine Learning Systems 23 Mar 2020 paper-summary machine-learning-engineering technical-debt

Summary of the 2015 article "Hidden Technical Debt in Machine Learning Systems" by Sculley et al. Read More ›

Scikit-learn Pipelines: Custom Transformers and Pandas integration 08 Mar 2020 pandas scikit-learn

Examples and reference on how to write customer transformers and how to create a single sklearn pipeline including both preprocessing steps and classifiers at the end, in a way that enables you to use pandas dataframes directly in a call to fit. Read More ›

Numpy Sampling: Reference and Examples 07 Mar 2020 numpy statistics

Sample from probability distributions and from lists, with and without weights. Examples using Python, Numpy and Scipy. Read More ›

Paper Summary: Software Engineering for Machine Learning: A Case Study 25 Jan 2020 paper-summary machine-learning-engineering software-engineering

Summary of the 2019 article "Software Engineering for Machine Learning: A Case Study" by Amershi et al. Read More ›

Paper Summary: Neural Machine Translation by Jointly Learning to Align and Translate 11 Jan 2020 paper-summary attention sequence-learning machine-translation

Summary of the 2014 article "Neural Machine Translation by Jointly Learning to Align and Translate" by Bahdanau et al. Read More ›

Paper Summary: Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift 23 Dec 2019 paper-summary machine-learning-engineering

Summary of the 2019 article "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift" by Rabanser et al. Read More ›

Pandas Dataframe Examples: Duplicated Data 17 Nov 2019 pandas

Deal with duplicated data in pandas: drop, count, show and mark duplicates in pandas dataframes. Read More ›

Paper Summary: Long Short-Term Memory 16 Nov 2019 paper-summary neural-networks sequence-learning

Summary of the 1997 article "Long Short-Term Memory" by Hochreiter and Schmidhuber. Read More ›

Paper Summary: 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com 09 Nov 2019 paper-summary machine-learning-engineering

Summary of the 2019 article "150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com" by Bernardi et al. Read More ›

Using Command-line Tools for Text Data Preprocessing: Examples and Reference 09 Nov 2019 gnu macos unix linux command-line data-science

Use native command-line tools for common tasks related to text preprocessing, like stripping bad characters, normalizing whitespace/newlines, replacing regular expressions, text normalization, etc. They're very fast and work surprisingly well. Read More ›

Paper Summary: TextRank: Bringing Order into Texts 16 Sep 2019 paper-summary natural-language-processing

Summary of the 2004 article "TextRank: Bringing Order into Texts" by Mihalcea and Tarau. Read More ›

People Skills for Data Science Projects: Lessons Learned 14 Sep 2019 data-science project-work

See a project go from start to finish, know how to create value with data science and machine learning. Read More ›

Paper Summary: Language Models are Unsupervised Multitask Learners 31 Aug 2019 paper-summary language-models

Summary of the 2019 article "Language Models are Unsupervised Multitask Learners" by Radford et al. AKA the GPT-2 Article. Read More ›

Pandas Indexing Examples: Accessing and Setting Values on DataFrames 21 Aug 2019 pandas dataframes

Some common ways to access rows in a pandas dataframe, includes label-based (loc) and position-based (iloc) accessing. Read More ›

Choosing C Hyperparameter for SVM Classifiers: Examples with Scikit-Learn 20 Jun 2019 scikit-learn svm

Analysis of the effect of the C parameter on learning SVM models under a noisy data regime. With examples using the Python Library Scikit-learn. Read More ›

Michelangelo Palette Overview 08 Jun 2019 machine-learning-engineering

Overview of Palette, the feature store system that is part of Uber's Michelangelo Mahcine Learning Platform. Based off the talk given at qcon.ai. Read More ›

Helping Data Science Projects Succeed: 5 Tips on how to Avoid Becoming a Statistic 01 Jun 2019 projects data-science project-work

5 real-world tips to help you avoid failures in data science projects. Suitable for both practitioners and project leads. Read More ›

Pandas Dataframe Examples: String Functions 01 Jun 2019 pandas

Pandas exposes a series of string methods that you can use on Series that contain string objects. These are useful for filtering dataframes among other uses. Read More ›

Paper Summary: Scaling Distributed Machine Learning with the Parameter Server 25 May 2019 paper-summary machine-learning-engineering distributed-computing

Summary of the 2014 article "Scaling Distributed Machine Learning with the Parameter Server" by Li et al. Read More ›

Pandas Dataframe Examples: Create and Append data 25 Mar 2019 pandas

Examples on how to create dataframes, using lists, dicts and creating empty dataframes then initializing it with data. Read More ›

The Calibration-Accuracy Plot: Introduction and Examples 17 Mar 2019 data-science calibration

Model scores don't always tell the whole story. It is much easier to interpret the outputs of machine learning models when the scores are well-calibrated probabilities. When a model's scores match probabilities, it is said that that model is well-calibrated. Read More ›

Pandas Time Series Examples: DatetimeIndex, PeriodIndex and TimedeltaIndex 10 Mar 2019 datetime pandas time-series

How and when to use special pandas Indexes such as DatetimeIndex, PeriodIndex and TimedeltaIndex. These will help you deal with and perform simple operations on time-series data. Read More ›

Pandas Concepts: Reference and Examples 10 Mar 2019 pandas

Short explanations with examples on the main concepts you'll find when using the Pandas library. Read More ›

Evaluation Metrics for Ranking problems: Introduction and Examples 24 Jan 2019 machine-learning model-evaluation

Explanation and examples on how to calculate the performance of ranked predictions for machine learning. Read More ›

Pandas Dataframe Examples: Manipulating Date and Time 15 Jan 2019 pandas datetime

Some examples on how to manipulate dates and times in pandas Dataframes, perform date arithmetic, etc. Read More ›

Paper Summary: The Tradeoffs of Large Scale Learning 15 Dec 2018 paper-summary machine-learning

Summary of the 2007 article "The Tradeoffs of Large Scale Learning" by Bottou and Bousquet. Read More ›

Pandas Dataframe Examples: Column Operations 09 Dec 2018 pandas dataframes

Examples on how to modify pandas DataFrame columns, append columns to dataframes and otherwise transform individual columns. Read More ›

Quick Summary + Thoughts on BigHead: AirBNB's ML Platform 03 Dec 2018 ml-platforms

Notes on AirBNB's Bighead ML platform, based off videos and presentations. Read More ›

Thoughts on Michelangelo: Uber's Machine Learning Platform 20 Nov 2018 machine-learning-platforms

Reading and dissecting the way Uber does Machine Learning. Read More ›

Project Review: Text Classification of Legal Documents 02 Nov 2018 project-review natural-language-processing

Lessons learned from a data science project. Read More ›

Paper Summary: Statistical Modeling: The Two Cultures 02 Nov 2018 paper-summary machine-learning

Summary of the 2001 article "Statistical Modeling: The Two Cultures" by Leo Breiman. Read More ›

Risk in Machine Learning Models 06 Sep 2018 data-science

Machine Learning models can make actual decisions that affect your business. However, things can go wrong, which introduces risk that must be dealt with. Read More ›

Heads-up for Deploying Scikit-learn Models to Production: Quick Checklist 01 Sep 2018 scikit-learn production machine-learning-engineering

A couple of tips for addressing common problems and unexpected situations when using scikit-learn models in production.. Read More ›

Cross-Validation Examples with Scikit-Learn 01 Sep 2018 scikit-learn

Using cross-validation within scikit-learn. Read More ›

Mutate for Pandas Dataframes: Examples with Assign 15 Jul 2018 pandas

Assign is a function that mutates a dataframe in place and can be used for chained operations. Read More ›

Pandas Query Examples: SQL-like queries in dataframes 05 Jul 2018 pandas

Use SQL-like syntax to perform in-place queries on pandas dataframes. Read More ›

Paper Summary: Multi-Label Classification on Tree- and DAG-Structured Hierarchies 02 Jul 2018 paper-summary multi-label structured-learning hierarchical-learning natural-language-processing

Summary of the 2011 article "Multi-Label Classification on Tree- and DAG-Structured Hierarchies" by Bi and Kwok. Read More ›

Paper Summary: The Natural Language Decathlon: Multitask Learning as Question Answering 30 Jun 2018 paper-summary natural-language-processing

Summary of the 2018 article "The Natural Language Decathlon: Multitask Learning as Question Answering" by McCann et al. Read More ›

Example Project Template: Serve a Scikit-learn Model via a Flask API 27 Jun 2018 flask scikit-learn

Full (albeit simple) example on how to create a simple Flask API to serve predictions using a pre-trained scikit-learn model. Includes supporting features such as logging, error handling, input validation, etc. Full code available on Github. Read More ›

Pandas Dataframe: Union and Concat Examples 14 Jun 2018 pandas

Emulate SQL union and union all behaviour, among other stuff. Read More ›

Evaluation Metrics for Regression Problems: Quick examples + Reference 26 May 2018 machine-learning metrics

Regression problems are evaluated against specific metrics that analyze whether the residuals (difference between actual and predicted values) indicate that a fitted model is a good fit for the data. Here are some of the most commonly-used metrics in that domain. Read More ›

Paper Summary: A Simple but Tough-to-beat Baseline for Sentence Embeddings 13 May 2018 paper-summary embeddings compositionality natural-language-processing

Summary of the 2017 article "A Simple but Tough-to-beat Baseline for Sentence Embeddings" by Arora et al. Read More ›

Scikit-Learn examples: Making Dummy Datasets 02 May 2018 scikit-learn

Make dummy datasets to test out classifiers and/or parameter configurations in Scikit-learn. Read More ›

Paper Summary: Context is Everything: Finding Meaning Statistically in Semantic Spaces 01 May 2018 paper-summary compositionality embeddings natural-language-processing

Summary of the 2018 article "Context is Everything: Finding Meaning Statistically in Semantic Spaces" by Zelikman, where the author introduces CoSal weighting for bag-of-words vectors. Read More ›

Podcast Episode Overview: What Machine Learning Engineers need to Know 23 Apr 2018 data-science peopleware data-newsletter-5 machine-learning-engineering

Overview of a great podcast episode on how much (if at all) we need a new role for data teams, namely Machine Learning Engineers. Read More ›

Visualizing Machine Learning Models: Examples with Scikit-learn, XGB and Matplotlib 23 Apr 2018 matplotlib machine-learning scikit-learn

Examples on how to use matplotlib and Scikit-learn together to visualize the behaviour of machine learning models, conduct exploratory analysis, etc. Read More ›

Pandas Dataframe: Merge and Join Examples 17 Apr 2018 pandas

Examples on how to use pandas.merge to do SQL-style joins on pandas dataframes. Read More ›

Introduction to AUC and Calibrated Models with Examples using Scikit-Learn 15 Apr 2018 machine-learning data-science model-evaluation

Inspired by a podcast episode by Linear Digressions, which talks about what AUC is and what it is not and why you need well calibrated models if you want to treat their outputs as probabilities. Read More ›

Measuring how far apart two points are is not as simple as you think and knowing how to use each can make predictive or exploratory models perform either very poorly or very well. Reference and examples including euclidean distance, manhattan distance, mahalanobis distance, etc. Read More ›

Pandas Dataframe: Plot Examples with Matplotlib and Pyplot 22 Dec 2017 pandas pyplot matplotlib dataframes

Examples on how to plot data directly from a Pandas dataframe, using matplotlib and pyplot. Read More ›

Churn Analysis 101: Quick Introduction and Key Concepts 27 Nov 2017 churn data-science

Simple definitions for churn analysis. Read More ›

Churn Analysis 101: Quick Introduction, Key Concepts 27 Nov 2017 churn data-science

Simple definitions for churn analysis. Read More ›

Gaussian Processes for Classification and Regression: Introduction and Usage 19 Nov 2017 machine-learning statistics

Study guide for understanding Gaussian Processes (also Sparse Gaussian Processes) as applied to classification in machine learning. Read More ›

Scikit-Learn Pipeline Examples 21 Oct 2017 scikit-learn

Examples of how to use classifier pipelines on Scikit-learn. Includes examples on cross-validation regular classifiers, meta classifiers such as one-vs-rest and also keras models using the scikit-learn wrappers. Read More ›

Kaggle NYC Taxi Trips Competition: Overview and Results 17 Oct 2017 kaggle

Overview of Kaggle competition: New York City Taxi Trip Duration. Read More ›

Pandas DataFrame: GroupBy Examples 11 Oct 2017 pandas groupby

Examples of specific ways to do what you want using groupby on Pandas Dataframes. Read More ›

Scaling Data Teams 09 Oct 2017 data-science data-newsletter-5

Needs of data teams are mostly around data access and sharing; Columnar databases are often more efficient for analytics; MS Excel is useful at many scales; Stakeholder communication is important to make your work more relevant; Use metrics to get to know how data products are being used. Read More ›

Paper Summary: Recursive Neural Language Architecture for Tag Prediction 05 Oct 2017 paper-summary tags neural-nets embeddings

Summary of the 2016 article "Recursive Neural Language Architecture for Tag Prediction" by Kataria. Read More ›

Paper Summary: Translating Embeddings for Modeling Multi-relational Data 01 Oct 2017 embeddings structure paper-summary neural-networks

Summary of the 2013 article "Translating Embeddings for Modeling Multi-relational Data" by Bordes et al. Read More ›

Feature Scaling: Quick Introduction and Examples using Scikit-learn 27 Sep 2017 data-science python data-preprocessing

Feature Scaling techniques (rescaling, standardization, mean normalization, etc) are useful for all sorts of machine learning approaches and *critical* for things like k-NN, neural networks and anything that uses SGD (stochastic gradient descent), not to mention text processing systems. Included examples: rescaling, standardization, scaling to unit length, using scikit-learn. Read More ›

5 Tips for moving your Data Science Operation to the next Level 26 Sep 2017 data-newsletter-5 data-science best-practices

Principles for disciplined data science include: Discoverability, Automation, Collaboration, Empowerment and Deployment. Read More ›

Data Provenance: Quick Summary + Reasons Why 07 Sep 2017 data-newsletter-5 data-science

Data Provenance (also called Data Lineage) is version control for data. It refers to keeping track of modifications to datasets you use and train models on. This is crucial in data science projects if you need to ensure data quality and reproducibility. Read More ›

Winning Solutions Overview: Kaggle Instacart Competition 04 Sep 2017 data-newsletter-4 kaggle data-science

The Instacart "Market Basket Analysis" competition focused on predicting repeated orders based upon past behaviour. Among the best-ranking solutings, there were many approaches based on gradient boosting and feature engineering and one approach based on end-to-end neural networks. Read More ›

A Quick Summary of Ensemble Learning Strategies 01 Sep 2017 data-newsletter-4 machine-learning

Ensemble learning refers to mixing the outputs of several classifiers in various ways, so as to get a better result than each classifier individually. Read More ›

Evaluation Metrics for Classification Problems: Quick Examples + References 31 Aug 2017 data-newsletter-4 machine-learning model-evaluation

There are multiple ways to measure your model's performance in machine learning, depending upon what objectives you have in mind. Some of the most important are Accuracy, Precision, Recall, F1 and AUC. Read More ›

Pandas for Large Data: Examples and Tips 13 Aug 2017 pandas performance

In order to successfully work with large data on Pandas, there are some ways to reduce memory usage and make sure you get good speed performance. Read More ›

Machine Learning and Data Science: Generally Applicable Tips and Tricks 18 May 2017 machine-learning data-science best-practices

A couple of general, practical tips and tricks that may be used when dealing with data science and/or machine learning problems. Read More ›

A simple description of some common job titles / positions of may come across when looking at the data work landscape. See what positions may be best suited for yourself and your company. Read More ›

Scikit-Learn Cheatsheet: Reference and Examples 10 Mar 2017 scikit-learn

Just a couple of things you may find yourself doing over and over again when working with scikit-learn. Read More ›

Tricks for Training Neural Nets Faster 20 Feb 2017 neural-nets performance

Tricks and Practical tips for training neural nets faster. Credit is mostly to Geoff Hinton and Yann LeCun. Read More ›

Numpy/Scipy Distributions and Statistical Operations: Examples & Reference 10 Sep 2016 numpy statistics

A couple of examples of things you will probably want to do when using numpy and scipy for data work, such as probability distributions, PDFs, CDFs, etc. Read More ›

Pandas DataFrame by Example 15 Dec 2015 pandas python

Lots of examples of ways to use one of the most versatile data structures in the whole Python data analysis stack. Learn how to slice and dice, select and perform commonly used operations on DataFrames. Read More ›

One-Hot Encoding a Feature on a Pandas Dataframe: Examples 27 Nov 2015 pandas

One-hot encoding is a simple way to transform categorical features into vectors that are easy to deal with. Learn how to do this on a Pandas DataFrame. Read More ›

Word2vec Quick Tutorial using the Default Implementation in C 23 May 2015 word2vec word-embeddings

Read More ›

Entries by tag: data-science

Including child/synonym tags