# Tag: Statistics (284)

**Find the Best-Matching Distribution for Your Data Effortlessly**- Oct 22, 2021.

How to find the best-matching statistical distributions for your data points — in an automated and easy way. And, then how to extend the utility further.**How to calculate confidence intervals for performance metrics in Machine Learning using an automatic bootstrap method**- Oct 15, 2021.

Are your model performance measurements very precise due to a “large” test set, or very uncertain due to a “small” or imbalanced test set?**How to do “Limitless” Math in Python**- Oct 7, 2021.

How to perform arbitrary-precision computation and much more math (and fast too) than what is possible with the built-in math library in Python.**How to Determine the Best Fitting Data Distribution Using Python**- Sep 30, 2021.

Approaches to data sampling, modeling, and analysis can vary based on the distribution of your data, and so determining the best fit theoretical distribution can be an essential step in your data exploration process.**Advanced Statistical Concepts in Data Science**- Sep 30, 2021.

The article contains some of the most commonly used advanced statistical concepts along with their Python implementation.**Important Statistics Data Scientists Need to Know**- Sep 29, 2021.

Several fundamental statistical concepts must be well appreciated by every data scientist -- from the enthusiast to the professional. Here, we provide code snippets in Python to increase understanding to bring you key tools that bring early insight into your data.**Real-Time Histogram Plots on Unbounded Data**- Sep 24, 2021.

Using histograms on real-time data is not possible in most of the popular data science libraries. In this article you will learn how dynamically compute and display a histogram within a Python notebook.**How to Find Weaknesses in your Machine Learning Models**- Sep 20, 2021.

FreaAI: a new method from researchers at IBM.**Paradoxes in Data Science**- Sep 17, 2021.

Have a look into some of the main paradoxes associate with Data Science and it’s statistical foundations.**KDnuggets™ News 21:n34, Sep 8: Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained**- Sep 8, 2021.

Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained; Data Science Cheat Sheet 2.0; 6 Cool Python Libraries That I Came Across Recently; Best Resources to Learn Natural Language Processing in 2021**Antifragility and Machine Learning**- Sep 6, 2021.

Our intuition for most products, processes, and even some models might be that they either will get worse over time, or if they fail, they will experience an cascade of more failure. But, what if we could intentionally design systems and models to only get better, even as the world around them gets worse?**Hypothesis Testing Explained**- Sep 3, 2021.

This brief overview of the concept of Hypothesis Testing covers its classification in parametric and non-parametric tests, and when to use the most popular ones, including means, correlation, and distribution, in the case of one sample and two samples.**What is Noise?**- Aug 25, 2021.

We might have a reasonable sense for what "noise" is as some statically random phenomena that occurs in Nature. But, how can this same characteristic be defined--and understood--within the context of making judgements, such as in human behavior, corporate decision-making, medicine, the law, and AI systems?**Learning Data Science and Machine Learning: First Steps After The Roadmap**- Aug 24, 2021.

Just getting into learning data science may seem as daunting as (if not more than) trying to land your first job in the field. With so many options and resources online and in traditional academia to consider, these pre-requisites and pre-work are recommended before diving deep into data science and AI/ML.**Introduction to Statistical Learning Second Edition**- Aug 13, 2021.

The second edition of the classic "An Introduction to Statistical Learning, with Applications in R" was published very recently, and is now freely-available via PDF on the book's website.**Be Wary of Automated Feature Selection — Chi Square Test of Independence Example**- Aug 5, 2021.

When Data Scientists use chi square test for feature selection, they just merely go by the ritualistic “If your p-value is low, the null hypothesis must go”. The automated function they use behaves no differently.**A Brief Introduction to the Concept of Data**- Jul 29, 2021.

Every aspiring data scientist must know the concept of data and the kind of analysis they can run. This article introduces the concept of data (quantitative and qualitative) and the types of analysis.**The Lost Art of Decile Analysis**- Jul 22, 2021.

The goal of classification is a primary and widely-used application of machine learning algorithms. However, if careful consideration through additional analysis is not taken into the subtlety in the results of an even an apparently straightforward binary classifier, then the deeper meaning of your prediction may be obscured.**WHT: A Simpler Version of the fast Fourier Transform (FFT) you should know**- Jul 21, 2021.

The fast Walsh Hadamard transform is a simple and useful algorithm for machine learning that was popular in the 1960s and early 1970s. This useful approach should be more widely appreciated and applied for its efficiency.**11 Important Probability Distributions Explained**- Jul 20, 2021.

There are many distribution functions considered in statistics and machine learning, which can seem daunting to understand at first. Many are actually closely related, and with these intuitive explanations of the most important probability distributions, you can begin to appreciate the observations of data these distributions communicate.**Why Saying “We Accept the Null Hypothesis” is Wrong: An Intuitive Explanation**- Jul 19, 2021.

“The opposite of ‘Rejecting the Null’ is ‘Accepting’ isn’t it?”. Well, it is not so simple as it is construed. We need to rise above antonyms and understand one crucial concept.**This Data Visualization is the First Step for Effective Feature Selection**- Jun 8, 2021.

Understanding the most important features to use is crucial for developing a model that performs well. Knowing which features to consider requires experimentation, and proper visualization of your data can help clarify your initial selections. The scatter pairplot is a great place to start.**A Guide On How To Become A Data Scientist (Step By Step Approach)**- May 24, 2021.

Becoming a Data Scientists is an exciting path, but you cannot learn data science within one year or six months—instead, it’s a lifetime process that you have to follow with proper dedication and hard work. To guide your journey, the skills outlined here are the first you must acquire to become a data scientist.**Confidence Intervals for XGBoost**- May 11, 2021.

Read this article about building a regularized Quantile Regression objective.**KDnuggets™ News 21:n16, Apr 28: Data Science Books You Should Start Reading in 2021; Top 10 Must-Know Machine Learning Algorithms for Data Scientists**- Apr 28, 2021.

Data science is not about data – applying Dijkstra principle to data science; Data Science Books You Should Start Reading in 2021; How to ace A/B Testing Data Science Interviews; Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1; Production-Ready Machine Learning NLP API with FastAPI and spaCy**10 Must-Know Statistical Concepts for Data Scientists**- Apr 21, 2021.

Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.**Data Science 101: Normalization, Standardization, and Regularization**- Apr 20, 2021.

Normalization, standardization, and regularization all sound similar. However, each plays a unique role in your data preparation and model building process, so you must know when and how to use these important procedures.**Top 3 Statistical Paradoxes in Data Science**- Apr 15, 2021.

Observation bias and sub-group differences generate statistical paradoxes.**Data Science Curriculum for Professionals**- Mar 25, 2021.

If you are looking to expand or transition your current professional career that is buried in spreadsheet analysis into one powered by data science, then you are in for an exciting but complex journey with much to explore and master. To begin your adventure, following this complete road map to guide you from a gnome in the forest of spreadsheets to an AI wizard known far and wide throughout the kingdom.**Rejection Sampling with Python**- Mar 24, 2021.

Read this article on rejection sampling with examples using the Normal and Cauchy Distributions.**KDnuggets™ News 21:n11, Mar 17: Is Data Scientist still a satisfying job? How To Overcome The Fear of Math and Learn Math For Data Science**- Mar 17, 2021.

Must Know for Data Scientists and Data Analysts: Causal Design Patterns; Know your data much faster with the new Sweetviz Python library; The Inferential Statistics Data Scientists Should Know; Natural Language Processing Pipelines, Explained**Must Know for Data Scientists and Data Analysts: Causal Design Patterns**- Mar 12, 2021.

Industry is a prime setting for observational causal inference, but many companies are blind to causal measurement beyond A/B tests. This formula-free primer illustrates analysis design patterns for measuring causal effects from observational data.**The Inferential Statistics Data Scientists Should Know**- Mar 11, 2021.

The foundations of Data Science and machine learning algorithms are in mathematics and statistics. To be the best Data Scientists you can be, your skills in statistical understanding should be well-established. The more you appreciate statistics, the better you will understand how machine learning performs its apparent magic.**How To Overcome The Fear of Math and Learn Math For Data Science**- Mar 10, 2021.

Many aspiring Data Scientists, especially when self-learning, fail to learn the necessary math foundations. These recommendations for learning approaches along with references to valuable resources can help you overcome a personal sense of not being "the math type" or belief that you "always failed in math."**10 Statistical Concepts You Should Know For Data Science Interviews**- Feb 23, 2021.

Data Science is founded on time-honored concepts from statistics and probability theory. Having a strong understanding of the ten ideas and techniques highlighted here is key to your career in the field, and also a favorite topic for concept checks during interviews.**Want to Be a Data Scientist? Don’t Start With Machine Learning**- Jan 26, 2021.

Machine learning may appear like the go-to topic to start learning for the aspiring data scientist. But. thinking these techniques are the key aspects of the role is the biggest misconception. So much more goes into becoming a successful data scientist, and machine learning is only one component of broader skills around processing, managing, and understanding the science behind the data.**Null Hypothesis Significance Testing is Still Useful**- Jan 25, 2021.

Even in the aftermath of the replication crisis, statistical significance lingers as an important concept for Data Scientists to understand.**Comprehensive Guide to the Normal Distribution**- Jan 18, 2021.

Drop in for some tips on how this fundamental statistics concept can improve your data science.**15 Free Data Science, Machine Learning & Statistics eBooks for 2021**- Dec 31, 2020.

We present a curated list of 15 free eBooks compiled in a single location to close out the year.**Monte Carlo integration in Python**- Dec 24, 2020.

A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python?**5 Free Books to Learn Statistics for Data Science**- Dec 8, 2020.

Learn all the statistics you need for data science for free.**Essential Math for Data Science: Probability Density and Probability Mass Functions**- Dec 7, 2020.

In this article, we’ll cover probability mass and probability density function in this sample. You’ll see how to understand and represent these distribution functions and their link with histograms.**10 Principles of Practical Statistical Reasoning**- Nov 3, 2020.

Practical Statistical Reasoning is a term that covers the nature and objective of applied statistics/data science, principles common to all applications, and practical steps/questions for better conclusions. The following principles have helped me become more efficient with my analyses and clearer in my conclusions.**The Best Free Data Science eBooks: 2020 Update**- Sep 30, 2020.

The author has updated their list of best free data science books for 2020. Read on to see what books you should grab.**Causal Inference: The Free eBook**- Sep 25, 2020.

Here's another free eBook for those looking to up their skills. If you are seeking a resource that exhaustively treats the topic of causal inference, this book has you covered.**What is Simpson’s Paradox and How to Automatically Detect it**- Sep 18, 2020.

Looking at data one way can tell one story, but sometimes looking at it another way will tell the opposite story. Understanding this paradox and why it happens is essential, and new tools are available to help automatically detect this tricky issue in your datasets.**Statistics with Julia: The Free eBook**- Sep 14, 2020.

This free eBook is a draft copy of the upcoming Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence. Interested in learning Julia for data science? This might be the best intro out there.**Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills**- Sep 8, 2020.

We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.**Book Chapter: The Art of Statistics: Learning from Data**- Sep 3, 2020.

Get a free book chapter from "The Art of Statistics: Learning from Data" by a leading researcher Sir David John Spiegelhalter. This excerpt takes a forensic look at data surrounding the victims of the UK most prolific serial killer and shows how a simple search for patterns reveals critical details.**Which methods should be used for solving linear regression?**- Sep 2, 2020.

As a foundational set of algorithms in any machine learning toolbox, linear regression can be solved with a variety of approaches. Here, we discuss. with with code examples, four methods and demonstrate how they should be used.**These Data Science Skills will be your Superpower**- Aug 20, 2020.

Learning data science means learning the hard skills of statistics, programming, and machine learning. To complete your training, a broader set of soft skills will round out your capabilities as an effective and successful professional Data Scientist.**KDnuggets™ News 20:n32, Aug 19: The List of Top 10 Data Science Lists; Data Science MOOCs with Substance**- Aug 19, 2020.

The List of Top 10 Lists in Data Science; Going Beyond Superficial: Data Science MOOCs with Substance; Introduction to Statistics for Data Science; Content-Based Recommendation System using Word Embeddings; How Natural Language Processing Is Changing Data Analytics**Hypothesis Test for Real Problems**- Aug 14, 2020.

Hypothesis tests are significant for evaluating answers to questions concerning samples of data.**Introduction to Statistics for Data Science**- Aug 12, 2020.

Statistics is foundational for Data Science and a crucial skill to master for any practitioner. This advanced introduction reviews with examples the fundamental concepts of inferential statistics by illustrating the differences between Point Estimators and Confidence Intervals Estimates.**R squared Does Not Measure Predictive Capacity or Statistical Adequacy**- Jul 31, 2020.

The fact that R-squared shouldn't be used for deciding if you have an adequate model is counter-intuitive and is rarely explained clearly. This demonstration overviews how R-squared goodness-of-fit works in regression analysis and correlations, while showing why it is not a measure of statistical adequacy, so should not suggest anything about future predictive performance.**A Complete Guide To Survival Analysis In Python, part 3**- Jul 30, 2020.

Concluding this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter based on different groups, a Log-Rank test, and Cox Regression, all with examples and shared code.**Essential Resources to Learn Bayesian Statistics**- Jul 28, 2020.

If you are interesting in becoming better at statistics and machine learning, then some time should be invested in diving deeper into Bayesian Statistics. While the topic is more advanced, applying these fundamentals to your work will advance your understanding and success as an ML expert.**Demystifying Statistical Significance**- Jul 17, 2020.

With more professionals from a wide range of less technical fields diving into statistical analysis and data modeling, these experimental techniques can seem daunting. To help with these hurdles, this article clarifies some misconceptions around p-values, hypothesis testing, and statistical significance.**Before Probability Distributions**- Jul 16, 2020.

Why do we use probability distributions, and why do they matter?**A Complete Guide To Survival Analysis In Python, part 2**- Jul 14, 2020.

Continuing with the second of this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter theory as well as the Nelson-Aalen fitter theory, both with examples and shared code.**A Complete Guide To Survival Analysis In Python, part 1**- Jul 7, 2020.

This three-part series covers a review with step-by-step explanations and code for how to perform statistical survival analysis used to investigate the time some event takes to occur, such as patient survival during the COVID-19 pandemic, the time to failure of engineering products, or even the time to closing a sale after an initial customer contact.**The 8 Basic Statistics Concepts for Data Science**- Jun 24, 2020.

Understanding the fundamentals of statistics is a core capability for becoming a Data Scientist. Review these essential ideas that will be pervasive in your work and raise your expertise in the field.**4 Free Math Courses to do and Level up your Data Science Skills**- Jun 22, 2020.

Just as there is no Data Science without data, there's no science in data without mathematics. Strengthening your foundational skills in math will level you up as a data scientist that will enable you to perform with greater expertise.**Overview of data distributions**- Jun 10, 2020.

With so many types of data distributions to consider in data science, how do you choose the right one to model your data? This guide will overview the most important distributions you should be familiar with in your work.**KDnuggets™ News 20:n23, Jun 10: Largest Dataset you analyzed? If you start statistics all over again, where would you start? GPT-3**- Jun 10, 2020.

#BlackLivesMatter. In this issue: If you had to start statistics all over again, where would you start? New Poll: What was the largest dataset you analyzed? Another Great NLP Course from Stanford; Naive Bayes: Everything you need to know; GPT-3 - a giant leap for Deep Learning and NLP?**If you had to start statistics all over again, where would you start?**- Jun 5, 2020.

If you are just diving into learning statistics, then where do you begin? Find insight from those who have tread in these waters before, and see what they might have done differently along their personal journeys in statistics.**STIPS – Statistical Thinking for Industrial Problem Solving – A free online statistics course**- Jun 2, 2020.

This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.**Appropriately Handling Missing Values for Statistical Modelling and Prediction**- May 22, 2020.

Many statisticians in industry agree that blindly imputing the missing values in your dataset is a dangerous move and should be avoided without first understanding why the data is missing in the first place.**Looking Normal(ly Distributed)**- May 20, 2020.

This article investigates when some probability distributions look normal "enough" for a statistical test.**Evidence Counterfactuals for explaining predictive models on Big Data**- May 18, 2020.

Big Data generated by people -- such as, social media posts, mobile phone GPS locations, and browsing history -- provide enormous prediction value for AI systems. However, explaining how these models predict with the data remains challenging. This interesting explanation approach considers how a model would behave if it didn't have the original set of data to work with.**Were 21% of New York City residents really infected with the novel coronavirus?**- May 6, 2020.

Understanding the types of statistical bias that pop up in popular media and reporting is especially important during this pandemic where the data -- and our global response to the data -- directly impact peoples' lives.**Statistical Thinking for Industrial Problem Solving – a free online statistics course**- May 5, 2020.

This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.**A Concise Course in Statistical Inference: The Free eBook**- Apr 27, 2020.

Check out this freely available book, All of Statistics: A Concise Course in Statistical Inference, and learn the probability and statistics needed for success in data science.**Should Data Scientists Model COVID19 and other Biological Events**- Apr 22, 2020.

Biostatisticians use statistical techniques that your current everyday data scientists have probably never heard of. This is a great example where lack of domain knowledge exposes you as someone that does not know what they are doing and are merely hopping on a trend.**Statistical Thinking for Industrial Problem Solving – a free online statistics course**- Apr 9, 2020.

This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.**Free online statistics course – Improve your analytics knowledge**- Mar 26, 2020.

This online course is available – for free – to anyone interested in using data to solve problems better.**Data Science Curriculum for self-study**- Feb 26, 2020.

Are you asking the question, "how do I become a Data Scientist?" This list recommends the best essential topics to gain an introductory understanding for getting started in the field. After learning these basics, keep in mind that doing real data science projects through internships or competitions is crucial to acquiring the core skills necessary for the job.**Statistical Thinking for Industrial Problem Solving: a free online course.**- Jan 13, 2020.

**Statistical Thinking for Industrial Problem Solving: a free online course**- Dec 3, 2019.

**An Eight-Step Checklist for An Analytics Project**- Nov 6, 2019.

Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.**KDnuggets™ News 19:n42, Nov 6: 5 Statistical Traps Data Scientists Should Avoid; 10 Free Must-Read Books on AI**- Nov 6, 2019.

Learn about statistical fallacies Data Scientists should avoid; New and quite amazing Deep Learning capabilities FB has been quietly open-sourcing; Top Machine Learning tools for Developers; How to build a Neural Network from scratch and more.**Probability Learning: Maximum Likelihood**- Nov 5, 2019.

The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.**5 Statistical Traps Data Scientists Should Avoid**- Oct 30, 2019.

Here are five statistical fallacies — data traps — which data scientists should be aware of and definitely avoid.**How to Become a (Good) Data Scientist – Beginner Guide**- Oct 16, 2019.

A guide covering the things you should learn to become a data scientist, including the basics of business intelligence, statistics, programming, and machine learning.**An Overview of Density Estimation**- Oct 14, 2019.

Density estimation is estimating the probability density function of the population from the sample. This post examines and compares a number of approaches to density estimation.**Statistical Thinking for Industrial Problem Solving: a free online course**- Oct 2, 2019.

**6 bits of advice for Data Scientists**- Sep 25, 2019.

As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.**Beta Distribution: What, When & How**- Sep 25, 2019.

This article covers the beta distribution, and explains it using baseball batting averages.**Which Data Science Skills are core and which are hot/emerging ones?**- Sep 17, 2019.

We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.**How Bad is Multicollinearity?**- Sep 17, 2019.

For some people anything below 60% is acceptable and for certain others, even a correlation of 30% to 40% is considered too high because it one variable may just end up exaggerating the performance of the model or completely messing up parameter estimates.**What’s the difference between analytics and statistics?**- Sep 6, 2019.

From asking the best questions about data to answering those questions with certainty, understanding the value of these two seemingly different professions is clarified when you see how they should work together.**Statistical Modelling vs Machine Learning**- Aug 14, 2019.

At times it may seem Machine Learning can be done these days without a sound statistical background but those people are not really understanding the different nuances. Code written to make it easier does not negate the need for an in-depth understanding of the problem.**What is Poisson Distribution?**- Aug 14, 2019.

An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.**Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course.**- Aug 2, 2019.

**P-values Explained By Data Scientist**- Jul 30, 2019.

This article is designed to give you a full picture from constructing a hypothesis testing to understanding p-value and using that to guide our decision making process.**Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps**- Jul 9, 2019.

A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.**How do you check the quality of your regression model in Python?**- Jul 2, 2019.

Linear regression is rooted strongly in the field of statistical learning and therefore the model must be checked for the ‘goodness of fit’. This article shows you the essential steps of this task in a Python ecosystem.**Top KDnuggets Tweets, Jun 12 – 18: The biggest mistake while learning #Python for #datascience; 5 practical statistical concepts for data scientists**- Jun 19, 2019.

Also: Resources for developers transitioning into data science; Best Data Visualization Techniques for small and large data; Top Data Science and Machine Learning Methods Used in 2018, 2019**KDnuggets™ News 19:n23, Jun 19: Useful Stats for Data Scientists; Python, TensorFlow & R Winners in Latest Job Report**- Jun 19, 2019.

This week on KDnuggets: 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; How to Learn Python for Data Science the Right Way; The Machine Learning Puzzle, Explained; Scalable Python Code with Pandas UDFs; and much more!**5 Useful Statistics Data Scientists Need to Know**- Jun 14, 2019.

A data scientist should know how to effectively use statistics to gain insights from data. Here are five useful and practical statistical concepts that every data scientist must know.**All Models Are Wrong – What Does It Mean?**- Jun 12, 2019.

During your adventures in data science, you may have heard “all models are wrong.” Let’s unpack this famous quote to understand how we can still make models that are useful.**Top 10 Statistics Mistakes Made by Data Scientists**- Jun 7, 2019.

The following are some of the most common statistics mistakes made by data scientists. Check this list often to make sure you are not making any of these while applying statistics to data science.**Statistical Thinking for Industrial Problem Solving (STIPS): a free online course.**- Jun 4, 2019.

**Separating signal from noise**- Jun 4, 2019.

When we are building a model, we are making the assumption that our data has two parts, signal and noise. Signal is the real pattern, the repeatable process that we hope to capture and describe. The noise is everything else that gets in the way of that.**What Does a Lady Tasting Tea Have to Do with Science?**- May 31, 2019.

Design of Experiments (DOE) is a statistical concept used to find the cause-and-effect relationships. Surprisingly, an experiment arising from a casual conversation about tea-drinking is one of the first examples of an experiment designed using statistical ideas.**Probability Mass and Density Functions**- May 21, 2019.

This content is part of a series about the chapter 3 on probability from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts.**Modeling 101**- May 13, 2019.

In the past couple of decades, innovation in statistics and machine learning has been increasing at a rapid pace and we are now able to do things unimaginable when I began my career.**Naive Bayes: A Baseline Model for Machine Learning Classification Performance**- May 7, 2019.

We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.**Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course.**- May 3, 2019.

**How to correctly select a sample from a huge dataset in machine learning**- May 1, 2019.

We explain how choosing a small, representative dataset from a large population can improve model training reliability.**Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course**- Apr 5, 2019.

**Spatio-Temporal Statistics: A Primer**- Apr 5, 2019.

Marketing scientist Kevin Gray asks University of Missouri Professor Chris Wikle about Spatio-Temporal Statistics and how it can be used in science and business.**Wake Forest University: Teaching Professor/Professor of the Practice in Statistics/Analytics [Winston-Salem, NC]**- Mar 18, 2019.

The Wake Forest University School of Business is seeking qualified candidates for a Teaching Professor/Professor of the Practice in Statistics/Analytics. This individual will be expected to teach graduate courses in areas such as Data Analysis & Business Modeling, Data Mining & Machine Learning, and Forecasting.**The 7 Myths of Data Anonymisation**- Mar 12, 2019.

Anonymisation has always been rather seen as a necessary evil instead of a helpful tool. That’s why plenty of myths have arisen around that technology over the years.**Beating the Bookies with Machine Learning**- Mar 8, 2019.

We investigate how to use a custom loss function to identify fair odds, including a detailed example using machine learning to bet on the results of a darts match and how this can assist you in beating the bookmaker.**Statistical Thinking for Industrial Problem Solving – a free online course**- Feb 6, 2019.

**From Good to Great Data Science, Part 1: Correlations and Confidence**- Feb 5, 2019.

With the aid of some hospital data, part one describes how just a little inexperience in statistics could result in two common mistakes.**The Essential Data Science Venn Diagram**- Feb 4, 2019.

A deeper examination of the interdisciplinary interplay involved in data science, focusing on automation, validity and intuition.**Southern Illinois University Edwardsville: Director of the Center for Predictive Analytics/(Associate) Professor of Mathematics and Statistics [Edwardsville, IL]**- Jan 4, 2019.

Southern Illinois University Edwardsville (SIUE) is establishing the Center for Predictive Analytics (C-PAN), and is seeking an innovative, visionary director for the center who will provide centralized leadership in establishing research and educational initiatives across academic units at SIUE.**Introduction to Statistics for Data Science**- Dec 17, 2018.

This tutorial helps explain the central limit theorem, covering populations and samples, sampling distribution, intuition, and contains a useful video so you can continue your learning.**A comprehensive list of Machine Learning Resources: Open Courses, Textbooks, Tutorials, Cheat Sheets and more**- Dec 7, 2018.

A thorough collection of useful resources covering statistics, classic machine learning, deep learning, probability, reinforcement learning, and more.**The 5 Basic Statistics Concepts Data Scientists Need to Know**- Nov 13, 2018.

Today, we’re going to look at 5 basic statistics concepts that data scientists need to know and how they can be applied most effectively!**Quantum Machine Learning: A look at myths, realities, and future projections**- Nov 5, 2018.

An overview of quantum computing and quantum algorithm design, including current state of the hardware and algorithm design within the existing systems.**How I Learned to Stop Worrying and Love Uncertainty**- Oct 24, 2018.

This is a written version of Data Scientist Adolfo Martínez’s talk at Software Guru’s DataDay 2017. There is a link to the original slides (in Spanish) at the top of this post.**University of San Francisco: Assistant Professor, Tenure Track, Mathematics and Statistics [San Francisco, CA]**- Oct 17, 2018.

The University of San Francisco invites applications for a tenure-track Assistant Professor position to begin August 2019. We seek well-qualified candidates in the areas of applied mathematics or statistics, with a focus on the extraction of knowledge from data.**Mindstrong Health: Sr Data Scientist / Machine Learning, Statistics, Coding [Palo Alto, CA]**- Oct 17, 2018.

Mindstrong Health is seeking a Sr Data Scientist in Palo Alto, CA, who is passionate about our mission, committed to excellence and excited to build a company that will address one of the greatest health challenges of our time.**Unfolding Naive Bayes From Scratch**- Sep 25, 2018.

Whether you are a beginner in Machine Learning or you have been trying hard to understand the Super Natural Machine Learning Algorithms and you still feel that the dots do not connect somehow, this post is definitely for you!**Machine Learning Cheat Sheets**- Sep 11, 2018.

Check out this collection of machine learning concept cheat sheets based on Stanord CS 229 material, including supervised and unsupervised learning, neural networks, tips & tricks, probability & stats, and algebra & calculus.**5 Things to Know About A/B Testing**- Sep 7, 2018.

This article presents 5 things to know about A/B testing, from appropriate sample sizes, to statistical confidence, to A/B testing usefulness, and more.**Essential Math for Data Science: ‘Why’ and ‘How’**- Sep 6, 2018.

It always pays to know the machinery under the hood (even at a high level) than being just the guy behind the wheel with no knowledge about the car.