- The Data Point
- Posts
- Intro to Statistics: 7 Key Concepts to Get You Started
Intro to Statistics: 7 Key Concepts to Get You Started
A crash course in fundamental stats concepts
Hey everyone đź‘‹
I was never good at math in school.
Often, I had to struggle just to squeeze out a D or C.
But for some reason, statistics has always made more sense to me.
Perhaps this is because statistics is narrative. It tells a story about the numbers.
For those new to it, statistics can seem intimidating.
But at its core, statistics is about understanding data and making informed decisions about it.
Let's break down some of the fundamental statistical concepts that every data professional should be familiar with.
1. Mean (Average)
The mean is what most people typically think of as the average. It's calculated by adding up all numbers in a dataset and dividing by the number of entries.
Example: If you have test scores of 85, 90, 80, and 95, the mean is: (85 + 90 + 80 + 95) Ă· 4 = 87.5
2. Median
The median is the middle number in a dataset when the numbers are arranged in ascending or descending order. If there's an even number of data points, the median is the average of the two middle numbers.
Example: For the numbers 5, 7, 9, 12, and 15, the median is 9. For 5, 7, 9, 12, 15, and 18, the median is (9 + 12) Ă· 2 = 10.5.
3. Mode
The mode is the number that appears most frequently in a dataset. A dataset can have one mode, more than one mode, or no mode at all.
Example: In the set {4, 5, 5, 6, 6, 6, 7}, the mode is 6 since it appears the most.
4. Standard Deviation
The standard deviation measures the amount of variation or dispersion in a set of values. A low standard deviation indicates that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range.
Example: Consider the test scores of two classes: Class A has scores of 85, 86, 85, 87, and 86, while Class B has scores of 70, 90, 60, 100, and 80. While both classes have the same mean score of 86, Class A has a smaller standard deviation since the scores are clustered closely around the mean. In contrast, Class B's scores are more spread out, resulting in a larger standard deviation.
5. Correlation
Correlation indicates the strength and direction of a relationship between two variables. The most common measure of correlation is the Pearson's r, which ranges between -1 and 1. A value close to 1 implies a strong positive correlation: as one variable increases, the other also does. A value close to -1 implies a strong negative correlation: as one variable increases, the other decreases. A value close to 0 implies little to no relationship.
Example: Imagine you're studying the relationship between hours studied and test scores among students. If you find that as the number of hours studied increases, test scores also tend to increase, this suggests a positive correlation. On the other hand, if you find that as the number of hours spent playing video games increases, test scores decrease, this suggests a negative correlation.
6. Variance
Variance measures how far each number in the set is from the mean and is calculated by taking the average of the squared differences from the Mean.
Example: Let's say we have a set of test scores: 70, 80, 90. First, find the mean: (70 + 80 + 90) ÷ 3 = 80. Next, find the squared difference from the mean for each score: (70-80)² = 100, (80-80)² = 0, and (90-80)² = 100. Now, average those squared differences: (100 + 0 + 100) ÷ 3 = 66.67. This value, 66.67, represents the variance, indicating the average squared difference of scores from the mean.
7. Probability
At its simplest, probability measures the likelihood that an event will occur. It's expressed as a number between 0 (implying an impossible event) and 1 (an event certain to occur).
For example: The probability of flipping a coin and it landing heads up is 0.5.
Conclusion
A basic understanding of these statistical concepts is a great starting point for any budding data professional. As you continue your journey in the data world, these foundational ideas will serve as stepping stones to more advanced topics and analyses.
Remember, statistics is more than crunching numbers—it's about uncovering stories and insights hidden within the data.
This week’s YouTube video:
In this week’s video, I break down 7 ways to optimize your LinkedIn profile for getting a job in data.
That’s it for this week.
See you next time
Matt ✌️
Whenever you’re ready, there are 3 ways I can help you:
1 | The Data Portfolio Guidebook
If you’re looking to create a data portfolio but aren’t sure where to start, I’d recommend this ebook: Learn how to think like an analyst, develop a portfolio and LinkedIn profile, and tackle the job hunt.
2 | 1:1 Coaching Call
For help navigating the data job hunt, consider booking a 1:1 career guidance session with me. There are a few options available to help you get to your ideal data job faster.
3 | Coaching Program
Interested in ongoing coaching? Fill out the form below for more information and to see if we’re a good fit.