Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Comprehensive Guide to Descriptive Statistics and Data Analysis, Lecture notes of Data Analysis & Statistical Methods

This lectures note provides a clear and concise understanding of descriptive statistics, making it essential for students studying business analytics or quantitative methods. It explores critical concepts and easy-to-follow explanations, including: Measures of Central Tendency: Mean, Median, Mode, and their significance. Measures of Variation: Range, Variance, and Standard Deviation for analyzing data spread. Distribution Shape Analysis: Skewness, Z-scores, and identifying outliers. Quartiles and Interquartile Range (IQR): For understanding data spread. Population Parameters vs. Sample Statistics: Learn the differences and practical applications. Perfect for business students aiming to excel in quantitative analysis and data interpretation Textbook used to integrate these lecture notes with: Levine, D, Stephan, D, & Szabat, K 2020, Statistics for Managers Using Microsoft Excel, Global Edition, Pearson Education, Limited, Harlow. Available from: ProQuest Ebook Central

Typology: Lecture notes

2023/2024

Available from 12/19/2024

amnah-asghar
amnah-asghar 🇬🇧

5 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
07/10/2024
Quantitative Skills for Business- Lecture 3: Descriptive Measures and Variables
Central Tendency:
Extent to which values of a numerical variable group around a central value.
Variation:
The amount of scattering from a central value that the values of a numerical
variable show.
Shape:
The pattern of distribution of values from the lowest to the highest.
Measures of Central Tendency
The Mean:
Definition: The sum of all values divided by the number of values.
Effect of Outliers: The mean can be significantly affected by extreme values
(outliers).
Calculation Example:
o For the set {11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, the mean =
(11+12+13+14+15+16+17+18+19+20) / 10 = 155 / 10 = 15.5.
The Median:
Definition: The middle value when data is ordered from smallest to largest.
Odd Data Set: The median is the middle value.
Even Data Set: The median is the average of the two middle values.
Formula: Position of the median = (n+1)/2, where n is the number of
observations.
Calculation Example:
o For {11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, the median is (15+16)/2 = 15.5.
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Comprehensive Guide to Descriptive Statistics and Data Analysis and more Lecture notes Data Analysis & Statistical Methods in PDF only on Docsity!

Quantitative Skills for Business- Lecture 3: Descriptive Measures and Variables Central Tendency:

  • Extent to which values of a numerical variable group around a central value. Variation:
  • The amount of scattering from a central value that the values of a numerical variable show. Shape:
  • The pattern of distribution of values from the lowest to the highest. Measures of Central Tendency The Mean:
  • Definition: The sum of all values divided by the number of values.
  • Effect of Outliers: The mean can be significantly affected by extreme values (outliers).
  • Calculation Example: o For the set {11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, the mean = (11+12+13+14+15+16+17+18+19+20) / 10 = 155 / 10 = 15.5. The Median:
  • Definition: The middle value when data is ordered from smallest to largest.
  • Odd Data Set: The median is the middle value.
  • Even Data Set: The median is the average of the two middle values.
  • Formula: Position of the median = (n+1)/2, where n is the number of observations.
  • Calculation Example: o For {11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, the median is (15+16)/2 = 15.5.

The Mode:

  • Definition: The value that occurs most frequently.
  • Characteristics: o There may be no mode or more than one mode. o The mode is not affected by outliers. Geometric Mean:
  • Definition: The nth root of the product of n values. Used mainly for data sets involving growth rates (e.g., financial returns).

o Coefficient of Variation (CV):

  • Definition: A percentage measure of the relative variability of the data set compared to the mean.
  • Formula:
  • Usage: Useful for comparing variability between different data sets, especially if they are in different units. Z-Score (for Outliers):
  • Definition: The number of standard deviations a data value is from the mean.
  • Formula:
  • Outliers: o If Z<−3Z < - 3Z<−3 or Z>3Z > 3Z>3, the data point is considered an extreme outlier. o The larger the absolute value of the Z-score, the farther the value is from the mean. Shape of Distribution Skewness:
  • Definition: Measures the asymmetry of the data. o Symmetric Distribution: Mean = Median. o Left-Skewed (Negative Skew): Mean < Median. o Right-Skewed (Positive Skew): Mean > Median.
  • Values: o Skewness ranges between - 4 and +4. o Close to zero = symmetrical. o Far from zero = highly skewed. Quartile Measures Quartiles:
  • Q1: The first quartile (25th percentile).
  • Q2: The second quartile, or the median (50th percentile).
  • Q3: The third quartile (75th percentile). Interquartile Range (IQR):
  • Definition: Measures the spread of the middle 50% of the data.
  • Formula: IQR = Q3 - Q1.

Population Parameters:

  • Definition : Population parameters are descriptive measures used to summarize and describe characteristics of an entire population.
  • Notation : Greek letters (e.g., μ, σ², σ) are typically used to represent population parameters.
  • Key Population Parameters :
    1. Population Mean (μ) : ▪ Interpretation : The mean gives the average value of the population.
    2. Population Variance (σ²) :
    3. Interpretation : The variance measures how much individual values in the population differ from the mean (i.e., the spread of the data).
  1. Population Standard Deviation (σ) : ▪ ▪ Interpretation : The standard deviation is the square root of the variance and is in the same units as the original data. It gives a direct sense of the average distance from the mean. Sample Statistics:
  • Definition : Sample statistics are measures used to describe and summarize characteristics of a subset of the population (the sample). These are used to estimate population parameters.
  • Notation : Latin letters are typically used to represent sample statistics.
  • Key Sample Statistics :
  1. Sample Mean ▪ Interpretation : It represents the average value of the sample.
  2. Sample Variance :Interpretation : Sample variance is the average of the squared deviations from the sample mean, used to estimate the population variance.
  3. Sample Standard Deviation : ▪ Interpretation : The square root of the sample variance, which is in the same units as the original data. It gives a sense of how spread out the sample data is.
  1. Population Mean vs. Sample Mean : o Population Mean : If you survey every student in a university and calculate the average age, that’s the population mean (μ). o Sample Mean : If you only survey 100 students from the university and calculate the average age, that’s the sample mean.
  2. Population Variance vs. Sample Variance : o Population Variance : If you know the exact ages of all students in the university and compute how much they vary from the population mean, you’re calculating the population variance. o Sample Variance : If you only have the ages of 100 students and compute the variance from their mean, you’re calculating the sample variance. Summary:
  • Population Parameters are exact values describing the entire population, but they are often difficult or impossible to calculate because it requires complete data for the entire population.
  • Sample Statistics are used to estimate population parameters and are subject to variability depending on the sample.
  • Bessel’s Correction (n-1) ensures that the sample variance is an unbiased estimator of the population variance, which is especially important when making inferences from samples.