- Objective: This project aims to conduct a comprehensive statistical analysis of European Football Player data, emphasizing data quality analysis, exploratory data analysis (EDA), and statistical modeling.
- The analysis provides insights into factors influencing a player's potential and their likelihood of earning a weekly wage above 8000 Euro.
- Key statistical analysis include:
- ANCOVA Analysis: Assessing factors influencing a player's potential.
- Logistic Regression Analysis: Investigating factors affecting a player's likelihood to earn a high weekly wage.
- R libraries used: dplyr, ggplot2, gridExtra, grid, corrplot, reshape2.
- Access the document in RMarkdown, HTML, and PDF formats.
- Objective: Ensure data accuracy, consistency, and reliability.
- Key steps:
- Data summary
- Identifying numerical and categorical variables
- Handling outliers
- Addressing missing data
- Visualizing data anomalies
- Objective: Summarise dataset characteristics through visualisations.
- Visualisations include histograms, box plots, heatmaps, and hypothesis testing (Shapiro-Wilk normality test and Pearson's correlation coefficient).
- The heatmap illustrates correlations between attributes.
- Objective: Address two research questions.
- ANCOVA tests variables affecting player potential.
- Hypotheses:
-
$H_0$ : No variables affect player potentials. -
$H_1$ : Variables influence player potentials.
-
- Linear equation:
- $y = (4.626e+01) + (3.02e-13) \times (wage) + (6.560e-05) \times age - (5.718e-01) \times height + (6.509e-02) \times shooting + (1.085e-01) \times (2.769e-01) \times dribbling (6.416e-02) \times defending + (7.931e-02) \times physics - (1.021e-01) \times
power long shots
+ (3.154e+00) \times high.wage.indTRUE
- $y = (4.626e+01) + (3.02e-13) \times (wage) + (6.560e-05) \times age - (5.718e-01) \times height + (6.509e-02) \times shooting + (1.085e-01) \times (2.769e-01) \times dribbling (6.416e-02) \times defending + (7.931e-02) \times physics - (1.021e-01) \times
- Conclusion: The alternative hypothesis is accepted, but this model needs improvement due to inconsistency and multicollinearity.
Logistic Regression: "What factors affect the likelihood of a football player earning a weekly wage above 8000 Euro?"
- Logistic Regression models factors influencing a player's likelihood to earn a high wage.
- Hypotheses:
-
$H_0$ : No factors affect the likelihood of high wages. -
$H_1$ : Factors influence high wages.
-
- Logistic transformation equation:
$log(\frac{p}{1-p})=a + b1 \times x1 + b2 \times x2 ....bk \times xk$
- Conclusion: The Null Hypothesis is rejected, and the minimal adequate model is a good fit.
Conclusion:
- In conclusion, this project underscores the significance of data quality, exploratory data analysis, and statistical modeling in extracting valuable insights from European football player data.
- By following these analytical steps, the report ensures reproducibility and provides valuable insights into the world of football statistics.
- These findings can be invaluable for clubs, analysts, and enthusiasts looking to understand the factors influencing a player's potential and earnings in the dynamic world of European football.