Exploratory Data Analysis Project: Obesity Levels Based on Eating Habits and Physical Condition


I chose to analyze a dataset that estimated obesity levels based on the eating habits and physical condition of people from Mexico, Peru, and Colombia. I was interested in the relationship between body fat levels and lifestyle habits, as the findings can be applicable to the health and wellbeing of many, if not most, of us. It is especially relevant because BMI has been on the news a lot recently due to the fact that those with a high BMI are eligible to get the COVID-19 vaccine earlier, as obese individuals are three times more likely to be hospitalized from the virus. I used a dataset from UC Irvine’s Machine Learning Repository, which included the data of 2111 individuals ages 14 to 61.

  • Age: numeric
  • Height: numeric, in meters
  • Weight: numeric, in kilograms
  • family_history (family history of obesity): yes or no
  • FCHCF (frequent consumption of high caloric food): yes or no
  • FCV (frequency of consumption of vegetables: 1, 2, or 3; 1 = never, 2 = sometimes, 3 = always
  • NMM (number of main meals): 1, 2, 3 or 4
  • CFBM (consumption of food between meals): 1, 2, 3, or 4; 1=no, 2=sometimes, 3=frequently, 4=always
  • Smoke: yes or no
  • CW (consumption of water): 1, 2, or 3; 1 = less than a liter, 2 = 1–2 liters, 3 = more than 2 liters
  • CCM (calorie consumption monitoring): yes or no
  • PAF (physical activity frequency per week): 0, 1, 2, or 3; 0 = none, 1 = 1 to 2 days, 2= 2 to 4 days, 3 = 4 to 5 days
  • TUT (time using technology devices a day): 0, 1, or 2; 0 = 0–2 hours, 1 = 3–5 hours, 2 = more than 5 hours
  • CA (consumption of alcohol): 1, 2, 3, or 4; 1= never, 2 = sometimes, 3 = frequently, 4 = always
  • Transportation: automobile, motorbike, bike, public transportation, or walking
  • Obesity**: insufficient weight, normal weight, level I overweight, level II overweight, type I obesity, type 2 obesity, type 3 obesity; these categories are listed from lowest to highest body fat
  1. Can BMI be used as a quantitative substitute for the qualitative weight classification category?
  2. Which eating habit and physical condition variables are most related to obesity levels? This question has many subquestions related to individual variables and groups of variables.

Examine and Prepare the Data

To analyze the dataset, I needed to first load the dataset into Colaboratory and evaluate it for errors and quality before creating visualizations.

Data Analysis

Graph 1: How are the respondents broken down by weight classification?


This data analysis suggests that factors such as a family history of obesity and eating high-calorie food can strongly influence weight classification, while other factors such as age had less influence, and factors such as gender had no influence.