AlgoDaily - Introduction to Data Cleaning and Wrangling

Home > DevOps/SDLC Cheat Sheets > DevOps and SDLC > Introduction to Data Cleaning and Wrangling

In this code snippet, we perform data wrangling tasks using two sample DataFrames:

Data Discovery: Use the info() method to get basic information about the DataFrame.
Data Organization: Sort the DataFrame based on the 'Age' column.
Data Enriching: Merge two DataFrames based on the 'ID' column.
Data Validating: Drop rows that have any missing values to ensure data integrity.
Data Publishing: Save the final, cleaned DataFrame to a CSV file.

PYTHON

1# Original DataFrames
2data1 = {
3    'ID': [1, 2, 3, 4],
4    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
5    'Age': [25, 30, 35, 40]
6}
7data2 = {
8    'ID': [3, 4, 5, 6],
9    'Score': [85, 90, 88, 76],
10    'Country': ['Canada', 'US', 'UK', 'Australia']
11}
12df1 = pd.DataFrame(data1)
13df2 = pd.DataFrame(data2)
14print("Original DataFrame 1:")
15print(df1)
16print("\nOriginal DataFrame 2:")
17print(df2)
18
19# Data Wrangling Steps
20# Step 1: Data Discovery - Get basic info
21print("\nDataFrame 1 Info:")
22print(df1.info())
23
24# Step 2: Data Organization - Sort by Age
25df1_sorted = df1.sort_values(by='Age')
26
27# Step 4: Data Enriching - Merge DataFrames
28df_merged = pd.merge(df1_sorted, df2, on='ID', how='outer')
29
30# Step 5: Data Validating - Drop rows with missing values
31df_validated = df_merged.dropna()
32
33# Step 6: Data Publishing - Save to CSV
34df_validated.to_csv('cleaned_and_enriched_data.csv', index=False)
35
36# Final DataFrame
37print("\nFinal DataFrame:")
38print(df_validated)

The result is a DataFrame that is ready for data analysis, having been cleaned, enriched, and validated.

Programming Categories

Popular Lessons