Language: Python
Data Science
Pandas was created by Wes McKinney in 2008 to provide a high-performance, user-friendly data analysis tool for Python. It has become the standard library for data manipulation in Python, widely used in data science, finance, research, and analytics.
Pandas is a powerful Python library for data manipulation and analysis. It provides fast, flexible, and expressive data structures such as Series and DataFrame for working with structured data.
pip install pandasconda install pandasPandas allows for easy reading, writing, and manipulation of data from multiple sources including CSV, Excel, SQL databases, and more. You can filter, aggregate, group, pivot, merge, and reshape datasets efficiently.
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())Reads a CSV file into a DataFrame and displays the first five rows.
print(df['column_name'])
print(df.iloc[0])Access columns by name and rows by index.
filtered = df[df['age'] > 30]
grouped = filtered.groupby('department').mean()Filter rows based on a condition and then group by a column to calculate mean values.
merged = pd.merge(df1, df2, on='id', how='inner')Combine two DataFrames on a common column using an inner join.
pivot = df.pivot_table(index='department', columns='gender', values='salary', aggfunc='mean')Create a pivot table to summarize data efficiently.
Use vectorized operations instead of loops for performance.
Clean data before analysis: handle missing values, duplicates, and inconsistent types.
Use descriptive column names for readability.
Leverage built-in aggregation functions for efficiency.
Profile large datasets with df.info() and df.describe() before processing.