Quick reference for Pandas — DataFrame creation, selection, filtering, groupby, merging, handling missing data and common operations.
| Method | Example | Description |
|---|---|---|
| pd.DataFrame(dict) | pd.DataFrame({'a':[1,2],'b':[3,4]}) | From dictionary |
| pd.DataFrame(list) | pd.DataFrame([[1,2],[3,4]], columns=['a','b']) | From list of lists |
| pd.read_csv() | pd.read_csv('file.csv') | From CSV file |
| pd.read_json() | pd.read_json('file.json') | From JSON file |
| pd.read_excel() | pd.read_excel('file.xlsx') | From Excel file |
| pd.read_sql() | pd.read_sql(query, conn) | From SQL query |
| Method | Description |
|---|---|
| df.head(n) | First n rows (default 5) |
| df.tail(n) | Last n rows (default 5) |
| df.shape | Tuple (rows, columns) |
| df.info() | Column types, non-null counts, memory |
| df.describe() | Summary statistics for numeric columns |
| df.dtypes | Data type of each column |
| df.columns | List of column names |
| df.index | Row index |
| df.value_counts() | Count of unique values |
| df.nunique() | Number of unique values per column |
| Method | Example | Description |
|---|---|---|
| df['col'] | df['name'] | Select single column → Series |
| df[['col1','col2']] | df[['name','age']] | Select multiple columns → DataFrame |
| df.loc[row, col] | df.loc[0, 'name'] | Label-based selection |
| df.iloc[row, col] | df.iloc[0, 1] | Integer position-based selection |
| df.loc[rows, cols] | df.loc[0:5, 'a':'c'] | Slice by label (inclusive) |
| df.iloc[rows, cols] | df.iloc[0:5, 0:3] | Slice by position (exclusive end) |
| df.at[row, col] | df.at[0, 'name'] | Fast scalar label access |
| df.iat[row, col] | df.iat[0, 1] | Fast scalar position access |
| Method | Example | Description |
|---|---|---|
| Boolean mask | df[df['age'] > 30] | Filter rows by condition |
| Multiple conditions | df[(df['age']>30) & (df['city']=='NYC')] | AND condition |
| OR condition | df[(df['age']<20) | (df['age']>60)] | OR condition |
| df.query() | df.query('age > 30 and city == "NYC"') | SQL-style string filter |
| df.isin() | df[df['city'].isin(['NYC','LA'])] | Filter by list of values |
| df.between() | df[df['age'].between(20, 40)] | Filter by range (inclusive) |
| str.contains() | df[df['name'].str.contains('Ali')] | Filter by string match |
| Method | Example | Description |
|---|---|---|
| df.isnull() | df.isnull().sum() | Count missing values per column |
| df.dropna() | df.dropna() | Drop rows with any NaN |
| df.dropna(subset) | df.dropna(subset=['age']) | Drop rows where specific col is NaN |
| df.fillna(value) | df.fillna(0) | Fill NaN with value |
| df.fillna(method) | df.fillna(method='ffill') | Forward fill |
| df.interpolate() | df['col'].interpolate() | Interpolate missing values |
| Method | Example | Description |
|---|---|---|
| df.groupby(col).agg() | df.groupby('city')['age'].mean() | Group and aggregate |
| Multiple aggs | df.groupby('city').agg({'age':'mean','salary':'sum'}) | Multiple aggregations |
| df.groupby().size() | df.groupby('city').size() | Count rows per group |
| df.pivot_table() | df.pivot_table(values='sales', index='city', aggfunc='sum') | Pivot table |
| df.crosstab() | pd.crosstab(df['city'], df['gender']) | Frequency cross-table |
| Method | Example | Description |
|---|---|---|
| pd.merge() | pd.merge(df1, df2, on='id') | Merge on column (inner join by default) |
| left join | pd.merge(df1, df2, on='id', how='left') | Keep all left rows |
| outer join | pd.merge(df1, df2, on='id', how='outer') | Keep all rows from both |
| pd.concat() | pd.concat([df1, df2]) | Stack DataFrames vertically |
| pd.concat(axis=1) | pd.concat([df1, df2], axis=1) | Stack DataFrames horizontally |
| df.join() | df1.join(df2, on='id') | Join on index by default |
| Method | Example | Description |
|---|---|---|
| df.sort_values() | df.sort_values('age', ascending=False) | Sort by column |
| df.rename() | df.rename(columns={'old':'new'}) | Rename columns |
| df.drop() | df.drop(columns=['col1']) | Drop column(s) |
| df.reset_index() | df.reset_index(drop=True) | Reset row index |
| df.set_index() | df.set_index('id') | Set column as index |
| df.apply() | df['col'].apply(lambda x: x*2) | Apply function to column |
| df.map() | df['col'].map({'a':1,'b':2}) | Map values using dict |
| df.astype() | df['age'].astype(float) | Cast column type |
| df.duplicated() | df.drop_duplicates() | Remove duplicate rows |
| df.copy() | df2 = df.copy() | Deep copy (avoid SettingWithCopyWarning) |