Data Structures
Series:
- A one-dimensional array-like object that can hold any data type
- Can be created from a list, NumPy array or dictionary
- Each element in a Series as an index
data = [1, 2, 3, 4]
series = pd.Series(data)
Dataframe
- A two dimensional table of data with rows and columns
- Can be thought of as a collection of Series
- Can be created from a dictionary, NumPy array or another DataFrame
data = {
"name": ["Alice", "Bob", "Charlie"],
"age": [12, 23, 34]
}
df = pd.DataFrame(data)
Basic Operations
- Indexing and Slicing
- Use square brackets [] to acccess columns by name
- Use .loc[] to do label based indexing and .iloc[] to do integer based indexing
- Use .describe() to get summary statistics of numeric columns
- .mean(), .median() .std() provide individual statistics
df.descibe()
df['age].mean()
Data Cleaning
- Use methods like .dropna() or .fillna(value=0) to handle missing values