Вы находитесь на странице: 1из 3

Leer CSV

pandas.read_csv(“path or URL” [, sep=’separator’, header=’index or None’, names=[name columns list], usecols=[name columns list or single value]])

(usecols=[name columns list or single value]

Selecting columns

data[‘Column Name’]

→ only read this columns)

Selection and indexing

data.iloc[3[, 4]] → Selección por posición de la tercera fila(, y cuarta columna, opcional) data.loc[3[, 4]] → Selección por índice de la tercera fila(, y cuarta columna, opcional)

Creating column

data[‘New Column Name’] = PandasSeriesData

Describing columns

data.describe([include=[‘types’…]])

Rename columns

data.rename(columns={‘old’: ‘new’, ‘old2’: ‘new2’} [, inplace=True])

data.columns = [‘name’, ‘name2’, ‘name3’]

Remove columns

data.drop([name columns list or single value], axis=1 [, inplace=True])

Remove rows

data.drop([index rows list or single value], axis=0 [, inplace=True])

Sorting

data.sort_values([name columns list or single value] [, ascending=True])

data[‘Column name’].sort_values([, ascending=True])

→ Series sorting

→ DataFrame sorting

Filtering

data[data[‘Column name’] > 20]

→ Single filtering

data[(data[‘Column name’] > 20) & (data[‘Column name’] < 30))]

→ Multiple filtering

data[data[‘Column name’].isin([values list])]

→ Filtering by value in a set

Str methods

data[‘column_name’].str.any_str_method()

Group by

data.groupby(‘column_name’)[‘column_name_operation’].mean()

group

→ single operation over data

data.groupby(‘column_name’)[‘column_name_operation’].agg([‘mean’, ‘max’, ‘min’]) → multiple operation over data group

Exploring

data[‘column_name’].value_counts([normalize=True])

value

→ returns the count of every single

data[‘column_name’].unique()

data[‘column_name’].nunique()

→ returns an array with unique values

→ returns the number of unique values

data[‘column_name’].describe()

→ Gives you some interesting information

pd.crosstab(data[‘column_name1’], data[‘column_name2’]) related to the two indicated columns

→ Cross the both information

(Series).plot(kind=’type_plot’)

Missing values

data.isnull()

→ returns True or False for every cell if the cell is NaN or not.

data.isnull().sum()

→ returns the number of NaN for every column

data.dropna(how=’any or all’[, subset=[‘column_name1’, ‘column_name2’]]) with NaN values.

→ Drop the rows

data[‘column_name’].fillna(value=’your value’) specified value

→ fills the NaN value of a column with the

Indexing

data.set_index(‘column_name’)

→ Set that column as the index

data.reset_index()

→ Reset the index of the dataframe

Add column(s)

pd.concat([dataframe, dataframe/column], axis = 1) dataframe as column

→ Add the Series or Dataframe to the current