简书大神SeanCheney的译作,我作了些格式调整和文章目录结构的变化,更适合自己阅读,以后翻阅是更加方便自己查找吧
import pandas as pdimport numpy as np
设定最大列数和最大行数
pd.set_option('max_columns',5 , 'max_rows', 5)
1 选取Series数据
college = pd.read_csv('data/college.csv', index_col='INSTNM')
city = college['CITY']
1.1 iloc整数选取
标量
city.iloc[3]
'Huntsville'
多行 - 整数列表
city.iloc[[10,20,30]]
INSTNMBirmingham Southern College BirminghamGeorge C Wallace State Community College-Hanceville HancevilleJudson College MarionName: CITY, dtype: object
多行 - 整数切片
city.iloc[4:50:10]
INSTNMAlabama State University MontgomeryEnterprise State Community College EnterpriseHeritage Christian University FlorenceMarion Military Institute MarionReid State Technical College EvergreenName: CITY, dtype: object
1.2 loc标签选取
标量
city.loc['Heritage Christian University']
'Florence'
多行 - 标签列表
随机选择4个标签
labels = list(np.random.choice(city.index, 4))labels
['University of St Thomas', 'Paul Mitchell the School-Woodbridge', 'San Francisco Conservatory of Music', 'Trinity Bible College']
city.loc[labels]
INSTNMUniversity of St Thomas Saint PaulPaul Mitchell the School-Woodbridge WoodbridgeSan Francisco Conservatory of Music San FranciscoTrinity Bible College EllendaleName: CITY, dtype: object
多行 - 标签切片
city.loc['Alabama State University':'Reid State Technical College':10]
INSTNMAlabama State University MontgomeryEnterprise State Community College EnterpriseHeritage Christian University FlorenceMarion Military Institute MarionReid State Technical College EvergreenName: CITY, dtype: object
也可以切片逆序选取
city.loc['Reid State Technical College':'Alabama State University':-10]
INSTNMReid State Technical College EvergreenMarion Military Institute MarionHeritage Christian University FlorenceEnterprise State Community College EnterpriseAlabama State University MontgomeryName: CITY, dtype: object
2 选取DataFrame数据
2.1 选取DataFrame的行
注意iloc是整数索引,loc是标签索引即可
2.2 同时选取DataFrame的行和列
2.2.1 选取前3行和前4列
college.iloc[:3, :4]
CITY | STABBR | HBCU | MENONLY | |
---|---|---|---|---|
INSTNM | ||||
Alabama A & M University | Normal | AL | 1.0 | 0.0 |
University of Alabama at Birmingham | Birmingham | AL | 0.0 | 0.0 |
Amridge University | Montgomery | AL | 0.0 | 0.0 |
college.loc[:'Amridge University', :'MENONLY']
CITY | STABBR | HBCU | MENONLY | |
---|---|---|---|---|
INSTNM | ||||
Alabama A & M University | Normal | AL | 1.0 | 0.0 |
University of Alabama at Birmingham | Birmingham | AL | 0.0 | 0.0 |
Amridge University | Montgomery | AL | 0.0 | 0.0 |
2.2.2 选取两列的所有的行
college.iloc[:, [4,6]].head()
WOMENONLY | SATVRMID | |
---|---|---|
INSTNM | ||
Alabama A & M University | 0.0 | 424.0 |
University of Alabama at Birmingham | 0.0 | 570.0 |
Amridge University | 0.0 | NaN |
University of Alabama in Huntsville | 0.0 | 595.0 |
Alabama State University | 0.0 | 425.0 |
college.loc[:, ['WOMENONLY', 'SATVRMID']]
WOMENONLY | SATVRMID | |
---|---|---|
INSTNM | ||
Alabama A & M University | 0.0 | 424.0 |
University of Alabama at Birmingham | 0.0 | 570.0 |
... | ... | ... |
Bay Area Medical Academy - San Jose Satellite Location | NaN | NaN |
Excel Learning Center-San Antonio South | NaN | NaN |
7535 rows × 2 columns
2.2.3 选取不连续的行和列
college.iloc[[100, 200], [7, 15]]
SATMTMID | UGDS_NHPI | |
---|---|---|
INSTNM | ||
GateWay Community College | NaN | 0.0029 |
American Baptist Seminary of the West | NaN | NaN |
rows = ['GateWay Community College', 'American Baptist Seminary of the West']columns = ['SATMTMID', 'UGDS_NHPI']college.loc[rows, columns]
SATMTMID | UGDS_NHPI | |
---|---|---|
INSTNM | ||
GateWay Community College | NaN | 0.0029 |
American Baptist Seminary of the West | NaN | NaN |
2.2.4 选取一个标量值
college.iloc[5, -4]
0.401
college.loc['The University of Alabama', 'PCTFLOAN']
0.401
2.2.5 行切片选取一列
college.iloc[90:80:-2, 5]
INSTNMEmpire Beauty School-Flagstaff 0Charles of Italy Beauty College 0Central Arizona College 0University of Arizona 0Arizona State University-Tempe 0Name: RELAFFIL, dtype: int64
start = 'Empire Beauty School-Flagstaff'stop = 'Arizona State University-Tempe'college.loc[start:stop:-2, 'RELAFFIL']
INSTNMEmpire Beauty School-Flagstaff 0Charles of Italy Beauty College 0Central Arizona College 0University of Arizona 0Arizona State University-Tempe 0Name: RELAFFIL, dtype: int64
3 用整数和标签的互换
用索引方法get_loc,找到指定列的整数位置
col_start = college.columns.get_loc('UGDS_WHITE')col_end = college.columns.get_loc('UGDS_UNKN') + 1col_start, col_end
(10, 19)
college.iloc[:5, col_start:col_end]
UGDS_WHITE | UGDS_BLACK | ... | UGDS_NRA | UGDS_UNKN | |
---|---|---|---|---|---|
INSTNM | |||||
Alabama A & M University | 0.0333 | 0.9353 | ... | 0.0059 | 0.0138 |
University of Alabama at Birmingham | 0.5922 | 0.2600 | ... | 0.0179 | 0.0100 |
Amridge University | 0.2990 | 0.4192 | ... | 0.0000 | 0.2715 |
University of Alabama in Huntsville | 0.6988 | 0.1255 | ... | 0.0332 | 0.0350 |
Alabama State University | 0.0158 | 0.9208 | ... | 0.0243 | 0.0137 |
5 rows × 9 columns
获得整数行对应的标签名
row_end = college.index[4]col_start,col_end = college.columns[10],college.columns[19]col_start,col_end
('UGDS_WHITE', 'PPTUG_EF')
college.loc[:row_end, col_start:col_end]
UGDS_WHITE | UGDS_BLACK | ... | UGDS_UNKN | PPTUG_EF | |
---|---|---|---|---|---|
INSTNM | |||||
Alabama A & M University | 0.0333 | 0.9353 | ... | 0.0138 | 0.0656 |
University of Alabama at Birmingham | 0.5922 | 0.2600 | ... | 0.0100 | 0.2607 |
Amridge University | 0.2990 | 0.4192 | ... | 0.2715 | 0.4536 |
University of Alabama in Huntsville | 0.6988 | 0.1255 | ... | 0.0350 | 0.2146 |
Alabama State University | 0.0158 | 0.9208 | ... | 0.0137 | 0.0892 |
5 rows × 10 columns
4 切片操作
4.1 惰性行切片
惰性,我的理解就就是省去loc和iloc吧,但是惰性切片不能用于列,只能用于DataFrame的行和Series,也不能同时选取行和列。
Series选取数据
city = college['CITY']
city[10:20:2]
INSTNMBirmingham Southern College BirminghamConcordia College Alabama SelmaEnterprise State Community College EnterpriseFaulkner University MontgomeryNew Beginning College of Cosmetology AlbertvilleName: CITY, dtype: object
dataframe行选取
college[10:20:2]
CITY | STABBR | ... | MD_EARN_WNE_P10 | GRAD_DEBT_MDN_SUPP | |
---|---|---|---|---|---|
INSTNM | |||||
Birmingham Southern College | Birmingham | AL | ... | 44200 | 27000 |
Concordia College Alabama | Selma | AL | ... | 19900 | PrivacySuppressed |
Enterprise State Community College | Enterprise | AL | ... | 24600 | 8273 |
Faulkner University | Montgomery | AL | ... | 37200 | 22000 |
New Beginning College of Cosmetology | Albertville | AL | ... | NaN | 5500 |
5 rows × 26 columns
Series和DataFrame都可以用标签进行切片。
start = 'Mesa Community College'stop = 'Spokane Community College'college[start:stop:1500]
CITY | STABBR | ... | MD_EARN_WNE_P10 | GRAD_DEBT_MDN_SUPP | |
---|---|---|---|---|---|
INSTNM | |||||
Mesa Community College | Mesa | AZ | ... | 35200 | 8000 |
Hair Academy Inc-New Carrollton | New Carrollton | MD | ... | 15200 | 9666 |
National College of Natural Medicine | Portland | OR | ... | NaN | PrivacySuppressed |
3 rows × 26 columns
4.2 按照字母切片
需先对索引进行排序
college = college.sort_index()college.head()
CITY | STABBR | ... | MD_EARN_WNE_P10 | GRAD_DEBT_MDN_SUPP | |
---|---|---|---|---|---|
INSTNM | |||||
A & W Healthcare Educators | New Orleans | LA | ... | NaN | 19022.5 |
A T Still University of Health Sciences | Kirksville | MO | ... | 219800 | PrivacySuppressed |
ABC Beauty Academy | Garland | TX | ... | NaN | PrivacySuppressed |
ABC Beauty College Inc | Arkadelphia | AR | ... | PrivacySuppressed | 16500 |
AI Miami International University of Art and Design | Miami | FL | ... | 29900 | 31000 |
5 rows × 26 columns
选取字母顺序在‘Sp’和‘Su’之间的学校
college.loc['Sp':'Su']
CITY | STABBR | ... | MD_EARN_WNE_P10 | GRAD_DEBT_MDN_SUPP | |
---|---|---|---|---|---|
INSTNM | |||||
Spa Tech Institute-Ipswich | Ipswich | MA | ... | 21500 | 6333 |
Spa Tech Institute-Plymouth | Plymouth | MA | ... | 21500 | 6333 |
... | ... | ... | ... | ... | ... |
Styles and Profiles Beauty College | Selmer | TN | ... | PrivacySuppressed | PrivacySuppressed |
Styletrends Barber and Hairstyling Academy | Rock Hill | SC | ... | PrivacySuppressed | 9495.5 |
201 rows × 26 columns