博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Pandas CookBook -- 04选取数据子集
阅读量:4308 次
发布时间:2019-06-06

本文共 7555 字,大约阅读时间需要 25 分钟。

简书大神SeanCheney的译作,我作了些格式调整和文章目录结构的变化,更适合自己阅读,以后翻阅是更加方便自己查找吧

import pandas as pdimport numpy as np

设定最大列数和最大行数

pd.set_option('max_columns',5 , 'max_rows', 5)

1 选取Series数据

college = pd.read_csv('data/college.csv', index_col='INSTNM')
city = college['CITY']

1.1 iloc整数选取

标量

city.iloc[3]
'Huntsville'

多行 - 整数列表

city.iloc[[10,20,30]]
INSTNMBirmingham Southern College                            BirminghamGeorge C Wallace State Community College-Hanceville    HancevilleJudson College                                             MarionName: CITY, dtype: object

多行 - 整数切片

city.iloc[4:50:10]
INSTNMAlabama State University              MontgomeryEnterprise State Community College    EnterpriseHeritage Christian University           FlorenceMarion Military Institute                 MarionReid State Technical College           EvergreenName: CITY, dtype: object

1.2 loc标签选取

标量

city.loc['Heritage Christian University']
'Florence'

多行 - 标签列表

随机选择4个标签

labels = list(np.random.choice(city.index, 4))labels
['University of St Thomas', 'Paul Mitchell the School-Woodbridge', 'San Francisco Conservatory of Music', 'Trinity Bible College']
city.loc[labels]
INSTNMUniversity of St Thomas                   Saint PaulPaul Mitchell the School-Woodbridge       WoodbridgeSan Francisco Conservatory of Music    San FranciscoTrinity Bible College                      EllendaleName: CITY, dtype: object

多行 - 标签切片

city.loc['Alabama State University':'Reid State Technical College':10]
INSTNMAlabama State University              MontgomeryEnterprise State Community College    EnterpriseHeritage Christian University           FlorenceMarion Military Institute                 MarionReid State Technical College           EvergreenName: CITY, dtype: object

也可以切片逆序选取

city.loc['Reid State Technical College':'Alabama State University':-10]
INSTNMReid State Technical College           EvergreenMarion Military Institute                 MarionHeritage Christian University           FlorenceEnterprise State Community College    EnterpriseAlabama State University              MontgomeryName: CITY, dtype: object

2 选取DataFrame数据

2.1 选取DataFrame的行

注意iloc是整数索引,loc是标签索引即可

2.2 同时选取DataFrame的行和列

2.2.1 选取前3行和前4列

college.iloc[:3, :4]
CITY STABBR HBCU MENONLY
INSTNM
Alabama A & M University Normal AL 1.0 0.0
University of Alabama at Birmingham Birmingham AL 0.0 0.0
Amridge University Montgomery AL 0.0 0.0
college.loc[:'Amridge University', :'MENONLY']
CITY STABBR HBCU MENONLY
INSTNM
Alabama A & M University Normal AL 1.0 0.0
University of Alabama at Birmingham Birmingham AL 0.0 0.0
Amridge University Montgomery AL 0.0 0.0

2.2.2 选取两列的所有的行

college.iloc[:, [4,6]].head()
WOMENONLY SATVRMID
INSTNM
Alabama A & M University 0.0 424.0
University of Alabama at Birmingham 0.0 570.0
Amridge University 0.0 NaN
University of Alabama in Huntsville 0.0 595.0
Alabama State University 0.0 425.0
college.loc[:, ['WOMENONLY', 'SATVRMID']]
WOMENONLY SATVRMID
INSTNM
Alabama A & M University 0.0 424.0
University of Alabama at Birmingham 0.0 570.0
... ... ...
Bay Area Medical Academy - San Jose Satellite Location NaN NaN
Excel Learning Center-San Antonio South NaN NaN

7535 rows × 2 columns

2.2.3 选取不连续的行和列

college.iloc[[100, 200], [7, 15]]
SATMTMID UGDS_NHPI
INSTNM
GateWay Community College NaN 0.0029
American Baptist Seminary of the West NaN NaN
rows = ['GateWay Community College', 'American Baptist Seminary of the West']columns = ['SATMTMID', 'UGDS_NHPI']college.loc[rows, columns]
SATMTMID UGDS_NHPI
INSTNM
GateWay Community College NaN 0.0029
American Baptist Seminary of the West NaN NaN

2.2.4 选取一个标量值

college.iloc[5, -4]
0.401
college.loc['The University of Alabama', 'PCTFLOAN']
0.401

2.2.5 行切片选取一列

college.iloc[90:80:-2, 5]
INSTNMEmpire Beauty School-Flagstaff     0Charles of Italy Beauty College    0Central Arizona College            0University of Arizona              0Arizona State University-Tempe     0Name: RELAFFIL, dtype: int64
start = 'Empire Beauty School-Flagstaff'stop = 'Arizona State University-Tempe'college.loc[start:stop:-2, 'RELAFFIL']
INSTNMEmpire Beauty School-Flagstaff     0Charles of Italy Beauty College    0Central Arizona College            0University of Arizona              0Arizona State University-Tempe     0Name: RELAFFIL, dtype: int64

3 用整数和标签的互换

用索引方法get_loc,找到指定列的整数位置

col_start = college.columns.get_loc('UGDS_WHITE')col_end = college.columns.get_loc('UGDS_UNKN') + 1col_start, col_end
(10, 19)
college.iloc[:5, col_start:col_end]
UGDS_WHITE UGDS_BLACK ... UGDS_NRA UGDS_UNKN
INSTNM
Alabama A & M University 0.0333 0.9353 ... 0.0059 0.0138
University of Alabama at Birmingham 0.5922 0.2600 ... 0.0179 0.0100
Amridge University 0.2990 0.4192 ... 0.0000 0.2715
University of Alabama in Huntsville 0.6988 0.1255 ... 0.0332 0.0350
Alabama State University 0.0158 0.9208 ... 0.0243 0.0137

5 rows × 9 columns

获得整数行对应的标签名

row_end = college.index[4]col_start,col_end = college.columns[10],college.columns[19]col_start,col_end
('UGDS_WHITE', 'PPTUG_EF')
college.loc[:row_end, col_start:col_end]
UGDS_WHITE UGDS_BLACK ... UGDS_UNKN PPTUG_EF
INSTNM
Alabama A & M University 0.0333 0.9353 ... 0.0138 0.0656
University of Alabama at Birmingham 0.5922 0.2600 ... 0.0100 0.2607
Amridge University 0.2990 0.4192 ... 0.2715 0.4536
University of Alabama in Huntsville 0.6988 0.1255 ... 0.0350 0.2146
Alabama State University 0.0158 0.9208 ... 0.0137 0.0892

5 rows × 10 columns

4 切片操作

4.1 惰性行切片

惰性,我的理解就就是省去loc和iloc吧,但是惰性切片不能用于列,只能用于DataFrame的行和Series,也不能同时选取行和列。

Series选取数据

city = college['CITY']
city[10:20:2]
INSTNMBirmingham Southern College              BirminghamConcordia College Alabama                     SelmaEnterprise State Community College       EnterpriseFaulkner University                      MontgomeryNew Beginning College of Cosmetology    AlbertvilleName: CITY, dtype: object

dataframe行选取

college[10:20:2]
CITY STABBR ... MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
Birmingham Southern College Birmingham AL ... 44200 27000
Concordia College Alabama Selma AL ... 19900 PrivacySuppressed
Enterprise State Community College Enterprise AL ... 24600 8273
Faulkner University Montgomery AL ... 37200 22000
New Beginning College of Cosmetology Albertville AL ... NaN 5500

5 rows × 26 columns

Series和DataFrame都可以用标签进行切片。

start = 'Mesa Community College'stop = 'Spokane Community College'college[start:stop:1500]
CITY STABBR ... MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
Mesa Community College Mesa AZ ... 35200 8000
Hair Academy Inc-New Carrollton New Carrollton MD ... 15200 9666
National College of Natural Medicine Portland OR ... NaN PrivacySuppressed

3 rows × 26 columns

4.2 按照字母切片

需先对索引进行排序

college = college.sort_index()college.head()
CITY STABBR ... MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
A & W Healthcare Educators New Orleans LA ... NaN 19022.5
A T Still University of Health Sciences Kirksville MO ... 219800 PrivacySuppressed
ABC Beauty Academy Garland TX ... NaN PrivacySuppressed
ABC Beauty College Inc Arkadelphia AR ... PrivacySuppressed 16500
AI Miami International University of Art and Design Miami FL ... 29900 31000

5 rows × 26 columns

选取字母顺序在‘Sp’和‘Su’之间的学校

college.loc['Sp':'Su']
CITY STABBR ... MD_EARN_WNE_P10 GRAD_DEBT_MDN_SUPP
INSTNM
Spa Tech Institute-Ipswich Ipswich MA ... 21500 6333
Spa Tech Institute-Plymouth Plymouth MA ... 21500 6333
... ... ... ... ... ...
Styles and Profiles Beauty College Selmer TN ... PrivacySuppressed PrivacySuppressed
Styletrends Barber and Hairstyling Academy Rock Hill SC ... PrivacySuppressed 9495.5

201 rows × 26 columns

转载于:https://www.cnblogs.com/shiyushiyu/p/9742232.html

你可能感兴趣的文章
Python基础实践-密码管理系统实例
查看>>
iOS工作笔记之NSClassFromString
查看>>
PING检查网络是否畅通
查看>>
李宁-2015年7月13日-个人文档
查看>>
C# 通过URL获取图片并显示在PictureBox上的方法
查看>>
矩阵学习摘记,欢迎指正
查看>>
2018.08.02 hdu1558 Segment set(并查集+计算几何)
查看>>
2019.03.29 NOIP训练 友好国度(点分治+容斥)
查看>>
实验1.1
查看>>
2015 多校第三场
查看>>
CSS基础
查看>>
浅蓝色设计类网站模板
查看>>
QT中的pro文件
查看>>
幂等和高并发在电商系统中的使用
查看>>
手机操控全站仪安卓版 测量员.app
查看>>
mysql 模块使用
查看>>
解决android中Layout文件下的xml文件配好后,R类中不能自动生成相应代码
查看>>
[iOS] photoKit获取所有照片
查看>>
下载安装webstrom及激活
查看>>
Android Studio 之 NDK篇
查看>>