pandas: powerful Python data analysis toolkit, Release 0.18.1
In [66]: df.mean(1)
Out[66]:
a
-0.489066
b
0.273355
c
0.008348
d
0.011457
dtype: float64
All such methods have a skipna option signaling whether to exclude missing data (True by default):
In [67]: df.sum(0, skipna=False)
Out[67]:
one
NaN
three
NaN
two
-0.765684
dtype: float64
In [68]: df.sum(axis=1, skipna=True)
Out[68]:
a
-0.978131
b
0.820066
c
0.025044
d
0.022914
dtype: float64
Combined with the broadcasting / arithmetic behavior, one can describe various statistical procedures, like standard-
ization(rendering data zero mean and standard deviation 1), very concisely:
In [69]: ts_stand (df df.mean()) df.std()
In [70]: ts_stand.std()
Out[70]:
one
1.0
three
1.0
two
1.0
dtype: float64
In [71]: xs_stand df.sub(df.mean(1), axis=0).div(df.std(1), axis=0)
In [72]: xs_stand.std(1)
Out[72]:
a
1.0
b
1.0
c
1.0
d
1.0
dtype: float64
Note that methods likecumsum() andcumprod() preserve the location of NA values:
In [73]: df.cumsum()
Out[73]:
one
three
two
a -0.626544
NaN -0.351587
b -0.765438 -0.177289
0.784662
c -0.753821
0.284925
0.335874
d
NaN
1.409398 -0.765684
Here is a quick reference summary table of common functions. Each also takes an optional level parameter which
applies only ifthe object has ahierarchicalindex.
10.5. Descriptive statistics
401
Pdf signature field - C# PDF File Permission Library: add, remove, update PDF file permission in C#.net, ASP.NET, MVC, WPF
Tell C# users how to set PDF file permissions, like printing, copying, modifying, extracting, annotating, form filling, etc
pdf to word converter sign in; pdf signature
Pdf signature field - VB.NET PDF File Permission Library: add, remove, update PDF file permission in vb.net, ASP.NET, MVC, WPF
VB.NET Tutorial for How to Set PDF File Access Permissions Using XDoc.PDF for .NET
add signature to pdf preview; create pdf stamp signature
pandas: powerful Python data analysis toolkit, Release 0.18.1
Function
Description
count
Number of non-null observations
sum
Sum ofvalues
mean
Mean of values
mad
Mean absolute deviation
median
Arithmetic median of values
min
Minimum
max
Maximum
mode
Mode
abs
Absolute Value
prod
Product ofvalues
std
Bessel-corrected sample standard deviation
var
Unbiased variance
sem
Standard error ofthe mean
skew
Sample skewness (3rd moment)
kurt
Sample kurtosis (4th moment)
quantile
Sample quantile (value at %)
cumsum
Cumulative sum
cumprod
Cumulative product
cummax
Cumulative maximum
cummin
Cumulative minimum
Note that by chance some NumPymethods, like mean, std,and sum,will exclude NAs on Series input by default:
In [74]: np.mean(df['one'])
Out[74]: -0.25127365175839511
In [75]: np.mean(df['one'].values)
Out[75]: nan
Series also has a methodnunique() which will return the number of unique non-null values:
In [76]: series pd.Series(np.random.randn(500))
In [77]: series[20:500np.nan
In [78]: series[10:20]
5
In [79]: series.nunique()
Out[79]: 11
10.5.1 Summarizing data: describe
There is a convenientdescribe() function which computes a variety of summary statistics about a Series or the
columns of a DataFrame (excluding NAs of course):
In [80]: series pd.Series(np.random.randn(1000))
In [81]: series[::2np.nan
In [82]: series.describe()
Out[82]:
count
500.000000
mean
-0.039663
std
1.069371
min
-3.463789
402
Chapter 10. Essential Basic Functionality
C# PDF Digital Signature Library: add, remove, update PDF digital
things. Add a signature or an empty signature field in any PDF file page. Search unsigned signature field in PDF document. Prepare
pdf will signature; add signature pdf
VB.NET PDF Digital Signature Library: add, remove, update PDF
things. Add a signature or an empty signature field in any PDF file page. Search unsigned signature field in PDF document. Prepare
pdf sign; add signature pdf preview
pandas: powerful Python data analysis toolkit, Release 0.18.1
25%
NaN
50%
NaN
75%
NaN
max
3.120271
dtype: float64
In [83]: frame pd.DataFrame(np.random.randn(10005), columns=['a''b''c''d''e'])
In [84]: frame.ix[::2np.nan
In [85]: frame.describe()
Out[85]:
a
b
c
d
e
count
500.000000
500.000000
500.000000
500.000000
500.000000
mean
0.000954
-0.044014
0.075936
-0.003679
0.020751
std
1.005133
0.974882
0.967432
1.004732
0.963812
min
-3.010899
-2.782760
-3.401252
-2.944925
-3.794127
25%
NaN
NaN
NaN
NaN
NaN
50%
NaN
NaN
NaN
NaN
NaN
75%
NaN
NaN
NaN
NaN
NaN
max
3.007143
2.627688
2.702490
2.850852
3.072117
You canselect specific percentiles toinclude in the output:
In [86]: series.describe(percentiles=[.05.25.75.95])
Out[86]:
count
500.000000
mean
-0.039663
std
1.069371
min
-3.463789
5%
NaN
25%
NaN
50%
NaN
75%
NaN
95%
NaN
max
3.120271
dtype: float64
By default,the median is always included.
For a non-numerical Series object,describe() will give a simple summary of the number of unique values and
most frequently occurring values:
In [87]: pd.Series(['a''a''b''b''a''a', np.nan, 'c''d''a'])
In [88]: s.describe()
Out[88]:
count
9
unique
4
top
a
freq
5
dtype: object
Note that on a mixed-type DataFrame object,describe() will restrict the summary to include only numerical
columns or, if none are, only categorical columns:
In [89]: frame pd.DataFrame({'a': ['Yes''Yes''No''No'], 'b'range(4)})
In [90]: frame.describe()
10.5. Descriptive statistics
403
C# PDF insert image Library: insert images into PDF in C#.net, ASP
field. Access to freeware download and online C#.NET class source code. How to insert and add image, picture, digital photo, scanned signature or logo into PDF
add signature field to pdf; add signature to pdf file
How to C#: Basic SDK Concept of XDoc.PDF for .NET
To be specific, you can edit PDF password and digital signature, and set PDF file permission. Class: PasswordSetting. Class: PDFDigitalSignatureHandler.
copy and paste signature into pdf; click to sign pdf
pandas: powerful Python data analysis toolkit, Release 0.18.1
Out[90]:
b
count
4.000000
mean
1.500000
std
1.290994
min
0.000000
25%
0.750000
50%
1.500000
75%
2.250000
max
3.000000
This behaviour can be controlled by providing a list of types as include/exclude arguments. The special value
all can also be used:
In [91]: frame.describe(include=['object'])
Out[91]:
a
count
4
unique
2
top
No
freq
2
In [92]: frame.describe(include=['number'])
Out[92]:
b
count
4.000000
mean
1.500000
std
1.290994
min
0.000000
25%
0.750000
50%
1.500000
75%
2.250000
max
3.000000
In [93]: frame.describe(include='all')
Out[93]:
a
b
count
4
4.000000
unique
2
NaN
top
No
NaN
freq
2
NaN
mean
NaN
1.500000
std
NaN
1.290994
min
NaN
0.000000
25%
NaN
0.750000
50%
NaN
1.500000
75%
NaN
2.250000
max
NaN
3.000000
That feature relies onselect_dtypes. Refer to there for details about accepted inputs.
10.5.2 Index of Min/Max Values
Theidxmin() andidxmax() functions on Series and DataFrame compute the index labels with the minimum and
maximumcorresponding values:
404
Chapter 10. Essential Basic Functionality
VB.NET PDF: Basic SDK Concept of XDoc.PDF
To be specific, you can edit PDF password and digital signature, and set PDF file permission. Class: PasswordSetting. Class: PDFDigitalSignatureHandler.
adding a signature to a pdf document; adding a signature to a pdf file
VB.NET PDF insert image library: insert images into PDF in vb.net
project. Import graphic picture, digital photo, signature and logo into PDF document. file. Insert images into PDF form field in VB.NET. An
add signature block to pdf; add a signature to a pdf file
pandas: powerful Python data analysis toolkit, Release 0.18.1
In [94]: s1 pd.Series(np.random.randn(5))
In [95]: s1
Out[95]:
0
-0.872725
1
1.522411
2
0.080594
3
-1.676067
4
0.435804
dtype: float64
In [96]: s1.idxmin(), s1.idxmax()
Out[96]: (31)
In [97]: df1 pd.DataFrame(np.random.randn(5,3), columns=['A','B','C'])
In [98]: df1
Out[98]:
A
B
C
0
0.445734 -1.649461
0.169660
1
1.246181
0.131682 -2.001988
2 -1.273023
0.870502
0.214583
3
0.088452 -0.173364
1.207466
4
0.546121
0.409515 -0.310515
In [99]: df1.idxmin(axis=0)
Out[99]:
A
2
B
0
C
1
dtype: int64
In [100]: df1.idxmax(axis=1)
Out[100]:
0
A
1
A
2
B
3
C
4
A
dtype: object
When there are multiple rows (or columns) matching the minimum or maximum value,idxmin() andidxmax()
return the first matching index:
In [101]: df3 pd.DataFrame([2113, np.nan], columns=['A'], index=list('edcba'))
In [102]: df3
Out[102]:
A
e
2.0
d
1.0
c
1.0
b
3.0
a
NaN
In [103]: df3['A'].idxmin()
Out[103]: 'd'
10.5. Descriptive statistics
405
VB.NET PDF Library SDK to view, edit, convert, process PDF file
NET program. Password, digital signature and PDF text, image and page redaction will be used and customized. PDF Annotation Edit.
adding signature to pdf in preview; export pdf to word sign in
C# Create PDF Library SDK to convert PDF from other file formats
PDF file. What's more, you can also protect created PDF file by adding digital signature (watermark) on PDF using C# code. Create
pdf sign in; pdf to word converter sign in
pandas: powerful Python data analysis toolkit, Release 0.18.1
Note: idxmin and idxmax are called argmin and argmax in NumPy.
10.5.3 Value counts (histogramming) / Mode
Thevalue_counts() Series method and top-level function computes a histogram of a 1D array of values. It can
also be used as a function on regular arrays:
In [104]: data np.random.randint(07, size=50)
In [105]: data
Out[105]:
array([5, 3, 2, 2, 1, 4, 0, 4, 0, 2, 0, 6, 4, 1, 6, 3, 3, 0, 2, 1, 0, 5, 5,
3, 6, 1, 5, 6, 2, 0, 0, 6, 3, 3, 5, 0, 4, 3, 3, 3, 0, 6, 1, 3, 5, 5,
0, 4, 0, 6])
In [106]: pd.Series(data)
In [107]: s.value_counts()
Out[107]:
0
11
3
10
6
7
5
7
4
5
2
5
1
5
dtype: int64
In [108]: pd.value_counts(data)
Out[108]:
0
11
3
10
6
7
5
7
4
5
2
5
1
5
dtype: int64
Similarly,youcan get the most frequently occurring value(s)(the mode) of the values in a Series or DataFrame:
In [109]: s5 pd.Series([1133355777])
In [110]: s5.mode()
Out[110]:
0
3
1
7
dtype: int64
In [111]: df5 pd.DataFrame({"A": np.random.randint(07, size=50),
.....:
"B": np.random.randint(-1015, size=50)})
.....:
In [112]: df5.mode()
Out[112]:
A
B
0
1 -5
406
Chapter 10. Essential Basic Functionality
C# PDF remove image library: remove, delete images from PDF in C#.
in Field Data. Field: Insert, Delete, Update Field. graphic picture, digital photo, scanned signature, logo, etc. remove multiple or all images from PDF document.
add signature to pdf acrobat; add jpeg signature to pdf
C# PDF Convert to Images SDK: Convert PDF to png, gif images in C#
in C#.NET class. Create image files including all PDF contents, like watermark and signature in .NET. Turn multipage PDF file into
pdf add signature field; add jpg signature to pdf
pandas: powerful Python data analysis toolkit, Release 0.18.1
10.5.4 Discretization and quantiling
Continuous values can be discretized using thecut() (bins based on values) andqcut() (bins based on sample
quantiles) functions:
In [113]: arr np.random.randn(20)
In [114]: factor pd.cut(arr, 4)
In [115]: factor
Out[115]:
[(-0.645, 0.336], (-2.61, -1.626], (-1.626, -0.645], (-1.626, -0.645], (-1.626, -0.645], ..., (0.336, 1.316], (0.336, 1.316], (0.336, 1.316], (0.336, 1.316], (-2.61, -1.626]]
Length: 20
Categories (4, object): [(-2.61, -1.626] < (-1.626, -0.645] < (-0.645, 0.336] < (0.336, 1.316]]
In [116]: factor pd.cut(arr, [-5-1015])
In [117]: factor
Out[117]:
[(-1, 0], (-5, -1], (-1, 0], (-5, -1], (-1, 0], ..., (0, 1], (1, 5], (0, 1], (0, 1], (-5, -1]]
Length: 20
Categories (4, object): [(-5, -1] < (-1, 0] < (0, 1] < (1, 5]]
qcut()computessamplequantiles.Forexample,wecouldsliceupsomenormallydistributeddataintoequal-size
quartiles like so:
In [118]: arr np.random.randn(30)
In [119]: factor pd.qcut(arr, [0.25.5.751])
In [120]: factor
Out[120]:
[(-0.139, 1.00736], (1.00736, 1.976], (1.00736, 1.976], [-1.0705, -0.439], [-1.0705, -0.439], ..., (1.00736, 1.976], [-1.0705, -0.439], (-0.439, -0.139], (-0.439, -0.139], (-0.439, -0.139]]
Length: 30
Categories (4, object): [[-1.0705, -0.439] < (-0.439, -0.139] < (-0.139, 1.00736] < (1.00736, 1.976]]
In [121]: pd.value_counts(factor)
Out[121]:
(1.00736, 1.976]
8
[-1.0705, -0.439]
8
(-0.139, 1.00736]
7
(-0.439, -0.139]
7
dtype: int64
We can also pass infinite values to define the bins:
In [122]: arr np.random.randn(20)
In [123]: factor pd.cut(arr, [-np.inf, 0, np.inf])
In [124]: factor
Out[124]:
[(-inf, 0], (0, inf], (0, inf], (0, inf], (-inf, 0], ..., (-inf, 0], (0, inf], (-inf, 0], (-inf, 0], (0, inf]]
Length: 20
Categories (2, object): [(-inf, 0] < (0, inf]]
10.5. Descriptive statistics
407
pandas: powerful Python data analysis toolkit, Release 0.18.1
10.6 Function application
To apply your own or another library’s functions to pandas objects, you should be aware of the three methods below.
The appropriate method to use depends on whether your function expects to operate on an entire DataFrame or
Series,row- or column-wise,or elementwise.
1. TablewiseFunctionApplication:pipe()
2. RoworColumn-wiseFunctionApplication:apply()
3. Elementwisefunction application:applymap()
10.6.1 Tablewise Function Application
New in version 0.16.2.
DataFrames and Series can of course just be passed into functions. However, if the function needs to be called
in a chain,consider using thepipe() method. Compare the following
# f, g, and h are functions taking and returning ``DataFrames``
>>> f(g(h(df), arg1=1), arg2=2, arg3=3)
with the equivalent
>>> (df.pipe(h)
.pipe(g, arg1=1)
.pipe(f, arg2=2, arg3=3)
)
Pandas encourages the second style, which is known as method chaining. pipe makes it easy to use your own or
another library’s functions in method chains, alongside pandas’ methods.
In the example above,the functions f, g,andh each expected the DataFrame as the firstpositional argument. What
if the function you wish to apply takes its data as, say, the second argument? In this case, provide pipe with a tuple
of(callable, data_keyword). .pipe will route the DataFrame to the argument specifiedin the tuple.
For example, we can fit a regression using statsmodels. Their API expects a formula first and a DataFrame as the
second argument, data. We pass in the function, keyword pair(sm.poisson, ’data’) to pipe:
In [125]: import statsmodels.formula.api as sm
In [126]: bb pd.read_csv('data/baseball.csv', index_col='id')
In [127]: (bb.query('h > 0')
.....:
.assign(ln_h = lambda df: np.log(df.h))
.....:
.pipe((sm.poisson, 'data'), 'hr ~ ~ ln_h + year + + g g + C(lg)')
.....:
.fit()
.....:
.summary()
.....: )
.....:
Optimization terminated successfully.
Current function value: 2.116284
Iterations 24
Out[127]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Poisson Regression Results
==============================================================================
Dep. Variable:
hr
No. Observations:
68
408
Chapter 10. Essential Basic Functionality
pandas: powerful Python data analysis toolkit, Release 0.18.1
Model:
Poisson
Df Residuals:
63
Method:
MLE
Df Model:
4
Date:
Tue, 03 May 2016
Pseudo R-squ.:
0.6878
Time:
09:03:28
Log-Likelihood:
-143.91
converged:
True
LL-Null:
-460.91
LLR p-value:
6.774e-136
===============================================================================
coef
std err
z
P>|z|
[95.0% Conf. Int.]
-------------------------------------------------------------------------------
Intercept
-1267.3636
457.867
-2.768
0.006
-2164.767
-369.960
C(lg)[T.NL]
-0.2057
0.101
-2.044
0.041
-0.403
-0.008
ln_h
0.9280
0.191
4.866
0.000
0.554
1.302
year
0.6301
0.228
2.762
0.006
0.183
1.077
g
0.0099
0.004
2.754
0.006
0.003
0.017
===============================================================================
"""
The pipe method is inspired by unix pipes and more recentlydplyrandmagrittr, which have introduced the popular
(%>%) (read pipe)operator forR. The implementation of pipe here is quite cleanand feels right at home in python.
We encourage you toview the source code (pd.DataFrame.pipe?? in IPython).
10.6.2 Row or Column-wise Function Application
Arbitrary functions can be applied along the axes of a DataFrame or Panel using theapply() method, which, like
the descriptive statistics methods,take an optional axis argument:
In [128]: df.apply(np.mean)
Out[128]:
one
-0.251274
three
0.469799
two
-0.191421
dtype: float64
In [129]: df.apply(np.mean, axis=1)
Out[129]:
a
-0.489066
b
0.273355
c
0.008348
d
0.011457
dtype: float64
In [130]: df.apply(lambda x: x.max() x.min())
Out[130]:
one
0.638161
three
1.301762
two
2.237808
dtype: float64
In [131]: df.apply(np.cumsum)
Out[131]:
one
three
two
a -0.626544
NaN -0.351587
b -0.765438 -0.177289
0.784662
c -0.753821
0.284925
0.335874
d
NaN
1.409398 -0.765684
In [132]: df.apply(np.exp)
10.6. Function application
409
pandas: powerful Python data analysis toolkit, Release 0.18.1
Out[132]:
one
three
two
a
0.534436
NaN
0.703570
b
0.870320
0.837537
3.115063
c
1.011685
1.587586
0.638401
d
NaN
3.078592
0.332353
Depending on the return type of the function passed toapply(), the result will either be of lower dimension or the
same dimension.
apply()combinedwithsomeclevernesscanbeusedtoanswermanyquestionsaboutadataset. Forexample,
suppose we wantedto extract the date where the maximum value for each column occurred:
In [133]: tsdf pd.DataFrame(np.random.randn(10003), columns=['A''B''C'],
.....:
index=pd.date_range('1/1/2000', periods=1000))
.....:
In [134]: tsdf.apply(lambda x: x.idxmax())
Out[134]:
A
2001-04-27
B
2002-06-02
C
2000-04-02
dtype: datetime64[ns]
You may also pass additional arguments and keyword arguments to theapply() method. For instance,consider the
following function youwould like to apply:
def subtract_and_divide(x, sub, divide=1):
return (x sub) divide
You may then apply this function as follows:
df.apply(subtract_and_divide, args=(5,), divide=3)
Another useful feature is the ability topass Series methods to carryout some Series operation on each column or row:
In [135]: tsdf
Out[135]:
A
B
C
2000-01-01
1.796883 -0.930690
3.542846
2000-01-02 -1.242888 -0.695279 -1.000884
2000-01-03 -0.720299
0.546303 -0.082042
2000-01-04
NaN
NaN
NaN
2000-01-05
NaN
NaN
NaN
2000-01-06
NaN
NaN
NaN
2000-01-07
NaN
NaN
NaN
2000-01-08 -0.527402
0.933507
0.129646
2000-01-09 -0.338903 -1.265452 -1.969004
2000-01-10
0.532566
0.341548
0.150493
In [136]: tsdf.apply(pd.Series.interpolate)
Out[136]:
A
B
C
2000-01-01
1.796883 -0.930690
3.542846
2000-01-02 -1.242888 -0.695279 -1.000884
2000-01-03 -0.720299
0.546303 -0.082042
2000-01-04 -0.681720
0.623743 -0.039704
2000-01-05 -0.643140
0.701184
0.002633
2000-01-06 -0.604561
0.778625
0.044971
2000-01-07 -0.565982
0.856066
0.087309
410
Chapter 10. Essential Basic Functionality
Documents you may be interested
Documents you may be interested