541
12.8Prediction
8
Recallthethirdexam/finalexamexample.
Weexaminedthescatterplotandshowedthatthecorrelationcoefficientissignificant.Wefoundtheequa-
tionofthebestfitlineforthefinalexamgradeasafunctionofthegradeonthethirdexam. Wecannow
usetheleastsquaresregressionlineforprediction.
Supposeyouwanttoestimate,orpredict,thefinalexamscoreofstatisticsstudentswhoreceived73onthe
thirdexam. Theexamscores(x-values)rangefrom65to75. Since73isbetweenthex-values65and75,
substitutex=73intotheequation.Then:
^
y
173.51+4.83(73)=179.08
(12.8)
Wepredictthatstatisticstudentswhoearnagradeof73onthethirdexamwillearnagradeof179.08on
thefinalexam,onaverage.
Example12.11
Recallthethirdexam/finalexamexample.
Problem1
Whatwouldyoupredictthefinalexamscoretobeforastudentwhoscoreda66onthethird
exam?
Solution
145.27
Problem2
(Solutiononp.579.)
Whatwouldyoupredictthefinalexamscoretobeforastudentwhoscoreda90onthethird
exam?
**WithcontributionsfromRobertaBloom
12.9Outliers
9
Insomedatasets, there arevalues(observeddatapoints)called outliersOutliersareobserveddata
pointsthatarefarfromtheleastsquaresline.Theyhavelarge"errors",wherethe"error"orresidualisthe
verticaldistancefromthelinetothepoint.
Outliersneedtobeexaminedclosely.Sometimes,forsomereasonoranother,theyshouldnotbeincluded
intheanalysisofthedata.Itispossiblethatanoutlierisaresultoferroneousdata.Othertimes,anoutlier
mayholdvaluableinformationaboutthepopulationunderstudyandshouldremainincludedinthedata.
Thekeyistocarefullyexaminewhatcausesadatapointtobeanoutlier.
Besidesoutliers,asamplemaycontainoneorafewpointsthatarecalled influentialpoints. Influential
pointsareobserveddatapointsthatarefarfromtheotherobserveddatapointsinthehorizontaldirection.
Thesepointsmayhaveabigeffectontheslopeoftheregressionline. Tobegintoidentifyaninfluential
point,youcanremoveitfromthedatasetandseeiftheslopeoftheregressionlineischangedsignificantly.
8
Thiscontentisavailableonlineat<http://cnx.org/content/m17095/1.8/>.
9Thiscontentisavailableonlineat<http://cnx.org/content/m17094/1.14/>.
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
Pdf to powerpoint converter online - C# Create PDF from PowerPoint Library to convert pptx, ppt to PDF in C#.net, ASP.NET MVC, WinForms, WPF
Online C# Tutorial for Creating PDF from Microsoft PowerPoint Presentation
converter pdf to powerpoint; convert pdf to ppt online without email
Pdf to powerpoint converter online - VB.NET Create PDF from PowerPoint Library to convert pptx, ppt to PDF in vb.net, ASP.NET MVC, WinForms, WPF
VB.NET Tutorial for Export PDF file from Microsoft Office PowerPoint
pdf to powerpoint slide; convert pdf to ppt
542
CHAPTER12. LINEARREGRESSIONANDCORRELATION
Computersand manycalculatorscanbe usedtoidentifyoutliersfromthe data. . Computeroutputfor
regressionanalysiswilloftenidentifybothoutliersandinfluentialpointssothatyoucanexaminethem.
IdentifyingOutliers
Wecouldguessatoutliersbylookingatagraphofthescatterplotandbestfitline.Howeverwewouldlike
someguidelineastohowfarawayapointneedstobeinordertobeconsideredanoutlier. Asaroughrule
ofthumb,wecanflaganypointthatislocatedfurtherthantwostandarddeviationsaboveorbelowthe
bestfitlineasanoutlier.Thestandarddeviationusedisthestandarddeviationoftheresidualsorerrors.
Wecandothisvisuallyinthescatterplotbydrawinganextrapairoflinesthataretwostandarddeviations
aboveandbelowthebestfitline. Anydatapointsthatareoutsidethisextrapairoflinesareflaggedas
potentialoutliers.Orwecandothisnumericallybycalculatingeachresidualandcomparingittotwicethe
standarddeviation. OntheTI-83,83+,or84+,thegraphicalapproachiseasier. Thegraphicalprocedure
isshownfirst,followedbythenumericalcalculations. Youwouldgenerallyonlyneedtouseoneofthese
methods.
Example12.12
Inthethirdexam/finalexamexample,youcandetermineifthereisanoutlierornot. Ifthereis
anoutlier,asanexercise,deleteitandfittheremainingdatatoanewline. Forthisexample,the
newlineoughttofittheremainingdatabetter. ThismeanstheSSEshouldbesmallerandthe
correlationcoefficientoughttobecloserto1or-1.
Solution
GraphicalIdentificationofOutliers
WiththeTI-83,83+,84+graphingcalculators,itiseasytoidentifytheoutliergraphicallyandvisu-
ally. Ifweweretomeasuretheverticaldistancefromanydatapointtothecorrespondingpoint
onthelineofbestfitandthatdistancewasequalto2sorfarther,thenwewouldconsiderthedata
pointtobe"toofar"fromthelineofbestfit. Weneedtofindandgraphthelinesthataretwo
standarddeviationsbelowandabovetheregressionline. Anypointsthatareoutsidethesetwo
linesareoutliers.WewillcalltheselinesY2andY3:
Aswedidwiththe equationoftheregressionlineandthecorrelationcoefficient,we willuse
technologyto calculatethisstandarddeviationforus. . UsingtheLinRegTTestwiththisdata,
scrolldownthroughtheoutputscreenstofinds=16.412
LineY2=-173.5+4.83x-2(16.4)andlineY3=-173.5+4.83x+2(16.4)
where
^
y
=-173.5+4.83x is the line of best t fit. . Y2 2 and d Y3 3 have the same e slope as s the line of
bestfit.
GraphthescatterplotwiththebestfitlineinequationY1,thenenterthetwoextralinesasY2and
Y3inthe"Y="equationeditorandpressZOOM9.Youwillfindthattheonlydatapointthatisnot
betweenlinesY2andY3isthepointx=65,y=175.Onthecalculatorscreenitisjustbarelyoutside
theselines.Theoutlieristhestudentwhohadagradeof65onthethirdexamand175onthefinal
exam;thispointisfurtherthan2standarddeviationsawayfromthebestfitline.
Sometimesapointissoclosetothelinesusedtoflagoutliersonthegraphthatitisdifficulttotell
ifthepointisbetweenoroutsidethelines. Onacomputer,enlargingthegraphmayhelp;ona
smallcalculatorscreen,zoominginmaymakethegraphclearer. Notethatwhenthegraphdoes
notgiveaclearenoughpicture,youcanusethenumericalcomparisonstoidentifyoutliers.
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
Online Convert PowerPoint to PDF file. Best free online export
Online Powerpoint to PDF Converter. Download Free Trial. Convert a PPTX/PPT File to PDF. Just upload your file by clicking on the blue
pdf to ppt converter online; convert pdf back to powerpoint
XDoc.Converter for .NET, Support Documents and Images Conversion
Convert PowerPoint to ODP. Convert to Tiff. Convert Word, Excel and PDF to image. Next Steps. Download and try Converter for .NET with online support. See Pricing
how to convert pdf to powerpoint on; convert pdf file to ppt online
543
Figure12.16
NumericalIdentificationofOutliers
Inthe table below, , the e first two columns s are e the third examandfinalexamdata. . The e third
columnshowsthepredicted
^
y
valuescalculatedfromthelineofbestfit:
^
y
=-173.5+4.83x. The
residuals,orerrors, havebeencalculatedinthefourthcolumnofthetable: : observedyvalue
predictedyvalue=y
^
y
.
sisthestandarddeviationofallthey
^
y=
evalueswheren=thetotalnumberofdatapoints.If
eachresidualiscalculatedandsquared,andtheresultsareadded,wegettheSSE.Thestandard
deviationoftheresidualsiscalculatedfromtheSSEas:
s=
q
SSE
2
Ratherthancalculatethevalueofsourselves,wecanfindsusingthecomputerorcalculator.For
thisexample,thecalculatorfunctionLinRegTTestfound16.4asthestandarddeviationofthe
residuals35;-17;16;-6;-19;9;3;-1;-10;-9;-1.
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
C#: How to Use SDK to Convert Document and Image Using XDoc.
You may use our converter SDK to easily convert PDF, Word, Excel, PowerPoint, Tiff, and Dicom files to raster images like Jpeg, Png, Bmp and Gif.
convert pdf slides to powerpoint online; how to convert pdf slides to powerpoint presentation
RasterEdge XDoc.PowerPoint for .NET - SDK for PowerPoint Document
Add image to specified position on PowerPoint page. Next Steps. Download Free Trial Download and try PDF for .NET with online support. See Pricing
how to convert pdf to powerpoint; pdf to powerpoint converter
544
CHAPTER12. LINEARREGRESSIONANDCORRELATION
x
y
^
y
y
^
y
65
175
140
175 140=35
67
133
150
133 15017
71
185
169
185 169=16
71
163
169
163 1696
66
126
145
126 14519
75
198
189
198 189=9
67
153
150
153 150=3
70
163
164
163 1641
71
159
169
159 16910
69
151
160
151 1609
69
159
160
159 1601
Table12.1
Wearelookingforalldatapointsforwhichtheresidualisgreaterthan2s=2(16.4)=32.8orlessthan
-32.8.Comparethesevaluestotheresidualsincolumn4ofthetable.Theonlysuchdatapointis
thestudentwhohadagradeof65onthethirdexamand175onthefinalexam;theresidualfor
thisstudentis35.
Howdoestheoutlieraffectthebestfitline?
Numericallyandgraphically,wehaveidentifiedthepoint(65,175)asanoutlier. Weshouldre-
examinethedataforthispointtoseeifthereareanyproblemswiththedata. Ifthereisanerror
weshouldfixtheerrorifpossible,ordeletethedata. Ifthedataiscorrect,wewouldleaveitin
thedataset. Forthisproblem,wewillsupposethatweexaminedthedataandfoundthatthis
outlierdatawasanerror. Thereforewewillcontinueonanddeletetheoutlier,sothatwecan
explorehowitaffectstheresults,asalearningexperience.
Computeanewbest-fitlineandcorrelationcoefficientusingthe10remainingpoints:
OntheTI-83,TI-83+,TI-84+calculators,deletetheoutlierfromL1andL2.UsingtheLinRegTTest,
thenewlineofbestfitandthecorrelationcoefficientare:
^
y=
355.19+7.39xandr=0.9121
Thenewlinewith0.9121isastrongercorrelationthantheoriginal(r=0.6631)because=
0.9121iscloserto1. Thismeansthatthenewlineisabetterfittothe10remainingdatavalues.
Thelinecanbetterpredictthefinalexamscoregiventhethirdexamscore.
NumericalIdentificationofOutliers:CalculatingsandFindingOutliersManually
IfyoudonothavethefunctionLinRegTTest, thenyoucancalculatethe outlierinthefirst exampleby
doingthefollowing.
First,squareeachjy
^
y
j(SeetheTABLEabove):
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
VB.NET PDF - Convert PDF Online with VB.NET HTML5 PDF Viewer
All Formats. XDoc.HTML5 Viewer. XDoc.Windows Viewer. XDoc.Converter. View & Process. Adobe PDF. XDoc.PDF. Scanning. XDoc.Word. XDoc.Excel. XDoc.PowerPoint. Barcoding
convert pdf to ppt; conversion of pdf into ppt
VB.NET PDF- View PDF Online with VB.NET HTML5 PDF Viewer
All Formats. XDoc.HTML5 Viewer. XDoc.Windows Viewer. XDoc.Converter. View & Process. Adobe PDF. XDoc.PDF. Scanning. XDoc.Word. XDoc.Excel. XDoc.PowerPoint. Barcoding
convert pdf file to powerpoint; how to convert pdf to powerpoint slides
545
Thesquaresare35
2
;17
2
;16
2
;6
2
;19
2
;9
2
;3
2
;1
2
;10
2
;9
2
;1
2
Then,add(sum)allthejy
^
y
jsquaredtermsusingtheformula
11
S
i=1
jy
i
^
y
i
j
!
2
=
11
S
i=1
e
i
2
(Recallthaty
i
^
y
i
=e
i
.)
=35
2
+17
2
+16
2
+6
2
+19
2
+9
2
+3
2
+1
2
+10
2
+9
2
+1
2
=2440=SSE.Theresult,SSEistheSumofSquaredErrors.
Next,calculates,thestandarddeviationofallthey
^
y
evalueswheren=thetotalnumberofdata
points.
Thecalculationiss=
q
SSE
2
Forthethirdexam/finalexamproblem,s=
q
2440
11 2
=16.47
Next,multiplysby1.9:
(1.9)(16.47)=31.29
31.29isalmost2standarddeviationsawayfromthemeanofthey
^
y
values.
Ifweweretomeasuretheverticaldistancefromanydatapointtothecorrespondingpointonthelineof
bestfitandthatdistanceisatleast1.9s,thenwewouldconsiderthedatapointtobe"toofar"fromtheline
ofbestfit.Wecallthatpointapotentialoutlier.
Fortheexample, ifanyofthe jy
^
y
jvaluesareatleast31.29,thecorresponding(x,y)datapointisa
potentialoutlier.
Forthethirdexam/finalexamproblem,allthejy
^
y
j’sarelessthan31.29exceptforthefirstonewhichis
35.
35>31.29
Thatis,jy
^
y
j(1.9)(s)
Thepointwhichcorrespondstojy
^
y
j=35is(65,175).Therefore,thedatapoint(65,175)isapotential
outlier.Forthisexample,wewilldeleteit.(Remember,wedonotalwaysdeleteanoutlier.)
The next step is s to o compute a a new best-fit t line e usingthe e 10remaining g points. . The e new line ofbest
fitandthecorrelationcoefficientare:
^
y
355.19+7.39xandr=0.9121
Example12.13
Usingthisnewlineofbestfit(basedontheremaining10datapoints),whatwouldastudent
whoreceivesa73onthethirdexamexpecttoreceiveonthefinalexam? Isthisthesameasthe
predictionmadeusingtheoriginalline?
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
C# HTML5 PDF Viewer SDK to convert and export PDF document to
All Formats. XDoc.HTML5 Viewer. XDoc.Windows Viewer. XDoc.Converter. View & Process. Adobe PDF. XDoc.PDF. Scanning. XDoc.Word. XDoc.Excel. XDoc.PowerPoint. Barcoding
converting pdf to powerpoint slides; images from pdf to powerpoint
DocImage SDK for .NET: Web Document Image Viewer Online Demo
Try Online Demo Now. Please click Browse to upload a file to display in web viewer. Suppported files are Word, Excel, PowerPoint, PDF, Tiff, Dicom and main
change pdf to powerpoint on; convert pdf to editable ppt
546
CHAPTER12. LINEARREGRESSIONANDCORRELATION
Solution
Usingthenewlineofbestfit,
^
y
355.19+7.39(73) 184.28.Astudentwhoscored73points
onthethirdexamwouldexpecttoearn184pointsonthefinalexam.
The original l line e predicted
^
y
 173.51+4.83(73) =
179.08 so o the prediction n using the
newlinewiththeoutliereliminateddiffersfromtheoriginalprediction.
Example12.14
(FromTheConsumerPriceIndexesWebsite)TheConsumerPriceIndex(CPI)measurestheaver-
agechangeovertimeinthepricespaidbyurbanconsumersforconsumergoodsandservices.The
CPIaffectsnearlyallAmericansbecauseofthemanywaysitisused.Oneofitsbiggestusesisas
ameasureofinflation.ByprovidinginformationaboutpricechangesintheNation’seconomyto
government,business,andlabor,theCPIhelpsthemtomakeeconomicdecisions.ThePresident,
Congress,andtheFederalReserveBoardusetheCPI’strendstoformulatemonetaryandfiscal
policies.Inthefollowingtable,xistheyearandyistheCPI.
Data:
x
y
1915
10.1
1926
17.7
1935
13.7
1940
14.7
1947
24.1
1952
26.5
1964
31.0
1969
36.7
1975
49.3
1979
72.6
1980
82.4
1986
109.6
1991
130.7
1999
166.6
Table12.2
Problem
 Makeascatterplotofthedata.
 Calculatetheleastsquaresline.Writetheequationintheform
^
y
=a+bx.
 Drawthelineonthescatterplot.
 Findthecorrelationcoefficient.Isitsignificant?
 WhatistheaverageCPIfortheyear1990?
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
547
Solution
 Scatterplotandlineofbestfit.
^
y
3204+1.662xistheequationofthelineofbestfit.
 r=0.8694
 Thenumberofdatapointsisn=14.Usethe95%CriticalValuesoftheSampleCorrelation
CoefficienttableattheendofChapter12. 2=12. Thecorrespondingcriticalvalueis
0.532.Since0.8694>0.532,rissignificant.
^
y
3204+1.662(1990)=103.4CPI
 UsingthecalculatorLinRegTTest,wefindthats=25.4;graphingthelinesY2=-3204+1.662X-
2(25.4)andY3=-3204+1.662X+2(25.4)showsthatnodatavaluesareoutsidethoselines,iden-
tifyingnooutliers. (Notethattheyear1999wasveryclosetotheupperline,butstillinside
it.)
Figure12.17
NOTE
:Intheexample,noticethepatternofthepointscomparedtotheline.Althoughthecorrela-
tioncoefficientissignificant,thepatterninthescatterplotindicatesthatacurvewouldbeamore
appropriatemodeltousethanaline. Inthisexample,astatisticianshouldprefertouse e other
methodstofitacurvetothisdata,ratherthanmodelthedatawiththelinewefound.Inaddition
todoingthecalculations,itisalwaysimportanttolookatthescatterplotwhendecidingwhether
alinearmodelisappropriate.
Ifyouareinterestedinseeingmoreyearsofdata,visittheBureauofLaborStatisticsCPIwebsite
ftp://ftp.bls.gov/pub/special.requests/cpi/cpiai.txt;ourdataistakenfromthecolumnentitled
"AnnualAvg."(thirdcolumnfromtheright). Forexampleyoucouldaddmorecurrentyearsof
data. Tryaddingthemorerecentyears2004: : CPI=188.9,2008: CPI=215.3and2011:CPI=224.9.
Seehowitaffectsthemodel. (Check:
^
y
4436+2.295x0.9018. Isrsignificant? Isthefit
betterwiththeadditionofthenewpoints?)
**WithcontributionsfromRobertaBloom
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
548
CHAPTER12. LINEARREGRESSIONANDCORRELATION
12.1095%CriticalValuesoftheSampleCorrelationCoefficientTable
10
10
Thiscontentisavailableonlineat<http://cnx.org/content/m17098/1.6/>.
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
549
DegreesofFreedom:2
CriticalValues:(+and )
1
0.997
2
0.950
3
0.878
4
0.811
5
0.754
6
0.707
7
0.666
8
0.632
9
0.602
10
0.576
11
0.555
12
0.532
13
0.514
14
0.497
15
0.482
16
0.468
17
0.456
18
0.444
19
0.433
20
0.423
21
0.413
22
0.404
23
0.396
24
0.388
25
0.381
26
0.374
continuedonnextpage
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
550
CHAPTER12. LINEARREGRESSIONANDCORRELATION
27
0.367
28
0.361
29
0.355
30
0.349
40
0.304
50
0.273
60
0.250
70
0.232
80
0.217
90
0.205
100
0.195
Table12.3
AvailableforfreeatConnexions<http://cnx.org/content/col10522/1.40>
Documents you may be interested
Documents you may be interested