ContentAvailabilityandBundlinginSwarmingSystems
DanielS.Menasche
UniversityofMassachusetts,
Amherst,MA,USA
sadoc@cs.umass.edu
AntonioA.A.Rocha
FederalUniversityofRiode
Janeiro,RJ,Brazil
arocha@land.ufrj.br
BinLi
TsinghuaUniversity,
Beijing,China
libinhere@gmail.com
DonTowsley
UniversityofMassachusetts,
Amherst,MA,USA
towsley@cs.umass.edu
ArunVenkataramani
UniversityofMassachusetts,
Amherst,MA,USA
arun@cs.umass.edu
ABSTRACT
BitTorrent, the immensely y popular r file e swarming system,
suffersafundamentalproblem: content t unavailability. . Al-
thoughswarmingscaleswelltotolerateflashcrowdsforpop-
ularcontent,itislessusefulforunpopularcontentaspeers
arrivingaftertheinitialrushfindthecontentunavailable.
Ourprimarycontribution isamodeltoquantifycontent
availabilityin swarmingsystems. . Weusethemodeltoan-
alyze the availability and the performance implications s of
bundling, a strategy commonly adopted by many BitTor-
rentpublisherstoday.Wefindthatevenalimitedamountof
bundlingexponentiallyreducescontentunavailability. Sur-
prisingly,forswarmswithhighlyunavailablepublishers,the
availability gain of bundling can result t in a net improve-
ment in download time, i.e., peers obtain more content in
less time. . Weempiricallyconfirm m the model’s s conclusions
through experiments onPlanetLabusingthemainlineBit-
Torrentclient.
CategoriesandSubjectDescriptors
C.4[PerformanceofSystems]: Modelingtechniques
GeneralTerms
Measurement;Performance;Reliability;Theory
1. INTRODUCTION
DespitethetremendoussuccessofBitTorrent(estimated
toaccountfor 30–50%ofallInternet traffictoday),itsuf-
fers from a fundamental l problem: : availability. . Although
peer-to-peerswarminginBitTorrentscalesimpressively to
toleratemassiveflashcrowdsforpopularcontent,swarming
does littletodisseminateunpopular contentas their avail-
abilityislimitedbythepresenceofaseedorpublisher.The
Permissiontomakedigitalorhard copiesofallorpartofthisworkfor
personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare
notmadeordistributedforprofitorcommercialadvantageandthatcopies
bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to
republish,topostonserversortoredistributetolists,requirespriorspecific
permissionand/orafee.
CoNEXT’09,December1–4,2009,Rome,Italy.
Copyright2009ACM978-1-60558-636-6/09/12...$10.00.
extentofpublisherunavailabilityissevere,e.g.,ourmeasure-
mentstudyshowsthat40%oftheswarmshavenopublishers
availablemorethan50%ofthetime.
Toappreciatetheavailabilityproblem,consideraswarm
foranepisodeofapopularTVshow. Whenapublisherfirst
poststheepisode,aflashcrowdofpeersjoinstheswarmto
download it. . The e original publisher r goes offline at t some
point, but peers may continue toobtain the content from
otherpeerswhiletheswarmis active. . If f apeer arrivesaf-
tertheinitialpopularitywave, whenthepopulation ofthe
swarmhasdwindleddowntonear-zero,itfindsthecontent
unavailableandmustwaituntilapublisherreappears.
Our primary y contribution n is s a mathematical l model to
studycontentavailabilityinswarmingsystemssuchasBit-
Torrent.WeuseanM/G/1queuetomodeltheself-scaling
property of BitTorrent swarms, i.e., more peers bring g in
morecapacitytothesystem. The e key insight is tomodel
uninterruptedintervalsduringwhichthecontentisavailable
asbusyperiodsofthatqueue. Thebusyperiodincreasesex-
ponentiallywiththearrivalrateofpeersandpublishersand
thetimespentbypeersandpublishersintheswarm.
Ourmodelenablesustoanalyzetheimpactofbundling,a
commonstrategyadoptedbyBitTorrentpublisherswherein,
insteadofdisseminatingindividualfilesviaisolatedswarms,
apublisher packages anumberof relatedfilesanddissemi-
natesitviaasinglelargerswarm. Toappreciatewhybundling
improves contentavailability, consider abundleof K files.
AssumethatthepopularityofthebundleisroughlyKtimes
thepopularityofanindividualfileasapeerrequestingany
file requests s the entire bundle. . The e size of the bundle is
roughlyK timesthesizeofanindividualfile. . Ourmodel
suggeststhatthebusyperiodofthebundledswarmisafac-
tore
£(K
2
)
largerthanthatofanindividualswarm. Indeed,
if busy periods supported bypeers alonelastuntila a pub-
lisherreappears,thecontentwillbeavailablethroughout.
Surprisingly,insomecases,theimprovedavailabilitycan
reducethedownloadtimeexperienced bypeers,i.e.,peers
download morecontent inlesstime. . Thedownloadtime e of
peersinthesystemconsistsofthewaitingtime spentwhile
contentisunavailableandtheservicetime spentinactively
downloadingcontent. If f thereduction in waitingtimedue
to bundling is greater r than n the e corresponding increase in
servicetime,thedownloadtimedecreases. Wevalidatethis
conclusioninSection4throughlarge-scalecontrolledexper-
imentsusingthemainlineBitTorrentclientover Planetlab.
121
Pdf combine pages - Merge, append PDF files in C#.net, ASP.NET, MVC, Ajax, WinForms, WPF
Provide C# Demo Codes for Merging and Appending PDF Document
combine pdfs online; pdf mail merge plug in
Pdf combine pages - VB.NET PDF File Merge Library: Merge, append PDF files in vb.net, ASP.NET, MVC, Ajax, WinForms, WPF
VB.NET Guide and Sample Codes to Merge PDF Documents in .NET Project
reader combine pdf; reader merge pdf
Ourexperimentsalsoshowthattheconclusionsofourmodel
qualitativelyholdeven with realisticarrival patterns, peer
uploadcapacities,andheterogeneouspopularities.
Insummary,wemakethefollowingcontributions.
Measurement: Wepresentalarge-scalemeasurementstudy
ofrealBitTorrentswarmsthatshowsthat(1)contentavail-
abilityisaserious problemduetopublisher unavailability,
(2)bundlingofcontentis awidelyprevalent,and (3)bun-
dledcontentismoreavailablethanunbundledcontent.
Availabilitymodel:Wepresentanovelqueuing-theoretic
modeltoanalyzecontentavailabilityanddownloadtimesin
BitTorrent-likeswarmingsystems. Toour r knowledge, this
isthefirstmodelthatrelatescontentavailabilitytoarrivals
anddeparturesofpeersaswellaspublishers.
Implications ofbundling: Weusethemodeltoanalyze
the implications s of f bundling, , a a widely y prevalent t yet little
studiedphenomenon,andshowthat(1)bundlingimproves
availability,and(2)bundlingcanreducedownloadtimesfor
unpopularcontentwhenpublishersarehighlyunavailable.
Experimental validation: : We e validatethe modelusing
large-scalecontrolledexperimentswiththemainlineBitTor-
rentclientonPlanetLabshowingthatthemodelaccurately
predictsdownloadtimesinswarmswithintermittentlyavail-
ablepublishersforbothbundledandindividualcontent.
2. MEASURINGCONTENTAVAILABILITY
ANDBUNDLINGINBITTORRENT
Inthissection,wepresentalarge-scalemeasurementstudy
ofBitTorrentthatshows that1)contentunavailabilityisa
seriousprobleminBitTorrenttoday,and2)bundlingofcon-
tentiswidelyprevalentandbundledcontentsshowsgreater
availability. Webeginwithabriefoverviewofhowswarming
inBitTorrentworksandwhycontentbecomesunavailable.
2.1 Whyunavailability?
Aswarm consists s of a a set t of peers concurrently sharing
(downloadingor uploading) ) content (a a file or r a a bundle of
files) of f common interest t with the e help p of f a a coordinating
tracker. Content t is divided into o blocks and peers s obtain
meta-dataabout constituent blocks aswellas identities of
otherpeersintheswarmfromthetracker. Apeerexchanges
blockswithotherpeersusingatit-for-tatincentivestrategy
until it completes s its s download. . Peers s that t have not t yet
completed their downloads s are called d leecherswhile e peers
thatpossessallblocksinthecontentarecalledseeds.
Contentisavailableifeitheratleastoneseedispresentor
sufficientlymanyactiveleechersarepresentsoastocollec-
tivelymakeallconstituent blocksavailable. . Seedsmaybe-
comeunavailableinpracticeduetoseveralreasons. Publish-
ingsitesservingalargenumberoffilesmaytakedownseeds
aftertheinitialpopularitywavesubsidesinordertoreduce
bandwidthcosts. Aseedmayalsobeanaverageuserpub-
lishinghome-generatedcontent that cannotaffordtostay
online all the time. . Seeds s illegally uploading g copyrighted
materialoftendisappearquicklyforobvious reasons. . Even
forlegitimatecontent,maintaininghighlyavailableseedsen-
tails administrative effort and cost, which runs counter to
the goals of content publishers that value BitTorrent as a
cheapalternativetoaclient-serverapproach.
Throughoutthis section,wemeasurecontentavailability
byequatingitwithseedavailability. Inthenextsection,we
modelcontentavailabilityresultingbothfromseedsaswell
asfromleechersalone. Intherestofthispaper,weusethe
terms publishersand peersinterchangeably with seedsand
leechersrespectively.
2.2 Measuringunavailability
HowavailableiscontentinBitTorrentswarmstoday? To
answerthisquestion,weconductedaseven-monthlongmea-
surement study of f BitTorrent swarms as s follows. . We e de-
veloped anddeployed BitTorrentmonitoringagentsat 300
nodeson PlanetlabfromAugust 3,2008toMarch6,2009.
Onceeveryhour,ahostattheUniversityofMassachusetts
Amherstreceives anRSSfeedadvertised byGoogleReader
ofrecentlycreatedtorrentURLsfromMininova(alargetor-
rent hostingsite), and sends each URLtoa a subset t of the
monitoringagentsonPlanetlab. Theagents s fetch thetor-
rent metadatabyjoiningtheswarmandbegin tomonitor
its peers. . Our r agents s leverage the Peer r Exchange e (PEX)
protocol extension, that enables s it t to o discover r new neigh-
borsfromotherpeers in addition tothetracker. . Toavoid
copyrightissues, ouragents collect information onlyabout
thecontrolplanewithoutactuallyuploadingordownloading
content,whichsufficesforourpurposesasweequatecontent
availabilitywithseedavailability.
Todistinguishseedsfromleechers,ouragentsrecordthe
bitmaps received from m connected peers. . The e bitmaps s are
part of the BitTorrent protocol and d a a peer r uses s them to
convey the blocks it possesses to its neighbors. . Each h en-
tryinthetracecollectedbytheagentsconsistsofaswarm
identifier, a peer r identifier (IP P address s and d port t number)
and its bitmap recorded roughly periodicallyfor each dis-
coveredpeerintheswarm. Ourtracesconsistofmorethan
14milliondistinctIPaddressesand66Kdistinctswarms.
Figure1showsthedistributionofseedavailabilityforthe
monitored swarms. . Thesolid d curve shows the availability
inthefirstmonthafterthecreationoftheswarm,whenwe
expectthecontenttobemorepopular. Theextentofpub-
lisher unavailabilityissevere: : lessthan35%oftheswarms
hadatleastoneseedavailableallthetime. Theavailability
of swarms over theentire duration of the measurement is
evenlowerasshownbythedottedcurve: almost80%ofthe
swarmsareunavailable80%ofthetime.
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
entire measurement
first month after each  torrent publication
F(x) = P(X < x)
Fraction of time with at least one seed available 
Figure1: CDFofseedavailabilityin45,693swarms
eachmonitoredforatleastonemonth.
122
C# PDF: C#.NET PDF Document Merging & Splitting Control SDK
List<BaseDocument> docList, String destFilePath) { PDFDocument.Combine(docList, destFilePath); }. For example, if the target PDF file has 8 pages and you
append pdf files reader; pdf mail merge
C# PDF File Split Library: Split, seperate PDF into multiple files
which C# developers can split target PDF document file by specifying a page or pages. If needed, developers can also combine generated split PDF document files
c# combine pdf; best pdf merger
2.3 Contentbundling
Bundling ofcontent t is acommon practiceinBitTorrent
today. Inthissection,westudytheextentofbundlingand
itsimpactonavailability. Thetraceusedinthissectionisa
snapshot ofBitTorrent swarms taken onMay6,2009. . For
eachofthe1,087,933swarmsinthissnapshot,werecordits
contentcategory(e.g.,movies,TV,booksetc.),namesand
sizes of constituent t files, , creation n date, , and instantaneous
numberofseedsandleechers. Notethatwecouldaffordto
monitormanymoreswarmsinthisdatasetthantheprevious
oneas wedid not havetomeasure detailsofpeer arrivals
anddeparturesinsideeachswarm.
2.3.1 Extentofbundling
Weanalyzetheextentofbundlinginthreeoftheninecat-
egories present in Mininova,namely,music,TVshows and
books. Thesethreecategoriestogetheraccountfor45.98%
oftheswarmsand 31.93%of thepeers in thesystem. . We
chosethesethreecategoriesbecauseitiseasiertoautomati-
callydetectbundlingbycheckingforthepresenceofmultiple
files with known extensions (e.g., .mp3for songs, .mpg g for
TVshowsand .pdfforbooks). Detectingbundlingisnon-
trivialinsomecategories,e.g.,aDVDforasinglemovieis
oftenorganizedas acollection ofvideofiles that arenever
distributedindividually,makingitdifficulttocheckforthe
presenceofmultiplemovieswithoutmanualinspection.
Amongmusicswarms,albums arecommon. . Weclassify
amusic swarmasabundleif ithas twoormorefiles with
commonaudiofileextensionssuchas.mp3,.midand.wav,
which results in 193,491 1 of f the 267,117monitored swarms
beingclassifiedasbundles.
AmongTVshowswarms,manybundlesconsistofsetsof
episodesinaseason. Weclassifyswarms s that havetwoor
morefiles withcommonvideofileextensions such as.mpg
and.aviasbundles,whichresultsin25,990ofthe164,930
monitoredswarmsbeingclassifiedasbundles.
Amongbookswarms,weobservethatcollections,i.e.,tor-
rentscontainingthekeyword“collection”intheirtitles,usu-
ally consist of a bundleof contents connected bya broad
theme,e.g.,the“UltimateMathCollection(1)”ofsize5.81
GBhas642books. Weclassified841ofthe66,387monitored
swarmsascollections. Classifyingswarmsthatcontain2or
more files s with h common n file extensions such as .pdf and
.djvuasbundlesresultsinanadditional6,270bundles.
2.3.2 Bundledcontentismoreavailable
Inthissection,wepresentevidencesuggestingthatbundling
iscorrelatedwithhigheravailability.Wefirstconsiderbook
swarms. Wefindthat62%ofallbookswarmshad d noseed
available on n May y 6, , 2009, whereas that t number drops s to
36%ifweconsideronlycollections. Furthermore,theaver-
agenumberofdownloadsforatypicalbookswarmis2,578,
whereasforcollectionsitis4,216.
Onereason forhigherseedavailabilitymaybethatcon-
tentpublishersareintrinsicallymorewillingtosupportseeds
forbundledcontent. Thehigher r numberofdownloads for
bundledcontentmaybeeitherbecauseofhigherdemandfor
bundledcontent(asanypeerseekinganyoftheconstituent
filesmayopttodownloadthebundle),orbecauseofhigher
availability, or both. . Higher r seed availability in turn may
in part t be e because of the increased number of downloads
as somepeers maychoose toaltruisticallydisseminatethe
contentfurther. Althoughitisdifficulttodiscerncauseand
Figure2: Illustrationofbusyandidleperiods.
effect in ourmeasurement data, our analytic modelin the
next section quantifies howthehigher demand and higher
seedavailabilityforbundledcontentproduceimprovedcon-
tentavailability.
Wenext analyzeourtraces morecloselyforcontentthat
isavailablebothinisolationandaspartofalargerbundle.
Weobservethatamongtheunavailablecollections,someof
themweresubsetsofbiggercollections,e.g.,the23swarms
consistingofcollectionsofGarfieldcomicsfrom1978to2000
had no seeds. . However, , each of these e collections s can be
found in n a a single super-collection aggregatingall Garfield
comics. Thesuper-collectionhadsevenseeds. Afteraman-
ualinspectionofall841bookcollections,weconcludedthat
210hadnoseedsandwerenotsubsetsofother collections,
which results in 210/841= 25%unavailability for content
disseminated through collections (compared to o 62% above
foratypicalswarm).
Asanotherexample,weconsiderswarmsforthepopular
TVshow“Friends”. Therewereatotalof52swarmsasso-
ciated with this show. . Amongthem, , 23 had oneor more
seeds available, and the remaining g 29 had d noseeds. . The
23available swarms s consisted d of 21bundles (and d 2single
episodes), whereas the 29 unavailable swarms s consisted d of
only7bundles. Theseobservationssuggest t astrongcorre-
lation between bundling g and d higher availability. . The e next
sectionpresentsananalyticmodelthatquantifiesthecausal
relationshipbetweenthetwo.
3. MODEL
Inthissection,wedevelopamodelforcontentavailabil-
ityinBitTorrent. The e keyinsight underlyingthemodelis
toviewBitTorrentasacoverageprocessorequivalentlyan
M/G/1queuingsystem. Themodelshowsthat1)bundling
improvesavailability,and2)forswarmswithhighlyunavail-
able publishers, , the e availability benefit of bundling more
than offsets the increased time toactively download more
content,resultinginanetdecreaseinuser-perceiveddown-
loadtimes.
3.1 Modeloverview
Figure2illustrateshowcontentavailabilityinBitTorrent
dependsuponthearrivalsanddeparturesofpublishersand
peers. Eachhorizontallinesegmentrepresentsthetimein-
tervalduringwhichapeer(representedusingthinlines)or
123
C# Word - Merge Word Documents in C#.NET
Combine and Merge Multiple Word Files into One Using C#. This part illustrates how to combine three Word files into a new file in C# application.
pdf merge documents; apple merge pdf
C# PowerPoint - Merge PowerPoint Documents in C#.NET
Combine and Merge Multiple PowerPoint Files into One Using C#. This part illustrates how to combine three PowerPoint files into a new file in C# application.
best pdf combiner; scan multiple pages into one pdf
apublisher (represented usingthicklines) stays online. . A
swarmis initiatedbythearrivalofapublisher,whichalso
marksthestartofthefirstbusyperiod. Theswarm’slifetime
is divided intoalternatingbusy and idle periods. . Content
isavailableduringbusyperiodsandunavailableduringidle
periods.Ifapublisherisalwaysonline,thefirstbusyperiod
lastsforeverandcontentremainsalwaysavailable.
Abusyperiodendswhenthefollowingtwoconditionsare
satisfied:1)therearenopublishersonline,and2)thecover-
age,i.e.,thenumberofpeerscurrentlyonline,dropsbelow
afixedsmallthreshold(causingsomeblockstobecomeun-
available). Forexample,Figure2showsthatafter r allpub-
lishers leaveat timet
1
,thebusyperiodcontinueswiththe
help of peers aloneuntilapublisherreappears at timet
2
.
Abusyperiodmayalternateanynumberoftimesbetween
aphaseconsistingofoneormorepublishers(Phase1)and
aphaseconsistingof peers alone(Phase2). . Peers s arriving
during either r phase in abusy period willfind the content
available. At t t
4
, there are no publishers and the number
ofpeers drops belowthecoveragethreshold (assumed3in
thisexample).Thisinitiatesanidleperiodthatlastsuntila
publisherreappearsattimet
5
. Extantpeersattheendofa
busyperiodaswellaspeers arrivingduringtheidleperiod
find the content unavailable(representedby dottedlines).
Becauseofidlewaiting,thesepeersexperiencelongerdown-
loadtimesdefined asthetimessinceapeer arrives untilit
completesthedownload.
Our goal is s tounderstand d how content t availability y and
thedownloadtimesexperiencedbypeersinaswarmdepend
upon1)itspopularityorthepeerarrivalrate‚;2)themean
times=„thatapeertakesduringabusyperiodtoactively
downloadthecontentofsizesatarateequaltotheeffective
averagecapacity „oftheswarm;and 3) thearrivalrate r
of publishers and d the e mean time u that apublisher stays
online. We e have implicitly assumed that u must t be e long
enoughforatleastonecopyofthefiletobeservedineach
busyperiod. Forsimplicity,wehaveassumedthatpeersare
selfish andleaveassoonasthey completetheirdownload;
x3.3.4extendsthemodeltoincorporatealtruisticlingering.
Toappreciatewhybundlingimprovescontentavailability,
considerthespecialcase ofahighlyunavailablepublisher,
i.e.,itsarrivalraterandmeanresidencetimeuaresmall.
Then,thelengthofabusyperiod is determined primarily
by peer arrivals and departures. . Assuming g Poisson peer
arrivals and a coverage e threshold d of one, the e length h of f a
busyperiod canbeshowntobe
e
‚s=„
¡1
. BundlingK K files
increasesthepeerarrivalrateforthebundletoK‚aseach
peerdesiringanyoftheconstituentfilesrequeststheentire
bundle, and d increases s the time spent t by y each peer r in n the
swarmtoKs=„. Asaresult,thelengthofthebusyperiod
for the e bundled swarm m is
e
K
2
‚s=„
¡1
K‚
, which translates to
areductioninunavailabilitybyafactore
£(K
2
)
. Forhighly
unavailablepublishers,theavailabilitygainsofbundlingcan
outweighthecostoftheincreasedtimetodownloadKtimes
as much content resulting g in n a a reduction n in the download
time,i.e.,peersobtainmorecontentinlesstime.
The rest ofthis section formalizes the aboveclaims and
derivesclosed-formexpressionsforthetotaldownloadtime
experiencedbypeers with andwithout bundling. . Unless
otherwisestated,weassumethatinter-arrivaltimesofpeers
andpublishers,residencetimeofpublishers,andfiledown-
loadtimesareallexponentiallydistributed.
Variable
Description(units)
k
peerarrivalrate(1/s)
⁄=
P
K
i=1
k
bundledpeerarrivalrate(1/s)
s
k
fllesize(bits)
S=
P
K
i=1
s
k
bundlesize(bits)
meandownloadrateofpeers(bits/s)
r
k
arrivalrateof publishers(1/s)
R
arrivalrateof publishers
forthebundle(1/s)
u
k
meanpublisherresidencetime(s)
U
meanbundledpublisherresidencetime(s)
Metric
Description(units)
P
k
unavailability
P
unavailabilityofbundle
T
k
downloadtime(s)
T
bundledownloadtime(s)
Table 1: : Variables s denoted d by y lower r case e charac-
terize swarm k k 2 2 f1;2;¢¢¢;Kg in isolation, , while
variablesdenotedbycapitalletterscharacterizethe
bundleofKfiles. Metricsfortheswarmsinisolation
and for bundles are e denoted d by y plain n and stylized
letters, respectively. . Subscripts s are dropped d when
homogeneousfilesareconsidered.
3.2 Asimplemodelforcontentavailability
Wepresent asimpleinstance oftheabovemodeltoan-
alyzecontentavailabilityandshowthatbundlingimproves
availability. The e model makes several simplifyingassump-
tions(whichweprogressivelyrelaxinsubsequentsections),
butbringsoutthekeyinsightunderlyingallofourresults.
Assumptions: Content t is availableifand only if there
isatleastonepublisheronline. Apeerarrivingduringan
idleperiodfindsthefileunavailableandimmediatelyleaves,
i.e.,itdoesnotqueueupuntilapublisherarrives. ƒ
Availabilityofanindividualswarm.
In swarm m k, let r
k
and u
k
denote the arrival rate and
residencetimeofpublishers(refertoTable1fornotation).
Swarmkcyclesthroughbusyandidleperiods,withaverage
lengthE[B
k
]and1=r
k
,respectively. TheprobabilityP
k
that
apeerarrivestoswarmktofindthecontentunavailableis
P
k
=
1=r
k
E[B
k
]+1=r
k
;
k=1;:::;K
(1)
and
E[B
k
]=
e
r
k
u
k
¡1
r
k
(2)
Theabovefollowsfromclassicalresultsforthebusyperiod
ofanM/G/1queue.
Availabilityofabundledswarm.
LetR andU U denotethearrivalrateand d residencetime
ofpublishersfor thebundle, respectively. . The e probability
P thatapeer arrivestofindthecontentunavailableinthe
bundledswarmis
P=
1=R
E[B]+1=R
(3)
wheretheaveragelengthofabusyperiodforabundleofK
124
VB.NET PDF Page Insert Library: insert pages into PDF file in vb.
to add and insert one or multiple pages to existing simple ways to create VB application to combine .NET Imaging Processing and PDF document libraries.
merge pdf files; split pdf into multiple files
VB.NET PDF: Use VB.NET Code to Merge and Split PDF Documents
VB.NET program and it includes all pages information in APIs for Merging PDF Documents in VB.NET. Private Sub Combine(source As List(Of BaseDocument), destn As
break pdf into multiple files; add pdf together one file
filesis
E[B]=
e
RU
¡1
R
(4)
Considerthespecialcasewhenthepublisherarrivalrates
are the same for r all l files, i.e., , r
k
= r r and their residence
timesarealsothesame,i.e.,u
k
=uforallKfiles. IfRand
UscaleasR=KrandU=Ku,then
E[B] =
e
K
2
ru
¡1
Kr
(5)
P
=
1=(Kr)
(eK
2
ru ¡1)=(Kr)+1=(Kr)
(6)
NotethatE[B]isafactore
£(K
2
)
largerthanthecorrespond-
ingvalue for an individual swarm. . It t can also o be verified
that¡logP
k
=Θ(1)and¡logP=Θ(K
2
). Thus,bundling
reducescontentunavailabilitybyafactore
¡£(K
2
)
.
Availabilitywithpublishersandpeers.
Assumptions: Thebusyperiodisdefinedw.r.t.acov-
erage threshold of one, i.e., a peer arrivingduringabusy
periodalwaysfinishesthedownloadinthatbusyperiodand
thelastpeertofinishendsthebusyperiod.ƒ
Contentmaybeavailableeven iftherearenopublishers
online.Lettheaggregatearrivalrateofpeersandpublishers
totheindividualswarmandtothebundlebe‚
k
+r
k
and
Λ+R, respectively. . For r simplicity, , we first consider r the
specialscenarioinwhichpublishersstayforatimeequalto
that ofdisseminating g onecopy of f the file, i.e., u
k
= s
k
=„
(anassumptionwerelaxinthefollowingsection),
E[B
k
]=
e
(r
k
+‚
k
)s
k
=„
¡1
r
k
+‚
k
;
k=1;:::;K
(7)
and
E[B]=
e
(R+⁄)S=„
¡1
Λ+R
(8)
If, for allK K files, ‚
k
= ‚ and s
k
= s s then Λ = K‚ ‚ and
S=Ks. ThebundledbusyperiodisE[B]=e
£(K
2
)
. Thus,
bundlingreduces theunavailabilitybye
¡£(K
2
)
evenifthe
bundledpublisherarrivalrateisequaltothe publisher ar-
rivalrateoftheindividualswarms.
3.3 Amodelforcontentavailabilityanddown-
loadtime
Next,wequantifycontentavailabilityandthemeandown-
loadtimeexperiencedbypeerswhen1)peersmaywaitfor
contenttobecomeavailable,2)themeanresidencetimeof
thepublisher maydifferfromtheservicetimeofpeersand
3)thecoveragethresholdmaybegreaterthanone.
We begin by presenting g the theoretical l background re-
quired by our model. . Our r results s rely y on those reported
byBrowneandSteele[2]onthebusyperiodofanM/G/1
queuewherethecustomerinitiatingthebusyperiodhasan
exceptionalresidencetime.
LetcustomersarriveaccordingtoaPoissonprocesswith
rate fl.
The residence time of the customer initiatinga
busy periodis draw from anexponentialdistribution with
meanµ. Theresidencetimeofallothercustomers,X,takes
the form m of one of two exponentially y distributed random
variables,X
1
orX
2
,withaveragesfi
1
andfi
2
,respectively;
X =X
1
with probabilityq
1
andX =X
2
with probability
q
2
=1¡q
1
. Theexpectedbusyperiodis
E[B]=µ+
X1
i=1
i
i!
Xi
j=0
i
j
!
q
j
1
q
i¡j
2
1+j
1
1¡j+i
2
µ
1
2
+jµfi
2
+µfi
1
i¡µfi
1
j
(9)
Thereadercanfindthederivationof (9)intheAppendix.
The proofs s of the e results that follow, when n not t included
in the Appendix, are in the technical report t [10]. . In n the
restofthissection,unlessotherwisestated,weassumethat
allfiles havethesamesizeanddemand,andthatthepub-
lisherarrivalratesandresidencetimesarethesameacrossall
swarms. Assuminghomogeneousswarmsallowsustodrop
the subscripts of variables s referring toindividual l swarms.
In[10] weshowthatmostofourresultsextendtothecase
wheredifferentswarmshavedifferentcharacteristics.
3.3.1 Availabilitywithimpatientpeers
Assumptions: Publishersarrivetoindividualswarms
atraterandstayinthesystemforameantimeu.Forthe
bundledswarm,publishersarrivewithrateRandstayfora
meantimeU. Peersthatarriveduringanidleperiod d leave
immediately.ƒ
We areinterested in n determiningtheprobability y that a
requestleaves withoutbeingserved.DenotethisasPandP
fortheindividualandbundledsystems,respectively. Then
P =
1=r
E[B]+1=r
P=
1=R
E[B]+1=R
(10)
Theaveragebusyperiodforeachindividualswarm,E[B],
is obtained d from (9) ) by settingthe parameters as follows:
fl=‚+r,µ=u,fi
1
=s=„,q
1
=‚=(‚+r),fi
2
=u.
Forthebundledswarm,theaggregatepeerarrivalrateis
Λ=K‚andthesizeis S=Ks. . Theaveragebusyperiod,
E[B], is obtainedfrom(9) as follows: : fl fl =Λ+R, , µ =U,
1
=S=„,q
1
=Λ=(Λ+R),fi
2
=U.
Thefollowinglemmaconcernsthenumberofpeersserved
inabusyperiod. Assumingthatboththebundlepublisher
arrivalrate, R, and publisher residencetime, U,are inde-
pendentofK,wehave
Lemma 3.1. . The meannumberofpeersservedinabusy
period,E[N],increasesase
£(K
2
)
bybundlingKflles.
Note that this s result t is s qualitatively similar to o the case
when publishers and peers stay online for the same mean
time(Section3.2).
We now consider the scenariowhere peers have skewed
preferences. GivenK K contents,letp
k
denotetheprobabil-
ity thatarequest isfor content k, k =1;:::;K. . Assume
that p
k
= c=k
, – > 0(Zipf’s law). . LettingΛ Λ denotethe
aggregatepeerarrivalrateforallKswarms,thearrivalrate
for swarm m kis‚
k
= p
k
Λ. Under r theassumption thatthe
mean timetodownloadthebundlescalesas K=„,onecan
showthatthelemmaabovestillholds(detailsin[10]).
In the theorem m below w we relate the e asymptotics s of the
busyperiodtotheprobabilitythatarequestisnotserved.
UnderthesameassumptionsofLemma3.1wehave,
Theorem 3.1. (Availabilitytheorem)BundlingKflles
togetherdecreasesunavailabilitybyafactore
£(K
2
)
.
Although the above result assume the publisher r arrival
ratefor thebundle, R, tobeconstantand independentof
125
Online Merge PDF files. Best free online merge PDF tool.
the editor area you can rearrange them or delete single pages. Also you can add more PDFs to combine them and merge as easy as possible to merge your PDF files
attach pdf to mail merge; break pdf file into multiple files
VB.NET PowerPoint: Merge and Split PowerPoint Document(s) with PPT
Just like we need to combine PPT files, sometimes, we also the split PPT document will contain slides/pages 1-4 If you want to see more PDF processing functions
c# pdf merge; add pdf files together reader
K,weshowin[10]thateven if R=Ω(e
¡cK
2
),c>0,the
availabilityofthebundleisstillgreaterthantheavailability
oftheindividualswarmbyafactore
£(K
2
)
. Whenenough
filesarebundled,thelongbusyperiodsofthebundledswarm
make it t nearly y self-sustaining, so peers can almost always
downloadthecontentevenintheabsenceofpublishers.
3.3.2 Meandownloadtimewithpatientpeers
Assumptions:
Peers that t arrive during g an n idle pe-
riod waitfor r a a publisher r tobecome available. . Theother
assumptionsarethesameasinSection3.3.1.ƒ
Wewishtocomparethedownloadtimeofpeerswithand
withoutbundling. Tothisaim,wefirstcomputetheaver-
agebusyperiodlengthinanindividualswarm,E[B]. When
contentisunavailableandapublisherarrivestostartabusy
period,thegroupofwaitingpeersimmediatelybeginstobe
served.
Neglecting the possible impact t of f this s group p of
peersonthedurationofthebusyperiod,theaveragebusy
periodE[B]canbeobtainedfrom(9)bysettingfl=‚+r,
1
=s=„,q
1
=‚=(‚+r),fi
2
=µ=u. Inthetechnicalre-
port[10]wealsoprovideanexpressionforE[B]accounting
forthepossibleimpactofthegroupofpeersthatbeginsto
beservedwhenthepublisherarrives.
Themeandownloadtime,E[T],isgivenby
Lemma 3.2. Themeandownloadtimeofafllewhenpeers
arepatientis
E[T]=
s
+
1
r
P
(11)
whereP =
1=r
1=r+E[B]
.
For the bundled swarm, the mean n busy y period length,
E[B], can be e obtained d from m (9) ) by setting g fl fl = = Λ+R,
1
= S=„, q
1
= Λ=(Λ+R), fi
2
= µ µ = = U. . Once e E[B] is
obtained, the mean n download d time for the e bundle, , E[T],
canbederivedfrom(11)replacings,randE[B] bytheir
bundlecounterpartsS,RandE[B].
In the following theorem we relate e the mean n download
timesofbundlesandindividualswarms,
Theorem 3.2. (Download d time e theorem) Bundling
Kflles,
(a) canincreasethemeandownloadtimeofeachfllebyat
mostafactorK;
(b) candecreasethemeandownloadtimeof f eachfllebya
factorΘ(1=R)whichgrowsunboundedasR!0.
Whenservicetimesdominatedownloadtimes,bundlingcan
increasethedownloadtimebyuptoafactorofK aspeers
download K K times s as much h content. . Nevertheless, , when
wait times dominate download times as is the case with
highly unavailable publishers, peers may y experience arbi-
trarilysmallerdownloadtimeswhendownloadingbundles.
3.3.3 Thresholdcoverage
Assumptions: Sameasthosedescribedinx3.1. ƒ
If a peer r leaves the system m carrying the last t copy y of f a
chunk,contentmaybecomeunavailableevenifthenumber
ofpeersonline,i.e.,thecoverage,is greater thanone. . Our
aimnowistodeterminetheavailabilityandthemeandown-
load time e experienced d by y peers in the general case where
contentbecomesunavailablewhennopublisherisonlineand
thecoveragereachesathresholdm.
LetB(n;m)betheexpectedlengthofaresidualbusype-
riod that t begins with h n n leechers and ends as s soon as s the
population sizereachesm. . Themean n busyperiodcorre-
spondstoB(1;0). B(n;m)isgivenby
Lemma 3.3. . Foralln,
B(n;0)=
Xn
i=1
s
i„
+
s
X1
i=1
s‚
i(n+i)!¡n!i!
i!(n+i)!i
(12)
Form<n,B(n;m)isobtainedusingtherecursionB(n;m)=
B(n;0)¡B(m;0).
WeuseLemma3.3toestimatetheunavailabilityprobabil-
ityandtheexpecteddownloadtimeofpeersinthescenario
describedinx3.1anddepictedinFigure2. Weassumethat
1) the distribution of the residual busy period d is concen-
trated around itsmeanand2)publishersstaylongenough
inthesystemsothat,whenPhase2begins,thepopulation
ofpeers is in steadystate. . We e denotethemeanresidual
busyperiod startingwhen thesystemtransitionstoPhase
2by B(m),
B(m)=
X1
i=0
e
¡
‚s
(
‚s
)
i
i!
B(i;m)
(13)
Noting that the e number of times that the e system cycles
through Phases 1and 2beforetransitioningtoPhase 3is
describedbyageometricrandomvariablewithmeane
rB(m)
yields
Theorem 3.3. Forathresholdcoverage e of m, the mean
downloadtimeofafllewhenpeersarepatientiss=„+P=r
where
P =exp(¡r(u+B(m)))
(14)
Thecorrespondingexpression forbundled swarmsisob-
tained by replacing g s, , ‚, , r and d uby y their bundled coun-
terparts, S, Λ, R R and d U. . In n particular, if R R = Kr r and
U =Kutheavailabilityanddownloadtimetheorems s still
hold.InSection4.3.1,wevalidatethemeandownloadtime
estimatedusingtheabovetheorem against experiments.
3.3.4 Altruisticlingering
Assumptions:
Peersremaininthesystemforanav-
erageamountoftime1=°aftercompletingtheirdownloads.
TheotherassumptionsarethesameasinSection3.3.2. ƒ
Peersmaystayonlineasseedsaftercompletingtheirdown-
loads,eitherbecausetheyarealtruisticorbecausepublish-
ers provide themincentives to o doso. . In n the technical re-
port[10]weshowhowtoparameterizeageneralversionof
equation (9) toderive the availability probability and the
meandownloadtimeofpeersthatstayonlineasseedsafter
completingtheirdownloads.Furthermore,weshowthatthe
availabilityandthedownloadtimetheorems stillhold.
Toillustratetheconsequences of peers stayinglonger in
thesystem,considertwoswarmswithfilesizess
1
ands
2
and
popularities‚
1
and‚
2
.Wewishtocomparetheperformance
oftheindividualswarmswiththatofabundlewithsimilar
availability[s
1
1
=„+‚
1
=° = (‚
1
+‚
2
)(s
1
+s
2
)=„]. The
meanresidencetimeforrequestorsofcontent1isequalto
s
1
+
1
°
=
(‚
1
+‚
2
)(s
1
+s
2
)
„‚
1
=
s
1
+s
2
1+
2
1
(15)
126
0
200
400
600
800
1000
1200
1
2
3
4
5
6
7
8
expected download time
bundle size (K)
u
k
=300
λ
k
=1/60
s
k
/µ=121
m=7
1/r
k
=1100
1/r
k
=1000
1/r
k
=900
1/r
k
=800
1/r
k
=700
1/r
k
=600
1/r
k
=500
1/r
k
=400
1/r
k
=300
1/r
k
=200
1/r
k
=100
1/r
k
=1100
1/r
k
=100
1/R=1/r
k
Figure3: Bundlesmayreducedownload d time.
Forthebundledswarm,themeandownloadtimeofpeersis
givenby(s
1
+s
2
)=„.
Assumeswarm1isassociatedwithasmallandunpopular
contentwhiletheswarm2contentislargeandpopular,s
1
¿
s
2
,‚
1
¿1¿‚
2
. Sincecontent1is s veryunpopular (peer
interarrivaltimevery large), , high h availabilitydepends on
peersstayingforalongtimeinthesystemafterconcluding
theirdownloads(inequation(15),1+‚
2
=‚
1
!1as‚
1
!
0).
If swarm 1 is bundled d with swarm m 2, , on the e other
hand,theoverheadincurredbythepeersonlyinterestedin
content2ismarginal(sinces
1
¿s
2
)butthegainsforpeers
interested in n content 1 1 is s remarkable, since requestors s for
content1experiencethesameavailabilityandperformance
asthoserequestingfile2.
3.4 Whencanbundlingreducedownloadtime?
In this s section n we e use the proposed d model l to illustrate
whenbundlingreducesmean downloadtime. . Wenumer-
icallyevaluateequations (11) and (9) ) bysettingthe e pa-
rameters asdescribed in thelegend of Figure3. . Figure3
showstheexpecteddownloadtimeasafunctionofthebun-
dlesize. Forseven n ofthescenarios(1=R2[500¡1100]),
increasingKtoitsoptimalvalue,K=3,leadstoadecrease
in theexpected downloadtime,whilesettingK=1isthe
best strategyfor theremaining g four. . In n each curve, as K
increases the meandownload time first increases,then de-
creasesandfinallyincreasesagain. Theinitialperformance
degradationoccursbecausesmallbundlesmayincreaseser-
vice times without sufficiently increasingthe busy period.
Figure 3alsoshowsthatthebenefitsofbundlingincrease
asthevalueofRdecreases.
4. EXPERIMENTALEVALUATION
In this section, wereport on controlled experiments us-
ingrealBitTorrentclientstovalidatethetwomainconclu-
sions of our model: : 1) ) bundlingimprovesavailability, and
2)bundlingcanreducedownloadtimeswhenpublishersare
highlyunavailable. Weuse e aninstrumented version ofthe
mainlineBitTorrentclient [8] andexperiment withprivate
torrents deployed on Planetlab. . Our r experimental setup
thusemulatesrealisticwide-areanetworkconditions,client
implementationartifacts,andtheimpactofrealisticupload
capacitydistributionsandarrivalpatternsthataredifficult
tocaptureinananalyticmodel.
4.1 Experimentalsetup
Ourexperimentswereconductedusingapproximately150
Planetlab hosts and d two o hosts s at the e University of f Mas-
sachusettsatAmherstoneofwhichisdesignatedasthecon-
trolleroftheexperimentandanotherasaBittorrenttracker.
The controller causes peer arrivals, publisherarrivals,and
publisherdeparturesbydispatchingviasshacommandto
start or stop the BitTorrent client t on arandomly y chosen
unused Planetlab host. . Attheend d of theexperiment,the
controller collects s the remote traces logged d by y the instru-
mented BitTorrent clients. . Each h client’s tracelogs the in-
stantaneousdownloadanduploadrateseverysecondaswell
asthefractionofthefiledownloadeduptothattime.
Experimentalparameters.
Ourexperimentsconsistoftor-
rents that publish either asinglefile ofsize S =4MBor
abundleofK filesofaggregatesize KS. . Thepeer r arrival
rate for abundle is assumed to be the sumof the arrival
rates of its s constituent t files. . The e uplinkcapacity of f each
peer is „ „ = = 33 3 KBps s („ „ = = 50 KBps s in n x4.3). . The e pub-
lisher’suploadcapacityis50KBpsfor individualaswellas
bundled torrents. . There e is only one publisher that t alter-
natesbetweenbeingonandoff. Thepeerarrivalrate‚and
on/offbehaviorofthepublisherarevariedaccordingtothe
experimentalgoalsasdescribedbelow.
4.2 Bundlingimprovesavailability
Ourmodelsuggeststhatbundlingincreasesavailabilityby
increasingthelengthofbusyperiods and therebyreducing
therelianceonastablepublisher. Asan n extremecase,we
consider a a publisher r that initiates aswarmand then goes
offlinenevertocomeback,andlookathowlongtheswarm
remainsavailableafterthepublishergoesoffline.Weensure
thatthepublisherstaysonlinelongenoughforatleastone
peertofullydownloadthefile. Eachpeerleavesthesystem
immediatelyafterdownloadingthefile.
Weset‚=1/150peers/second for eachfileandallother
parameterstotheirdefaultvalues,andstudyhowtheavail-
ability of the publisher-less swarmvaries with thelevelof
bundlingK.
0
5
10
15
20
25
30
35
40
0
200
400
600
800
1000
1200
1400
Completed Downloads
Real Experiment Time (sec.)
K=1
K=2
K=4
K=6
K=8
K=10
K=10
K=6
Figure 4: : Availability y of seedless swarms and the
tradeoffinthechoiceofthebundlesize.
Figure4showsthenumberofpeersservedbetween0and
1500 seconds of the experiment for K=1, , 2, 4, 6, 8 8 and
10. Nopeercompletesitsdownloadinthefirst300seconds
of the experiment: : the e publisher is either waiting for the
firstpeer toarriveorisservingthefirstpeer in each case.
However, when n the e first peer completes s its s download and
thepublishergoesoffline,thecurvesforK=1;2;4exhibit
averydifferenttrend comparedtoK =6;8;10. . For r K =
1;2;4,only asmallnumber ofadditionalpeersare ableto
127
completetheirdownloadbeforepartsofthecontentbecome
unavailable.Ontheotherhand,forK=6;8;10,thenumber
ofcompleteddownloadsincreaseslinearly,i.e.,theswarmis
self-sustainingevenintheabsenceofapublisher.
In steady y state, , the e length h of f time the swarm m remains
self-sustainingafterthepublishergoesofflineisgivenbythe
mean residual busy period, B(m). . To o compute B(m) we
use eq. (13) with „ = 33KBps, s = 4MB and ‚ = 1=150
peers/s. A A threshold d coverage of m m = = 9leads to the fol-
lowingvaluesof B(m)forK=1to8,(0,0,47,569,2816,
8835,256446,75276).Thesevaluescapturethefactthatfor
K‚5theswarmsremainedself-sustainingthroughoutour
measurement.
Althoughthesystemgoesfrombeingunavailabletobeing
availableas K increasesfrom4to6, further increasingK
only results in n increased download d times.
The average
downloadtimeofpeerswhenK=10isroughly66%higher
thanthatforK=6(notshowninFigure4). Thissuggests
adelicatetradeoffinchoosingK—itshouldbelargeenough
tobridgegaps in publisherunavailability,butbeyondthat
point bundling g only y increases s download d times. . We e study
thistradeoffinmoredetailnext.
4.3 Bundlingcanimprovedownloadtime
In this section, we e consider r an intermittently y available
publisher with capacity 100KBps that alternately remains
onandofffor(exponentiallydistributed)meantimesof300s
and900srespectively. Thearrivalrateofpeersforeachfile
is ‚=1=60peers/second and the capacityof eachpeeris
„=50KBps. We e studyhow theaveragedownload time of
peersvarieswiththelevelofbundling.
Figures 5(a)–(c) show peer r arrivals s and departures over
time. Eachlinesegmentstartsattheinstantthatthepeer
arrives and terminates when the e peer r departs. . For r each
valueof K theexperimentlasts for 10runs of1200s each.
Figure 5(a) ) shows that for r K K = = 2, , many peers s complete
their downloads atroughlythesametime. . Theseflashde-
parturesindicatethattheswarmisnotself-sustaining. They
happen becauseextant as well as newly arrivingpeers get
stucksoonafterthepublishergoesoff,andmustwaituntil
thepublisherreappearsandservesthemissingblocksallow-
ingthemtocompletetheirdownloads. Ontheother hand,
settingK=3 (Figure5(b)) ) reduces thelikelihood d of f peers
beingblocked,andsettingK=4(Figure5(c))nearlyelim-
inates blockingastheswarmsustains itself duringperiods
ofpublisherunavailability.
Figure6(a) showsthemeandownloadtime asfunctionof
K. ForK=1and2,themeandownloadtimeremainslarge
asitisdominatedbythetimepeersspendwaitingforthe
publisher. Thelargevarianceisduetothevarianceinthe
downtimeof thepublisher. . WhenK=3,themeandown-
loadtimereducessignificantly,howeverthevarianceremains
large asthedownload timesare stillpartly determinedby
peers waiting g for the publisher r toreappear. . The e optimal
bundlesizeisK=4. Themeanandthemediandownload
timeaswellasthevariancearethelowestforthis valueof
Kasbundlingeliminatesgapsinpublisheravailability. For
values of K>4thedownloadtimeincreases linearly with
respecttoKasthedownloadtimeisdominatedbythetime
toactivelydownloadincreasinglybiggerbundles. Thevari-
ance continues toremainlowasthe swarmis increasingly
self-sustainingwithincreasingK.
4.3.1 Evaluationoftheanalyticalmodel
Next, we e validate our analytical l model l (Section 3.3.3)
against the experimental results s above. . We e compute e the
meandownloadtimeusingTheorem3.3, adapting (14)to
accountforthefact that thereisonlyonepublisherinthe
systemtoobtain
P=
exp
¡R
P
1
i=0
exp(¡
K
2
‚s
)(
K
2
‚s
)
i
i!
B(i;m)
UR+1
(16)
Thederivationoftheformulaisin[10]. Settings=„=80s,
‚=1=60peers/s,1=r=900arrivals/s,u=300sandm=9,
ourmodelpredictstheresultsobservedinFigure6(a)pretty
well. ThemodelleadstoanoptimalbundlesizeofK=5,
whereastheoptimalobservedintheexperimentswasK=4,
andcorrectlycapturesthetrendofthedownloadtimecurve.
4.3.2 Heterogeneousuploadrates
Next, we e repeat t the e above experiment t with h heteroge-
neouspeeruploadcapacities. Theuploadratedistribution
was taken from the e measured d data used to generate Fig-
ure1intheBitTyrantstudy[14]. Theaverageuploadrate
is280KBpsandthemedianis50KBps. Usingrealisticpeer
uploadcapacities doesnot qualitativelychange thebehav-
iorofthesystem(compareFigures6(a)and Figures 6(b)).
However, theoptimal bundle size is s now K K = = 5. . This s is
consistentwiththeincreaseintheaverageuploadcapacity
comparedtothevaluesobtainedfromtheexperimentswith
homogeneous capacities („=50KBps). . The e larger upload
capacity implies thatalarger bundleis neededtoincrease
thelengthofitsbusyperiodssoastomaketheswarmself-
sustainingduringperiodsofpublisherunavailability—acon-
clusionthatagreeswithourmodel.
4.3.3 Heterogeneousfilepopularites
Next,westudytheimpactofbundlingwhendifferentfiles
have different popularities. . We e consider r abundle e of f four
files. Weassumethatthepopularitiesofthefilesinsidethe
bundlearedistributedasfollows:‚
1
=1=8,‚
2
=1=16,‚
3
=
1=24and‚
4
=1=32. Werun5experiments,the e first four
correspondingtoswarms withindividualfiles(experiments
1,2, 3and 4) andthelastone toabundle of all the files
(experiment 5). . In n experimenti i (1 •i• 4) ) we set ‚
i
as
describedabove,andinexperiment5weset‚=
P
4
i=1
i
=
1=3:84.Allotherparametersaresettotheirdefaultvalues.
Themeandownload timesareillustrated inFigure6(c).
Theboxplotsandlines show thedistribution quartilesand
5th and 95th percentiles. . For r the e individual l files, as we
movetotherightinFigure6(c)(i.e.,asthepopularityofthe
filesdecreases)themeandownloadtimeincreases.Whenwe
considerabundleoffourfiles(experiment5,extremerightin
6(c))themeandownloadtimeis405s. Themeandownload
timeof the bundle is largerthanthemean downloadtime
of329s experienced for file 1in isolation butsmaller than
the mean download times for files 2, 3 and 4in isolation.
These results are explained as follows. . File e 1is the most
popularandstandslittletogaininavailability,sothecostof
downloadingmorecontentoutweighstheavailabilitybenefit
ofbundling. However,fortheless s popularfiles2,3and 4,
bundlingreducesthedownloadtimebykeepingtheswarm
self-sustainingduringperiodsofpublisherunavailability. In
summary, if f contents s have different popularities, bundling
128
(a)
(b)
0
50
100
150
200
250
300
350
400
450
0
2000
4000
6000
8000
10000
12000
Peer ID
Real Experiment Time (sec.)
0
100
200
300
400
500
600
0
2000
4000
6000
8000
10000
12000
Peer ID
Real Experiment Time (sec.)
0
100
200
300
400
500
600
700
800
0
2000
4000
6000
8000
10000
12000
Peer ID
Real Experiment Time (sec.)
(c)
Figure 5: : An n intermittent publisher: : (a)K=2;(b) ) K=3;(c)K=4. . Eachline e startswhena peer arrives and
endswhenitleaves. As s K increases,blockingprobability decreases.
0
500
1000
1500
2000
2500
1
2
3
4
5
6
7
8
K
(c)
(b)
0
500
1000
1500
2000
2500
1
2
3
4
5
Download time (sec.)
Experiment
Download time (sec.)
1
2
3
4
5
6
7
8
K
(a)
0
500
1000
1500
2000
2500
3000
Download time (sec.)
model
Figure 6: : Download d timeversus bundling g strategy. . (a) ) exponential l up and down times;(b) ) heterogeneous
upload rates;(c)heterogeneousdemand(‚
i
=
1
8i
,i=1;:::;4),filesbundledin experiment5.
mayincreasethedownloadtimesofpeersdownloadingthe
most popular contents but can benefit those downloading
unpopularfiles. Inthisexample,bundlingslightlyincreases
thedownloadtimesof48%ofpeerswhodownloadthemost
popularcontentbutsignificantlybenefitsthemajorityofthe
population.
4.3.4 Arrivalpatterns
OurmodelaswellasexperimentssofarassumedPoisson
peer arrivals at asteady rate. . To o evaluate the sensitivity
ofour conclusions tothePoissonassumption, werepeated
experimentssimilartothoseinFigure6usingscaledversions
ofrealarrivalpatternsobservedinourmeasurementtraces
collected in x2. . We e found that t using trace-driven n arrivals
doesnotqualitativelychangeourconclusions(referto [10]
fordetails).
However,webelieveourmodel’sconclusionsmaynothold
ifthemeanarrivalrateisnotsteadyforalongenoughdura-
tionoftime. Inparticular,ourmodelwilloverestimatethe
lengthofthebusyperiodandconsequentlyavailabilityifthe
arrivalratedecreasessignificantlybeforetheendofthebusy
period determined d by the current arrival rate. . Neverthe-
less,wefoundasignificantnumberofswarmswithrelatively
steadyarrivalratesinourmeasurementtraces.Forexample,
outofthe1,155swarmsassociatedwiththeTVshow“Lost”,
911werepublishedmorethanonemonthbeforewestarted
ourmeasurement. Figure e 7(a)shows atypicalnew swarm
in its firstmonth and atypicaloldswarmafter twoyears
ofits creation. . Thearrivalratesofoldswarmsshow w much
less variationcomparedtothearrivalratesofnewswarms.
Ourmodelcanbeusedtopredicttheavailability,download
times,andtheimpactofbundlingforsuchswarms.
0
2000
4000
6000
8000
10000
5
10
15
20
25
arrivals
0
5
10
15
20
870
880
890
900
910
920
930
arrivals
days since creation
days since creation
short-lived swarm
long-lived swarm
Figure7: Typicalpeerarrivalpatternsofshort-lived
and long-livedswarms.
5. RELATEDWORK
Alargebodyofpriorworkhasstudiedavailability,perfor-
manceandincentiveissuesinBitTorrent[3]. Toourknowl-
edge,thispaperpresentsthefirstanalyticalmodelforcon-
tent availability in BitTorrent-like swarmingsystems. . We
werealsounabletofindpriorworkstudyingtheavailability
andperformanceimplicationsofbundlinginBitTorrent.
Ramachandranetal.[18]studytheblockedleecherprob-
lem,whereextantaswellasarrivingpeersmayhavetowait
foralongperiodoftimeforsomeblocksofthefilethatare
nolonger available. . Toaddress s the problem, theypropose
BitStore,atoken-basedincentivearchitecturetoobtainthe
missing blocks cached at t other r peers that had previously
downloadedthefile.
Negliaetal. [13]performalarge-scalemeasurementstudy
to investigate availability in n BitTorrent. . They y find d that
trackeravailabilityis aserious enough problemthat many
torrentsusereplicatedorDHT-basedtrackers forfaulttol-
129
erance.Ourfocusisnotontheavailabilityofthetracker(or
controlplane),butoncontentavailability(ordataplane).
BothSusitaivaletal.[20]andWongetal.[22]relatethe
busyperiodoftheM/G/1queuetocontentavailabilityin
BitTorrent. Menasche e et al.[12] modelthe availability of
chunks in n a a swarm using aset t of M/G/1 queues in tan-
dem,underassumptionssimilartothoseofcouponcollector
systems. Ourmodeldiffersfrom[20,22,12]intwowaysas
it1)quantifiescontentavailabilitywhileaccountingforpub-
lisherdynamics,and2)quantifiestheimpactofbundlingon
availabilityanddownloadtime.
Inthecontextofenterpriseswarming,Menascheetal.[11]
studiedthestrategicinteractionbetweenpublishers,whoare
alwaysavailable,andpeers.Publisherscontroltheirpricing
andbundlingstrategieswhilepeersdecidewhichcontentto
download.Theauthorsshowthatinthemonopolythereal-
waysexistauniqueequilibriumbetweenthesinglepublisher
andthepeers.
QiuandSrikant[17]buildinguponearlierworkbyVeciana
et al. . [21] present a fluid d model l toanalyze the download
timeperformanceofBitTorrentinsteady-state.Incontrast,
our model accounts for r both h performance and availability
similar in spirit to o performability [7]. . Anaive e adaptation
ofthefluidmodelin[17]tobundlessuggestsstrictlylonger
downloadtimes under bundling,whereasour modelshows
that bundling can decrease e download d times by improving
availability.
Manyrecentworkshavestudiedperformanceandfairness
ofasingleswarm[8,9,1,4]. Collaborationacrossswarms
was studiedbyGuoet al. . [6]suggestingmanyunexplored
inter-torrent opportunities for block exchanges. . Piatek k et
al. [15] ] suggest that propagatingpeer reputations limited
toonehopcanincentexchangesacross swarms. . Sirivianos
etal.[19]proposeanarchitecturewhereacommercialcon-
tent providerprovides“credits”toincentmorecooperation
between peers. . Bundlingis s complementarytointer-swarm
collaboration based d on n micropayment schemes to o improve
contentavailability. Micropayment t schemesrequire acen-
tralbanktoenabletransactionsandatrackingmechanism
across swarms for peers tolocate eachother. . In n contrast,
bundlesareeasytosetupandrequirenochangetoexisting
trackersorclientsandisalreadyinwidespreaduse.
Economicsofbundling.
Productbundlingisacommoncommercialmarketingstrat-
egy. The e economics s literature e distinguishes s between two
forms of f bundling [5]. . In n pure bundling or r tying, , a a con-
sumercanpurchasetheentirebundleornothingatall. In
mixed bundling,consumershaveachoicetoselectparts of
thepackage.
Bothformsofbundlingexistandhavetheirprosandcons
inBitTorrent’s“bandwidthmarket”aswell.Publisherscan
implementpurebundlingbydistributingbundledcontentas
aziparchive.Byforcingpeerstodownloadthewholebun-
dle,purebundlingmaymakeunpopularfilesmoreavailable,
whilesubsidizingbandwidthcosts forthepublisher. . How-
ever,itcandelaythoseseekingexclusivelypopular filesby
forcingthemtodownloadcontenttheydonotwant.
Mixed bundlingis s more e common n and d can also improve
availability. Publishers s typically y bundle files according to
userinterests,thusbundlingcanserveasamutuallybenefi-
cial recommendation system. . A A user seeking one episode
of a TV V show w may decide to fetch the entire e season n for
possible future viewing. . A A publisher r might recommend a
movieaspartofabundletoauserwhomaypreviewitand
choose to pay for it after all [16, 22]. . Even n asmall frac-
tion of f users s opting g to download d more content t than they
strictlysought can significantlyimprove availability. . Both
mixed and pure e bundling in n BitTorrent have e a beneficial
side-effect: theyreplicateunpopularorrarecontentimplic-
itlyincreasingtheirdurabilityinthelongrun,i.e.,itreduces
thelikelihoodofrarecontentbeinglostpermanently.
Ourworkopensupseveralavenuesoffuturework. First,
bundlingmayincreasethetrafficinthenetwork,whichmo-
tivatesstudyingtheimplicationsofbundlingforISPpricing
aswellasits impacton content locality. . Second,although
our worksheds somelighton whatfiles make goodcandi-
datesforbundling,moreworkisneededtounderstandhow
acontentprovidershouldoptimallybundlefilestomeetper-
formanceorcostobjectives,especiallywhenthedemandfor
abundlemaybedifferentfromtheaggregatedemandforits
constituentfiles.
6. CONCLUSIONS
Peer-to-peer swarming in BitTorrent t scales impressively
to tolerate massive e flash crowds, , but falls s short on avail-
ability. Although h itiscommonlyobserved thatBitTorrent
accountsforuptohalfofallInternettraffictoday,itisless
wellknown that half oftheswarmsare unavailablehalfof
the time—an n observation n that does not bode well for the
increasingcommercialinterestinintegratingswarmingwith
server-basedcontentdissemination. Ourworkisafirststep
towardsdevelopingafoundationalunderstandingofcontent
availabilityinswarmingsystems.
By viewing BitTorrent as s a a queueing g system, , we e were
abletomodelcontentavailability. Themodelsuggeststwo
importantimplications forbundlingofcontent,acommon
practice among swarm m publishers today. . First, , bundling
improves content availability. . Second, , when the publisher
is highly unavailable, bundling g reduces s the download time
experiencedbypeerstoobtainunpopularcontent. Thelat-
ter implication is particularly intriguingas peers take less
timetodownloadmorecontent. Althoughthemodelmakes
severalsimplifyingassumptions,wewereabletoempirically
validateitsconclusionsthroughlarge-scalecontrolledexper-
imentswiththemainlineBitTorrentclientoverPlanetlab.
7. ACKNOWLEDGEMENT
WethankPeterKey,JohnLui,GiovanniNegliaandShlomo
Zilberstein for fruitful discussion on n bundling. . This s work
was supported in part by y the NSF under award d numbers
CNS-0519922andCNS-0721779. Research h ofD.Menasche
alsofundedbyascholarshipfromCAPES/Fulbright(Brazil).
8. REFERENCES
[1] Bharambe, A. R.,Herley,C.,and
Padmanabhan,V.N.Someobservationson
Bittorrentperformance.InSIGMETRICS (2005).
[2] Browne,S., andSteele,J.Transientbehaviorof
coverageprocesseswithapplicationstotheinfinite
serverqueue.J.Appl.Prob.30(3)(1993),589–601.
[3] Cohen,B.IncentivesbuildrobustnessinBittorrent.
InP2PEcon (2003).
130
Documents you may be interested
Documents you may be interested