itextsharp pdf to text c# : Extract data from pdf to excel online software application project winforms windows azure UWP ExpressionSetIntroduction0-part158

An Introduction to Bioconductor’s ExpressionSet Class
SethFalcon, Martin Morgan, andRobertGentleman
6October, 2006; revised 9 February, 2007
1 Introduction
Biobase is part of the Bioconductor project,andis usedby many other packages. Biobase
containsstandardizeddatastructurestorepresentgenomicdata. TheExpressionSet classis
designedtocombineseveraldierentsourcesofinformationintoasingleconvenientstructure.
AnExpressionSet canbemanipulated(e.g.,subsetted,copied)conveniently,andistheinput
oroutputfrommanyBioconductor functions.
ThedatainanExpressionSet iscomplicated,consistingofexpressiondatafrommicroar-
rayexperiments(assayData;assayDataisusedtohintatthemethodsusedtoaccessdier-
entdatacomponents,aswewillseebelow),‘meta-data’describingsamplesintheexperiment
(phenoData),annotationsandmeta-dataaboutthefeatures onthechiportechnologyused
for the experiment (featureData, annotation), information related to the protocol used
forprocessingeach sample (andusually extractedfrom manufacturerles,protocolData),
and a  exible structure to describe the experiment (experimentData). The ExpressionSet
class coordinates allof this data, so that you do not usually have to worry about the de-
tails. However,anExpressionSet needstobecreatedinthe rst place,andcreationcanbe
complicated.
Inthis introductionwe learnhowto create andmanipulate ExpressionSet objects,and
practicesomebasicRskills.
2 Preliminaries
2.1 Installing Packages
Ifyouarereadingthisdocumentandhavenotyetinstalledany softwareonyourcomputer,
visithttp://bioconductor.organdfollowtheinstructionsforinstallingRandBioconduc-
tor. Onceyou have installedR andBioconductor,youarereadytogowiththis document.
Inthefuture,youmight ndthatyouneedtoinstalloneormoreadditionalpackages. The
bestwaytodothisis tostartanRsessionandevaluatecommands like
> source("http://bioconductor.org/biocLite.R")
> biocLite(c("Biobase"))
1
Extract data from pdf to excel online - extract form data from PDF in C#.net, ASP.NET, MVC, Ajax, WPF
Help to Read and Extract Field Data from PDF with a Convenient C# Solution
extract data from pdf c#; pdf data extractor
Extract data from pdf to excel online - VB.NET PDF Form Data Read library: extract form data from PDF in vb.net, ASP.NET, MVC, Ajax, WPF
Convenient VB.NET Solution to Read and Extract Field Data from PDF
how to save a filled out pdf form in reader; exporting pdf form to excel
2.2 Loading Packages
The denition of the ExpressionSet class along with many methods for manipulating Ex-
pressionSet objects are dened in the Biobase package. In general, youneed to loadclass
andmethoddenitionsbeforeyouusethem. WhenusingBioconductor,thismeansloading
Rpackagesusinglibraryorrequire.
> library("Biobase")
Exercise 1
What happens whenyoutrytoloadapackagethatis notinstalled?
Whenusinglibrary,youget an errormessage. Withrequire,the
returnvalueis FALSE andawarningisprinted.
3 Building an ExpressionSet From .CELandotherles
Many users have access to .CEL or other les produced by microarray chip manufacturer
hardware. UsuallythestrategyistouseaBioconductorpackagesuchasayPLM,ay,oligo,
or limma,to read theseles. These Bioconductor packages have functions (e.g.,ReadAffy,
expresso,orjustRMAinay)toreadCELlesandperformpreliminarypreprocessing,and
to represent the resulting data as an ExpressionSet or other type of object. Suppose the
result from reading and preprocessing CEL or other les is named object, and object is
dierentfromExpressionSet;agoodbet istotry,e.g.,
> library(convert)
> as(object, "ExpressionSet")
Itmightbethecasethatnoconverterisavailable. Thepaththenistoextractrelevantdata
fromobjectandusethis tocreateanExpressionSet usingtheinstructions below.
4 Building an ExpressionSet From Scratch
As mentioned in the introduction, the data from many high-throughput genomic experi-
ments,suchasmicroarrayexperiments,usuallyconsistofseveralconceptuallydistinctparts:
assay data,phenotypicmeta-data,feature annotations andmeta-data,andadescriptionof
theexperiment. We’llconstructeachofthesecomponents,andthenassemblethem intoan
ExpressionSet.
2
VB.NET Create PDF from Excel Library to convert xlsx, xls to PDF
Image: Insert Image to PDF. Image: Remove Image from PDF Page. Image: Copy, Paste, Cut Image in Page. Data: Read, Extract Field Data. Data: Auto Fill-in Field
save pdf forms in reader; export excel to pdf form
VB.NET PDF Text Extract Library: extract text content from PDF
Online Visual Basic .NET class source code for quick evaluation. If you want to extract text from a PDF document using Visual Basic .NET programming language
exporting data from pdf to excel; extracting data from pdf files
4.1 Assay data
One important part of the experiment is a matrix of ‘expression’ values. The values are
usually derivedfrom microarrays ofone sortor another,perhapsafterinitialprocessingby
manufacturer software or Bioconductor packages. The matrix has F rows and S columns,
whereF isthenumber offeaturesonthechipandS is thenumberofsamples.
Alikelyscenarioisthatyourassaydataisina’tab-delimited’textle(asexportedfrom
aspreadsheet, for instance) with rows corresponding to features and columns to samples.
ThestrategyistoreadthisleintoRusingtheread.tablecommand,convertingtheresult
toamatrix. Atypicalcommandtoreadatab-delimitedlethat includescolumn‘headers’
is
> dataDirectory <- system.file("extdata", package="Biobase")
> exprsFile <- file.path(dataDirectory, "exprsData.txt")
> exprs <- as.matrix(read.table(exprsFile, header=TRUE, sep="\t",
+
row.names=1,
+
as.is=TRUE))
Thersttwolinescreatealepathpointingtowheretheassaydataisstored;replacethese
withacharacterstringpointingtoyourownle,e.g,
> exprsFile <- "c:/path/to/exprsData.txt"
(Windows users: notetheuseof/ratherthan\;thisisbecause Rtreatsthe\characteras
an‘escape’sequencetochangethemeaningofthesubsequentcharacter). Seethehelppages
forread.tableformoredetail. Acommonvariantisthatthecharacterseparatingcolumns
isacomma(\comma-separatedvalues",or\csv"les),inwhichcasethesepargumentmight
besep=",".
Itisalwaysimportant toverifythatthedatayouhavereadmatches yourexpectations.
At a minimum, check the class and dimensions of geneData and take a peak at the rst
severalrows
> class(exprs)
[1] "matrix"
> dim(exprs)
[1] 500 26
> colnames(exprs)
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
[16] "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
> head(exprs[,1:5])
3
C# PDF Image Extract Library: Select, copy, paste PDF images in C#
image. Extract image from PDF free in .NET framework application with trial SDK components and online C# class source code. A powerful
filling out pdf forms with reader; how to make pdf editable form reader
C# PDF Text Extract Library: extract text content from PDF file in
Free online source code for extracting text from adobe PDF document in C#.NET class. Able to extract and get all and partial text content from PDF file.
extract data from pdf using java; pdf data extraction
A
B
C
D
E
AFFX-MurIL2_at 192.7420 85.75330 176.7570 135.5750 64.49390
AFFX-MurIL10_at 97.1370 126.19600 77.9216 93.3713 24.39860
AFFX-MurIL4_at
45.8192
8.83135 33.0632 28.7072 5.94492
AFFX-MurFAS_at
22.5445
3.60093 14.6883 12.3397 36.86630
AFFX-BioB-5_at
96.7875 30.43800 46.1271 70.9319 56.17440
AFFX-BioB-M_at
89.0730 25.84610 57.2033 69.9766 49.58220
At this point, we can create aminimal ExpressionSet object using theExpressionSet
constructor:
> minimalSet <- ExpressionSet(assayData=exprs)
We’ll get more benet from expression sets by creating a richer object that coordinates
phenotypic andother datawithourexpressiondata,asoutlinedinthefollowingsections.
4.2 Phenotypic data
Phenotypic data summarizes information about the samples (e.g., sex,age,and treatment
status;referredtoas‘covariates’).Theinformationdescribingthesamplescanberepresented
asatablewithSrowsandV columns,whereV isthenumberofcovariates. Anexampleof
phenotypic datacanbeinputwith
> pDataFile <- file.path(dataDirectory, "pData.txt")
> pData <- read.table(pDataFile,
+
row.names=1, header=TRUE, sep="\t")
> dim(pData)
[1] 26 3
> rownames(pData)
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
[16] "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
> summary(pData)
gender
type
score
Female:11
Case
:15
Min.
:0.1000
Male :15
Control:11
1st Qu.:0.3275
Median :0.4150
Mean
:0.5369
3rd Qu.:0.7650
Max.
:0.9800
4
VB.NET PDF Image Extract Library: Select, copy, paste PDF images
Extract image from PDF free in .NET framework application with trial SDK components for .NET. Online source codes for quick evaluation in VB.NET class.
extract table data from pdf to excel; fill in pdf form reader
C# Create PDF from Excel Library to convert xlsx, xls to PDF in C#
Online C#.NET Tutorial for Create PDF from Microsoft Office Excel Spreadsheet Using .NET XDoc.PDF Library. Free online Excel to PDF converter without email.
extract data from pdf file; pdf form field recognition
Therearethreecolumns ofdata,and26rows. Notethatthenumber ofrowsofphenotypic
datamatchthenumberofcolumnsofexpressiondata,andindeedthattherowandcolumn
names areidenticallyordered:
> all(rownames(pData)==colnames(exprs))
[1] TRUE
This is an essential feature ofthe relationshipbetweenthe assay andphenotype data; Ex-
pressionSet willcomplainifthesenamesdonotmatch.
Phenotypicdatacantakeonanumberofdierent forms. Forinstance,somecovariates
might reasonably be represented as numeric values. Other covariates (e.g., gender, tissue
type, or cancer status) might better be represented as factor objects (see the help page
for factor for more information). It is especially important that the phenotypic data are
encoded correctly; the colClasses argument to read.table can be helpful in correctly
inputing(andignoring,ifdesired) columnsfromthele.
Exercise 2
What classdoesread.tablereturn?
Exercise 3
DeterminethecolumnnamesofpData. Hint: apropos("name").
> names(pData)
[1] "gender" "type"
"score"
Exercise 4
Usesapplytodeterminetheclasses ofeachcolumnofpData. Hint: readthehelppagefor
sapply.
> sapply(pData, class)
gender
type
score
"factor" "factor" "numeric"
Exercise 5
WhatisthesexandCase/Controlstatusofthe15thand20thsamples? Andforthesample(s)
withscoregreaterthan0:8.
5
C# HTML5 PDF Viewer SDK to view PDF document online in C#.NET
Text: Replace Text in PDF. Image: Insert Image to PDF. Image: Remove Image from PDF Page. Form Process. Data: Read, Extract Field Data. Data: Auto Fill-in Field
extract data from pdf to excel online; extract data from pdf form to excel
VB.NET PDF- View PDF Online with VB.NET HTML5 PDF Viewer
Image: Insert Image to PDF. Image: Remove Image from PDF Page. Image: Copy, Paste, Cut Image in Page. Data: Read, Extract Field Data. Data: Auto Fill-in Field
pdf form save with reader; fill in pdf form reader
> pData[c(15, 20), c("gender", "type")]
gender type
O Female Case
T Female Case
> pData[pData$score>0.8,]
gender
type score
E Female
Case 0.93
G
Male
Case 0.96
X
Male Control 0.98
Y Female
Case 0.94
Investigatorsoftenndthatthemeaningofsimplecolumnnamesdoesnotprovideenough
information about thecovariate {Whatis the cryptic name supposed torepresent? What
unitsarethecovariatesmeasuredin? Wecancreateadataframecontainingsuchmeta-data
(orreadthe informationfrom aleusingread.table)with
> metadata <- data.frame(labelDescription=
+
c("Patient gender",
+
"Case/control status",
+
"Tumor progress on XYZ scale"),
+
row.names=c("gender", "type", "score"))
This creates adata.frame object with asingle columncalled labelDescription,and with
rownamesidenticaltothecolumnnamesofthedata.frame containingthephenotypicdata.
ThecolumnlabelDescription must bepresent;other columnsareoptional.
Bioconductor’s Biobase package provides a class called AnnotatedDataFrame that con-
veniently stores and manipulates the phenotypic data and its metadata in a coordinated
fashion. Create andviewanAnnotatedDataFrame instancewith:
> phenoData <- new("AnnotatedDataFrame",
+
data=pData, varMetadata=metadata)
> phenoData
An object of class
AnnotatedDataFrame
rowNames: A B ... Z (26 total)
varLabels: gender type score
varMetadata: labelDescription
SomeusefuloperationsonanAnnotatedDataFrame includesampleNames,pData (toextract
theoriginalpDatadata.frame),andvarMetadata. Inaddition,AnnotatedDataFrame objects
canbesubset muchlikeadata.frame:
6
> head(pData(phenoData))
gender
type score
A Female Control 0.75
B
Male
Case 0.40
C
Male Control 0.73
D
Male
Case 0.42
E Female
Case 0.93
F
Male Control 0.22
> phenoData[c("A","Z"),"gender"]
An object of class
AnnotatedDataFrame
rowNames: A Z
varLabels: gender
varMetadata: labelDescription
> pData(phenoData[phenoData$score>0.8,])
gender
type score
E Female
Case 0.93
G
Male
Case 0.96
X
Male Control 0.98
Y Female
Case 0.94
4.3 Annotations and feature data
Meta-dataonfeatures is as important asmeta-dataonsamples,andcanbe very largeand
diverse. Asinglechipdesign(i.e.,collectionoffeatures)islikelytobeusedinmanydierent
experiments,anditwouldbeinecienttorepeatedlycollectandcoordinatethesamemeta-
dataforeachExpressionSetinstance. Instead,theideasistoconstructspecializedmeta-data
packagesforeachtypeofchiporinstrument. Manyofthesepackagesareavailablefromthe
Bioconductorwebsite. Thesepackages containinformationsuchas thegene name,symbol
andchromosomallocation. Thereareothermeta-datapackagesthatcontaintheinformation
thatisprovidedbyotherinitiativessuchasGOandKEGG.TheannotateandAnnotationDbi
packagesprovides basicdatamanipulationtoolsforthemeta-datapackages.
The appropriate way tocreateannotationdata for featuresisvery straight-forward: we
provideacharacterstringidentifyingthetypeofchipusedintheexperiment. Forinstance,
thedataweareusingis fromtheAymetrixhgu95av2chip:
> annotation <- "hgu95av2"
It is also possible to record information about features that are unique to the experiment
(e.g., agging particularlyrelevantfeatures). This is donebycreatingor modifyinganAn-
notatedDataFramelikethatforphenoDatabutwithrownamesoftheAnnotatedDataFrame
matchingrowsofthe assaydata.
7
4.4 Experiment description
Basic descriptionabout the experiment(e.g.,the investigator or lab wherethe experiment
was done,anoveralltitle,andothernotes) canbe recorded by creating a MIAME object.
Oneway tocreateaMIAME objectis tousethenewfunction:
> experimentData <- new("MIAME",
+
name="Pierre Fermat",
+
lab="Francis Galton Lab",
+
contact="pfermat@lab.not.exist",
+
title="Smoking-Cancer Experiment",
+
abstract="An example ExpressionSet",
+
url="www.lab.not.exist",
+
other=list(
+
notes="Created from text files"
+
))
Usually,newtakesasargumentstheclassnameandpairsofnamesandvaluescorresponding
todierentslotsintheclass;consultthehelppageforMIAME fordetailsofavailableslots.
4.5 Assembling an ExpressionSet
AnExpressionSet object is createdby assembling its component parts andcallng the Ex-
pressionSetconstructor:
> exampleSet <- ExpressionSet(assayData=exprs,
+
phenoData=phenoData,
+
experimentData=experimentData,
+
annotation="hgu95av2")
Notethatthe names onthe right ofeach equalsign canreferto any object of appropriate
class fortheargument. See thehelppageforExpressionSet formoreinformation.
We created a rich data object to coordinate diverse sources of information. Less rich
objects can be created by providing less information. As mentioned earlier, a minimal
expressionsetcanbecreatedwith
> minimalSet <- ExpressionSet(assayData=exprs)
Ofcoursethisobjecthasnoinformationaboutphenotypicorfeaturedata,oraboutthechip
usedfortheassay.
5 ExpressionSet Basics
Now that you have an ExpressionSet instance, let’s explore some of the basic operations.
You canget anoverviewof the structure and available methods for ExpressionSet objects
by readingthehelppage:
8
> help("ExpressionSet-class")
Whenyouprint anExpressionSet object,abrief summary ofthecontents ofthe object
is displayed(displayingtheentireobjectwouldllyourscreenwithnumbers):
> exampleSet
ExpressionSet (storageMode: lockedEnvironment)
assayData: 500 features, 26 samples
element names: exprs
protocolData: none
phenoData
sampleNames: A B ... Z (26 total)
varLabels: gender type score
varMetadata: labelDescription
featureData: none
experimentData: use
experimentData(object)
Annotation: hgu95av2
5.1 Accessing Data Elements
AnumberofaccessorfunctionsareavailabletoextractdatafromanExpressionSet instance.
Youcanaccessthecolumnsofthephenotypedata(anAnnotatedDataFrame instance)using
$:
> exampleSet$gender[1:5]
[1] Female Male
Male
Male
Female
Levels: Female Male
> exampleSet$gender[1:5] == "Female"
[1] TRUE FALSE FALSE FALSE TRUE
You can retrieve the names of the features using featureNames. For many microarray
datasets,thefeaturenamesaretheprobesetidentiers.
> featureNames(exampleSet)[1:5]
[1] "AFFX-MurIL2_at" "AFFX-MurIL10_at" "AFFX-MurIL4_at"
[4] "AFFX-MurFAS_at" "AFFX-BioB-5_at"
The unique identiers of the samples in the data set are available via the sampleNames
method. ThevarLabels methodlists thecolumnnames ofthephenotypedata:
> sampleNames(exampleSet)[1:5]
9
[1] "A" "B" "C" "D" "E"
> varLabels(exampleSet)
[1] "gender" "type"
"score"
Extracttheexpressionmatrixofsampleinformationusing exprs:
> mat <- exprs(exampleSet)
> dim(mat)
[1] 500 26
5.1.1 Subsetting
ProbablythemostusefuloperationtoperformonExpressionSet objectsissubsetting. Sub-
settinganExpressionSet isverysimilartosubsettingtheexpressionmatrixthatiscontained
withinthe ExpressionSet,therst argument subsets thefeatures and the secondargument
subsetsthesamples. Herearesomeexamples: CreateanewExpressionSet consistingofthe
5featuresandthe rst 3samples:
> vv <- exampleSet[1:5, 1:3]
> dim(vv)
Features Samples
5
3
> featureNames(vv)
[1] "AFFX-MurIL2_at" "AFFX-MurIL10_at" "AFFX-MurIL4_at"
[4] "AFFX-MurFAS_at" "AFFX-BioB-5_at"
> sampleNames(vv)
[1] "A" "B" "C"
Createasubsetconsistingofonly themalesamples:
> males <- exampleSet[ , exampleSet$gender == "Male"]
> males
ExpressionSet (storageMode: lockedEnvironment)
assayData: 500 features, 15 samples
element names: exprs
protocolData: none
phenoData
sampleNames: B C ... X (15 total)
varLabels: gender type score
varMetadata: labelDescription
featureData: none
experimentData: use
experimentData(object)
Annotation: hgu95av2
10
Documents you may be interested
Documents you may be interested