system thatprescribes howtotranslate 127 characters into single bytes (wheretheﬁrstbitof
eachbyte is necessarily 0). TheASCII characters include the upperand lowercaseletters of the
Latinalphabet (a-z,A-Z),Arabicnumbers (0-9),anumberof punctuation characters and a
numberof invisible so-calledcontrol characters such as newlineand carriagereturn.
Although itis widely used,ASCII is obviously incapableof encoding characters outsidethe
Latinalphabet,soyoucansay ``hello'', butnot``㗾㔀㔄㗼 㔎㗼㔍''in this encoding. Forthis reason,a
numberof characterencodingsystems have been developed thatextendASCII orreplaceitall
together. Somewell-known schemes includeUTF-8andlatin1. The characterencoding
schemethatis used by defaultby youroperating system is deﬁned in yourlocalesettings.
MostUnix-alikes useUTF-8 by defaultwhileolderWindowsapplications,including theWindows
version ofR uselatin1. TheUTF-8 encodingstandard is widely used to encodeweb pages:
according toa frequently repeated survey ofw3techs
,about75%ofthe10 million most
visitedweb pages are encoded inUTF-8.
Youcan ﬁnd outthecharacterencoding of your system by typing(not copy-pasting!) a
non-ASCIIcharacterand askfor the encoding scheme, likeso.
##  "unknown"
If the answeris"unknown",this means that the local nativeencoding is used. The default
encoding used by yourOS can berequested by typing
##  "en_US.UTF-8"
ForRtobeableto correctly readinatextﬁle,it mustunderstandwhich characterencoding
schemewas usedto store it. By default,R assumes thatatextﬁle is stored intheencoding
schemedeﬁned by theoperatingsystem'slocalesetting. This may failwhen theﬁle was not
generated on the same computerthatR is running on butwas obtained fromtheweb for
example. To makethings worse,itis impossibleto determineautomatically with certainty from
aﬁlewhat encoding schemehas been used(althoughforsome encodings it is possible). This
means thatyoumay run into situations whereyouhaveto tellRliterally in which encoding aﬁle
has beenstored. Onceaﬁlehas been read intoR,a charactervectorwill internally be translated
ThefileEncoding argument ofread.table and its relatives tellsR whatencoding scheme
was used to storetheﬁle. ForreadLines theﬁleencoding mustbe speciﬁed when theﬁleis
opened,beforecallingreadLines,as in theexamplebelow.
# 1. open a connection to your file, specifying its encoding
f <- file("myUTF16file.txt", encoding = "UTF-16")
# 2. Read the data with readLines. Text read from the file is converted to
# uft8 or latin1
input <- readLines(f)
# close the file connection.
When reading theﬁle,R will nottranslate the encoding toUTF-8orlatin1 by itself,butinstead
relies onanexternaliconvlibrary. Depending on theoperating system,R eitheruses the
conversion service oﬀered by theOS, oruses athird-party library included withR.R'siconv
function allows users to translatecharacterrepresentations,becauseof theOS-dependencies