




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
-
PAGE
1
-
ASimplifiedParallelDecodingSchemeforTurboCodes
WangLe,HuangYu,YangHongwen
WirelessCommunicationCenter,BeijingUniversityofPostsandTelecommunications,Beijing,China(100876)
Abstract
ParalleldecodingofTurbocodesisvitaltotheapplicationswithveryhighdatarates.Intheoverlappingparalleldecodingscheme,thedecodingthroughputcannotincreaselinearlywiththenumberofsegments.Aschemewherethethroughputcanincreaselinearlywasproposed,wheretheboundaryinformationofeachiterationisstoredtobetheinitialconditioninnextiteration.Inthispaper,wehavesimplifiedtheschemebyonlystoringtheindexofthemostprobablestateoftheboundary.Withthecostofsomesmallperformanceloss,theschemecanreducelargelytheextramemoryrequiredtostoretheboundaryinformation.
Keywords: Turbocodes;MAPalgorithm;paralleldecoding
Introduction
Withtheever-increasingdemandofthedatarate,turbocodeisfacingstringentchallenges.ITUsuggeststhatthedatarateof4Gsystembeatleast100Mbps,inmobileconditions.Asthelocaloperatingclockfrequencyinhardwaresystemistypicallywithin100MHznowadays,itisreported[1]thatthethroughputofonly6.17Mbpscanbeachievedforturbocodeswithconventionalserialdecoding[2].Obviously,thedecodingalgorithmofturbocodehastobeimprovedsignificantlytosurviveinthefuturecommunicationsystems.
Thebottleneckofthethroughputliesinitsdecodingscheme.Theconvolutionalcodesistypicallyemployedastheconstituentcodes,leadingtothedecodingalgorithmemployssomerecursiveschemesbasedontrellis,suchassymbol-by-symbolMaximumAPosterioriprobability(MAP)algorithm[3].Itrequireslotsofrecursivecomputationsinbothcomponentdecoders.Moreover,thedecodingprocedureisiterative.Hence,thedecodingdelayisquitelarge.
Toreducethedecodingdelayandimprovethedecodingthroughput,theparalleldecodingmethodwasproposed[4]bydividingthewholeframeintomultiplesub-blocksanddecodingthesegmentswithabankofidenticalsub-blockMAPprocessors.Withtheparalleldecodingscheme,thethroughputcanbeimprovedbyafactorofaboutK,thenumberofsub-blockssegmented.Underthesameconditionsasmentionedpreviously,thethroughputof100MbpscanbeachievedwithK=16.
However,asaresultofthesegmentation,eachsub-blocksuffersthelostofboundaryinformation(initialconditions),whichisvitalfortheperformance.Togettheboundaryinformation,amethodwithsegmentsoverlappingisproposed[4],bycomputingsomemoreredundanttrellisstages.However,suchoverlappingsufferssomeextracomputationaldelay,andleadstolowerdecodingspeed.TheactualspeedupfactorissmallerthanK,especiallywhenKislargeandtheframelengthissmall.Anotherwaytogettheboundarydistributioninformationwasproposedin[5].Itsuggeststousetheforwardandbackwardvariablescomputedinthepreviousiterationtoprovideanapproximationtotheboundary.Simulationresultsshowthatitsperformanceisveryclosetotheserialdecodingschemein[2].Thecostofthismethodissomeadditionalmemoryneededtostoretheintermediateboundaryinformation.
Toreducethesizeoftheadditionalmemoryin[5],wesuggestthatinsteadofthewholeboundarydistributioninformation,onlytheindexofthemostprobablestatebestored.Withthissimplifiedstoragescheme,wecansavealotofmemoryatthecostofsomeacceptableperformanceloss.
Therestofthepaperisorganizedasfollows.First,section2givesabriefdescriptionofthestandardserialdecodingtoshowitsweakpointindecodingspeed.Section3showsthetwoparalleldecodingmethodsproposedin[4-5].Ourproposedschemeisgiveninsection4andsection5isthesimulationresults.Finally,section6concludesthepaper.
SerialdecodingschemeofTurbocodes
TurbocodesintroducedbyBerrouetal[2]haveanear-Shannoncapability,thusbeingveryattractivetovariousapplications.However,thelargedecodingdelayprohibitsitsapplications,especiallyforthosetimeconstraintorhighdatarateservices.
ThedecoderofTurbocodesemploysaseriallyconcatenatedstructure,whichconsistsoftwocomponentdecoders,asshowninFig.1.Thedecodingproceedsinaniterativemanner.Ineachiteration,thefirstcomponentdecoder,correspondingtothefirstcomponentencoder,deliversitscalculatedextrinsicinformation(Le1)tothe2ndcomponentdecoderafterinterleavingtoserveastheapriori
information(La2).Thenthesecondcomponentdecoderfeedbacksitsextrinsicinformation(Le2)tothefirstoneafterde-interleaving,workingasaprioriinformation(La1).xandxinFig.1are
receivedinformationbitsanditsinterleavedversion,respectively.The
y1and
y2arereceived
redundantbitsproducedbythecomponentRecursiveSystematicConvolutional(RSC)encoders,respectively.
La1x
y1
Le1
La2
x
y2
Le2
LLRs
Figure1.DecoderofTurboCodes
ThefunctionofthecomponentdecoderistocalculatetheLLRs(LogarithmofLikelihoodRatio)fortheinformationbits.ThecalculationofLLRsgenerallyemploystheMAPalgorithm[3]wheretheLLRsfortheithinformationbitofalengthNframecanbeexpressedas
i(s)i(s)i(s,1)
LLRiLog
1
i(s)i(s)i(s,0)
0
(1)
wherethesummationinnumeratoranddenominatorisoverthesetofbranchesintrellisdiagram,inwhichtheinformationbitis1or0,respectively.NotethatwearenotgoingtoexplainthedetailsoftheMAPalgorithmoritsvariants;rather,wesimplyshowtherecursivenatureinthealgorithmsthat
prohibitthedecodingspeed.Therearethreekindsofvariablesin(1).
i(s)
and
i(s)
denotethe
forwardandbackwardvariablesintheithtrellisstage,
is,d
isthebranchmetriccorresponding
thebranchwhichstartsfromstatesandtheinformationbitisd.
Thei
in(1)canbedirectlycalculatedwithchannelinputsandtheaprioriinputswithinoneclock
cycle,providedthatalltheinputsnecessaryarereadytouse.Itistherequiresrecursivecalculationasshownin(2)and(3)forbinaryRSC:
i(s)
and
i(s)
which
i(s)i1(s1)i(s1,1)i1(s0)i(s0,0),
i1,2...N1
(2)
1
1
0
0
i(s)i1(s)i1(s,1)i1(s)i1(s,0),
iN1,N2...1
(3)
wheresd
isthepreviousstatecomingtostateswiththeinputinformationbitsasd.
isthe
s
d
arrivingstateviathecurrentstateswithinputbitasd.Theinitialconditionfortherecursivecalculationis
0
0
(s)1
(s)1
s0
else
s0
(4)
(5)
0
N else
Notethattheinitialcondition(5)onlyappliesfortheRSCwhichisterminated[6].IfRSCisopen[2],theinitialconditionshouldbe
1
N(s)2m, s
(6)
where2misthenumberofthetrellisstates.ThewholeprocessisshowninFig.2.
LLRs
Figure2.theoperationflowchartforMAPalgorithm
Apparently,tocalculatei,weshouldcalculate
i1
first.Andtocalculate
N1,wehavetowait
forallpastibeenobtainedandthesituationissimilartoi.Thus,theprocessingforone
componentdecoderinoneiterationcannotbefinishedwithinNclockcycles(someextraclock
cyclesmayberequiredtoexecutesomeotheroperations).Iftheclockfrequencyis
fc,thenthedelay
forprocessingoftheMAPismorethan
Tmap
N
f
.Supposethatthemaximumiterationis
c
Imax,then
thedelayfordecodingonecodewordofturbocodeismorethanT
2T I
2NI
,leadsto
dec map
max
max
c
ainformationthroughputlessthan
RNTdec
fc
2Imax
f
.Forexample,when
fc100MHz
and
Imax8,thethroughputcannotbelargerthan6.25Mbps.Notethatwithsomeearly-stopiterationstrategy[7],theactualnumberofiterationstosuccessfullydecodeonecodewordmaybemuchless
than
Imax,butthehardwarehavetobedesignedwith
Imax.
Paralleldecodingmethods
Toreducethedecodingdelayandimproveitsthroughput,wecandividethewholecodewordintomultiplesub-blocksanddecodethemwithmultipleMAPprocessorsinparallel.However,thesegmentsinthemiddlehavenoinitialcondition[4-5].Withrandomorarbitraryinitializationas(6)willleadtoaperformancelossthatisunacceptable.Sodirectlyworkingateachsegmentfacesaproblemofhowto
initializetheforwardandbackwardvariablesattheboundaryofeachsegment.Asshownin(2),
ks
dependsonallpast
js,jk.Notethatthefarawaythejleavesk,thelesseffect
will
j(s)
haveon
k(s).Thus,tohaveagoodaccuracyforthe
k(s)
atthebeginningofeach
sub-block,weshouldstarttheprocessingof(2)earlieratjkL
withLlargeenough[4].This
isshowninFig.3.Foracertainsub-blockoflengthMlocatingfrompositionktokM1,we
startthecalculationfromkL
with
kLs
beingarbitrarilyinitializedas(6).Theposition
kL
tok1
belongstotheprevioussub-block,hencetheparallelprocessersarepartially
overlapped.Thecalculationof
k(s)
isexactlythesame.Tohaveagoodaccuracyforthe
kM1(s)
attheright-handbeginningpointofthesegment,weshouldstartfromjkM1L.
overlappingarea
kM(s)
…
kL(s)
…
k1(s)
k(s)
…
kM1(s)
Figure3.theoverlappingmethodin[4]
Therefore,toproducealltheforwardvariables
i(s)
inonecomponentdecoderwithinoneiteration,
K
thewholeprocessingofthecodewordwithKsegmentsandLoverlappingrequiresNL
clock
cyclesinsteadofN cycles,andthedecodingspeedwillbesloweddownbyafactorof N .
K NLK
LargertheLandKis,slowerthedecodingspeedwillbe.Forinstance,assumingthattheblocklengthis2298[8],ifwesegmentitinto50sub-blocks,andtheoverlappinglengthis30,thenitrequires76clockcyclestoproducealltheforwardvariables,insteadof46cycles.Hence,theMAPdecodingspeedcanonlybeimprovedby30times,farlessthanK50,whichisoriginallysupposedtobe.
Tosolvesuchproblem,amethodwithstoragefeaturewasproposedin[5].Insteadofoverlappingtechnique[4],itutilizestheforwardandbackwardvariablescomputedinthepreviousiterationtogettheboundaryinformationforeachsub-block.Forexample,forthep+1thiterationandthesame
sub-blocklocatingfromkto
kM1,thecalculationof(2)willstartatkwithinitialcondition
k1
p1s
beapproximatedwith
ps,thefinalresultsinlastiteration.ThisisshowninFig.4.
(s)
(p)
k1
k1
Thebackwardvariableisprocessedinthesameway.
(p)(s)
kM
(p)(s)
k1
(p1)(s)
k
(p1)(s)
kM1
Figure4.themethodin[5]
Comparedwiththeoverlappingmethod,suchmethodrequiresnoredundantcomputations.Hence,thedecodingspeedcanincreaselinearlywiththenumberofsegments.However,itrequiresanextra
memoryofsize2m(K1)v
tostorethefinalresultsof
ps
inlastiteration,wherevisthe
k1
k1
numberofbitsusedtoquantizetheps.
Proposedmethod
k1
Forsomeapplicationssuchassmallmobileterminals,itispreferredthatthememoryrequiredshouldbeassmallaspossible.Tokeepthefeatureofthemethodsin[5]butsavethememory,inthispaperweproposedtostoretheindexofthemostprobablestateinsteadofalltherealvaluesofintermediate
boundary(e.g.
(p)(s)inFig.4),computedinthepreviousiteration.Withthestatenumberof2m,
weonlyneedtostorembits,ratherthan2mvbits.Thismaybealargesavingifmandvislarge.
Atp+1thiteration,forthesamesub-blocklocatingfromkto
kM1,thecalculationof(2)will
startatkandtheinitialconditionissetas(inlogarithmdomain)
logp1s 0
ss*
(7)
k1
p1
else
whereisapredeterminedparameterwhichonlydependsonthenumberofiterationsandcanbe
k1
optimizedviasimulation,s*argmaxlogpsistheindexofthemostprobablestateinthe
s
previousiteration.Withthismethod,theonlythingweneedtostoredfornextiterationisrequiresmbits.
Simulationresults
s*,which
Inthissectionwewillshowtheperformanceoftheproposedmethod.Thesimulationadoptstheturbocodedefinedincdma2000[9],theinterleaverlengthischosenasN=2298,themaximumiteration
numberis
Imax8.Thenumberofsub-blockis
K20
implyingthatthehardwareMAPprocesser
consistsof20parallelprocessorsandeachprocesses
2298/20115
bits.Bothlogmapand
max-logmapdecodingalgorithmsareconsideredandFER(frameerrorrate)andBER(biterrorrate)areobserved.
Figure5.FERperformanceofcoderate=1/3
Fig.5includesfourFERperformancecurvesforcoderate1/3,usingmax-logmaporlogmapdecodingalgorithmrespectively.AtFER=0.01,formax-logmapalgorithm,theperformancelossoftheproposedmethodisabout0.07dB,comparedwiththemethodin[5].Andforlogmap,theperformancelossatFER=0.01isabout0.1dB.
Figure6.FERperformanceofcoderate=3/4
InFig.6,wecanseethatforcoderate3/4,theperformancelossislargerthanthatofcoderate1/3.AtFFR=0.01,theperformancelossisabout0.3dBforbothlogmapandmax-logmapdecodingalgorithm.WehavealsothesimulatedBERperformancewhicharenotshownhereduetolimitedspace.TheconclusionforBERperformanceisroughlythesame.
Conclusion
Inthispaper,wesimplifiedthemethodin[5],byonlystoringtheindexofthemostprobablestate,insteadofallvaluesofintermediateboundary.Theproposedschemecanreducethesizeofmemorylargely,withsomeacceptableFERorBERperformanceloss.
References
“Throughput,latencyandcomplexityestimationfortheturbodecoder,”3GPPTSGRANWG1#46,R1-062155,
Tallinn,Estonia,Aug
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 工業自動化技術及裝備升級
- 工業遺產旅游的開發與運營模式研究
- 工業設計原理與實踐操作指南
- 工業節能的先進技術與策略
- 工業風家居設計探索
- 工作場所的效能提升技巧
- 工作與生活平衡的策略與心理健康的關聯
- 工作流程優化與工作效率提升
- 工廠安全生產與職業病防護
- 工程教育與實踐培訓方法
- 2025年人工智能應用技術職業資格考試試卷及答案
- 2025年一級建造師《市政實務》考點精粹
- 融資專員測試題及答案
- 河北秦皇島事業單位招聘中小學教師類D類考試模擬題帶答案2024年
- T-ZZB 2218-2021 燃氣用具脈沖點火器
- 好讀書讀好書課件
- 以科技創新為導向的醫療人才培養計劃
- 《中華人民共和國公務員法概述》課件
- 2025年ASQ質量經理(CMQ.OE)認證考試練習題庫(350題)
- 裝修驗房合同協議
- 專業市場營銷咨詢服務合同
評論
0/150
提交評論