在各种不同的场合的英文翻译不同的场合英语怎么说-钟道隆
2023年3月29日发(作者:铁杵磨针翻译)
HighLevelDesignForHighSpeedFPGADevices
99
ImperialCollege
June13,2002
Acknowledgement
Beforestartingthereport,Iwouldliketothankthefollowingpeopleforhelpingmethroughoutthe
ttheirhelp,itwouldbeimpossibleformetofinishtheproject:
Abstract
Intheproject,is
approach,Isuccessfullyimplementedthesophisticatedgelimageprocessingonhighspeedhardware.
Inthereport,Iwillalsointroducedanewtechniquewhichcanautomatetheprocessofhighlevel
hardwareperformanceoptimizationbyrearrangingthecodesequencesothattheitcanberunat
ortwillbesplitinto4Chapters:
udesthebackground,alltherelatedworksandmycontributiontothe
project.
chapter,lso
demonstratesometechniqueswhichcanautomatetheoptimizationprocess.
chapter,Iwillgeneralizethestepsofconvertinga
ncludeseveraltechniqueswhichcanimprovethe
performanceorsavethehardwareresources.
udestheassessedachievementsandexpectedfutureworks.
Thereisalsoanonlineversionavailableforthisreport,theURLis:
Chapter1
Introduction
1.1BackgroundandRelatedWorks
Inthissection,Iamgoingtopresentthematerialsthatarenecessarytounderstandthecontentofthis
report.
1.1.1FieldProgrammableGateArrays(FPGAs)[1]
1.1.2Pilchard[2]
1.1.3RC1000[3]
1.1.4VHDL[4]
1.1.5Handel-C[5]
1.1.6ExtendingtheHandel-Clanguage[7]
1.2Contribution
Ihavedevelopedaneasybutefficientoptimizationmethodwhichcanrearrangecodesothatitcan
beruninminimumofcycles.
Ihavedevelopedasystematicdesignflowforhighlevelhardwaredesigntargetforhighspeed
devices
Chapter2
Optimization
Inthischapter,Iamgoingtodiscussvariousmethodstooptimizationthehighlevelcode.
Optimizationisthemainpartwhichwetrytoexploitandutilizeparallelismtoachievespeedup
whichPCsn
focalsodiscuss
someevaluationequationsotomeasurethespeedupwecanachieveafteroptimization.
2.1PerformanceOptimization
2.1.1BalanceTheDelayOfEachPath
Balancingthedelayofeachpatisimportantbecausethehardwareclockspeedwillatmostbethe
ore,ifthedelayofoneparticularpathismuchlaterthan
theothers,thenitmeanswehavewastedresourceasotherpathsiscapableofrunningatmuchhigher
ncingthedelay,itcanmakesurethatthethe5paralleloptimizationwillbeoptimalin
ayofapathcanbedefinedas:
Tdelay=Tlogic+Trouting(2.1)
whereTdelayisthetotaldelayofthepath
Tlogicisthedelayduetologic
Troutingisthedelayduetorouting
Therefore,re2
mainstepstoachievethis:
PossibilityofAutomatingThisProcess
2.1.2BasicParallelism
2.1.3Re-arrangeCodeSequence
y,chooseagroupofcodestostartwith,preferablyintheinnermostloop.
elisattheformal
var:nwherevaristhenameofavariableandnisthenumberspecifyingtheoperationsequence.
hvariableassignment(Eithermodification/initialization),
assignalabeltotheoperationfollowingtheruleslistedbelow:
step1searchthetabletofindoutthelabelofthevariablebeingassignedto.
step2aifnoentryisfound,ntryinthetable,the
content(label)isspecifiedas:
step3aiifthevariableisassignedaconstantvalueorasignalfromoutsidetheblockweareworking
with,specifythelabelasvarname:1wherevarnameisthenameofthevariablebeingassigned.
step3aiiifthevariablevaluedependsonothervariables,getthelabelsofthesevariablesfromthe
thelabelofthevariablesameasthelabelswegotwiththebiggestorderbutwiththe
,fora=b+c,iflabelforbisd:3andcise:4,thenlabelforashouldbe
e:5.
step2bifanentryisfound,labelofthatvariable.
step4associatetheoperationwiththelabelwejustspecified.
abellingalltheoperations,wewillrearrangetheoperationssuchthatoperationswiththe
sameorderareplacedtogether.
”basicparallelism”methodwillreturncodewhichcanruninminimumcycleswhichisthe
sameasthehighestorderofthelabelswithintheblock.
canworkwithoneouterloop,repeatfromsteptwoagainuntilthewholeprogramis
covered.
Figure2.4showsexamplesofhowthemethodworks.
2.1.4AddRegisterToStoreIntermediateResult
ThepossiblesolutionistoaddregisterstostoreIntermediateResultofthevariableandrunthe
calculchniquedoesn’t
eethatbeforemodifyingthecode,itwillneedto
take3cyclestofinishtheoperations,buttakeonly2afterwemodifiedthecode.
2.1.5Pipelining
Pipeliningisanimplementationtechniquewherebymultipletasksareoverlappedinexecution.
HennessyandPatterson[6]describedpipelineinchapter3oftheirbookas:
Pipelinerequiresextracontrollogi纷纷暮雪下辕门什么意思 csthusincreaseoverhead.
Pipelinerequiresextraregisterstostoretheintermediateresultthusincreasethedelay.
Moreover,developersneedtorememberthateachtaskwon’liningis
idealfresome
pointsweneedtocareaboutwhenimplementingpipeline:
StructuralHazardoccurswhentherearenotenoughhardwareresourcestodealwiththeoverlapped
mple,wecannotmultiplereaddatafromthesameraminstanceinthesame
ontoitistomakesureenoughresourceshavebeencreatedforthepipeline.
2.2SpaceOptimization
2.2.1OptimalWidthVariables
2.3Evaluation
Itwon’tbeknowwhattheoptimizationcanachieveifwedon’thavemethodtoevaluatetheresult.
Therefore,introduce
somerthyto
implementtheprograminhardware?Iftheresultisnotasdesirablecurrently,willitbeworthytodo
sowhentechnologyadvances?Orcanwethinkofanotherapproachwhichcanacquiremuchhigher
spee默而识之学而不厌翻译 dup?Thesearethequestio刘凤科 nwewanttoanswerinthisstep.
2.3.1Equations
Nowwewillintroducesomeextraequationswhichcanhelpustoevaluatetheresult.
Tproccanbedeterminedby:
Tproc=n*t(2.4)
Wherenisthenumberofcycleandtistheclockdelay.
tcanbedeterminedby:
t=1/c(2.5)
Wherecistheclockspeedtherefore
Tproc=n/c(2.6)
Subthemalltogether
Texec=w/b+n/c(2.8)
2.3.2ReasoningByUsingtheEquation
llknowthatwhetherourhardwareimplementationwill
,wewillthentrytoreasoningit:
/bisconsiderablysmallerthantheexpectedTexec,ckspeed
remainsthesame,inkaboutwhetheritis
possibletoexploitmoreparallelismtoachievethisnumberofcycles.
indthatitisnotmuchyoucandowithn,thentrytofixnandcalculatethecrequireto
achievethespeedup?Thenthinkaboutwhethercurrenttechnologyallowyourprogramrunatthat
speed?Isitpossiblethatthespeed-upwillbeachievedifyouusethechipwiththemostadvance
technology?Ordoyouexpectthedevicewhichcanrunatthespeedyourequirewillsoonenterthe
market?
y,itisworthtothinkwhetheryoushouldchooseanotherpartoftheprogramtobe
implementedinhardwarewhichpotentiallyhasmoreparallelismtobeexploited.
2.4Summary
Chapter3
SystematicHighLevel
HardwareDesign
Inthischapter,Iamgoingtointroducefivegeneralstepsofconvertingasoftwareprogrammeinto
othroughe隆中对原文及翻译注释 achstepin端午节的故事 detailinthenextfewchapters.
3.1DesignFlow
Figure3.1showsthegeneraldevelopmentstepsforconvertingsoftwareprogramtohardwarecircuits
elistedasfollows:
convertingthesoftwareprogramdirectlyinto
epensuresthatthehardware
willbehaveexactlyt同心协力的近义词 hesameasthesoftware.
importaore,if
thedelayofoneparticularpathismuchlaterthantheothers,thenitmeanswehavewastedresource
asotherpathsiscapableofrunningatmuchhigherspeed.
evaluatingtheactualresultofthishardwareversionofthe
awaconclusionwhichconsidersthefollowingquestions:Doesthehardware
providespeedupoftheprogram?Ifnot,inwhatcircumstancesitcould?Shouldwe
trydifferentapproachtoimplementtheprogram?...etcWecanseethatthelast3stepsareindeed
ainingtwowillbediscussedin
thischapter.
3.2ProgramAnalysis
InthismAnalysis
isthemostimportantamongthefivestepsastheeffectoftheimplementationwillnotbesignificant
orevennegligibleifthewrongpartoftheprogramischosentobeimplementedinhardware.
3.2.1FourGuidelinesForProgramAnalysis
uidelinesarenotforcedtobefollowed,butcan
givethedeveloperstheideaofwhatkindofprogramsarepreferabletobeconverted.
morepossiblenow.
Guideline2:Choosethepartwithlowdatadependency
Thereshouldn’t
inmindthat,FPGAchipsarenormallyslowerthanPC’becauseextralogicsare
,speedupisachievedpurelyonexploiting
tadependencywillreducethepotentialparallelism
wecouldachieveasdataatonepartdependsonthedataattheotherpartswhichrestrictusfrom
processingtheminparallel.
3.3DirectConversion
Inthissection,rectconversionfrom
epdoesn’tinvolveapplicationofanykindof
sonofleavingitforlaterstageiswewanttomakesurethatthehardware
applytheoptimizationtechniquesatthebeginning,itishardtodebugtheprogramwhensomething
goeswrongbecauseyouwillnotbeabletofindoutinwhichsteptheerroroccurs.
3.4Summary
Chapter4
CaseStudy:2-DGelImage
Processing
explainitindetailinlatersection.
4.12-DGelImageProcessing
4.2ProgramAnalysis
Asmentionedinchapter3,thefirststepofthehardwaredevelopmentisprogramanalysis.
step1Setdetaillevellto0
step2BlurimagesI1,I2bysettingresolutionto5
step3OptimizeparametersfromaninitialrigidtransformationtinTrigid
step4Whilelsmallerthan5dothefollowings,gotostep4,otherwisefinish
step5Subdividet
step6ifnotfinishallsquaresai;j,ai+1;j,ai;j+1,ai+1;j+1inthegridoft,
thengotostep7,otherwisegotostep8
step7Optimizethecontrolpointsci;j;ci+1;j;ci;j+1;ci+1;j+1byusingBFGStomaximizef(c)=
corr(l1;tc(I2))intheaffectedarea,returntostep6
step8Incrementdetaillevell
workisdroppingthesebitswon’,Irealizethatitisnotthe
altothefactthatsomeintermediateresultofthealgor关于爱国的诗歌短一点 ithminvolves
multiplicationoffloadingpointvalueswithlargevalues,especiallywhentheprogramisprocessingat
’smeandroppingoftheleast
t’snottheendofthestory.
WhenIreasonedmoredeeplyintotheprogram,Ifoundoutthattheoptimizationindeedcanbe
separatedinto2parts.
TransformationandCalculationofSimilarityandDerivatives,and
BFGSOptimization
stpartcanbeimplementedmostly
,wehavetoaskthequestion:Isthefirstpartstillthepartwhichexecuted
most?Beforeansweringthequestion,let’slookatthefollowingequation:
c=wiCPI(4.1)
amountofdataneededtobetransferredis:
w=I1+I2+cp+Tc(I2)+d+s
更多推荐
variable是什么意思iable在线翻译读音例
发布评论