在各种不同的场合的英文翻译不同的场合英语怎么说-钟道隆


2023年3月29日发(作者:铁杵磨针翻译)

HighLevelDesignForHighSpeedFPGADevices

99

ImperialCollege

June13,2002

Acknowledgement

Beforestartingthereport,Iwouldliketothankthefollowingpeopleforhelpingmethroughoutthe

ttheirhelp,itwouldbeimpossibleformetofinishtheproject:

Abstract

Intheproject,is

approach,Isuccessfullyimplementedthesophisticatedgelimageprocessingonhighspeedhardware.

Inthereport,Iwillalsointroducedanewtechniquewhichcanautomatetheprocessofhighlevel

hardwareperformanceoptimizationbyrearrangingthecodesequencesothattheitcanberunat

ortwillbesplitinto4Chapters:

udesthebackground,alltherelatedworksandmycontributiontothe

project.

chapter,lso

demonstratesometechniqueswhichcanautomatetheoptimizationprocess.

chapter,Iwillgeneralizethestepsofconvertinga

ncludeseveraltechniqueswhichcanimprovethe

performanceorsavethehardwareresources.

udestheassessedachievementsandexpectedfutureworks.

Thereisalsoanonlineversionavailableforthisreport,theURLis:

Chapter1

Introduction

1.1BackgroundandRelatedWorks

Inthissection,Iamgoingtopresentthematerialsthatarenecessarytounderstandthecontentofthis

report.

1.1.1FieldProgrammableGateArrays(FPGAs)[1]

1.1.2Pilchard[2]

1.1.3RC1000[3]

1.1.4VHDL[4]

1.1.5Handel-C[5]

1.1.6ExtendingtheHandel-Clanguage[7]

1.2Contribution

Ihavedevelopedaneasybutefficientoptimizationmethodwhichcanrearrangecodesothatitcan

beruninminimumofcycles.

Ihavedevelopedasystematicdesignflowforhighlevelhardwaredesigntargetforhighspeed

devices

Chapter2

Optimization

Inthischapter,Iamgoingtodiscussvariousmethodstooptimizationthehighlevelcode.

Optimizationisthemainpartwhichwetrytoexploitandutilizeparallelismtoachievespeedup

whichPCsn

focalsodiscuss

someevaluationequationsotomeasurethespeedupwecanachieveafteroptimization.

2.1PerformanceOptimization

2.1.1BalanceTheDelayOfEachPath

Balancingthedelayofeachpatisimportantbecausethehardwareclockspeedwillatmostbethe

ore,ifthedelayofoneparticularpathismuchlaterthan

theothers,thenitmeanswehavewastedresourceasotherpathsiscapableofrunningatmuchhigher

ncingthedelay,itcanmakesurethatthethe5paralleloptimizationwillbeoptimalin

ayofapathcanbedefinedas:

Tdelay=Tlogic+Trouting(2.1)

whereTdelayisthetotaldelayofthepath

Tlogicisthedelayduetologic

Troutingisthedelayduetorouting

Therefore,re2

mainstepstoachievethis:

PossibilityofAutomatingThisProcess

2.1.2BasicParallelism

2.1.3Re-arrangeCodeSequence

y,chooseagroupofcodestostartwith,preferablyintheinnermostloop.

elisattheformal

var:nwherevaristhenameofavariableandnisthenumberspecifyingtheoperationsequence.

hvariableassignment(Eithermodification/initialization),

assignalabeltotheoperationfollowingtheruleslistedbelow:

step1searchthetabletofindoutthelabelofthevariablebeingassignedto.

step2aifnoentryisfound,ntryinthetable,the

content(label)isspecifiedas:

step3aiifthevariableisassignedaconstantvalueorasignalfromoutsidetheblockweareworking

with,specifythelabelasvarname:1wherevarnameisthenameofthevariablebeingassigned.

step3aiiifthevariablevaluedependsonothervariables,getthelabelsofthesevariablesfromthe

thelabelofthevariablesameasthelabelswegotwiththebiggestorderbutwiththe

,fora=b+c,iflabelforbisd:3andcise:4,thenlabelforashouldbe

e:5.

step2bifanentryisfound,labelofthatvariable.

step4associatetheoperationwiththelabelwejustspecified.

abellingalltheoperations,wewillrearrangetheoperationssuchthatoperationswiththe

sameorderareplacedtogether.

”basicparallelism”methodwillreturncodewhichcanruninminimumcycleswhichisthe

sameasthehighestorderofthelabelswithintheblock.

canworkwithoneouterloop,repeatfromsteptwoagainuntilthewholeprogramis

covered.

Figure2.4showsexamplesofhowthemethodworks.

2.1.4AddRegisterToStoreIntermediateResult

ThepossiblesolutionistoaddregisterstostoreIntermediateResultofthevariableandrunthe

calculchniquedoesn’t

eethatbeforemodifyingthecode,itwillneedto

take3cyclestofinishtheoperations,buttakeonly2afterwemodifiedthecode.

2.1.5Pipelining

Pipeliningisanimplementationtechniquewherebymultipletasksareoverlappedinexecution.

HennessyandPatterson[6]describedpipelineinchapter3oftheirbookas:

Pipelinerequiresextracontrollogi纷纷暮雪下辕门什么意思 csthusincreaseoverhead.

Pipelinerequiresextraregisterstostoretheintermediateresultthusincreasethedelay.

Moreover,developersneedtorememberthateachtaskwon’liningis

idealfresome

pointsweneedtocareaboutwhenimplementingpipeline:

StructuralHazardoccurswhentherearenotenoughhardwareresourcestodealwiththeoverlapped

mple,wecannotmultiplereaddatafromthesameraminstanceinthesame

ontoitistomakesureenoughresourceshavebeencreatedforthepipeline.

2.2SpaceOptimization

2.2.1OptimalWidthVariables

2.3Evaluation

Itwon’tbeknowwhattheoptimizationcanachieveifwedon’thavemethodtoevaluatetheresult.

Therefore,introduce

somerthyto

implementtheprograminhardware?Iftheresultisnotasdesirablecurrently,willitbeworthytodo

sowhentechnologyadvances?Orcanwethinkofanotherapproachwhichcanacquiremuchhigher

spee默而识之学而不厌翻译 dup?Thesearethequestio刘凤科 nwewanttoanswerinthisstep.

2.3.1Equations

Nowwewillintroducesomeextraequationswhichcanhelpustoevaluatetheresult.

Tproccanbedeterminedby:

Tproc=n*t(2.4)

Wherenisthenumberofcycleandtistheclockdelay.

tcanbedeterminedby:

t=1/c(2.5)

Wherecistheclockspeedtherefore

Tproc=n/c(2.6)

Subthemalltogether

Texec=w/b+n/c(2.8)

2.3.2ReasoningByUsingtheEquation

llknowthatwhetherourhardwareimplementationwill

,wewillthentrytoreasoningit:

/bisconsiderablysmallerthantheexpectedTexec,ckspeed

remainsthesame,inkaboutwhetheritis

possibletoexploitmoreparallelismtoachievethisnumberofcycles.

indthatitisnotmuchyoucandowithn,thentrytofixnandcalculatethecrequireto

achievethespeedup?Thenthinkaboutwhethercurrenttechnologyallowyourprogramrunatthat

speed?Isitpossiblethatthespeed-upwillbeachievedifyouusethechipwiththemostadvance

technology?Ordoyouexpectthedevicewhichcanrunatthespeedyourequirewillsoonenterthe

market?

y,itisworthtothinkwhetheryoushouldchooseanotherpartoftheprogramtobe

implementedinhardwarewhichpotentiallyhasmoreparallelismtobeexploited.

2.4Summary

Chapter3

SystematicHighLevel

HardwareDesign

Inthischapter,Iamgoingtointroducefivegeneralstepsofconvertingasoftwareprogrammeinto

othroughe隆中对原文及翻译注释 achstepin端午节的故事 detailinthenextfewchapters.

3.1DesignFlow

Figure3.1showsthegeneraldevelopmentstepsforconvertingsoftwareprogramtohardwarecircuits

elistedasfollows:

convertingthesoftwareprogramdirectlyinto

epensuresthatthehardware

willbehaveexactlyt同心协力的近义词 hesameasthesoftware.

importaore,if

thedelayofoneparticularpathismuchlaterthantheothers,thenitmeanswehavewastedresource

asotherpathsiscapableofrunningatmuchhigherspeed.

evaluatingtheactualresultofthishardwareversionofthe

awaconclusionwhichconsidersthefollowingquestions:Doesthehardware

providespeedupoftheprogram?Ifnot,inwhatcircumstancesitcould?Shouldwe

trydifferentapproachtoimplementtheprogram?...etcWecanseethatthelast3stepsareindeed

ainingtwowillbediscussedin

thischapter.

3.2ProgramAnalysis

InthismAnalysis

isthemostimportantamongthefivestepsastheeffectoftheimplementationwillnotbesignificant

orevennegligibleifthewrongpartoftheprogramischosentobeimplementedinhardware.

3.2.1FourGuidelinesForProgramAnalysis

uidelinesarenotforcedtobefollowed,butcan

givethedeveloperstheideaofwhatkindofprogramsarepreferabletobeconverted.

morepossiblenow.

Guideline2:Choosethepartwithlowdatadependency

Thereshouldn’t

inmindthat,FPGAchipsarenormallyslowerthanPC’becauseextralogicsare

,speedupisachievedpurelyonexploiting

tadependencywillreducethepotentialparallelism

wecouldachieveasdataatonepartdependsonthedataattheotherpartswhichrestrictusfrom

processingtheminparallel.

3.3DirectConversion

Inthissection,rectconversionfrom

epdoesn’tinvolveapplicationofanykindof

sonofleavingitforlaterstageiswewanttomakesurethatthehardware

applytheoptimizationtechniquesatthebeginning,itishardtodebugtheprogramwhensomething

goeswrongbecauseyouwillnotbeabletofindoutinwhichsteptheerroroccurs.

3.4Summary

Chapter4

CaseStudy:2-DGelImage

Processing

explainitindetailinlatersection.

4.12-DGelImageProcessing

4.2ProgramAnalysis

Asmentionedinchapter3,thefirststepofthehardwaredevelopmentisprogramanalysis.

step1Setdetaillevellto0

step2BlurimagesI1,I2bysettingresolutionto5

step3OptimizeparametersfromaninitialrigidtransformationtinTrigid

step4Whilelsmallerthan5dothefollowings,gotostep4,otherwisefinish

step5Subdividet

step6ifnotfinishallsquaresai;j,ai+1;j,ai;j+1,ai+1;j+1inthegridoft,

thengotostep7,otherwisegotostep8

step7Optimizethecontrolpointsci;j;ci+1;j;ci;j+1;ci+1;j+1byusingBFGStomaximizef(c)=

corr(l1;tc(I2))intheaffectedarea,returntostep6

step8Incrementdetaillevell

workisdroppingthesebitswon’,Irealizethatitisnotthe

altothefactthatsomeintermediateresultofthealgor关于爱国的诗歌短一点 ithminvolves

multiplicationoffloadingpointvalueswithlargevalues,especiallywhentheprogramisprocessingat

’smeandroppingoftheleast

t’snottheendofthestory.

WhenIreasonedmoredeeplyintotheprogram,Ifoundoutthattheoptimizationindeedcanbe

separatedinto2parts.

TransformationandCalculationofSimilarityandDerivatives,and

BFGSOptimization

stpartcanbeimplementedmostly

,wehavetoaskthequestion:Isthefirstpartstillthepartwhichexecuted

most?Beforeansweringthequestion,let’slookatthefollowingequation:

c=wiCPI(4.1)

amountofdataneededtobetransferredis:

w=I1+I2+cp+Tc(I2)+d+s

更多推荐

variable是什么意思iable在线翻译读音例