From 036b0c74c8f712e9fbf55ef41b8d2ae13feb2baf Mon Sep 17 00:00:00 2001 From: Leonard Kugis Date: Sat, 7 Jan 2023 14:54:34 +0100 Subject: Finished presentation slides --- resources/basis_projection.svg | 1673 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1673 insertions(+) create mode 100644 resources/basis_projection.svg (limited to 'resources/basis_projection.svg') diff --git a/resources/basis_projection.svg b/resources/basis_projection.svg new file mode 100644 index 0000000..119302e --- /dev/null +++ b/resources/basis_projection.svg @@ -0,0 +1,1673 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Fig.11.Sparseweightsafterbasisprojection[50]. + researchonexploringtheimpactofbitwidthonaccuracy[ + 51 + ]. + Infact,recentlycommercialhardwareforDNNreportedly + support8-bitintegeroperations[ + 52 + ].Asbitwidthscanvaryby + layer,hardwareoptimizationshavebeenexploredtoexploit + thereducedbitwidthfor2.56 + × + energysavings[ + 53 + ]or2.24 + × + increaseinthroughput[ + 54 + ]comparedtoa16-bitfxedpoint + implementation.Withmoresignifcantchangestothenetwork, + itispossibletoreducebitwidthdownto1-bitforeither + weights[ + 55 + ]orbothweightsandactivations[ + 56 + , + 57 + ]atthecost + ofreducedaccuracy.Theimpactof1-bitweightsonhardware + isexploredin[58]. + B.Sparsity + ForSVMclassifcation,theweightscanbeprojectedonto + abasissuchthattheresultingweightsaresparsefora2 + × + reductioninnumberofmultiplications[ + 50 + ](Fig.11).For + featureextraction,theinputimagecanbemadesparsebypre- + processingfora24%reductioninpowerconsumption[48]. + ForDNNs,thenumberofMACsandweightscanbereduced + byremovingweightsthroughaprocesscalledpruning.This + wasfrstexploredin[ + 59 + ]whereweightswithminimalimpact + ontheoutputwereremoved.In[ + 60 + ],pruningisappliedto + modernDNNsbyremovingsmallweights.However,removing + weightsdoesnotnecessarilyleadtolowerenergy.Accordingly, + in[ + 61 + ]weightsareremovedbasedonanenergy-modelto + directlyminimizeenergyconsumption.Thetoolusedforenergy + modelingcanbefoundat[62]. + Specializedhardwarehasbeenproposedin[ + 47 + , + 50 + , + 63 + , + 64 + ]toexploitsparseweightsforincreasedspeedorreduced + energyconsumption.InEyeriss[ + 47 + ],theprocessingelements + aredesignedtoskipreadsandMACswhentheinputsare + zero,resultingina45%energyreduction.In[ + 50 + ],byusing + specializedhardwaretoavoidsparseweights,theenergyand + storagecostarereducedby43%and34%,respectively. + C.Compression + Datamovementandstorageareimportantfactorsinboth + energyandcost.Featureextractioncanresultinsparsedata + (e.g.,gradientinHOGandReLUinDNN)andtheweights + usedinclassifcationcanalsobemadesparsebypruning.As + aresult,compressioncanbeappliedtoexploitdatastatistics + toreducedatamovementandstoragecost. + Variousformsoflightweightcompressionhavebeenex- + ploredtoreducedatamovement.Losslesscompressioncanbe + usedtoreducethetransferofdataonandoffchip[ + 11 + , + 53 + , + 64 + ]. + Simplerun-lengthcodingoftheactivationsin[ + 65 + ]provides + upto1.9 + × + bandwidthreduction,whichiswithin5-10%ofthe + theoreticalentropylimit.Lossycompressionsuchasvector + quantizationcanalsobeusedonfeaturevectors[ + 50 + ]and + weights[ + 8 + , + 12 + , + 66 + ]suchthattheycanbestoredon-chipatlow + cost.Generally,thecostofthecompression/decompressionis + ontheorderofafewthousandkgateswithminimalenergy + overhead.Inthelossycompressioncase,itisalsoimportant + toevaluatetheimpactonperformanceaccuracy.VII.O + PPORTUNITIESIN + M + IXED + -S + IGNAL + C + IRCUITS + Mostofthedatamovementisinbetweenthememory + andprocessingelement(PE),andalsothesensorandPE. + Inthissection,wediscusshowthisisaddressedusingmixed- + signalcircuitdesign.However,circuitnon-idealitiesshould + alsobefactoredintothealgorithmdesign;thesecircuitscan + beneftfromthereducedprecisionalgorithmsdiscussedin + SectionVI.Inaddition,sincethetrainingoftenoccursinthe + digitaldomain,theADCandDACconversionoverheadshould + alsobeaccountedforwhenevaluatingthesystem. + Whilespatialarchitecturesbringthememoryclosertothe + computation(i.e.,intothePE),therehavealsobeeneffortsto + integratethecomputationintothememoryitself.Forinstance, + in[ + 67 + ]theclassifcationisembeddedintheSRAM.Specifcally, + thewordline(WL)isdrivenbya5-bitfeaturevectorusing + aDAC,whilethebit-cellsstorethebinaryweights + ± + 1 + .The + bit-cellcurrentiseffectivelyaproductofthevalueofthe + featurevectorandthevalueoftheweightstoredinthebit-cell; + thecurrentsfromthecolumnareaddedtogethertodischarge + thebitline(BLorBLB).Acomparatoristhenusedtocompare + theresultingdotproducttoathreshold,specifcallysign + thresholdingofthedifferentialbitlines.Duetothevariations + inthebitcell,thisisconsideredaweakclassifer,andboosting + isneededtocombinetheweakclassiferstoformastrong + classifer[ + 68 + ].Thisapproachgives12 + × + energysavingsover + readingthe1-bitweightsfromtheSRAM. + Recentworkhasalsoexploredtheuseofmixed-signal + circuitstoreducethecomputationcostoftheMAC.Itwas + shownin[ + 69 + ]thatperformingtheMACusingswitched + capacitorscanbemoreenergy-effcientthandigitalcircuits + despiteADCandDACconversionoverhead.Accordingly, + thematrixmultiplicationcanbeintegratedintotheADCas + demonstratedin[ + 70 + ],wherethemostsignifcantbitsofthe + multiplicationsforAdaboostclassifcationareperformedusing + switchedcapacitorsinan8-bitsuccessiveapproximationformat. + Thisisextendedin[ + 71 + ]tonotonlyperformmultiplications, + butalsotheaccumulationintheanalogdomain.Itisassumed + that3-bitsand6-bitsaresuffcienttorepresenttheweights + andinputvectors,respectively.Thisenablesthecomputation + tomoveclosertothesensorandreducesthenumberofADC + conversionsby21 + × + . + + + -- cgit v1.2.1