The document discusses several problems with using DRAM memory cells. First, the tiny charge stored in each DRAM cell leaks over time, requiring the cell to be refreshed regularly to avoid losing data. Second, reading from a cell depletes its charge, so it must be recharged, adding time and energy costs. Third, charging and discharging a capacitor is not instantaneous, so there is a delay before the output is usable. Fourth, the small cell size allows for many to be packed densely but introduces scaling issues in addressing them all.
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
11 views
Memory Chapter3
The document discusses several problems with using DRAM memory cells. First, the tiny charge stored in each DRAM cell leaks over time, requiring the cell to be refreshed regularly to avoid losing data. Second, reading from a cell depletes its charge, so it must be recharged, adding time and energy costs. Third, charging and discharging a capacitor is not instantaneous, so there is a delay before the output is usable. Fourth, the small cell size allows for many to be packed densely but introduces scaling issues in addressing them all.
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3
the cell discharges the capacitor.
The procedure cannot be repeated indefinitely, the capacitor must
be recharged at some point. Even worse, to accommodate the huge number of cells (chips with 109 or more cells are now common) the capacity to the capacitor must be low (in the femto-farad range or lower). A fully charged capacitor holds a few 10’s of thousands of electrons. Even though the resistance of the capacitor is high (a couple of tera-ohms) it only takes a short time for the capacity to dissipate. This problem is called “leakage”. This leakage is why a DRAM cell must be constantly refreshed. For most DRAM chips these days this refresh must happen every 64ms. During the refresh cycle no access to the memory is possible since a refresh is simply a memory read operation where the result is discarded. For some workloads this overhead might stall up to 50% of the memory accesses (see [3]). A second problem resulting from the tiny charge is that the information read from the cell is not directly usable. The data line must be connected to a sense amplifier which can distinguish between a stored 0 or 1 over the whole range of charges which still have to count as 1. A third problem is that reading a cell causes the charge of the capacitor to be depleted. This means every read operation must be followed by an operation to recharge the capacitor. This is done automatically by feeding the output of the sense amplifier back into the capacitor. It does mean, though, the reading memory content requires additional energy and, more importantly, time. A fourth problem is that charging and draining a capacitor is not instantaneous. The signals received by the sense amplifier are not rectangular, so a conservative estimate as to when the output of the cell is usable has to be used. The formulas for charging and discharging a capacitor are QCharge(t) = Q0(1 − e − t RC ) QDischarge(t) = Q0e − t RC This means it takes some time (determined by the capacity C and resistance R) for the capacitor to be charged and discharged. It also means that the current which can be detected by the sense amplifiers is not immediately available. Figure 2.6 shows the charge and discharge curves. The X–axis is measured in units of RC (resistance multiplied by capacitance) which is a unit of time. Unlike the static RAM case where the output is immediately available when the word access line is raised, it will always take a bit of time until the capacitor discharges sufficiently. This delay severely limits how fast DRAM can be. The simple approach has its advantages, too. The main advantage is size. The chip real estate needed for one DRAM cell is many times smaller than that of an SRAM 1RC 2RC 3RC 4RC 5RC 6RC 7RC 8RC 9RC 0 10 20 30 40 50 60 70 80 90 100 Percentage Charge Charge Discharge Figure 2.6: Capacitor Charge and Discharge Timing cell. The SRAM cells also need individual power for the transistors maintaining the state. The structure of the DRAM cell is also simpler and more regular which means packing many of them close together on a die is simpler. Overall, the (quite dramatic) difference in cost wins. Except in specialized hardware – network routers, for example – we have to live with main memory which is based on DRAM. This has huge implications on the programmer which we will discuss in the remainder of this paper. But first we need to look into a few more details of the actual use of DRAM cells. 2.1.3 DRAM Access A program selects a memory location using a virtual address. The processor translates this into a physical address and finally the memory controller selects the RAM chip corresponding to that address. To select the individual memory cell on the RAM chip, parts of the physical address are passed on in the form of a number of address lines. It would be completely impractical to address memory locations individually from the memory controller: 4GB of RAM would require 2 32 address lines. Instead the address is passed encoded as a binary number using a smaller set of address lines. The address passed to the DRAM chip this way must be demultiplexed first. A demultiplexer with N address lines will have 2 N output lines. These output lines can be used to select the memory cell. Using this direct approach is no big problem for chips with small capacities. But if the number of cells grows this approach is not suitable anymore. A chip with 1Gbit6 capacity would need 30 address lines and 2 30 select lines. The size of a demultiplexer increases exponentially with the number of input lines when speed is not to be sacrificed. A demultiplexer for 30 address lines needs a whole lot of chip real estate in addition to the complexity (size and time) of the demultiplexer. Even more importantly, transmitting 6 I hate those SI prefixes. For me a giga-bit will always be 2 30 and not 109 bits. 6 Version 1.0 What Every Programmer Should Know About Memory 30 impulses on the address lines synchronously is much harder than transmitting “only” 15 impulses. Fewer lines have to be laid out at exactly the same length or timed appropriately.7 Ro w Address Selection a0 a1 Column Address Selection a2 a3 Data Figure 2.7: Dynamic RAM Schematic Figure 2.7 shows a DRAM chip at a very high level. The DRAM cells are organized in rows and columns. They could all be aligned in one row but then the DRAM chip would need a huge demultiplexer. With the array approach the design can get by with one demultiplexer and one multiplexer of half the size.8 This is a huge saving on all fronts. In the example the address lines a0 and a1 through the row address selection (RAS)9 demultiplexer select the address lines of a whole row of cells. When reading, the content of all cells is thusly made available to the column address selection (CAS)9 multiplexer. Based on the address lines a2 and a3 the content of one column is then made available to the data pin of the DRAM chip. This happens many times in parallel on a number of DRAM chips to produce a total number of bits corresponding to the width of the data bus. For writing, the new cell value is put on the data bus and, when the cell is selected using the RAS and CAS, it is stored in the cell. A pretty straightforward design. There are in reality – obviously – many more complications. There need to be specifications for how much delay there is after the signal before the data will be available on the data bus for reading. The capacitors do not unload instantaneously, as described in the previous section. The signal from the cells is so weak that it needs to be amplified. For writing it must be specified how long the data must be available on the bus after the RAS and CAS is done to successfully store the new value in the cell (again, capac7Modern DRAM types like DDR3 can automatically adjust the timing but there is a limit as to what can be tolerated. 8Multiplexers and demultiplexers are equivalent and the multiplexer here needs to work as a demultiplexer when writing. So we will drop the differentiation from now on. 9The line over the name indicates that the signal is negated. itors do not fill or drain instantaneously). These timing constants are crucial for the performance of the DRAM chip. We will talk about this in the next section. A secondary scalability problem is that having 30 address lines connected to every RAM chip is not feasible either. Pins of a chip are precious resources. It is “bad” enough that the data must be transferred as much as possible in parallel (e.g., in 64 bit batches). The memory controller must be able to address each RAM module (collection of RAM chips). If parallel access to multiple RAM modules is required for performance reasons and each RAM module requires its own set of 30 or more address lines, then the memory controller needs to have, for 8 RAM modules, a whopping 240+ pins only for the address handling. To counter these secondary scalability problems DRAM chips have, for a long time, multiplexed the address itself. That means the address is transferred in two parts. The first part consisting of address bits (a0 and a1 in the example in Figure 2.7) select the row. This selection remains active until revoked. Then the second part, address bits a2 and a3 , select the column. The crucial difference is that only two external address lines are needed. A few more lines are needed to indicate when the RAS and CAS signals are available but this is a small price to pay for cutting the number of address lines in half. This address multiplexing brings its own set of problems, though. We will discuss them in section 2.2. 2.1.4 Conclusions Do not worry if the details in this section are a bit overwhelming. The important things to take away from this section are: • there are reasons why not all memory is SRAM • memory cells need to be individually selected to be used • the number of address lines is directly responsible for the cost of the memory controller, motherboards, DRAM module, and DRAM chip • it takes a while before the results of the read or write operation are available The following section will go into more details about the actual process of accessing DRAM memory. We are not going into more details of accessing SRAM, which is usually directly addressed. This happens for speed and because the SRAM memory is limited in size. SRAM is currently used in CPU caches and on-die where the connections are small and fully under control of the CPU designer. CPU caches are a topic which we discuss later but all we need to know is that SRAM cells have a certain maximum speed which depends on the effort spent on the SRAM. The speed can vary from only slightly slower