Site Map:
|
Hardware
This is to get us thinking about and more
familiar with what's inside the magic boxes in the legacy: hardware and data.
It's loosely organized along lines of: memory, protecting what's
in memory, data representation, CPU, I/O, data on disc, and interfacing
system components.
- Memory, Main Memory, Main
Storage, Primary Storage, and RAM (Random Access Memory) are used
interchangeably to refer to the circuitry on the bus,
near the CPU, that holds data temporarily as the
CPU needs it. In '09 we need to be aware that 'Thumb Drives' or 'Memory Sticks'
are _not_ this 'Primary Memory', although it can be used to augment
primary memory in some cases. Where access to the RAM on a mainboard
is nearly instantaneous, access to data on a USB Thumbdrive is tar-pit
slow in comparison.
Data in memory are represented using binary digits, 0 & 1, called
bits. Eight bits make a byte.
Data are exchanged between memory and CPU, with the unit of exchange a
'word'. 'Word size' for desktop computers these days is
commonly 4 bytes (32 bit CPU) and in '09 CPUs of 8 bytes (64 bit)
are becoming more affordable and will help catapult Windows
into mainframe-class roles as hardware is built to accomodate what
the relatively _huge_ amounts of RAM directly addressable by 64-bit CPUs.
CPU word size, and the width of the bus between memory and the CPU are
usually expressed in bits: early PCs (Apple, Commodore, TRS80, &c)
had 8 bit words, the 8080 series of processors in the first
IBM PCs used 16-bit 8080-series CPUs from 80286 thru 80486, and into
Intel's Pentiums that got us firmly into 32-bit CPUs.
The Itanium & other 64-bit variants of Intel's
multi-core 64-bit CPUs gets Intel into true 'mainframe class'
situations as the hardware manufacturers provide chassis with hot-swappable
components, _huge_ RAMS, multi-bus, and dedicated I/O processors that
are today associated with mid-range and mainframe product lines.
Mainframes have used 64 bit processors for decades, but until
recently they could only be located in 'industrial neighborhoods',
requiring 3 phase power. Mainframes were
liquid cooled
to keep them from melting down, with chillers in the basement (typically three of them)
to remove the heat from the CPU circuitry. Now, since the turn of the Millennium
Mainframes provided by IBM are air cooled, and run on ordinary 110 a/c power.
Some of their competitor's (Fujitsu, Siemons, Hitachi, &c)
products are still water-cooled and are priced
to add value to the legacy of the enterprises that deploy them.
It appears that 128 bit processors
will be feasible before too long, and it remains to be seen if one of
'the competition' will leap-frog IBM into a new class of Mainframe-class
computer? .
Here's an opportunity for confusion: Sometimes capacity of a data processing
component is expressed in terms
of bits: kilobits, megabits, gigabits, terabits. Other times it's
expressed in terms of bytes: kilobytes, megabytes, gigabytes,
terabytes. Sometimes, it's expressed in Decimals. A 'Kilo' is not
exactly 1,000, it is 1,024, and a 'Mega' is not exactly a Million,
it is 1,024 X 1,024...
This will soon get us into a discussion of binary, octal, decimal, and
hexadecimal number systems and ascii & ebcdic codes for data...
For a desktop perspective: A PC or notebook built for Windows XP in the past
few years 512 MBytes of RAM as a minimum,
and running several applications can make use of the full complement of 2
GBytes that would fit on the mainboard.
In '09, Vista is pushing us to need 2 GBytes of RAM to start, and we want the Maximum
4 GBytes that a 32-bit CPU can use if we're running more challenging
applications that a browser. Vista needs in excess of 1 GByte just to
support a good experience at the GUI.
For the mid-range & mainframe perspective on RAM: These machines,
might be running a large enterprise with thousands and thousands of
of processes active and have boards on its bus able to hold
_multi-TeraBytes_ of RAM. Access to data in RAM is hundreds of thousands times
faster than disk access -- so machines with 64-bit CPUs can 'keep everything in RAM'
and do their work hundreds of times faster than machines limited to 4 GBytes of RAM which
must 'keep everything on disk'.
Before a CPU can 'operate on data' it must get the data into RAM, close to the CPU,
and then into its
'internal registers' where the CPU can
'operate on the data'. Machines with huge RAM can eliminate the very slow
disk access and service hundreds of thousands, or millions, of users on-line.
RAM is still slow compared to the 'Cache
Memory' where the CPU stores recently accessed data in a smaller, faster memory
that is even closer to the CPU. Since much of computing is repetitious,
this Cacheing scheme can improve application performance. (See more about Cacheing
later...)
Most primary memory today is 'volatile'. It is lightning fast, but
only provides temporary storage for data.
If the power goes off, or the computer
'crashes' for some reason, all work in progress will be lost.
Where a person editing a Word document might be inconvenienced by
such a loss, an Enterprise depending on a huge RAM to operate
might experience a devastating loss is the RAM is lost. To mitigate
this risk, Mid-range & Mainframe manufacturers engineer solutions to provide
practically 100% 'up time' for their systems, making all components, even RAM
redundant, and providing redundant power supplies in the chassis, UPS closeby to
handle instantaneous failures,
and generators to kick in if the power's off for more than an instant.
Some types of RAM are 'non-volatile' and will hold
the data when the power goes off, but it is more
expensive and sluggish relative to ordinary RAM. These are used
in portable devices.
In '09 we are seeing SSD-Solid State Disk technology become very affordable.
These are memory devices that mimic the signals of equivalent Hard Disk
devices, and may be attached to an IDE, SATA, or other modern mainboard
disk interface. They're still slow compared to the RAM adjacent to the CPU,
and operate at speeds comparable to a disk drive,
but for many reasons they are desirable in portable technology.
SSDs have been available for decades for wearable & other 'rugged' devices,
but are only recently becoming affordable for use in Notebook & Netbook
computers.
ROM (Read Only Memory, _NOT_ CD-ROM!) is permanent memory &
will hold its data without
power, but can't be re-written, and the data or program it holds is
'burned
into it' using a 'ROM burner' that quite literally
burns
the data into the ROM by using high current to burn out circuits representing 0 and
leaving the 1s intact. These are very cheap and are used to program drink-machines,
radios, & other devices where a program is written once and used
for the life of the product. In early days of PCs, BIOS was ordinarily on a ROM --
to change the BIOS required swapping the ROM.
EPROMs are 'erasable, programmable' ROMS that can be 'flashed' with new
data or 'firmware' as needed without removing them from their socket on
a
mainboard or other device. This is the way the BIOS is typically store for
today's PCs and 'server class' machines. We're used to seeing 'the BIOS flash by'
as our PCs boot. Since the late '80s it's become common to have a
'flashable BIOS' kept on EPROM so the BIOS can be updated, or downdated, without
having to swap out the 'BIOS Chip', a ROM, as was prior required.
Unlike the SSD, an EPROM can only be flashed a limited number of times.
There are several other variations on memory, and their description
could make a good term-paper topic...
Backup Power Systems
- Primary memory's volatility must be considered by
system managers so that business isn't interrupted by power failures.
Today,
it's relatively economical to get past the 'power has to be on or
you'll
lose it' limitations of RAM by using UPSs (Uninterruptible Power
Supplies)
and backup generators so that momentary or longer power failures don't cause
network failure or data
loss. A 'racking system' or mid-range or larger computer
can provide means to connect several UPS systems,
so that they can be easily swapped into & out of service as their batteries
wear out every year, or two, or three.
Twenty years ago a UPS could cost as much as a computer so they were
hard to sell. So, we spent a lot of time repairing data corrupted
by momentary power losses in those days, and the cost for programmers (we got $65 an
hour for this tedious work, would be $100+ today) was justified because
the UPSs were expensive. One 200+ user saystem I provided in
about 1985 had
a $30,000 UPS to support a $295,000 of minicomputer -- the owner wanted
to
leave it out, so we made it a condition of sale, knowing that the
alternative
would have us spending hours patching data every time the power
flickered. The machine it powered eventually ran without interruption
for seven years.
A UPS for a large machine, and all else in the computer room, is still
a costly item but it's not wise to run a mission critical computer
without a UPS -- they are very inexpensive today relative to the loss
of work, or fried equipment, that is inevitable without one.
I've got a $500, 1500 KV, UPS in my office at school that will support
the five or six machines there for about forty minutes. A similar
one at home supports one
machine for a few hours. Cheap UPSs, starting at $50, will
protect a PC and keep it running for several minutes.
Most power failures
are momentary, and the UPS is the first line of defense against momentary
failures. Many incidents where networking equipment is 'fried' happen
when power is restored to a failed circuit, and the UPS acts like a powerful
'surge suppressor' whether the power failure was momentary or longer-term.
Neither UPSs nor Surge Suppressors last indefinitely -- depending on circumstances
they may last a year or less. A Surge Suppressor can only absorb so many joules of
energy before it fails to suppress surges, always buy 'power strips' with beefy
surge suppression and a light indicating if it's protecting or not.
- Really critical, multi-user, applications running
on large machines or 'server farms' require at least two 'levels of
backup power'. A big, battery-powered UPS provides continuous power
& supports 'sags and brownout' or power-outage for a few minutes. A backup
generator, usually propane or natural gas, kicks in as the UPS is
discharging. Hydrogen Fuel Cells are now an option.
Together they can provide uninterrupted power
for hours
or days until power is restored by the utility company. An example sits
at the SouthWest corner of the old Business Building, this to support one
of our Commonwealth's backup sites in the old computer room on the 4th
floor. At Snead Hall, they're below the steel grates on the Southern side
of the building, exhaust goes to the roof thru the offices on the North.
Our network room has multiple 20KVA battery UPS and gets power from circuits
supplied by the generator.
'Fault-tolerant' machines often have several UPSs powering each chassis so
that they can survive a few minutes without power. For server or blade farms,
racking power systems can provide this extra line of defense, before the 'big UPS'
and generator...
Several of the systems that I have worked in over the decades have
run for their entire lives without ever having an unplanned shutdown.
A fault-tolerant unix machine or mainframe with redundant power sources might
run for its entire life without rebooting. It appears that Windows may
someday be to the point where they can make this claim.
Backup Power Does Not Replace Backing Up Data!
The text doesn't get into the issues of 'backing up', or keeping copies
of data so a system can be restored if there is some 'disaster' that
destroys the hardware where it resides. So, I should mention 'disaster
avoidance and recovery' strategies here, and introduce the ideas of:
encrypted backups, tapes
carried off-site daily or multiple generations of backups sent via
VPN or The Internet to remote backup sites, transaction logging to a remote device, warm
&
hot sites, and schemes that involve replicated facilities that are
widely
separated geographically and connected by a fast network to provide
uninterrupted
nation-wide or world-wide system access in the wake of a building fire
or
collapse, or some regional disaster like a tornado or an ice storm.
- How data are represented
This is covered in the text's appendix A. Discussion on the board
in class will cover: binary, octal, decimal, & hex numbering
systems; integer, real, and decimal numbers; string data.
This is a _huge_ issue for anybody in IT. Exchanging data with a 'foreign'
system is likely to required some 'translation' from one encoding scheme
or another. Going from Power to Intel, for example, involves flipping bits
from big-endian to little-endian orientation. Web developers need to worry
about a myriad of character sets & encodings.
Please take time to get your eyes on at least the several references here
to start developing the ability of identifying how data are encoded
to be ready for the time a foreign dataset arrives on your desk to be
taken on to your systems' DBMS or web site.
The text gives a table showing ASCII & EBCDIC values for 'printing
characters' 0 thru 9 and A thru Z. Other printing characters are
like decimal point, comma, and the relatively few other symbols we get
when
we shift a number key.
As a practical matter into the '90s, EBCDIC is the coding system used for mainframes
and ASCII is used almost everywhere else -- both fit in an 8-bit byte
and can only represent 128 or 256 characters.
ASCII has lots of variations, with the several ISO-8859 standards and UTF-8
causing riffles on web pages for those who are not encoding-savvy.
MicroSoft relatively recently
introduced UniCode, using 2 bytes for each character, which makes
65,000+++ characters possible and was important for them to gain
market share in countries with languages that don't
use alphabets with a limited number of characters,
such as Chinese, traditional Japanese, and other written language
that draw a picture for each _word_ instead of each _letter_.
Collating Sequences: The EBCDIC code for the number 0 is actually
_larger_ (11110000) than the EBCDIC code the the letter A (11000000).
This isn't the case in ASCII. So, mainframes sort letters
before numbers and PCs and minicomputers sort numbers before numbers.
Issues like this need to be considered and appropriate
accomodations made made when data are exchanged among environments
with differenct collating sequences.
'Control Characters' in data are used to control devices like printers,
POS terminals, gas pumps, &c. They allow a host computer or server
to make a bell ring (more likely a beep today), make cash drawers slide
open (maybe DC1), move the printhead to the left side (carriage
return), move
the paper up one line, (line feed), or eject a page (form feed) on the
printers and other devices attached to it. Just as important, 'control characters'
are sent from a printer to a computer to tell the computer to stop transmitting when
the buffer is full, or when the lid is up.
Here are a couple of web pages that show some of
the character encoding schemes: ASCII, EBCDIC, UniCode, &c:
asciitable.com;
JimPrice.com.
- Memory Addressing Schemes
Many diagrams of memory, especially those used to introduce arrays and
other data structures, show memory arranged in rows and columns.
Most CPUs, however, 'see' or 'address' memory as one 'row' or
'vector'
of storage locations that can hold either one byte or one word of data.
The addresses represent the offset from the first location,
starting
with 0 and continue to 11111111, 1111111111111111,
11111111111111111111111111111111,
or 1111111111111111111111111111111111111111111111111111111111111111, or
more, depending on the size of the CPU's word.
The relative size of memory increases at a geometric rate relative to
the word size. Doubling the length of the word from 8 bits to 16
gets 256 * more discrete memory addresses, not just twice as
many. Getting to 128 bit processors (and IPv6) gets us to a
number we can't pronounce yet. (anybody discovering the proper
term please let me know)
| Bits |
Power of Two &
Era |
Number of discrete memory or IP
addresses using this many bits, and common lingo |
| 11111111 |
2 ^ 8
Early hobby micro & Mini |
256
Two hundred & fifty six,
the number of ASCII characters |
| 1111111111111111 |
2 ^ 16
Early Mini & IBM PC in the '80s |
65,536
64 Kilo |
| 11111111111111111111111111111111 |
2 ^ 32
Late Mini, Desktop & midrange in the 90's |
4,294,967,296
4 Giga |
11111111111111111111111111111111
11111111111111111111111111111111 |
2 ^ 64
Mainframe in from the 70's, desktop & mid-range since '04 or
so |
18,446,744,073,709,551,616
17.2 billion gigabytes;
16.8 million tera; or 16 exabytes |
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111 |
2 ^ 128
Some mid-range platforms provide _registers_ this size now, and
we suspect that NSA's Carnivore-class heavy iron can do this, and
that the next mainframes and big *ix
machines, 2012+, may evolve 128-bit CPUs when the 64-bit's 17.2 billion gigabytes
becomes a limit.
Already IPv6
routers can handle these 128 bit addresses. |
3.4028236692093846346337460743177e+38
or 340,282,366,920,938,463,463,374,607,431,768,211,456
Wikipedia's article on 128-bit
calls this value about 340.3 undecillion, and relates that in DEC's VAX line
the value is(/was) an OctaWord.
In slightly more human terms
this will allow 128-bit processors to directly reference
274,887,906,944 yottabytes. A yotta is roughly 281 billion.
As it is applied to IPv6 340.3 und provides approximately
665,570,793,348,866,943,898,599 addresses per square meter of
the surface of the planet Earth, or 665 quintillion addresses per
square meter. |
If a CPU addresses memory by words, memory might be 'wasted' when data
elements are less than a word. If a CPU addresses bytes, it can't
reference as much memory...
To get around limitations of treating memory like a simple 'vector'
with the end point limited by the size of the CPU's word,
some early computers used a 'page' and 'offset' scheme where one word stores
a 'page number' in memory and another word stores the offset from
the beginning of that page. This results in slower memory access, where 'pages' must be moved into the CPU's RAM
to be processed,
but it allowed big machines to address much larger amounts of RAM.
Today, many computers make use of 'Cache memory', which is usually
very fast, very expensive, and relatively small memory that 'sits
between' the CPU and the primary memory. Where RAM is usually a simple vector where the address
is the offset from the beginning, cache memory can be thought of as a 'table' with two 'columns'
and many 'rows'. The first column holds an address in RAM and the second holds (duplicates)
the data stored at that address.
Cache works as a 'staging
area' for data moving between RAM and the CPU, allowing frequently
accessed memory locations (some processes, like spreadsheets, graphics
or database, do a lot of this) to be accessed much quicker than if they
were in the larger but slower RAM. Since many CPU operations are iterative in nature,
there's a good chance that recently accessed addresses will need to be accessed again.
But, to make use of cache, the CPU has to add a step where it has to 'look up' then address stored in the cache
first to see if an address' data is _already_ in the cache before
fetching it from RAM. This 'lookup' takes more time than direct access to RAM and can have an adverse affect on speed if
cache is too
large for the processor or it is mismanaged. In the old days of desktop computing, the mid '80s and early '90s, cache was purchased
and installed separately in little slots next to the CPU on the mainboard, and it was possible to install too much cache.
As CPUs have become faster & faster the size of cache has become
larger and larger. And, it is built-on the chip directly adjacent to the CPU and is engineered for optimum performance.
Since the arrival of multi-core CPUs cacheing is a multi-level scheme,
where each of the cores has a little cache of its own and there is another cache shared by all the cores.
Here is an example of multi-level caching, using IBM's
Power SOC - System On a Chip.
These RISC chips are optimized for servers where there is a constant demand of
relatively light, data-oriented services _without_ GUI eye-candy.
CPUs like this are also deployed for blazing fast RAID and networked storage.
Intel CPUs are CISC chips optimized for
a constant demand for fewer, heavier, processes _with_ GUI eye-candy, processes by connecting a
fast-lane on the North Bridge to the CPU's Frontside, to maximize thruput
to/from graphics and RAM. Slower, secondary storage and i/o processors
compete for bandwidth on the South Bridge.
The amount of cache is one of the major
differences between the Gamer's 'top of the line' Pentium or Athlon and
a cheaper Celeron or Duron that works OK for his grandmother.
A user
who does nothing but get email and browse the internet doesn't need the
cache. A gamer, or graphics or spreadsheet artist, needs a big cache
to feel fulfilled.
- The Processor
A computer's CPU moves and manipulates data, always under the control
of some program, or software, or firmware. When a modern computer
is
powered on a 'bootstrap loader' initializes the memory and CPUs
registers and instructs the CPU to look to a ROM that holds
the BIOS (Basic Input & Output System), the BIOS initializes
mainboard interfaces with keyboard and mouse, then checks in with any
devices that may be present, like IDE drives, and PCI video, SCSI,
parallel, game, or serial devices. Then, control is passed to the
program referenced at the Master Boot Record, the Operating System
loads, and we expect it will run our applications soon after that.
A program, whether it's on a ROM, an EPROM, or a disk, is a series of
instructions each of which contain an Operation Code (OpCode, or
function) from the CPU's instruction set and Operand, which may be a
register, accumulator, data, or address for the function to use.
Basic CPU functions are like: move data from memory to a
register, add, subtract, multiply, divide, compare, branch if equal,
branch if not equal, branch unconditionally, start input, start output,
&c.
One difference between CISC
and RISC processors is that RISC implements only a limited
number of simple instructions which execute in about the same time,
always less than a 'clock tick'.
CISC
has a much expanded instruction set which mixes geometric and other
more
complex instructions in with the simple instructions. A CISC
processor
may have to waste time waiting for the next clock tick if a lot of
simple
instructions are executed, so the GigaHz are not utilized unless a lot of
the complex functions are used -- which, happily enough they are in
many
desktop environments today.
A RISC processor wastes little time waiting for the routine
instructions to complete, so it can accomplish more simple instructions
faster. When it has to do something geometric or otherwise
complex the OS software or application code has to provide the functionality.
There is an
almost religious zeal influencing most of the value judgements about
RISC or CISC at any point in time. Either works where
appropriately applied. In 2012 RISC processors are being
deployed in an increasing number of 'personal computing devices'
where cooler operation and lower power consumption are desirable
features.
- Instruction sets
All instructions are put to the CPU in the binary machine language
that is made up of elements of the 'instruction set' for the particular
CPU, or family of CPUs. Very few people 'think' in binary instructions, and their
job is to engineer a more accessible 'Assembler Language' that is easier
for people to think about, and an 'Assembler' to
produce binary instructions for the CPU.
Assembler languages provide mnemonics, abbreviations, that reflect a one-to-one
correspondence to the binary instructions that are engineered into a
particular processor. There are many more 'Assembler Programmers' (2GL) than
'Machine Code Programmers' (1GL), but they still have to 'think like a CPU' to get
good results. Their job is often to make higher-level languages and 'compilers' or
'interpreters' (3 & 4GL) used by most application developers these days.
Here's a site that provides 'Assembler Programming cards'
for a variety of (relatively old) CPUs. Compare an early RISC,
like 8088, to an early CISC, like 68000.
Wikipedia provides this list of instruction sets
showing the several CPUs still active in the legacy and lots that aren't.
Here is a 'pinout' of a Pentium to demo the complexity
at the hardware side of the CPU.
Attempts to make an OS or application built for one architecture run on
another have been somewhat successful.
For example: before Macs got Intel, a 'PC Emulator' would allow
software written for an Intel chip to run on a Mac with a Power chip, and a 'Mac
Emulator' would go the other way around if needed. The results weren't always
satisfactory, especially when a complex development environment like 'Visual Basic'
or 'PowerBuilder' tried to run on a Mac, or a complex Mac application
like PhotoShop tried to run on a PC. Students attempting application development
with Windows tools on their Mac were usually disappointed or stymied,
and operations that were quick on the real Mac were sluggish on the Mac-emulator.
- Processor components
A question like 'name and describe the components of a CPU'
is asking for the architecture of our Von Neumann machines, in the link above: clock, alu, icu, registers, ram.
A sketch in class will update single CPU to modern multi-core and multi-CPU
that handle SMP. (RISC's been doing SMP since the '70s, Intel not quite as long, has caught up to the state of the art...)
A question like 'describe the components of a PC' is asking for
components of a PC
including neighborhoods of a modern PCI bus: North & South bridges, divers controllers, PCIExpress/AGP,
&c...
-
Generations of computers
Although programmable 'data processing equipment' goes way back to the early 1900s, the architecture of
the 'digital computers' we use today only goes back to the 40s or 50s.
Across these fifty or so years of advances in digital computing
there have been four or more 'generations' of computers, depending on who is counting. Some
articles claim that we are at perhaps a 7th generation. But, most I
see describe four generations, the first 3 spanning the 40s thru the 70's,
and the 4th spanning three or four decades of amazing strides in miniaturization since the 70's:
- 1st - Processor and other components made of
relays and vacuum tubes, programs 'hardwired' or on punched cards,
ENIAC, mid-40s thru mid-50s, are roughly equivalent to Babbage's architecture
except done with electronics instead of cogs & clockwork.
- 2nd - Transistors replaced vacuum tubes
as discrete components soldered to circuit boards, Magnetic
Core RAM was state of the art. Better core reference. Punched cards, tape, magnetic drums, and disks provided 2ndary storage.
Most processes were 'batch jobs' involving punched cards and tapes, and huge amounts of carbon-paper.
- 3rd - Transistors built into Integrated
Circuits, 1958, put the 'CPU on a board'.
Wired.com posted a good article Fall of '08 on
50 Years of the IC.
- 4th - VLSI-Very Large Scale Integration, put the 'CPU on a
chip'. This is the generation of the Microprocessor,
about 1971. Solid-state RAM replaces Core. Magnetic Disk and Data Terminals become less and less expensive,
punched cards fade from use, tape is less involved in batch processing -- becomes king of archival storage media.
An excellent history of microprocessors and a detailed look at their manufacture
is at Intel's web museum.
- Everything since then has been nothing really
new as far as the CPU is concerned, just smaller & faster, and
usually less expensive. 'Personal Computing' started in the late '70s, is really interesting here 30 years after.
Enterprise computing pretty much relies on a legacy of systems developed in the '80s,
miniaturized at the new millineum, and extended to the web in the past decade.
Real-time, 100% available, paperless, workflow management, CRM, web-based and lots of other adjectives describe the
media these days.
Disk is cheap. RAM is cheap. Networking is cheap.
Tape is king for archiving, is mixing with SSD and other 'cloud' techniques.
- History of computing
Here's a detailed
History of Computing that covers early, mechanical computing devices thru the 2nd generation.
Here's a Wired Magazine take on
Low Tech Computing that gets the lineage about right.
Here's a Short
History of Computing that does a good job of highlighting the
important points and uses of computers thru their history.
This
brief history goes back to Shamans needing to keep track of the
seasons so they can know when to have their ceremonies and when to
plant things. Being able to
predict the moon's phases or an eclipse was considered heap big magic,
and
people did it without computers for a long time...
The History of
Computing Project, a work in progress I'm pleased to have googled
into, provides good images and insights.
- Machine Cycles
The text addresses this concept in the figures starting at 2.6.
Sit with these pages for a while and figure out concepts like:
clock, instruction counter, fetch, instruction time, execution time,
copying data from memory to registers & the other way, accumulator.
The text doesn't mention 'wait', but that's a good one to add on.
- The Little Man Computer
This is an exercise we'll do in class as another approach to answering
'how does a CPU work?' It assumes there's a 'little man' in the
computer that
does what the ICU and ALU do.
There is a simple set of 11 instructions
for the little man to follow.
This example follows one in Englander: The Architecture of Computer
Hardware and Systems Software, which was the prior text used for this
course. It has been adapted by Nottingham University and put on-line as
a pdf.
This
is a jpg we'll use in class to run thru a couple of LMC programs.
RAM is at the right, using a two-digit address to reference 100 memory
locations from 0 thru 99. Before the Start Button is pushed to get the
Little Man's attention, a program is loaded as 'program data' starting
at address 0. RAM can also hold 'plain data' after the program, it
being the programmer's
responsibility to keep program instructions from being overwritten by
data.
The Instruction Counter is a special register that holds the address of
the next instruction to fetch. When the Start button is pushed the, it
drops 0 into the first Instruction Cycle just before a bell rings
to signal the Little Man to start his Instruction Cycles.
The 'Register' is a general purpose register used to hold temporary
values. The LMC can only do arithmetic on data while they're in the
register, so complex calculations, like an average, will need to
shuffle data in & out of the register.
An InBox and OutBox are used for inputting data, up to three digits,
and for displaying results as output.
The instruction set is built in the computer where the Little Man can
reference it easily. Some instructions have two parts: the 'Op Code'
and an 'Operand'. In the LMC, instructions 1 thru 8 use the last two
'bytes' of a three 'byte' word to reference a memory address.
So, '311' means 'Write register contents to memory location 11'.
The instructor or some volunteer from the class plays the part of the
Little Man,
who has a lot more to do than the little man in the refrigerator.
LM has been trained to follow this Instruction Cycle:
- Read the Address in the Instruction Counter;
- Fetch the instruction at that address;
- Execute the instruction;
- Increment the Instruction Counter;
- Repeat.
The User of the LMC writes the program into RAM, starting at 0, then
presses the start button, setting a 0 in the instruction counter
and LM to work...
Here's a program that inputs two numbers and displays the difference
between them. We'll run this in class and work on a more complex program:
901
311
901
211
902
000
The assignment following it is to sit with a LMC for a while and work
out a couple of other programs, maybe even something that does
branches and loops.
The next 'Pop Quiz' will be to write another
program. A simple one like this will get 1 point. A more complex one
will be a candidate for 2 points, should use a branching instruction to
do something
like accumulate a total of entries until the user enters a zero,
or count the entries, maybe even average them if you want to obsess on
this for a while...
- The text doesn't have many pictures, staying at a
relatively high level. HowStuffWorks.com has an excellent section
about 'How PCs Work'
that will
help fill in for those who are interested in more details.
- Microcode
Here is another example of 'firmware', a program 'burned into' a chip.
A CPU's 'instruction set' is engineered into the circuitry.
'Microcode' is burned into PAL (Programmable Array Logic)
chips and other
main board components to add functionality not
engineered into
the CPU, perhaps as an oversight, or perhaps as a proprietary feature..
In the 'good old days' putting most of an OS's code into firmware was a
common way to save RAM for application programs and data on
minicomputers. OS upgrades
were few and far between, and most machines were maintained by their
manufacturers -- a new revision of the OS might require opening the box
and changing out the chips.
Today, microcode is found all kinds of places depending on the computer
at hand. On a gamer's PC it's in video processors so that
functions like 'apply texture' can be applied at the video card instead
of the main board's CPU. On a minicomputer it's involved in
managing multiple CPUs, perhaps on multiple main boards. Other
components that are stressed as host computers get larger to serve more
& more users are likely to be managed by firmware to free up the
CPU from i/o tasks, as with 'intelligent' i/o boards that allow
keystrokes of 100's, or 1000's of users to be ignored by the CPU -- vs.
the Desktop PC's CPU, which gets an interrupt for each keystroke or
mouse movement.
- I/O Devices
The textbook runs thru these right quickly and doesn't
provide any visuals. HowStuffWorks.com comes thru again,
providing good support for I/O Devices that are common on a PC, with How Does a
Parallel Printer Work as a good starting point. Use this and
other on-line resources to gain a good understanding of topics like
these:
- Input & Output are relative to the
computer. A keyboard is a basic input device, a monitor is basic
for output. Most OS's are rigged to support a 'console' with this
combination and perhaps a mouse.
- Other input devices read magnetic, optical,
sound, or electrical signals such as serial, network, or telephone.
Stripes on credit cards and the routing numbers on checks are examples
of magnetic data, so are floppy & hard disks.
Routing numbers at the bottom of a check are 'human readable', but
they are
shaped for easy reading by MICR (Magnetic Ink Character Recognition)
equipment, which has been in use for decades. If you have an application
that requires printing checks for multiple accounts you can run blank
checkstock thru a laser printer that uses magnetic toner so that
the routing numbers are MICR compliant, otherwise the bank will
charge a fee for having to handle the check manually. In '09,
some banks are doing optical processing of checks, and we may see
the requirement for magnetic ink & toner go away in some years.
OCR (Optical Character Recognition) devices read zip codes &
addresses off envelopes, and the grid sheets used for
tests. They can read a large proportion of tax and other
forms that provide blocks for us to enter numbers, saving the first,
labor-intensive, data entry step -- those that the software determines
are in error are shunted aside for a person to figure out.
Bar Code scanners have a nearly 100% 'first read rate' now and are so
easy to use that supermarkets let customers scan their own
merchandise. A barcode scanner on a PDA or other small computer
is very easy for an application's programmer to support. Often,
the scanner supplies data at the keyboard's interface.
Devices that answer incoming telephone calls provide an interface with
the PSTN. Speech as output as been very easy for computers for
decades, at least since 1975, and a rack of DECTalk processors has been
a feature in the computer rooms of banks and other enterprises that
have replaced persons with machines to provide customer support of one
kind or another. Since the middle 90s, cards on an ISA or PCI slot in a
standard PC chassis have replaced the stand-along DECTalks. Touchtones
have been very easy to use as input
for just as long, and all of us are familiar with the functions as a
result. Speech recognition has been more-or-less easy for at
least a decade, and coupled with the touchtone keypad and speech out
makes a telephone a versatile i/o device.
Modems provide serial data for a computer, either by plugging into a
serial port on the back of the machine if it is an 'external modem', or
by 'spoofing' this interface on the bus if an 'internal modem' is
used. Fax uses a similar technology and 'fax modems' allow a PC
user with appropriate software to send and receive faxes and may also
provide other 'telephony' features by plugging in a headset to the sound card's
mike and speaker jacks. Although DSL & Broadband Internet
have drastically reduced the number of modems in use, those
in the boonies still need them. Banking machines, credit-card
authorization, and other legacy applications that are tied
to POTS will keep modems
in use for the foreseeable future.
Laboratory equipment, gas pumps, and other devices for process control use 'serial networks', with RS232 the main flavor.
A computer that interfaces with lots of these devices may
have 100s of 'serial ports' for POS and other equipment. For example, A busy service station
is likely to have a Linux machine in
the back room with a gang of dozens or hundreds of serial ports to control the pumping and account for it.
Gas pumps have several
displays and devices that take advantage of robust and relatively
inexpensive serial network cabling and interface. There are many applications in
the systems legacy where ethernet is slowly replacing RS232 & other serial networking technology.
Where serial devices are located remotely, it's common to use a 'terminal server' that attaches on
one 'side' to a fast Ethernet, and on the other side to dozens or hundreds of serial ports.
Sloth is one of the defining features of RS232 networks, which may operate at
speeds as low as 300, 1200, 2400, 9600, or 19,200 _bits_ per second. It doesn't
take much speed to get data from a gas pump or send it to an LCD display or ticket printer.
The best connection speed we can expect from a modem is 30,000+ to 56,000 bits per second,
and it's likely to be slower if the 'last mile' of the POTS connection is more than a few miles.
Today's Ethernet Network Interface Connections (NICs) allow I/O with a LAN,
which may easily be attached to a Wide Area Network, or The
Internet, and enjoy bandwidths of 10, 100, or 1000 megabits per second. (Bandwidth & Throughput are related, but not equal!)
The ordinary NIC today accepts the familiar
RJ45 'CAT5 ethernet connector', or CAT5e, or CAT6.
But there are NICs for any digital media, whether
it's optical, RF, or copper coaxial cable.
Ethernet and other digital networks are generally much easier to administer and configure than
the older serial & 'current loop' networks that and provide the increased bandwidth required for today's networked computing applications.
The speed and flexibility of the USB combined with less
expensive 'nonvolatile memory' has made it easy to package removable memory devices
like 'thumb drives'
and 'smart
media' for computers and digital cameras, respectively.
A thumb drive is easy to carry from computer to computer and will plug
directly into the USB of either. A smart media card will fit in
a digital camera, and may be put into an inexpensive adaptor that plugs
into the USB.
- Printers are output devices that attach to
computers with a variety of interfaces. Serial & Parallel
ports provided the 'traditional' interfaces with printers for PCs until
the turn of the millennium, and now USB is the preferred interface.
Many printers 'in the legacy' use a serial interface with a host or
server computer, which may be difficult to configure since it is not
'as standard' as other printer
interfaces. In larger system there
may be dozens or hundreds of printers attached via serial interface.
More recently it's become economical to
plug printers into a wired LAN, or buy one with built-in wireless, so
they can be shared easily by the network's users. Some 'network
ready' printers 'plug and play' with Windows networking, others need an
administrator to assign them an IP address.
The parallel interface is still commonly found on Intel-based PCs,
although it disappeared from the Mac a few years back. It was for
decades the most common printer interface for the PC and Apple, and for
longer than that for the small computers in use before the IBM
PC. If a parallel printer needs to be used on a
network, a 'print
server' may be used, where the printer plugs into the server's parallel
port and the
server attaches to the LAN's hub or switch.
Printers are further categorized as 'impact' and 'non-impact' .
Applications that need to print on multi-part forms, perhaps to gather
signatures when services are performed or to make the customer's copy of a signed delivery ticket, require an impact printer
where the
print head
strikes thru a ribbon, either with a 'dot matrix'
or whole characters on a ball or chain. Impact printers range
from the ubiquitous Panasonic
KX-P3200, printing rental car agreements at something like 300 characters a second, thru a '
'Line Printer' that prints 1000s of lines per minute. Line printers are big & expensive,
and are in the legacy of equipment for medical & other billing, direct mail, and to meet
requirements for putting out lots of print on paper. In the past, a CFO might get 100's of pages printed
and dropped on her desk early each morning -- now the report's more likely to be on-line or in
a spreadsheet.
If there is no requirement for multi-part forms
Laser
and InkJet
printers are popular. Other technologies, like ion
deposition, might show up where large volumes of bills, statement, or
other business documents are printed.
Lasers (and LCD Array) give the sharpest edges and print quickly, are
preferred for business correspondence. Where color lasers were very expensive
thru the '90s, thousands of dollars for the printer and upwards of 50 cents per
page, they are now much more affordable and the cost per page is less.
High-speed laser & laser-like printers, such as ion-deposition, are replacing traditional
'line printers' where there is no requirement for multi-part forms. These printers
can print thousands of lines per minute and are found in organizations with large numbers of invoices, statements, insurance statements,
or other 'personalized' forms that need to be sent via post office. Nearby there are
likely to be found machines to remove perforations, 'burst' continuous forms, stuff envelopes
with the printed forms, return envelopes, and sales material. 'Mailing houses', 'medical billing',
and other service-providers are motivated to keep these expensive and cantankerous machines operating at
full capacity, so often the 'printing needs' for a large organization or a group of them are
handled from one location.
InkJets print color
(somewhat) economically and are good general purpose printers for home
or business applications. Disposable ink jets and laser
components make these printers easy to keep running in the field.
Ink for the inkjet printers varies in its ability to resist moisture, some
smears if touched with anything wet and some are more 'water resistant'.
Inkjet ink is transparent, so color is enhanced by printing on very white stock
or on special 'photo paper'. Inkjets generally can't print on colored stock.
Dye-sublimation, dye-sub,
or wax sublimation printers print with opaque 'ink' and some can print true colors
on colored stock, or lay down metallic-appearing patterns. An electronic process is
used to blast a waxy, solid ink off a ribbon onto the paper.
'Sublimation' is when a substance changes state without going thru the normally interveining
state, as when ice changes directly to vapor without going thru the liquid state. Dye-sub
printers move the solid dye directly to solid color without the dye becoming liquid.
This technology is commonly used for 'photo printers', whether personal
or in photo-kiosks in stores. Dye-sublimation is also adaptable to printing directly or transferring onto fabrics, competing with
the traditional silk-screening processes by removing the costly 'set up time' for a screen press.
Thermal printers are also non-impact printers, and are ubiquitous in point-of-sale devices such as
cash registers and anytime banking machines. They allow easy replacement of paper without the problems
of changing ribbons or keeping ink residue out of print mechanisms. Thermal paper doesn't store very
well so is not 'archival', but for temporary purposes like printed receipts it works fine.
Here's a new class of
'printer' that's reminiscent of the replicators on Starship Enterprise.
The low end of ZCorp's line is about $30,000.
HP is working on a prototype 3D printer that will be more affordable. Only a few
thousand dollars and you can print out your coffee mug in the
morning.
Out at USC, they're building prototype devices that shoot blobs of concrete
instead of ink, then trowel them to make adobe-like structures.
They're exploiting techniques of
'Sintering' & 'Contour Crafting' to
'print' out buildings directly from the architect's plans.
They want to couple this 'concrete jet printer' with a gantry that can position joists into the mix to make multi-floor buildings.
Here's competition from
from those clever Italians.
Here's more recent competition from the Open Source community:
Cupcake CNC
- SCSI interfaces
are still in the legacy and provide fast data transfer, usually for hard disk, tape drives, or
scanners and sometimes for printers.
But, other technologies have 'leap frogged' RAID today, where better performance and simpler
administration are found with new, fast IDE or SATA drives. Scanners are likely to be USB today.
Tape drives are likely to be SCSI.
There are several variations on SCSI and costly
mistakes can be avoided by checking specs carefully to make sure that
the required combination of interface card, cable, terminator resistor
for your SCSI device is on hand or you're likely to have to button the
chassis back up and try on another day.
SCSI can address lots of devices relative to other interfaces.
Where an IDE disk controller can address two only disk drives, SCSI can
address 7 or 15 per controller, and as many controllers can be used as
slots are available. A 'tower' holding 15 CDROMs is likely to be
a SCSI device, as is a RAID.
- 'MultiMedia' interfaces make a PC a good
candidate for running a home entertainment center, a recording studio,
games, or other graphic & sound intensive tasks.
Although there are 'built for purpose' devices for any of these
applications, specialized interfaces and software allow a PC to be used
instead.
- Secondary Storage, Data on Disk
& Tape
The text clearly, if not very colorfully, describes Secondary Storage
as a 'fast, accurate, inexpensive, high-capacity, nonvolatile extension
of main memory.' A computer doesn't operate _directly_ on data in
secondary storage, but it can quickly move data from disk to memory and
CPU registers, make changes there, then copy what's in memory back to
the disk.
Modern operating systems, both desktop and
multi-user, include functions
to 'swap' contents of RAM in & out of disk storage so that
there is very large 'virtual memory' available for applications.
The rub is, of course, that access to data in RAM is thousands of times
faster than access to data on a disk or other secondary storage device,
and the swapping makes the system run
visibly slower. A Windows user with minimum RAM soon learns to
keep minimum applications open at a time to avoid the swapping
activity.
Important terms for organizing data on disks are: read/write heads,
track, sector, cylinder, seek time, and rotational delay.
The amount of time that it takes for the read/write arm to move from
the outermost to the innermost track is seek time. This is a
'worst case' time since in many cases the head only needs to move one
track at a time and the circuitry can pull this off very quickly.
Since most hard disks have multiple 'platters' on their spindle making
a 'disk pack',
'surface' is another important term. All the read/write heads move over or
sit
on the same track at the same time, describing a 'cylinder'.
Hard Disks organize data in cylinders to minimize movement of the
read/write heads by a factor determined by how many surfaces are in the cylinder. This example assumes 4 platters in a pack,
providing a
read/write head (0-7) for each disk surface, each surface has sectors (0 - 7), and 'contiguous
space' large enough to hold the file written to disk:
- It locates free space starting at sector 2 on
track 101 on
head zero and keeps track of the file's location in a 'directory'
for the disk.
- It will fill track 101's other 6 sectors on
surface zero.
- Then, it doesn't move the arm to track 102,
instead it continues on
to fill the other surfaces in the current cylinder: track 101 on head
one, two, three, four, five, & six
are filled.
- The head moves to cylinder 102 (a very small
movement, very little seek time) and fills all the tracks in cylinder
102 from surface 0 thru 6 before moving to cylinder 103.
- It continues on like this until the entire file is
written
to the disk.
- When the file needs to be read in the future,
the starting point Cylinder, Head, & Sector) is looked up
in the directory, the head moves to the cylinder, waits for the
starting sector to rotate under the head servicing that surface, and then the
data spill back to RAM very
quickly, reading a complete cylinder before scooching the head over just a tiny
bit to
the next cylinder.
The minimum unit of data transferred from secondary storage is
referred to as a 'block'. On a disk blocksize is likely to
correspond to a sector's size or be a multiple of sector
size. On a tape the blocksize is determined by the tape
drive's buffer size. Common blocksizes on tapes run like 512,
5120, 16384.
On a hard disk the sector might
be 512 or 2048 bytes, with more modern PC drives using sectors of 4,096
bytes. Since IDE drives arrived, 'clustering' of several sectors to make 'logical blocks'
and 'soft-sectoring' have become the standard
and the drive electronics are able to make 'block sizes' multiples of the actual size
of data in the sectors.
If a record is smaller than the block size, the space is
'wasted' or 'slack space' (since directory entries only reference starting sector) and
isn't available for another file. Computers storing
a lot of small records are better off with small blocks. A
desktop PC that's always storing large documents or other datasets is
better off with large blocks.
Modern disks can be run using either CHS-Cylinder Head & Sector addressing or
LBA-Logical Block Addressing.
Mainframe and large storage networks may
access their disks so quickly, with dedicated channels and i/o
processors, that they block a whole cylinder at a
time.
One of today's ordinary PC hard disks is the IDE (Integrated Drive
Electronics) Drive, or the more modern SATA-Serial ATA drives.
These Drives have an 'on board controller', the IDE, that
allows
a PC's BIOS to automatically adjust itself to suit the drive. They also handle
most of the computations for reading & writing data and managing
bad sectors on the drive. This frees up the CPU from these
mundane tasks, and frees up the machine's administrator, or owner, from manually
configuring BIOS and manually configuring drives as sectors and tracks inevitably go bad. At this point
in time IDE and SATA drives with 10,000+ RPM and 166 mbps transfer rates are
quick, and newer SATA technology is providing much quicker transfer.
Most drives today, SCSI, IDE, or SATA, have SMART-Self Monitoring And Reporting Technology,
which maintains a little
database with drive performance and the results of little 'self tests' the drive conducts.
Linux, Windows, and other OSs provide access to SMART reports which can give a system administrator
early warning of incipient disk failures. Hard drives in servers that stay on 100% and are
never moved seldom fail without kicking out warning
messages, so it pays to watch the SMART logs, or to use a tool like Nagios
or IBM's or Windows' Hypervisor to watch them for you...
The 'standard' (at Winter '05) HDD was fast becoming 'Serial ATA' (Advanced Technology
Attachment). SATA entered the market able to burst at 150 mbps on the CPU side, and
able to handle a 1.5 gbps stream at the storage unit's interface. This is becoming more and
more important as disk storage is becoming more 'pooled' and schemes for RAID and Grid computing
depend on autonomous storage devices. We can expect 600 mbps by 2007 since the technology appears
to be doubling speeds and capacity in three-year cycles.
SATA drives offer another benefit, since they have a very thin cable terminating
in a keyed 8mm plug, where IDE and SCSI require a wide ribbon cable with connectors that are
nearly as bulky as a new, small drive.
This makes SATA desirable for
making small computers smaller, and for managing cables in a workstation or server machine with lots of drives.
A SATA cable can be up to 1 meter
in length, which helps in mid-range, rack-mounted or blade server applications where disk drive may be
located in a chassis near the CPU using ESATA (External SATA) technology, similar to SCSI.
SATA technology inherited the SMART and other functionality of the IDE drives of the prior
generation -- the transition has been very easy.
'SSD', Solid State Disk, was becoming very affordable in 2010 and the trend continues into 2012.
Since there is no 'rotational delay'
and minimum 'seek time' they can deliver data faster than a traditional magnetic disk drive.
They save data slower than they serve it up, and some have a limit as to how many times the data
can be updated -- from tens of thousands to 100 thousand. So, they're best suited for storing
'archival data' that's not going to be changed. Several manufacturers are making chassis to accomodate
huge numbers of SSD devices, and they are becoming feasible for use in enterprise backup systems.
A few years back, at Y2K, the more expensive SCSI drives were the choice for
high speed performance, but now the more affordable IDE and SATA drives are as fast or faster than
the fastest SCSI drives. SCSI drives are more likely to be used where
the OS requires them (some *ix for Motorola platforms don't like IDE drives), or where more
than two drives are required on a controller, perhaps to make a RAID. One SCSI controller
can handle 7 or 15 devices.
- Backup
Here's another opportunity to harp on this important topic:
Data are not 'backed up' unless the backup media are stored off-site asap!
It is plainly stupid to keep all backup copies near the backed-up system!
Business backup/recovery requires multiple 'backup sets' so
files or tables can be recovered to a prior point in time!
Half of businesses that experience a 'data disaster' without proper backup fail!
It might be inconvenient or a bummer for a person to lose their hard
disk at home due to hardware failure or some cracker or virus getting
to it.
It would be disastrous for an enterprise to lose their data. Insurance for
'lost records' is difficult to collect, and the expense of a computer
disaster may be fatal for a company.
Knowing this, there are several schemes for backing
up corporate data (tape/SSD, transaction logging, hot-sites, clouds, grids)
as discussed earlier that help to make data practically 100% safe.
There are many inexpensive options for
backing up your personal data, and a good rule for any system is to
keep multiple copies of the data on multiple media stored
off-site. I often suggest that students keep an email account with a large
inbox so they can email themselves a copy of their project's files to
protect against failure of a zip disk or a thumbdrive.
Even better, pay $50+ a year and get an account at carbonite.com or mozy.com.
Backing Up Business & Enterprise Data
In a class like this we need to be considering caring
for other people's data, whether for a small business or huge enterprise.
A casual user of a PC can get along fine with a backup scheme that makes
it easy to recover _the most recent copy_ of a lost file or a whole PC or notebook.
Services like Carbonite or Dell cost about $50 a year and work automatically
to keep a backup copy in a cloud.
This single, most recent, backup copy isn't sufficient for business
or enterprise-class computing where it's important to be able to be able to recover
data from some time ago. 'Problems' with business data are often not discovered
for some days after they are caused, perhaps when reconciling accounts or reviewing
monthly or other periodic reports. So, it's desirable to keep at least
several months of backup copies available, a year or more is even better.
I like this response I got back on a quiz
question about backup a few years back: Systems Administrators cover
their butts by choosing appropriate backup methods and making sure that
all enterprise data are backed up on (at least) a daily basis.
They get the backup media (probably tape) off-site, keep it secure, and
regularly verify that the backup of their system can be reliably read
on another system, sometimes practicing a disaster recovery using it. They
document the procedures and log this activity so that another Systems
Administrator can easily restore the system in the wake of some
disaster, even if they meet their demise in it.
A reference like Hewlett-Packard's 'Storage'
section of their website suggests quite a range of storage devices including
disk, tape, and network devices. Similar can be found at IBM's website, or by googling.
Tandberg has been building 'real' tape units for decades
and also manufacturers tape drives and components for other 'original equipment manufacturers'.
Their home page shows an interesting twist in modern backup systems, where they make large
disk storage devices that 'emulate' the benefits of traditional tape storage systems, allowing the storage
of lots and lots of sets of data. Where an individual might be happy keeping a backup of only the
most recent version of a file, diligence in business or enterprise backup systems often requires
keeping backup copies for the prior 90 days or longer so that any file, or table, can be recovered
when needed.
'Tape Storage' devices range from the six-tape $2700 jukebox demo'd in class to really _huge_
jukeboxes like this Tape Storage Robot that
can store multi-petabytes of data 'near line', where the robot can find a tape an mount it on a
drive while the user waits a short time.
For a look closer to home, check out the Services page of
VCU's Computer Center.
Wikipedia's article on
Magnetic Tape Storage
gives good coverage of this important, somewhat retro, technology.
Here in 2012 there is no more
economical way to archive lots of data than tape,
but other factors like quick, direct access to archived data instead of slower,
sequential access also enter into the equation.
Emerging technologies are supplementing and/or replacing tape in some enterprises.
SSD-Solid State Disk devices are becoming more affordable and a large chassis can hold lots of them.
They will offer much faster access to archived data since each SSD is always 'on-line' and
doesn't have to be found and mounted to be read. Here's a Ziff-Davis article on
growing trends in SSD.
- RAID
(Redundant Array of Inexpensive Disks) devices help make data on disk
'highly available', promising that one disk in the array can fail and
the array will still provide accurate data until the array can be taken
down (some are hot-swappable so they don't need to be powered
off.
RAIDs are so reliable that it's tempting to think that a RAID is all
that's needed to
keep data permanently... But, RAID doesn't provide the most
important part of keeping data
safe & secure, and that is 'get backup copies off-site asap'. Then, the theft of the
RAID or a fire that destroys everything in the the computer room won't
destroy the only existing copies
of the enterprise's data. These events are unlikely, but stuff happens,
and the
possibility of a disaster must be considered so that there is minimum effect
when it hits.
The more likely cause of a 'data disaster' is some human factor. A
n incompetent or
disgruntled employee with system access can wreak a data disaster. Or it may be 'socially
engineered' by some cracker intent on vandalizing
the system.
Lightning, rain, snow, floods, errant fork-lifts, fires, earthquakes and other 'natural disasters'
need not cause a 'data disaster'. If disaster avoidance and planning has been commensurate with the
value of the enterprise's data 'the system' will be back up without interrupting business.
-
Tape is the classic backup medium for computers. Depending on
the class of computer, there will be a tape that is more-or-less
standard. For example, Unix machines almost all use 4mm
DAT like these from HP, which can write up to 72 GBytes of data on
a $9 DAT tape at a rate of from about 8 to 20 GBytes and
hour. A PC might use an inexpensive tape drive connected to
the floppy controller that can backup a couple of Gigabytes in six or
eight hours.
HP, Seagate, Tandberg, and a couple of
other manufacturers provide popular tape drives that will work with
most computers. IBM leads the
world's technology for getting data from disk to tape and a properly
selected set of their proprietary tape drives and operating system can
get terabytes of data on tape in an overnight shift.
-
DVD & CD always pop up when discussing backup, and they are a good
option for a personal desktop computer. But, they are
still slow and expensive compared to tape for backing up an enerprise's
data. They do provide inexpensive and relatively quick
access to relatively 'permanent' data. For example, many
municipalities have copied their old record books to CD and keep adding
new records to them, allowing them to store original records and save
the wear & tear of the public's fingers on them -- and in the event
the courthouse burns down the public records remain safe.
-
Backing up data to another machine via a network or The Internet is an
option where it is feasible. If there is enough bandwidth available
it's easy to make a 'full backup' producing a 'zipped tarball' of the enterprise's database and
use a copy utility to place a copy on a server in another section of town or another state.
'Transaction
logging' at the DBMS or OS level (rsynch) is often used over a fast WAN link to make an immediate backup of each user's
activity at a remote 'hot site' or 'warm site' so that users' keystrokes made since the last 'full backup' aren't lost.
The diligence of a systems administrator is to oversee a combination of full backups and
transaction logs on tapes and/or internetworked machines, regularly verify them, and be
able to recover the system using them in the wake of a data disaster.
-
The only claim that can honestly be made for any disk device is that it
may fail at anytime. A 'mean time between failure' of five, or fifteen, years
quoted by a manufacturer means the _average_ is five years or fifteen. Some
may fail within days, others after ten years, and the average isn't a
good thing to bet on without backup.
Even a RAID may fail. A couple years back, VCUNet had a RAID
controller
fail and it took a full day to recover our data (email) from
tape. This is a pro-run shop, though, and they had a tape backup
from before the failure and were able to capture and keep all the
email that arrived after it --
otherwise all our email on the server would be gone as would much of
what arrived while it was down.
The $250,000 Sun we've got in the 4th floor computer room has a pair of
dual-processor computers and a pair of RAIDs
that may be rigged to provide a 'redundant redundant array', but no
SysAdmin would consider leaving out the tape backup. The system
is resilient to the failure of a processor or a disk in a RAID, but it
is not going to stand up to a fire in the computer room.
Tape is still key in the backup of large systems as of Spring '09.
A discussion of RAID and 'storage' in general needs to mention SAN-Storage Area Networks and
NAS-Network Attached Storage technologies that are becoming more prevalent.
In short, these are devices that handle data and serve it up over a LAN
so that it can be accessed by more than
one computer, as in a server-farm or a blade chassis. Blazing-fast networks these days
make it quick to get data from a NAS or SAN instead of housing the disks in the same
chassis as the computer. Google 'SAN vs NAS'
for a discussion of the simalarities and differences between these important technologies.
- 'Fault Tolerant' hardware, 'hot sites' & 'warm sites', 'grid computing', and 'clouds' are techniques
to assure continuous availabiltity of computing services in the wake of some 'disaster' at
one or more computer centers. An ordinary PC has no features for 'high availability' or fault tolerance:
it only has _one_ of any critical components: one power supply, one CPU, one CPU cooling fan, one RAM...
The failure of any one of these components stops everything. If business depends on its databases and systems to satisfy its customers,
as many do, the typical low-end server is not a wise choice of hardware.
Mid and High-end, fault tolerant, or 'highly available' computers have two or more of each of the critical components.
Their OS has been rigged so that 'the system' continues to operate when a component fails.
For example: Two or more power supplies supply power to the chassis. When one begins to fail it raises an event
the OS has been rigged to 'notice'. Email is generated for the systems administrator, or another 'alarm' is
raised. The hardware's support company (probably a manufacturer like IBM, Stratus, or Tandem, ...) also gets notice of the malfunction and
sends a replacement power supply. When the power supply arrives in the next day's UPS,
the systems administrator pulls the old one from its slot in the chassis (probably identified by a flashy red light),
plugs in the new one, and returns the failed device to the manufacturer.
In a 'fault tolerant' machine, you can substitute 'CPU', 'Memory Unit', 'Disk Controller', 'Disk Drive', 'Cooling Fan', or other critical component for the
'Power Supply' in the above example. If we can keep power to the chassis, using UPS & generator, they are likely to keep running.
But, having a fault tolerant machine does not provide complete system availability in the wake of some local disaster like
fire in the switch room; air-conditioner failing in the computer room; roof collapsing in a heavy snow/ice storm;
fire burning down the building and everything in it; &c.
To get to the highest degree of availability in spite of a local or regional disaster:
put two or more fault tolerant machines in widely separated locations; connect them with high-speed digital networks;
run OS services that keep them continually 'in synch' with one another; provide redundant connections to telephone and/or
data networks so that customers can always call. This way, even if the Home Office somehow gets obliterated
the customers and employees everywhere else will still have access to 'the system'.
The cost of continuously available computing resources is high,
but it must be weighed against the costs of not doing business.
A global operation like ebay or an airline, for example, whose entire enterprise is 'information based' cannot
tolerate the system being down. 'Grid Computing' is a technique involving more than two physical sites in a scheme
to provide excellent response time from a system when everything working and continuous service when components fail.
Compromise is available for enterprises that can stand some 'down time' without losing any business.
'Hot Sites' can be contracted to keep a 'backup computer' running 'in synch' with the main computer.
In the event of some local disaster or failure of the main computer, users can relatively quickly be connected to
the backup system and continue business.
'Warm Sites' may not run continually in synch, but promise quick availability of hardware with compatable OS & _tape drive_
that can be used to restore business to the state of the most recent backup tape, then manual procedures and paper audit trails are
used to ensure no loss of transactions. A small bank, for example, might not be able to bear the expense of a hot site agreement
but is able to satisfy bank regulations using good tape backup procedures and quick availability of 'commodity' computing equipment.
- Interfacing the components
Different signals are used to communicate with and represent data on
different devices in a
computer system. The same scheme is usually used within the CPU
&
RAM. But every other 'peripheral device' varies in the way
it represents data: keyboards, printers, mice, printers, modems,
speakers, &c
all use different cabling and data encoding.
This isn't a problem for us because peripherals don't attach directly
to the
CPU. They attach via an 'interface board' that translates data from one
medium and encoding scheme to one suitable to put on the bus to the
CPU.
An AGP (Advanced Graphic Processor) card is a good example of an
interface, where data are represented as RGB & other signals on
their way to the monitor thru a 15-pin connector and cable outside the
chassis. On the other end of the AGP (or other) processor, it
plugs into the bus and data are represented as bits on the bus ride
from the CPU.
For every slot on a bus (AGP, PCI, ISA (outmoded), Microchannel, Motorola,
&c)
there is, or was, an adaptor card that interfaces a monitor, or other device, to the computer.
If you want to plug your computer into the telephone network (PSTN, POTS) you can
get a modem card and interface it directly to the computer's bus.
Or you can get an external modem that interfaces with the computer's
serial port, which interfaces with the bus. External modems have lights on
them that flash to indicate data transmission and EDC (Error Detection
& Correction by retransmission) activity
An ISP or large enterprise that needs to support a large number of dialup calls will place a 'card' in
a communications chassis that can handle dozens or hundreds of telephone calls to an 800- or other telephone number.
This allows a midrange or larger computer, or a 'server farm' to interface via a highspeed LAN with the callers, or their computers,
instead of sitting a lot of external modems on shelves, or using a lot of PCs with modem cards.
Lucent
(telecommunications equipment maker, spun off Ma Bell) is a safe choice for such equipment.
The USB provides a relatively new standard for interfacing peripherals of all
types with PCs and Macs today. So, we have to buy fewer interface cards to handle cameras, printers,
modems, network interfaces, and other devices that are built for USB.
Many desktop mainboards have the adaptors 'built into' them. If
you look at the rear of the computer and see that the monitor's
connection is directly on the mainboard, the video processor is built
into the mainboard. Today it's ordinary to have AGP or VGA,
Ethernet, disk, modem, PC slot, WiFi, and/or other interfaces built onto the mainboard
where each required a separate interface card in the past. This makes
the purchase price lower, but carries some risk: that we'll be stuck with
an inferior controller when we want to upgrade some peripheral; or, some surge on the phone line that would fry only
an external modem will fry
our whole computer if we have an internal modem or one built into the mainboard;
or, that we'll have to replace the whole mainboard if a component
fails.
- Buffers
Devices attached to computers & networks handle data at different
rates and 'buffers' help the hardware
accommodate the varying rates of
speed.
For example, a
computer might be capable of delivering data to the printer interface at 38,400
or more characters per second. If a 90 character per second Okidata is hammering thru a
five part form
at 2 pages per minute, most of the time the printer is telling the computer 'I am
busy, hold onto your data...' and the computer is kept busy watching
for the signal to transmit more data to the printer.
We say the printer provides 'flow control' or 'pacing' data for the
computer so that the
very fast computer doesn't overflow the very slow printer's
buffer.
A printer may
have a small amount of memory that it uses to buffer it's interface to
the cable, perhaps 512 characters. If the computer is
delivering data at 19.2 kbps (an ordinary setting for a serial port)
and the printer is printing at 30 cps (300 bps) the
printer's buffer will get filled quickly, and the printer will transmit
the 'stop
sending data' signal when the buffer reaches some arbitrary level, like
80%. Then, as the printer prints, the buffer gets close to
'empty' the
printer sends a 'send more data' signal and the computer complies with
another 'bufferload' of data.
'Software' flow control signals are embedded in the data stream on the conductor used to transmit data.
For example: In RS-232 networks an ASCII character 19, XOFF, sent to the host computer means 'pause transmission'
and character 17, XON, means 'continue transmission'.
RS-232 also supports 'hardware flow control',
where a separate conductor in the cable carries the signal.
For example, the printer putting 0 thru +12 volts on pin 11 might mean
'pause' and -12 on pin 11 means 'continue'..
Most devices involved in transfer of data involve some sort of
buffering
mechanism or scheme, whether they're at a computer's interface or some
network
device: The ports on switches in LANs, and routers on internets, have buffers
help smooth the flow of data among nodes on the
network. Keyboards and mouses (mice?) are 'buffered character devices'.
Disk drives and DMA devices buffer blocks of data moved among disk and RAM --
large systems may buffer many cylinders' data at a time to avoid repeatedly seeking
a cylinder since the buffer's memory is much quicker than the disk's seek time.
TCP/IP provides for buffering data
and flow control of internet traffic. The receiving node 'watches' its buffers
performance & as part of its acknowledgements it transmits data
about how much data the remote system should transmit next.
This 'sliding window' flow control allows web servers to adapt to
sluggish, less-than-reliable network situation.
Even the keyboard on a PC has a buffer which can overflow if some
process has the CPU 'locked up' so it ignores keyboard interrupts which
ordinarily get a CPU's attention. Users of older, slower
machines with less-capable OSs have heard the beep delivered up by the
computer when the keyboard buffer overflows, maybe at about a dozen or
20 characters.
- Channels & Dedicated Processors
Larger computers, from mid-range thru mainframes, add other layers of circuitry to help meet the special demands
of hundreds or thousands of users of an application server or host computer.
Their circuits include 'Channels' which interface dedicated 'i/o control units'
with the computer's 'main bus'.
A desktop PC has to dedicate only a very small portion of its
CPU power to servicing _one_ user's keyboard interrupts.
We generally get a crisp response at each keystroke and
our PC
still has plenty left over for browsing the web, recalculating a
spreadsheet, or showing that DVD.
But, a mid-range computer supporting a couple hundred users would have
to spend so much CPU bandwidth handling keyboard interrupts
generated by all these users, and would have little time left for processing other data.
So, a mid-range computer supporting 1024 on-line terminal sessions might have 32 'terminal controllers' that each handle
32 terminals each. To facilitate placement of lots of CPUS & all these dedicated processors, a mid-range or larger computer is
often housed in more than one 'bay' or 'chassis', and the busses a combination of cables that connect
the 'backplanes' in each chassis together.
A PC/Workstation/Server class machine has only one main PCI bus about a foot in length with one switchset to
coordinate the activity on the bus. In contrast, a large mid-range or mainframe computer has multiple
busses arranged both horizontally and vertically with switchsets at the intersections to manage traffic on the busses.
Click Classic Tandem Bus to see an example from an early Tandem computer.
Instead of a minicomputer's CPUs handling all the keyboard interrupts of hundreds or
thousands of users stroking several keys a second,
these larger computers offload most keyboard interrupt processing to a specialized microprocessor that
echoes each character as it is input and adds it to a buffer maintained for each keyboard.
These 'intelligent i/o processors' only interrupt the 'main CPU' when the user hits the
enter key, or uses an editing key that needs immediate response.
Similarly, a PC only has to service one or a few hard disks where
a minicomputer or mainframe may have dozens, hundreds, or thousands. So,
larger computers place multiple, dedicated i/o controllers on the system busses to
streamline disk access, providing several DMA channels where a PC only has one. Ditto for network interfaces,
storage, tape drives, and other devices required in enterprise-class computing.
So, this is a major difference between desktop systems and larger minis
or mainframes, and this is what accounts for the big price on these machines.
It used to be easier to observe when the channels'
controllers were in large, separate chassis connected by large cables to
the CPU chassis. In the old days, if we had access to the computer room, we
might lean on a disk controller and chat with the operator who is
leaning on the tape controller while waiting for data on our tape to
move to disk. A large computer room would have dozens of disk drives
and a half-dozen tape units and their controllers were obvious.
Now channels and controllers are generally contained in one
chassis where a 'backplane' with much wider slots than on a PC's
mainboard connects the components.
Several specialized i/o controllers can be located very near the CPU, eliminating the
long cable to a controller. Since data move about
a about 1.8 feet in a nanosecond, getting components closer together is important as
more and more instructions are executed, and more data is expected to move, in that nanosecond.
It's easy to say something like 'this wristwatch has more power than an IBM
mainframe did in the 60s'. But it would be very difficult to make that
wristwatch handle the i/o from a dozen disk drives, several tape
drives, card readers and printers...
|