Is data only on hard drive?

When it comes to data storage, many people assume that data exists only on their computer’s hard drive or on external storage devices like USB drives and external hard disks. However, data can exist in many other places besides local storage. In this article, we’ll explore the different places where data can be stored and how it gets there.

Data Storage on Local Devices

The most obvious place that data is stored is on the hard drive inside your computer or laptop. This is where the operating system, programs, and files you create are located by default. When you save a file, it gets written to a physical location on the hard drive which can then be retrieved later to open and edit the file again. The hard drive uses magnetic recording techniques to store and rewrite data.

External storage devices like USB flash drives, external hard disks, and CDs/DVDs also store data locally. When you copy files onto these devices, you are physically transferring the data and storing it on that device. The advantage of external storage is that it allows you to transfer data from one computer to another by simply plugging the device into different machines.

Data stored on local devices is easily accessible as long as the storage medium is intact. However, local storage has higher risk of data loss since the physical devices can fail, be damaged, or be lost or stolen. Storing duplicate copies on multiple external devices can mitigate this risk.

Cloud Storage

Cloud storage services have become hugely popular in recent years as an alternative to local storage. Services like Dropbox, Google Drive, Apple iCloud and Microsoft OneDrive allow you to upload files from your local device to store them remotely on Internet servers owned by the cloud provider. This gives you the ability to access your data from any Internet-connected device by logging into your cloud account.

When you update a file stored in the cloud, the updated version also exists on the provider’s servers rather than just on your local machine. Cloud storage allows easier sharing of data with other people online as you can invite collaborators to view or edit files in your cloud account. Most cloud services also keep multiple redundant copies of your data on their servers to prevent data loss.

The disadvantage of cloud storage is that you need an active Internet connection to access your data. Also, free cloud accounts typically have storage limits unless you pay for premium accounts. Storing extremely sensitive data in the cloud may also raise privacy concerns.

Types of Cloud Storage

Some major types of cloud storage services include:

  • File storage – Services like Dropbox and Google Drive store files and allow sharing/collaboration.
  • Photo storage – Flickr, Google Photos store images and provide editing tools.
  • Music/video storage – YouTube, Vimeo, SoundCloud allow uploading media files.
  • Document storage – Google Docs, Office 365 store documents and allow live collaboration.
  • Backup storage – Backblaze, Carbonite make backups of local data.

Network Attached Storage (NAS)

NAS devices are dedicated file storage appliances connected to a local area network. This allows all devices on that network like computers, smartphones, tablets etc. to access and share files stored on the NAS. Data stored on a NAS is not dependent on any single device being turned on.

A NAS consists of hard drives inside an enclosure with an operating system optimized for storage and data sharing. Features like built-in RAID technology, data backup, access permissions, and encryption are available. NAS devices are a popular choice for small businesses, households and even personal media storage.

Database Storage

Databases are used to store and manage vast amounts of structured data. Facebook, Google, banks and other large organizations rely on databases to handle their enormous user/account data, financial records, inventory etc. This data is persisted by storing it on physical storage media like high-capacity hard disks and tape drives.

Database Management Systems (DBMS) software defines the layout of the database and provides tools to add, access, secure and back up the data. The total storage capacity can be easily scaled by adding more hardware. Users and applications interact with the database via a query language without needing direct access to the underlying filesystem.

Common database examples include Oracle Database, MySQL, Microsoft SQL Server, PostgreSQL and MongoDB. The Structured Query Language (SQL) is the standard language for querying and manipulating relational databases.

Advantages of Database Storage

  • Structured storage format allowing complex querying
  • Access control to allow multiple users
  • Scalability to vast data sizes
  • Reliability features like replication, failover
  • Encryption and access logging for security

Distributed File Systems

A distributed file system allows files to be stored transparently across multiple servers located in different physical locations. This is commonly used in large scale cloud computing implementations. The files are replicated across the network so there is no single point of failure. Additional servers can be easily added to scale up capacity.

The physical storage could be on hard drives, solid state drives or even tape drives housed in data centers that could be separated geographically. Users access the distributed file system via a client that handles interaction with the servers, metadata management and caching for performance.

Some examples of distributed file systems include HDFS (Hadoop Distributed File System), Microsoft Azure Data Lake Storage and Amazon S3 (Simple Storage Service). These are commonly used to store big data for analytics and machine learning processing.

Benefits of Distributed File Systems

  • Massive scalability to handle zettabytes of data
  • Built-in data replication and failover
  • Support for high throughput I/O
  • Designed for big data and analytics workloads

Data Archival

For long term data retention, archival storage allows preserving data for decades at lower cost. Regulatory requirements often mandate organizations to retain data for several years. Storing all this data on primary storage systems with high redundancy would be prohibitively expensive.

Archival data storage typically uses magnetic tape media housed in physical vaults with controlled temperature and humidity. This provides air-gapped security and removes dependency on specific hardware or software formats. Linear Tape-Open (LTO) and IBM 3592 are examples of widely used tape formats.

Data is first staged from primary storage to disk-based secondary storage and then periodically migrated to tape according to policies. Sophisticated data management software automates cataloging, indexing and retrieval when required. Optical discs like Blu-Ray, along with cloud archives, are also alternatives for long-term storage.

Benefits of Archival Storage

  • Very low cost per GB compared to primary storage
  • Meets legal and regulatory retention requirements
  • Energy efficiency and physical security
  • Preservation for decades

Server and Endpoint Memory/Storage

In addition to long term data storage, data also resides temporarily in the random access memory (RAM) of servers and endpoint devices. The volatile memory maintains temporary working data being actively processed by applications along with cached copies of frequently accessed storage. This provides ultrafast access measured in nanoseconds rather than the milliseconds taken to read data from hard drives.

However, the data is lost when the system is powered down or restarted. Faster forms of non-volatile storage like solid state drives are also increasingly used in servers and endpoints for improved performance compared to hard disk drives.

Types of Endpoint Memory and Storage

  • RAM – Provides high speed temporary working memory
  • Cache – On-CPU cache and SSD caching improves access speed
  • Internal SSDs – Faster than HDDs without moving parts
  • HDDs – Older hard disk drives for cheap bulk storage

Network Data Transmission

When you access data over a network, whether from a local server or the Internet, the data has to be physically transmitted over the network infrastructure. This includes Ethernet LAN cables, WiFi, cellular data networks, satellites, routers, switches etc. that collectively form the global telecommunications infrastructure.

While data transmission is not exactly data storage, the data exists transiently on these network components while in transit before reaching its final storage destination. The total global network capacity for transient data at any instant, known as bandwidth, is over 1,000 Tbps and growing rapidly fueled by video, IoT and 5G.

Major Network Components for Data Transmission

  • Submarine cables – Undersea fiber optic cables carry ~99% of intercontinental data traffic.
  • Cellular networks, WiFi – Connect mobile devices and enable mobility.
  • Routers and switches – Process and forward data packets between networks.
  • Network edge servers – Interface between users and core networks, cache popular content.
  • Satellites – Provide connectivity to remote areas where cable/fiber infrastructure is not available.

Non-digital Data Storage

So far we have looked at storage of digital data encoded in bits and bytes. However, information can also be stored non-digitally in various physical and analog media.

Printed materials like books, documents, photographs and film contain information encoded into text, images, graphics and symbols printed on paper, microfilm or photographic media. LPs and analog magnetic tapes encode audio signals into vinyl records or plastic tape reels.

More esoteric non-digital media like punched paper cards and piano rolls have also been used historically. These need manual or electromechanical systems to access rather than computers. However, analog audio and video recordings are now commonly digitized for preservation and access using digital storage media.

Examples of Non-digital Information Storage

  • Printed materials – Books, documents, photographs, blueprints, etc.
  • Microforms – Microfilm, microfiche
  • Phonographic records – Vinyl LPs, gramophone records
  • Magnetic tapes – Reel-to-reel, audio cassettes, VHS
  • Punched cards – Hollerith/IBM cards with punched holes
  • Punched tapes – Tape reels with punched holes used in early computing

Data Stores in Digital Systems

Within the domain of digital systems, there are still several locations where data can be stored temporarily, as it is being processed:

CPU Registers

Registers are small storage units directly on the CPU to hold data like instruction operands, memory addresses, condition codes, etc. Access times are under 1 nanosecond.

CPU Cache

Cache is fast SRAM memory located on the CPU chip itself. It stores frequently used data from main memory for even faster access. Cache is managed automatically by the hardware.

RAM

System memory provides working storage for currently executing processes. It is much slower than CPU registers and cache, with access times of ~50-100 nanoseconds for DRAM memory.

Disk Buffers

Buffers are temporary staging areas used by operating systems and databases to hold data being transferred between slower disks and faster RAM.

Network Buffers

Network buffers hold data packets being transferred over a network. They help compensate for differences in transmission speed between network links.

Caching and Copies

An important goal when designing data storage systems is ensuring fast access for applications. As we go further from the CPU, storage becomes slower and higher capacity. Various caching mechanisms help bridge this gap:

  • CPU caches hold frequently accessed main memory data
  • RAM acts as cache for hard disk data
  • Local disks cache data from network storage
  • CDNs cache Internet content closer to users

Making copies of data across multiple locations also helps provide redundancy and fault tolerance if one location becomes unavailable. For archival storage, multiple copies may be stored in different geographic regions to protect against disasters.

Security Considerations

Wherever data resides, it is essential to consider potential security risks. Some key principles for keeping data secure include:

  • Encryption – Scramble data so only authorized parties can read it
  • Access controls – Restrict permissions to modify/delete data
  • Activity logging – Record actions taken on sensitive data
  • Backups – Protect against hardware failures, accidental deletion etc.
  • Physical security – Limit physical access to servers, disks, archives etc.

For highly sensitive data like medical records, financial data, government/military information, additional safeguards are required such as air-gapped systems, multi-factor authentication and more.

Conclusion

In summary, data can exist in many different states as it flows between storage, memory, network and processing units:

  • At rest on hard drives, flash, optical media, tape archives
  • Cached temporarily on server memory or local disks
  • In flight on networks and buses between devices
  • Actively processed on CPU caches and registers

Optimizing where data resides based on how it is used can result in tremendous performance and efficiency benefits. The proliferation of endpoints, networks and cloud platforms gives more options than ever before for storing data. Understanding these options is key to building robust, secure and cost-effective data storage solutions.