Some people eat, sleep and chew gum, I do genealogy and write...

Tuesday, December 12, 2017

The Ultimate Challenges of Genealogical Access to Digitized Records


Online genealogically important historical records are rapidly transforming the way genealogists find their ancestors and extended ancestral families. Billions of new records are being added every year by the large online genealogy companies. It would seem that this flood of new records could go on indefinitely. But there are strong indications that the flood may soon diminish to a trickle unless the genealogical community can overcome some looming obstacles.

These obstacles to the continued increase in the number of online genealogical records fall into a number of categories that include the following:
  • Political restrictions on the access to records
  • The monetization of records by governments and other organizations
  • The reverse side of the principle of economies of scale, i.e. the cost of digitizing smaller collections of records
  • Unrealistically restrictive copyright and other similar restrictions on historical records
  • The unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections thereby increasing inability of the larger collections to ingest smaller collections of records
  • The costs of maintaining ever larger databases including the costs associated with migrating file formats over time
  • The lack of community standards for record formats and the inability of users to move records from one online family tree program to another
  • Ignorance of the members of the genealogical community as to the identity and availability of online digital record collections
Here is my viewpoint on each of these obstacles:

Political restrictions on the access to records

The most difficult and pervasive obstacles to continued digitization are the politically imposed restrictions on record access around the world. In some areas, record access, much less digitization of those records, is virtually impossible. It is clear that the ability of individuals to access records is a major threat to oligarchies and repressive governments no matter what their origin or motivation. This is not an issue that is limited to national governments but can operate on a local level when politicians believe their control and power are threatened by access. In the United States, for example, we would not have national and local freedom of information statutes were politicians and bureaucrats cooperative in providing access to "public" records. In addition, the ongoing destruction of genealogically important records and the attacks on state archives and libraries continues to threaten the availability of records around the country. Absent major changes in some countries of the world and even in parts of less repressive countries, many records will remain unavailable. Ultimately, the reasonably accessible records around the world will all be "cherry picked" leaving huge numbers of records locked up by repressive governments. 

The monetization of records by governments and other organizations

It is a fact of life for genealogists that access to more and more records around the world are being used by those who maintain or archive those records as local revenue streams. This occurs wholesale, even in the United States, for many types of records. For example, in almost every state of the United States of America, if you are born, get married or die and you or your family want a copy of an official government certificate of any of those events, you will have to pay a fee to obtain a copy. In England, it a common practice for local ecclesiastical parishes to charge a fee for access to historical parish registers. I am not of the opinion that all records must be free, but the monetization of the records makes their acquisition by free websites such as FamilySearch.org very unlikely. It also makes the overall cost of digitizing and making the records available much more expensive.

The reverse side of the principle of economies of scale, i.e. the cost of digitizing smaller collections of records

Record acquisition and digitization are labor intensive and the equipment needed for high-quality images is still quite expensive. For these reasons, extensive record digitization efforts can achieve economies of scale. On the other hand, smaller projects with fewer records require that those same assets but must be used with far fewer records so the cost per record becomes a major concern. In other words, smaller collections have some of the same overhead considerations as larger collections making the cost per record much higher. Also, the logistics of obtaining smaller records are usually about the same as larger collections. The results are that there are distinct disincentives to acquiring smaller collections of valuable records.

Unrealistically restrictive copyright and other similar restrictions on historical records

Unfortunately, US Copyright law is vague and overly restrictive. Current copyright claims will likely be in effect longer and any person now living. Even old copyright claims dating back to the 1920s and 30s will likely be arguably enforceable longer than anyone now living. This could be called the "Mickey Mouse" effect. In both 1976 and 1998, the existing copyright interests were extended for up to 120 years from the year of creation. See the ArtRepreneur.com post, "How Mickey Mount Keeps Changing Copyright Law." Because the provisions of these laws are vague, all sorts of claims to copyright now cloud the ability of genealogists to access records online.

In other cases, record repositories claim a "contractual" ownership right to documents that are clearly in the public domain. These claims prevent the free use of all sorts of records, photographs, and other documents. Until there is a realistic overhaul of the copyright laws and a clarification of the unfounded claims by repositories, many valuable records will be subject to restricted access.

The unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections thereby increasing inability of the larger collections to ingest smaller collections of records

This particular issue is less obvious than any of the other challenges facing genealogical access to digitized records. Essentially, those who are charged with developing the standards for online digital preservation impose unrealistic restrictions on the process of digitization. For example, we have long known that the highest resolution is approximately the equivalent of 170 dpi or PPI (pixels per inch) when viewed at 20 inches. In contrast, the average laser printer can print at 300 dpi or roughly double the eye's resolution. See "What is the highest resolution humans can distinguish." Presently, some of the digitization efforts going on around the world are using cameras that have up to 50 Megapixel sensors. Most of the documents being digitized could be adequately preserved with a camera of about 12 Megapixels the resolution of a present smartphone. The U.S. Library of Congress has established a publication called "Guidelines: Technical Guidelines for Digitizing Cultural Heritage Materials." Quoting from that publication concerning documents:
Image capture resolutions above 400 ppi may be appropriate for some materials, but imaging at higher resolutions is not required to achieve 4* compliance.
The practical effect of an artificially imposed higher standard is that many smaller collections are going to be lost because the large online genealogy companies refuse to ingest even images at the Library of Congress standard or make the process of obtaining images so complicated as to make smaller collections unfeasible.

The costs of maintaining ever larger databases including the costs of migrating the file formats over time

Even with the dramatic decreases in the cost of memory storage, huge online genealogical collections, especially those with photos, videos and audio files, can eat up huge amounts of memory into the hundreds of Terabytes. Adding in the cost of acquisition and maintenance makes this an extraordinary effort. Adding new records can have an incrementally higher cost. It is only a matter of time until these huge collections run into an economic and practical limit. However, there is a long way to go before this will happen. Right now, there is a major concern with the need to migrate existing collections as new file formats and operating systems evolve. Apple recently introduced a new file format for its smartphones, HEIC, and this will eventually affect the large online genealogy companies.

The lack of community standards for record formats and the inability of users to move records from one online family tree program to another

This is a major issue and I have written about this recently. Without community standards, each of the large online database companies is essentially an island of their own file formats. Without a standard way to exchange data, if one or more of these companies fail, much of their data could be lost.

Ignorance of the members of the genealogical community as to the identity and availability of online digital record collections

Let's face it. There is a constant loss of genealogical data due to genealogists who ignorantly or even intentionally fail to share their data and adequately prepare for its preservation upon their deaths. This attrition of records will always be a drag on preservation efforts.

There is always hope in the future and it is always possible that some or all of these issues will be resolved, but right now they stand as genealogy's greatest challenges. 

1 comment:

  1. I have always been a believer that preservation should be performed at the highest possible resolution. As time has passed, as you mention, this could be 50 Megapixels today, and who know how much tomorrow? But the biggest advantage of 50 vs 12 Megapixels is the ability to zoom in and examine details closely. I have found this very helpful with things like scans of old vital records where correct interpretation of handwriting, for example, requires great magnification. It is useless if zooming in only results in a highly pixelated image. This applies likewise to photographs where the only image of GG Grandpa is a tiny section of a larger image. If I want to recognize his features clearly, I am grateful for a 50 Meg scan. Obviously, as you mention, file size (storage capacity) is an issue, but less so as time passes. Therefore, I support the ". . . unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections . . .". Tomorrow's researchers will thank us for adhering to those high standards.

    ReplyDelete