Safeguarding Personally Identifiable Information on Microfilm Legislative bodies have different definitions of personally identifiable information (PII), but most definitions classify PII as any information about an individual maintained by an organization that can be used either alone or in combination with other information to distinguish an individual's identity. Examples of PII can include an individual's name, date of birth, Social Security number (SSN) and driver's license number. Highly regulated industries like the nuclear industry continue to implement new technologies and procedures to monitor and regularly remove PII from within their organizations, regardless of the format that it exists. IT infrastructure solutions that include hardware and software commonly address issues like Data Loss Prevention and Data Leakage. While these solutions focus on existing digital assets, PII often exists in formats other than digital, including paper and microfilm. For example, PII intermingled with other records that researchers access on microfilm can create a risky scenario in which PII accidentally or intentionally gets into the wrong hands. Other industries facing similar compliance requirements provide best practices to safeguard PII on microfilm. County Recorders, for example, are under pressure from state laws that prevent SSNs from public view. Many County Recorders have selected to digitally convert their archive and redact the information during the conversion process. San Luis Obispo County in California, for example, faced a legislative mandate that prevents the public display and printing of information that contains individual SSNs. The County had 2,600 microfilm rolls representing approximately 3,000,000 official record images. As a result of this legislation, San Luis Obispo County sought a microfilm conversion solution that could redact SSNs from their public-facing official record digital archive yet create a County staff version that left the SSN in view for the staff. In that way, the staff could hold the original copy to conclusively determine if the redacted text was supposed to be redacted. San Luis Obispo County selected a solution that identified SSNs during the microfilm-to-digital scanning production process. Each digitally converted microfilm roll was processed through an optical character recognition (OCR) engine that was able to identify SSNs by detecting the actual SSN character strings and by using the context that often surrounds SSNs (e.g. Taxpayer Identification Number, Social Security #). Each SSN detected was then manually adjudicated to identify false positives (text falsely identified as a SSN even though it was not). Additional Microfilm Conversion Considerations: Accuracy and Image Quality Conversion accuracy is a key consideration when undertaking any microfilm conversion process. Even services that offer 99% conversion accuracy can still result in tens of thousands of missed images during the actual conversion process. Images are lost for a variety of reasons, including poor microfilm quality and QA processes that depend on human intervention. For example, many services dissociate images from the original microfilm roll. When this occurs, one mistake during the indexing process can result in a converted document that will not be found during document searches. |