Migrants and the State — Scaling Archival Immigration Documents into a Public Humanities Dataset with Machine Learning

Links

“Migrants and the State: Unlocking the Potential of A-files for the Histories of U.S. Immigration” recently received a 2023 Digital Humanities Advancement Grant from the National Endowment for the Humanities (NEH).

The project will create a digital prototype that will develop open source machine learning techniques for image segmentation and classification to facilitate expanded access to large sets of heterogeneous government files. It will develop a robust metadata schema for a set of historical migrant records held by the U.S. National Archives (NARA) gathered in what are known as A-files (formerly Alien Files). A-files are in essence portfolios that are generated by the state for migrants, immigrants, or refugees who enter the U.S. and in the process come into contact with border control agencies. They contain a wide variety of documentation specific to the individual migrant’s life story and migration history. Historical A-files are in the public domain, but are currently accessible only by request from NARA, on a file-by-file basis. Searchability is limited to a few categories such as country of origin and date of birth.

M/S will use a corpus of 550 A-files gathered prior to the grant period through NARA file requests (which we estimate contain 20,000-25,000 pages of documents in total) to model new methods of digital access to large collections of government records. Access will be facilitated and expanded through the segmentation of documents, identification of document types, and the addition of detailed metadata about document types (e.g. government forms, correspondence, employment records, etc.). This level of indexing will make the diverse contents of the files discoverable by social, political, and geographical criteria. It will enable researchers to locate specific topics such as policing and detention, medical care, employment, or types and outcomes of legal proceedings. As transnational records, A-files will shed new light on immigration history in the U.S., the history of migrants’ home countries, and the often understudied ways in which the two interconnect.


This project is a collaboration with Sibylle Fischer (Associate Professor of Spanish and Portuguese), Ellen Noonan (Clinical Associate Professor and Director, Archives and Public History Program), and others at NYU. I’m currently working to process and publish the data as a IIIF collection using Aperitiiif, which in turn will support (1) the machine learning work of a data sciece student collaborator, and (2) the prototype web app I’ll develop in the spring. Stay tuned for more!

Biblio-Política