Skip to content

MIAP at Personal Digital Archiving Conference, Feb. 22-24, 2012

This past February, MIAP Director, Howard Besser, and several MIAP students and alumni attended the 2012 Personal Digital Archiving conference, hosted by the Internet Archive, in San Francisco, CA.

Below are extensive notes from the events of the conference, graciously supplied by MIAP student, Marie Lascu. 



Brewster Kahle- Genius

-          ½ million people a day peruse IA
-          “The wayback machine is a shipping container”
-          “The web is very shallow” it doesn’t have everything
-          You have to be proactive about collecting material, be smooth about it, fit into their workflow*

The Library of Congress: Personal Digital Archive Advice for the General Public.  Mike Ashenfelder, Library of Congress

-          Simplify!
-          The general public has been neglected by preservationists and archivists
-          “We want every person in America to understand digi pres issues, and be able to do something about it”
-          Simplify institutional knowledge of digi pres, and share with general pubic
-          LoC began addressing digi pres about 5 years ago

o   What do people want to save? (photos, email, digi video, docs, mp3s, websites, blogs…)

o   Questions about born digital and scanned material

o   #1 file format that raises concern – photos (particularly cell phones)

o   Do digital feng shui

-          To communicate better make terminology less complex
-          Personal Archiving blog getting much more traffic than website- both are successful
-          They’re not dumbing things down, they’re simplifying, saying it in a concise way
-          Much of the instruction assumes people will take care of their media, perhaps in 10 years computers might do this for us
-          There’s no guarantee gen pub sees this material or understands it
-          LoC started “taking it to the streets”- Personal Archiving Day at LoC held in conjunction with ALA Preservation Week
-          Smithsonian National Museum of American History does a sort of antiques road show for family heirlooms to best inform people on how they can take care of these items*
-          The National Book Festival has proved best opportunity for LoC to reach people on digi pres
-          “Personal Archiving” doesn’t resonate with gen pub- obsolete media are powerful teaching aids
-          “Questions help stretch our awareness”
-          Too much information may bury the message- get to the point in a few words
-          Outreach through local public libraries – “stimulates librarians”
-          The Digital Preservation Presentation Kit- <-- your friend

Cases and Examples:

-          How my Family Archives Affected Others.  Stan James. – – very cool project

-          Grandmother and Grandfather’s letters- burned!
-          About 25,000 scanned files at this point
-          Family like “space monkies” a little bit ahead of the curve in terms of personal archiving
-          4000 pages of family archives transcribed for about $600 on Mechanical Turk

What I’ve learned from gardening my Brain.  Jerry Michalski, The REXpedition. (schedule change)
-          TheBrain 7 Beta
-          “When I put things in my brain…” ~jerry
- ,
-          A sort of storage system to keep track of your thought patterns…
-          Howard: Where does the data live? Jerry: On my computer server, outlook is the hardest thing to migrate!

Unstable Archives: Performing the Franko B Archive.  Jo An Morfin-Guerrero, University of Bristol.
-          Franko B, Milan, Italy
-          He simply put material in several bins, and over the years this became his “archive”
-          As soon as you document a performance, you are in effect destroying it as its function needs to be ephemeral
-          The performance as the exhibit, and the object
-          The archive is moving, it does not give answers it creates questions.
-          The “performativity” of the archive
-          Howard: standard archival practice IS to totally reflect person’s way of using the material, not user reflected
-          Howard: Library world is now starting to implement standard that allows one to deal with different levels of something- idea is you have Shakespeare’s Romeo + Juliet, and then under that you can gather all the different types of derivatives.

Processing and Delivering Email Archives in Special Collections using MUSE.  Peter Chan,  Stanford University Libraries.
-          Preserving Email- Christopher J. Prom (report)
-          Wendy Cope archive sold to British Library- had 40,000 emails
-          Harvard U started project in 2007 to shape archiving of emails
-          Challenges: copyright/privacy, sensitive material, description (# of emails, recipients, folders, contents?), delivering (same as original? Html file?)
-          Stanford owns Muse software ??

parallel-flickr.  Aaron Straup Cope.
-          Presentation slides:
-          Imagining the worst case scenario- what happens if flickr went away tomorrow?
-          Insane level of trust- majority of people do NOT back up photos

Remember the Web? Practical challenges of Bookmarking for Keeps.  Maciej Ceglowski, Pinboard
Social Network Data:

Arc-chiving: saving social links for study.  Marc A. Smith, Social Media Research Foundation.
-          NodeXL - free, open-source template for Microsoft® Excel® 2007 and 2010 that makes it easy to explore network graphs.
-          How does social media interact with days and events of our lives…
-          Social networks don’t want their graphs to get out because that’s where all the money is- we should break our graphs out!

Personal Interaction Archiving: Saving our Attitudes, Beliefs, and Interests.  Megan Alicia Winget, School of Information, University of Texas at Austin.
-          Tend to focus on the act of collecting “things”
-          “Thing” based behavior
-          Can now keep track of interactions to “things” (in addition to, instead of the Things)
-          Commonplacing- practice of entering literary excerpts in personal journals/writings, “Commonplace books (or commonplaces) were a way to compile knowledge, usually by writing information into books.”
-          commonplace books as precedent for social media. "A forest of things."
-          What are people saving when they upload annotations? Is it a thing that they own, a thing or an idea?
-          People do not own the digital “things” they are interacting with
-          What does it mean to save a personal interaction? What’s the primary artifact?
-          What’s the relationship between ppl, the primary artifact and the interaction? How can we model that relationship? (Is it the e-book the artifact? The phrase being highlighted? The network?)
-          Challenged mentioned are not limited to social network artifacts, nature of the materials is fundamentally different from discreet objectsà Ownership becomes Licensed Access
-          New project mentioned by audience member:


Putting Personal Archives to Work: Reminiscence, Search and Browsing.  Sudheendra Hangal, Stanford University.
-          Why email archives?
-          Developed MUSE - to make browsing your own long-term email archive convenient and fun
-          People will use MUSE to:

o   1. Reminisce about the past

o   2. add “color to flashbulb memories”

o   3. summarize work in progress

o   4. identify personal email when leaving a job

o   5. retrieve all attachments

o   6.  feel a “renewed sense of confidence”

-          The Experience-Infused Browser
-          How to make the archive more complete- e.g. browsing history, social feeds…and also course materials, papers, books read, archives of interest lists, latent social connections
-          “Total Recall”

Data Triage and Data Analytics for Personal Digital Collections.  Kam Woods, University of North Carolina at Chapel Hill.
-          Issues around the acquisition of personal digital collections in collecting institutions

o   Locating & protecting sensitive info on fixed and removable digital media

o   Providing archivists and other staff w/ tools required to quickly and accurately assess raw data from donor media

o   Generating metadata that facilitates interoperability btw tools and can readily be cross-walked to current collections and preservation standards

o   Or simply put:





Cowbird : A public library of human experience Jonathan Harris, Cowbird.
-          An incredibly frightening personal experience turned into cutting edge personal digital archiving
-          Great way of storytelling, debatable if you don’t want to over-narrate documents
-          Check it out:
-          “young mostly male engineers affect the behavior of millions by designing software.”
-          Need for set of awareness and ethics around designing software
-          Easy to design software that appeals to the base elements of people
-          Rise of curation replacing self-expression
-          Howard: consulted with MoMA on how to preserve “I want you to want me”
-          That project was intended as a portrait of the world at a certain moment as experienced through the Internet- so it is ok to freeze it in time
-          Greater responsibility with Cowbird to preserve things as people are entrusting them with these memories
-          We can have cyber monuments or have them “cremated” when we die
-          It’s a little sociopathic to preserve every element of everything all the time forever- it’s ok to let things die
-          Curation is based on reaction not action, Creation is based on action and then people react to that action
-          Utopias fail miserably when cult leaders get into illicit sexual relationships with their followers
Lightning Talks:

miLifeMap- personal content management system, Denim Smith
-          Ways to rediscover content: calendar select, search by tag or description, flashback option
-          Can create subaccount in your account for deceased family members- e-beneficiary

iKive: Towards a Trusted Personal Archives Service.  Christopher Prom, UIUC.
-          What can I do to help people ensure that their records last long enough where some day they can end up in an archive and be accessible and aid in making history?
-          Otherwise it’s just “sweeping up crumbs”
-          Evidence Making Technologies: many out there, but they all take control away from the users
-          More archivist approved version: It would vacuum up metadata from all of these sources using all IP that is accessible, whatever open standards available, and would give the user control over it, and be stored by open archive (like IA)
-          The most effective tools of the anthropologist are sympathy and compassion for the people he studies (apply to archivists)- From The Children of Sanchez

Information Packaging for Personal Archiving.  Henry M. Gladney, HMG Services.
-          You cannot trust documents to archivists…
-          Cryptographic blocks and protective layers in digital objects…I think…

Personal Data Ecosystems. Kaliya Hamlin
-          Co-founder, Internet Identity Workshop- Identity Woman
-          The individual should be at the center of their own data lives
-          “We’re gonna turn off all business on the internet if we turn off data flow!”
-          Personal Data Ecosystem- Data Aggregation for fun, profit and insight

Digital Curation for Excel (DCXL), Carly Strasser, California Digital Library.
-          Archiving Small Science Data Sets
-          Many scientists are “data hoarders”, not taught to think in terms of archiving and access
-          Facilitate data archiving, sharing and publishing for scientists
-          DCXL- goal was to develop open source and free excel add-in
-          Why promote excel? Not archiving friendly- consumer driven, all scientists use it (so do archivists!!)

A Data Archiving Service.  Brewster Kahle, Internet Archive.
-          How do you deal with physical objects? (e.g. box of press clippings form Timothy Leary estate)
-          Roughly $0.25 an image- Leary box about 1900 images
-          It really just comes down to labor
-          Rick Prelinger home movie scanning project:  roughly $20.00 per hour
-          The Müller HM data frame scanner
-          How much does it cost to “endow a terabyte?”
o   S3-interface to the IA takes in 10TB/day, up to 10 files per sec
o   Permission system controls who can write that
o   We (IA) preserve 1TB for~$2000/TB (~40x street price)

Active Personal Archiving- a research area at the Internet Archive, Aaron Ximm
-          Collections at IA are becoming increasingly active (dynamic and evolving)
-          Active archiving: Autonomous collection  by an archive of its own content
-          Agent that collects material on its creator’s behalf- “active archives": a collection that knows how to increase itself…

-          “The iPhonification of the Mac” ~Brewster


Dispatch from the Internet Archive-hosted Personal Digital Archiving Conference 2012: Not official, hopefully useful.

NOTE: Once again, highly recommend checking out links, contacting speakers where contact info is applicable. Some people have begun to load their talks to the internets. Internet Archive will be providing the whole conference in the next couple weeks or so.

Ownership, aggregation and re-use of Personal Data.  Cathy Marshall, Microsoft Corp.
-          The world ended twice last year, none of us were raptured L
-          Used photographs in presentation, photographer saw the presentation on the web, requested that he be emailed before future use
-          Example: giant catfish photo – dozens of people claiming ownership of same photo, different versions, different watermarks
-          Emerging social norms on reuse…
-          Many people believe that everything on the web is public domain, if you don’t want something reused, don’t post it.
-          People misunderstand the term “creative commons” i.e. “well it’s in the creative commons, so I can use it!”
-          The case for institutional archiving:

o   People have shown considerable interest in outreach efforts, classes and products

o   Disaggregation of skills- family archivist is rarely family IT person, and skills involved in each job is different

o   Trends in personal data mgmt- btw social media and cloud stores like Dropbox, more of our stuff is in services…

-          What happens when institutions just snap up a feed?
-          It’s not storing social media that gets people worked up, it’s access and reuse
-          LoC collecting Twitter dating back to origin, eventually will be accessible, currently just open to researchers
-          People don't want things that could be deemed offensive to be collected (a very dangerous mindset to have)
-          Precautions themselves must be approached with caution

User Studies:

***What is your plan for your personal digital archives after your lifetime?  Learning from individuals.  Sarah Kim, University of Texas at Austin.
-          Used case studies of 20 individuals with various backgrounds
-          Interested in long-term preservation of digi docs produced, collected and retained by ordinary individuals in their lives
-          “I’m not planning on it (dying). [laugh]”
-          Paradigm: The Personal Archives Accessible in Digital Media  project


-          Continuing preservation- reasons/motivations for leaving “something” behind

o   Consideration of potential usefulness of personal docs for other people

o   Interests in family history and genealogy

o   Wish to be remembered by friends and family

o   Sentimental attachment

-          Private and public nature of personal digital archiving and digital archives
-          During owner’s lifetime, they are the main or only beneficiary of the personal digi archive
-          After owner’s lifetime, digi archive left behind “like a fossil” and becomes part of other people’s lives
-          Personal digital preservation helps those left behind with bereavement. First people must admit they won't live forever.

Personal Archiving in Not Personal Spaces.  Debbie Weissmann.
-          Blending of work and not-at-work boundaries, questions about representation
-          EX: Waiter fired for tweet about actress who didn’t pay bill
-          First Amendment right to free speech does not apply across the board- particularly when you’re dealing with work, someone’s brand etc.
-          National Labor Relations Board stepped into FB convo that resulted in 5 employees being fired.
-          Basically, great examples of how social networking can ruin your life…
-          Many businesses do not have “media policies” or they are over-broad.
-          Microsoft employee posted photo of Apple computers being delivered to MS headquarters- trade secret!

Use of Personal Archives: Family History Works.  Lori Kendall, UIUC.
-          Family History and Genealogical Discourse
-          Exploration of different sites has led to examination of discourse surrounding genealogical sites
-          Nation: genealogy draws on this theme
- Covers her Czech heritage
-          Howard: Observations about family trees in the sense of biological vs. non-bio parenthood and how that might relate to DNA based research+
-          You can find people doing research that is not solely on biological family, but it is more difficult to do with current templates and social expectations- tree metaphor is problematic in this sense


I, Digital: Personal collections as an archival endeavor.  Christopher (Cal) Lee, University of North Carolina.
-          5 trends have impacted status of personal collections:

1.       Work within collecting institutions has become increasingly professionalized

2.       Individuals have gained more ability to create and store materials they find meaningful

3.       Personal collections are distributed across a diversity of systems, environment and platforms

4.       Researchers have placed more emphasis on personal stories and perspectives- more recognition to importance of everyday life to scholarship!

5.       Previously distinct communities have come to recognize that they share challenges

-          FYI: I, Digital: Personal Collections in the Digital Era [Paperback]Christopher A. Lee (in BOBST!)
-          Lots of different, but related, fields around personal digital collections. How can we handle the convergence?
-          Huge potential for further collaborations “Gear up and keep the good bits!”

Laura Gurak, University of Minnesota.
-          qualitative researcher studying email and internet as social space
-          Grad training in qualitative research methods emphasize careful data collection
-          Personal digital research, less motivation to be organized, personal habits morph into sloppy digital practices, use multiple devices- too much stuff, to many choices!
-          Click-on-Knowledge Conference
-          Even the most organized research is overwhelmed by choices, learning curves, fads, speed, and associative searching
-          For personal research, issues are even greater

What’s being Lost, What’s being Saved: Practices in digital scholarship and personal archiving.  Smiljana Antonijevic, Royal Netherlands Academy of Arts and Sciences
-          Voices from the Field
-          Spoke with about 95 scholars about their research practices from around the world
-          Challenges

o   Broad spectrum of technology uses and awareness

o   some scholars start with digital materials found online, print it out, hand write comments, research and disseminate through print

o   others start digital, develop databases, use visualization tools, disseminate online with blogs e-journals etc.

o   these approaches use archives in similar ways but we need to keep in mind that people are using both at the same time

o   Preservation issues stand out:

§  Available digital materials in academia increasing faster than underlying services.

§  Preserving files but not the interface is fundamental misunderstanding of intellectual content

§  "we are in the digital dark ages". Future generations will wonder why we didn't preserve better, this is especially true for science data.

o   Preservation +

§  Value-added preservation that would allow researchers to reuse their preserved data in new ways.

§  Humanities needs to have a common framework of language to communicate with IT.

o   Tools


§  product of Roy Rosenzweig Center for History and New Media

-          Fascinating Side Note: Digital Image Archive of Medieval Music

John Butler, University of Minnesota
-          Discover, Gather, Create, Share
-          Detailed activities of “Primitives” listed above (the visual is better)
-          Survey showed a high percentage needing assistance in organizing/storing materials
-          Key findings/affirmations:

o   Diversity of resources/media used

o   Methods learning in “traditional” contexts are not easily transferred to digital context

o   Researchers have unique collections to be shared, usually under personally-specified conditions

-          “Digital Humanities recognizes curation as central feature of the future of Humanities disciplines…” “…the scholar as curator and the curator as scholar…”
-          concept challenging because each has own practice
-          Data Curation- managing data to ensure they are fit for contemporary use and available for reuse
-          Lifecycle Thinking- critical for digital collections, as is active management
-          Best Practices

o   Enduring Access: Digi Pres standards and BPs

o   Discovery and re-use: metadata and leveraging

-          Incentives to Support adoption of best practices:

o   Data mgmt plan mandates

o   Low barrier tools that: abide, expedient, supportive of personal workflow

-          PIM- Personal Information Management for researchers
-          Data management consultation is an important service for academic libraries to offer

Faculty Member as Micro-Librarian: Critical literacies for personal scholarly archiving.  Ellysa Stern Cahoy, Penn State University Libraries.
-          The library of today resides on scholars’ desktops
-          We can imbed principals of digital personal archiving without just using disconnected tools
-          If you’re going to make change, you have to go back to the source and change (or create) standards
-          Information literacy has changed dramatically
-          Information literacy in 2000 was Find , Use, Create, but where was Curation, Archiving? Should also include Accumulation, Distribution, Curation, Long-Term Access
- more recent model
-          Ellysa Cahoy’s own model, hopefully you can view the link:
Archive  Team and the Case of the Widespread Recognition, Jason Scott
-          Archive Team: “We are going to rescue your shit”
-          Adjunct archivist for Internet Archive, literally life-saving position
-          Goes out for the things the archive doesn’t have time to pursue
-          40x8x8 cube “full of computer history”
-          Has brought in 37TB on one year
-          Dealing with the actual instead of theoretical (!)
-          “scanning a Braille playboy” Useful and useless.
-          Internet Archive- supreme space solution:
- Open source, community of free, legal and unlimited music published under Creative Commons licenses.
- DNA Lounge Live Recordings
- non-profit focused on improving access and exposure to music by creating free resources and educational materials.
-          It's official, Great Virtues of an Activist Archivist: Paranoia, Rage, Kleptomania.
-          Google is a library or an archive like a supermarket is a food museum
- command line program for grabbing content off web page
-  WARC, a preservation format, now built into WGET
- JSMESS- turn computer history into an embedded window
-          archive@home will be the archive version of fold@home

Commercial Services:

The Business of Web Archiving.  Maciej Ceglowski, Pinboard.
-          Model #1 -  accept money in exchange for good and services
-          Model #2 – Find someone to burn a mountain of money on your back (as a user, you wonder what the rational is for keeping these services alive)
-          Model #3 – Give everything for free until your business fails
-          PEBKAC (problem exists between keyboard and computer)
-          points out that the term "the cloud" tries to abstract away the inherent physicality of digital storage.
-          Storing 4TB per year costs $2400 if you host it yourself vs. enterprise cloud $5600

Digital Archive for the Elderly: Facilitating Old-Fashioned Storytelling.  Jed Lau, Memoir Tree.
-          Software engineer
-          You have 100% data loss when a person passes away
-          You have a lot of time to scan photos, but not a lot of time to talk to people
-          Digital archiving is meant to be collaborative, it is a storytelling process meant to pull family members into the process
-          Don't let technology get in the way of the story
-          Protect privacy (naturally), need the ability to packetize audio files and control permissions on sections- granual permissions in an audio file, breaking it into segments with varying privacy settings.
-          Audio becomes metadata, or the photo?
-          This is more about storytelling then archiving
-          Storytelling platform is not exclusive to the elderly
-          Video screen takes away from the interview experience - not implementing video upload because it gets in the way of conversation.

Every House has a History.  Stacy Colleen Kozakavitch. Digital archaeologist
-          1910 home reveals small treasures each time there is a small earthquake
-          Last quake found newspaper clipping from 1947
-          Leads into historical building research; Archaeologist and Library collaborator; researches, writes, designs, publishes customized histories of houses.
-          Wants to start business at which she researches people’s homes
-          Look at special context of where people are living and how they live there
-          Uses many easily available tools to gather extremely detailed historical info
-          Presentation shows that even if the data is open, it still takes a person to make the story and put it together.

Modeling the economics of long-term storage.  David S. H. Rosenthal, Stanford University.
-          LOCKSS Program- Lots of Copies Keep Stuff Safe
-          an international community initiative that provides libraries with digital preservation tools and support so that they can easily and inexpensively collect and preserve their own copies of authorized e-content.
-          Business Models:

o   Rent

o   Monetize the content

o   Endow the data

-          Discounted Cash Flow

o   Costs now vs. future costs?

o   What interest rate to use?

o   This is standard technique investors use, what could go wrong?

-          Statistically sig evidence of short-termism in pricing of companies’ equities
-          This is true across all industrial sectors
-          DCF doesn’t work in theory OR in practice!
-          Fact that present value of actions that affect the far future can shift from a few percent to infinity when we move from a constant interest rate to a geometric random walk calls seriously into question many well regarded analyses of economic consequences of global warming…
-          If Kryder's law keeps going no one will keep drives longer than 5 years anyway.
-          Considering everyone needs storage, why aren’t more people addressing the analysis problem?
-          Tapes are still cheaper in terms of raw media costs, and will continue to be for about 5 more years- operational cost of tape significantly higher than operational cost for disks
-          à I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation.

-          Mark H. Kryder and Chang Soo Kim. “After Hard Drives - What Comes Next?” IEEE Transactions on Magnetics, Vol 45 # 10, Oct 2009.


-          Brewster- we try to endow things by owning them
Lightning Talks:

Anarchive: A Performative archive within the SummerLAB’11.  Jo Ana Morfin-Guerrero, University of Bristol.
-          SummerLAB a meeting place for creators, hackers and artists from all over Spain who principally work in free libre software (FLOSS) contexts.

Singly: An Open Source personal data platform.  Matt Zimmerman, Singly.
-  CONNECT all of your data into a single structured place - personal photos, places, links, contacts and more. CREATE new experiences against one API without writing any server-side code, SHARE your apps. Easy to publish. Easy to distribute.

Personal Digital Photography and the Implications of Selective Positive Representation.  Eric C. Cook, University of Michigan.
-          Need for awareness of patterns of representation in the design and analysis of personal photo archives
-          Importance of individual context and interpretation points to limits for automatic classification and quality assessment
-          Selective positive representation provides counterpoint to lifelogging perspectives

Deep Personal Significance: Computer Gaming & the Notion of Significant Properties, Jerome McDonough, UIUC. (will try to get contact info)
-          Focus on digital preservation of complex media objects
-          Preserving Virtual Worlds 2
-          Significant Properties Defined- “Significant properties are those properties of digital objects that affect their quality, usability, rendering, and behaviour.” (Margaret Hedstrom & Cal Lee, 2002)
-          Digi pres community is operating under assumption that our job is to maintain significant properties of data in our care
-          Hidden assumptions are driving decisions on what significant properties are.