Home » Articole » EN » Education » Electronic publishing » Book digitization

Book digitization

Sustainability: a critical challenge of digitizing

This issue raises many concerns for media degaussing. Poor reliability of storage and the rapid evolution of formats (proprietary formats appearance) present real physical files conservation issues. The latter includes two logic: the back-up and archiving.

  • The backup is the data duplication.
  • Archiving refers to the conservation and duplication in the long term.

Few examples:

  • OAI (Open Archival Information System) it is a digital document management system used only by professionals.
  • LOCKSS: used for smaller structures like libraries. They can put their data within a network and copy the file as many times as there are users.
  • CLOCKSS: looks like LOCKSS but allows more control over the network of actors.

The first digital libraries

After the development of the Project Gutenberg, and soon followed by other projects of the same type of copying texts by hand, more ambitious projects, where are made available on the Internet or intranet works reproduced by scanner image form, have emerged.

The risk of a monopoly by Google scanning books

Google books

Google Books work at the University of Michigan (Google Books work at the University of Michigan, https://commons.wikimedia.org/wiki/File:Google_Book_Search_-_notice_board_at_michigan_university_library.jpg?uselang=fr)

The various book duplication projects conducted until then were upset by Google books, a project for scanning in image mode and OCR the heritage of humanity. Seven million books have now been scanned in different libraries.

To understand the strategy of Google, it is useful to distinguish three legal time for a book: the time during which there is no more copyright; a copyrighted period (by author or assignee); and in between is a gray area, period covered by copyright, but for which a blur persists about the ability of the claimant to assert his rights. Many works remain in effect orphans, and that gray area is of ​​great interest. Google has decided to revive these untapped works.

Publishers have thus attacked Google: in the US, an agreement was reached in the keeping of a register of rights management of orphan works, managed and funded by Google (which has compensated many publishers). This agreement is still not operational.

Google attempts to take the monopoly of digitization highlights the problem of intellectual property and copyright on the Web. When a contract is formed between an author and a publisher, it is not possible to mention “all media”: each of the supports must be mentioned (paper, CD-Rom, Internet, etc.). Not mentioned media is excluded from the contract. There is an update process of the contracts for the Internet rights on published works. The authors do not adopt a clear position, so the opportunity to negotiate contracts, particularly as regards the duration. The license regime appeared with the Internet because of the revolution caused by the access network. Although there are exceptions of copyright for the disabled, and for teaching and research, without these licenses, the exchanges between authors and people who want to use their works would explode, because a work without authorization from its author is forbidden by law.

In Google books, there are different social features to interact with other users or with the creators of the site:

  • It is possible to report any anomalies, enabling Google to improve its data quality (initially strongly perfectible).
  • It is possible to leave your opinion to the readership.
  • It is possible to share the link of the document by messaging or on the website (code generation)
  • It is possible to select some text and share it in the same way.

Some examples of offensive against the Google book monopoly:

The Open Content Alliance

This not commercial project scanned a million books with permission of the rights holders.

For now, Google has won the battle of speed and the number of online publications, including leveraging the computing capacity of data centers, thousands of server farms data, spread all over the world, allowing among other translation operations, data mining, etc.


Gallica seems to consider a partnership with Google.


An European virtual library is being built, and announces the digitization of three million works.


Universal Library: stopped since 2002, hundreds of books available.


55,715 copyright free texts.

Leave a Reply

Your email address will not be published.