As the previous
feature on open content noted, the need for an appropriate license was felt
from the earliest days. Strangely, it was not Richard Stallman who filled this
gap: even though the GNU General Public License dates back to 1984, it was only
in 2000 that the corresponding
GNU Free Documentation License was created. As a result, the honor for the
creation of the first formal non-software open license goes to David Wiley.
In the summer of 1998, Wiley had joined the graduate program in Instructional
Psychology and Technology at Brigham Young University, where he began doctoral
work on “learning objects” - small-scale, reusable computer-based educational
materials designed to be used in a variety of settings. This was just a couple
of months after the term “open source” had been devised at the Freeware Summit,
and Wiley realized that what was needed was a kind of open source for instructional
content.
He contacted people like Richard Stallman and Eric Raymond to ask their advice,
and drew up his first license in July 1998. Wiley decided to call his approach
“open content” - a term which he seems to have been the first to use consistently.
For Stallman, the idea of “open” as opposed to “free” is anathema, and he also
refuses to refer to works as “content”, so ultimately he wanted nothing to do
with this new “OpenContent
License”, even though he and Wiley had previously worked together in an
attempt to tweak the GNU GPL for content. Raymond, by contrast, was an important
influence on the fledgling open content idea, as the following
passage from the newly-created Opencontent.org site indicates:
OpenContent advocates adoption of the principles Eric S. Raymond outlines
in his essay “The Cathedral and the Bazaar” for use in the development of
Content. ... The Bazaar model for Content development will bring these same
benefits to online instructional content; namely the creativity, expertise,
and problem-solving power of a potentially infinite team of instructional
designers and subject matter experts. A development effort of this kind
will fill the Internet with high quality, well-maintained, frequently updated
Content.
More input was provided by Tim O'Reilly and Andy Oram, making the license
more palatable to publishers so that online versions of printed books and journals
could be distributed for free. The result was the
Open Publication License (OPL), released in June 1999. Appropriately enough,
Raymond's “Cathedral and the Bazaar” was released under the OPL (as was his
“Brief History of Hackerdom”). A number of other books, mostly in the field
of computing, adopted the license, including
GTK+/Gnome Application Development by Havoc Pennington, and
Grokking the GIMP, by Carey Bunks. It was also adopted for Bruce Perens'
Open Source Series, published by Prentice Hall.
Although the OPL led to a modest increase in open content being made available,
the license still had some problems. One was that it came in four versions –
OPL, OPL-A, OPL-B and OPL-AB - according to which, if any, of two optional clauses
were included. These dealt with the thorny issues of “substantively modified
works” and whether the work or derivatives of it could be published in book
form for commercial purposes. The combinations obviously made it harder to be
sure what exactly an OPL license permitted, and meant that users were forced
to refer to the license to find out what their rights were. What was needed
was some legal input to produce a series of open content licenses that clearly
delineated what could and could not be done with them.
Fortunately, in the second half of the 1990s, a group of lawyers were becoming
increasingly interested in the interrelated issues of copyright, intellectual
property, digital content and the public domain. Pioneers here include Pamela
Samuelson, James Boyle and Yochai Benkler. But the person who has become most
closely associated with this whole area is undoubtedly
Larry Lessig.
He rose to prominence with his book “Code
and other laws of Cyberspace”, which asserted that the Net's software codes
necessarily implied legal codes. From this early interest in architectures and
their growing power to affect everyday life, Lessig's focus gradually shifted
back to the legal domain, where he sought to counter the threats posed by the
music and film industries to the new creative possibilities opened up by the
Net.
His first attempt at a solution was the creation of
Copyright's Commons in 1999, “a coalition devoted to promoting the public
availability of literature, art, music, and film.” Its principal instrument
was the use of what it called “counter-copyright”,
which “strips away the exclusivity that a copyright provides and allows others
to use your work as a source or a foundation for their own creative ideas. The
counter-copyright initiative is analogous to the idea of open source in the
software context.”
When Copyright's Commons became involved in the
Eldred vs. Ashcroft lawsuit – which tried to block the extension
of US copyright by 20 years - it also pioneered what it called “openlaw”, where
legal arguments were posted online for open discussion.
It was Lessig who argued the Eldred vs. Ashcroft case in court – and
lost, much to his
chagrin. A more positive outcome from this work was the creation of a second,
more ambitious, organization called
Creative Commons, and the drawing up of a series of formal open content
licenses. Like Wiley's Open Publication license, these
Creative Commons licenses allow several options. While this lends them great
flexibility, it also means that there is now a confusing array of Creative Commons
licenses. Indeed, Richard Stallman no longer supports the Creative Commons project
because not all of these licenses meet his requirements for freedom.
Despite Stallman's concerns, there is no doubt that the Creative Commons
licenses have transformed the open content scene. They offer creators a range
of rigorous licenses that have been drawn up by lawyers with a deep understanding
of the issues of copyright in the Net age. An important recent court case in
the Netherlands has
confirmed their legality, at least in that jurisdiction.
Wiley's original licenses were created for educational materials, and among
the first applications of the Creative Commons licenses were two major open
content projects in the field of what has come to be called open courseware,
both funded by the
Hewlett Foundation. Just as open source avoids re-inventing the wheel by
building on existing code, so open courseware aims to save time, effort and
money by making educational material freely available for others to re-use,
extend and improve.
The first such project,
Connexions,
came from Rice University. It was the brainchild of Richard Baraniuk, professor
of electrical engineering, who was directly inspired by the example of open
source. Connexions uses a content creation platform called Rhaptos, which is
released under the GNU GPL. The other major open courseware project came from
MIT. One of the people behind the
OpenCourseWare idea – which arose out of an earlier failed attempt to make
money from selling MIT courses online – was Hal Abelson, who is also one of
the founders of Creative Commons. This joint involvement simplified the issue
of licensing, something that was a major issue for Rice initially, until it
too adopted a Creative Commons license.
MIT does not use an open source platform, but David Wiley has started a project
called
eduCommons, based on
Plone,
that offers this facility. Another of his free software projects, called
Open Learning Support, and now part of eduCommons, provides Rice's Connexions
and MIT's OpenCourseWare with online discussion boards. Baraniuk, for his part,
is working on a range of ancillary open source software, including systems to
aid translation, and a rating system for courses. It is also worth mentioning
the free software course management package
Moodle,
which is widely used around the world, and
Sakai, a similar project, funded by the Hewlett Foundation.
Although both Connexions and OpenCourseWare allow course materials to be
modified, they do not make any provision in their platforms for true collaborative
development. The final article in this short series will explore how this issue
has been addressed by open content projects.
Glyn Moody writes about open source and open content at
opendotdotdot.
Posted Apr 27, 2006 6:42 UTC (Thu) by
subscriber tzafrir [Link]
> [...] there is no doubt that the
Creative Commons licenses
> have transformed the open content
scene. They offer
> creators a range of rigorous licenses
that have been
> drawn up by lawyers with a deep understanding
of the
> issues of copyright in the Net age.
Despite them being drawn up by experienced lawyers, and despite
the several versions the CC licenses had so far, they still
seem fail to apply to the Debian Free Software Guidelines. The
GFDL has basically the same problem, basically. Some of the
issues involved seem to be quite practical (e.g: too strict
anti-DRM clauses may cause problems when storing the file in
an encrypted filesystem).
Version 2.0 of basically all the CC licenses share those
problems. See
http://people.debian.org/~evan/ccsummary . That link seems
to sum the discussions of the debian-legal mailing list from
April 2004.
I can't find any later source, though the wordings of the
relevant clauses in 2.5 has practically remained the same. Other
people I have asked seem to believe that those issues still
stand. But IANAL and probably non of them is either.
Any newer and more authorative opinions?
open content licensing and the DFSG Posted Apr 27, 2006 15:47 UTC (Thu)
by subscriber smoogen [Link]
I doubt they have changed. At this point, I think the Debian people need
to come out with a license that meets their needs and that writers can then
follow.
Freedoms of users of works
Posted Apr 27, 2006 23:13 UTC (Thu) by subscriber bignose [Link]
Part of the problem seems to be that artistic or informative works are many
years behind the "mind share" of required freedoms that programs currently enjoy.
It's no longer the case with program authors that they find the ideas of the
GPL to be foreign, but this is commonly the case with other types of works.
Authors seem to seek the CC licenses that prevent commercial redistribution,
or prevent derivative works. Musicians seem more enlightened about derivative
works, but still commonly want to prevent commercial redistribution. Artists
of graphical works are commonly not prepared to share the "source" of the graphical
work, so that others can work with it.
This is very similar to the mental landscape faced by free software twenty
years ago. A core group was trying to educate copyright holders of the benefits
to giving users of their programs the four freedoms iterated by the FSF. It
took much patience and much working against deep-seated fallacies to bring the
majority to the view that at least it's not *crazy* to give up so much control,
even if one doesn't choose to do so oneself.
Sadly, the FSF seem to be themselves stuck near the beginning of this curve;
they espouse the view that users of some kinds of useful information (programs)
are more deserving of freedom than users of other kinds (e.g. books), with the
result that they promote a license for books that is more restrictive to its
recipients than the license they promote for programs.
It seems artists of works of authorship, graphical, audio, and other creative
works need to go through a similar education period as software authors have
been through.
Posted Apr 27, 2006 20:50 UTC (Thu) by subscriber
k-squire [Link]
If you're interested, you can check out a recent
Google Tech Talk presentation presented by the Connexions people.
I recently saw this presentation elsewhere, and was quite impressed. They have
gotten to the point where they can take online texbook-quality material and
produce a bound copy for a fraction of what textbooks cost today.
Their content coverage is a little uneven--lots of Electrical Engineering, Bioinformatics,
and Music, little Computer Science. But there's quite a bit there, almost a
critical mass in some areas. Good stuff!
Kevin
Harnad
Last week we
highlighted Roberto
Casati's contribution to the online text-e symposium about the book, print and
reading in the digital world. The latest contribution is
paper on Skyreading
and Skywriting for Researchers: A Post-Gutenberg Anomaly and How to Resolve
It by publishing gadfly Stevan Harnad.
Harnad's distinguished by passion and ingenuity in a crusade to free publishing,
in particular scientific journals, from the clutches of commercial publishers.
His latest paper revisits arguments made in the often heated debates about scholarly
publishing, for example in the Nature online forum highlighted in our Electronic
Publishing
guide.
He argues that
There will be a profound and fundamental
dividing line in the PostGutenberg Galaxy, between non-give-away work (books,
magazines, software, music) and give-away work (of which the most important
representative is refereed scientific and scholarly research papers).
It is the failure to make this distinction that causes so much confusion,
and that is delaying the inevitable transition of the give-away work to
what is the optimal solution for scholars and scientists: that the annual
2,000,000+ articles in all 20,000+ refereed journals across disciplines
and languages and around the world should be freed on line through author/institution
self-archiving: http://www.eprints.org. ... questions about copyright, peer
review and other controversial issues can be clarified if the give-away/non-give-away
distinction is made.
Works such as Towards Electronic Journals:
Realities for Scientists, Librarians & Publishers (Washington: Special Libraries
Association 00) by Carol Tenopir & Donald King or their recent Lessons For
the Future Of Journals
paper suggest
that Harnad's polemic is overstated but if you are grappling with issues as
an author, reader, publisher or custodian it's worth a look.
We suggest that you read his paper in conjunction with Tenopir & Kings'
responses to criticisms
of their Towards Electronic Journals study.
OCLC to the rescue?
Last month we
noted worries about
the apparent collapse of netLibrary, one of several dot-coms that crashed and
burned after problems in the online college library market. Critics speculated
that institutions might be left without access to the texts once the smoke cleared.
Dublin (Ohio) based OCLC has now
announced a bid for
netLibrary, accepted in principle but to be approved by a Colorado bankruptcy
judge. OCLC will continue to provide access - for a fee.
And in line with recent dot-crashes, netLibrary is being sued by investors who
claim that they were deceived about its finances.
Etext standards
The US National Information Standards Organization (NISO)
is encouraging development of a Digital Talking Book Standard (DTBS)
to ensure compatibility among competing systems for formatting and providing
audio access to text.
As we've discussed in our Accessibility
guide, many surfers
with poor/no vision rely on speech readers - facilities that convert onscreen
text to a synthesized voice. For most people that's more effective than a device
that provides a braille output.
Unfortunately, most readers have difficulty in dealing with many web pages -
one reason why structure and tools such as ALT tags are important - and are
incompatible with the proprietary systems used in ebook devices. An exposure
draft of the proposed standard was released by NISO earlier this year.
It suggests that the structure of a digital talking book should consist of three
elements:
an audio file, coded using several standard
formats
a text file with XML tags for word spelling
and text searches
an integrative file in Synchronized Multimedia
Integration Language (SMIL)
to synchronize the audio and text elements.
About a year ago, Google announced a project to digitize large numbers of books
from five research libraries. Dubbed “the Google Five,” the University of Michigan,
Harvard, Stanford, Oxford, and the New York Public Library signed an agreement
with Google to provide portions (or, in the case of Michigan, all) of their
collections to Google to be digitized. A year later we still don't know much
more about their procedures, but now Google is being sued for digitizing material
under copyright while out-of-copyright books are beginning to appear on the
Google Print web site.By contrast, a similar initiative was recently announced
about which we already know much more. Maybe that's why it's called the Open
Content Alliance (OCA), put forward by the Internet Archive, Yahoo!, and a number
of large libraries, including my employer, the California Digital Library. Microsoft
shortly thereafter announced support as well, and additional libraries likely
will join. Yahoo!, Microsoft, and the libraries themselves are paying the Internet
Archive to digitize materials at 10¢ a page—an excellent price for nondestructive
scanning. The resulting files will be made available at the Internet Archive
web site and likely at other locations.
Open and accessible
Since the OCA is focusing on out-of-copyright material, it is dodging the
legal fight that Google is taking head-on. This means that all OCA content will
be viewable in its entirety online. But the project goes further. The digitized
files and their associated metadata will be available for complete downloading,
thereby allowing anyone to create singular presentations of this material. Some
books are already available for downloading and printing.
... ... ...
The OCA effort, unlike that of Google, is based on respect for collections
and the principles behind mass digitization of library materials. Research libraries,
writes Dan Greenstein of the California Digital Library in a draft principles
document, must “clearly and unambiguously begin articulating what public goods
are served by massive digitization of their holdings,” plus “articulate and
agree to adhere to a set of principles” to ensure that the resulting products
“support and promote these public goods.”
It's unclear whether the OCA project will rival the Google Library project
in size. Since it is easier for organizations to participate, the OCA will easily
have more participants, but the Google project may lead in the number of digitized
volumes if it fulfills its promise. Only time will tell. In any case, more digitized
content is likely a better thing overall.
The agreement between the University of California and the Internet Archive
emphasizes that the initiative is collaborative, as both parties must agree
to a protocol that will set up procedures for, among other things, moving the
books to and from the Internet Archive digitization shop, identifying and attaching
appropriate metadata to the scanned files, and assessing the scanned files against
appropriate standards.
Collaborations among participating libraries are also likely, if for no other
reason than to minimize duplication. There are other opportunities for collaboration
and not just among OCA libraries but with the “Google Five” and many other institutions
involved with digitizing content. Open digitized content, after all, is a growing
boon to all of our libraries and the users we serve.
SAN FRANCISCO -- Search is on a mission these days.
It's no longer enough to be able to index and point
to everything that's loaded on a Web server somewhere.
Search has moved into a new era in which content owners
and search providers are hustling to digitize
information moldering on the shelf.
"The World Wide Web gives us access to more
information, but almost everything on the net has been
written since 1996," said Brewster Kahle, founder of the
Internet Archive. "I think folks before 1996 also had
something to say."
Kahle spoke to an audience of librarians and
journalists for the kick-off of the Open Content
Alliance (OCA), a group with a plan to scan as many
out-of-print books as possible, then work up the chain
toward books under copyright. The OCA was
announced on October 3.
The digitized books will be made openly available for
search on the Internet Archive Web site or through other
search services.
"Having an open library allows different projects to
build new and different interfaces without having to ask
permission," Kahle said.
The archive created an elegant reading interface that
uses a page-turning metaphor. Entering a term in the
search query box produces a yellow tab on each page on
which the term is found. Clicking on a tab takes the
user to that page, where the term is highlighted in
yellow.
At the event, held in San Francisco's Presidio,
Kahle's staff demonstrated the "scribe station," a
system for scanning books that he said would cost around
ten cents a page. The system uses a 16-megapixel digital
camera that produces images at 500 DPI. Software color
corrects the images and provides thumbnails so the
operator can make sure all of the pages have been
scanned.
The OCA has more software to help determine whether a
particular book might be under copyright, and if so, to
connect with another database created in partnership
with libraries to find the copyright holder. "Copyright
issues are tricky, but they're doable," Kahle said.
Rather than shipping books to a central location for
scanning by a single company, OCA members will for the
most part handle their own scanning, then upload the
digitized documents to the archive.
OCA membership includes prestigious libraries and
research institutions that have pledged to digitize
priceless collections and make them available for
search.
For example, the Smithsonian Institution will
contribute its current digital collection and work to
digitize materials with a focus on history, culture and
biodiversity. The Missouri Botanicals Garden will scan
rare botanical prints and books kept under lock and key
in its archives. The Natural History Museum of London,
the New York Botanical Garden and Royal Botanical Garden
of London will contribute materials, as will the
libraries of Columbia, Emory and Johns Hopkins
Universities.
While Yahoo was a founding member of the OCA, and MSN
announced its membership at the event, Google (Quote)
was conspicuously absent. Google is being
sued by the Association of American Publishers and
the Authors Guild for scanning library books without the
consent of their copyright holders. (Google says its
activities fall under fair use principles.)
Founding OCA members are The Internet Archive, Yahoo!
Inc., Adobe Systems Inc., the European Archive, HP Labs,
the National Archives (UK), O'Reilly Media Inc.,
Prelinger Archives, the University of California, and
the University of Toronto. Fourteen new members were
announced at the event.
Several new OCA members said they wanted to make sure
that these public troves of knowledge remained owned by
the public.
Daniel Greenstein, executive director of the
University of California's California Digital Library,
said, "We want to make sure these works don't become
commodified."
Doron Weber, director of the Sloan Foundation's
programs for public understanding of science and
technology and history of science and technology, said,
"We cannot risk having world knowledge privatized. We
believe an open, non-proprietary approach is better. To
private companies, we say, 'Rein in your impulses.'"
|
SEATTLE - Microsoft Corp.
is diving into the business of offering online searches of books and other writings,
and says its approach aims to avoid the legal tussles met by rival Google Inc.
The Redmond-based software
giant said Tuesday that it will sidestep hot-button copyright issues for now
by initially focusing mainly on books, academic materials and other publications
that are in the public domain.
Microsoft plans to
initially work with an industry organization called the Open Content Alliance
to let users search about 150,000 pieces of published material. A test version
of the product is promised for next year.
The alliance, whose participants
also include top Internet portal Yahoo Inc., is working to make books and other
offline content available online without raising the ire of publishers and authors.
Danielle Tiedt, a general
manager of search content acquisition with Microsoft's MSN online unit, said
the company also is working with publishers and libraries on ways to eventually
make more copyright material available for online searches.
She said Microsoft is looking
at several options, including models where users would be charged to access
the content.
Microsoft said it has no
plans right now to have targeted ads located in the search results, but the
company cautioned that it was still working out the details of its business
model. "I think about the 150,000 books as a test," Tiedt said.
(MSNBC is a Microsoft -
NBC joint venture.)
Rival Google has
taken a markedly different approach, with plans to index millions of copyright
books from three major university libraries — Harvard, Stanford and Michigan
— unless the copyright holder notifies the company which volumes should be excluded.
The Association of American
Publishers, representing five publishers, and The Authors Guild, which includes
about 8,000 writers, have both sued the search engine giant over the plans.
Google has defended the
effort as necessary to its goal of helping people find information — and insists
that its scanning effort is protected under fair use law because of restrictions
placed on how much of any single book could be read.
Responding to Microsoft's
plans to offer its own book search, Google said in a statement that it "welcomes
efforts to make information accessible to the world."
Tiedt said Microsoft is
coming at book search from a different angle in part because the software maker
itself is so often the target of copyright infringement. Pirated versions
of Microsoft's Windows operating system are widely available in developing countries
for only a few dollars.
Microsoft's approach has
the potential to backfire, however, if Google ends up having more content available
or begins offering ways to search content for free while Microsoft pursues a
model that requires people to pay for it.
Microsoft acknowledges
it is far behind Google. Tiedt said she expects it will take years — and
require a substantial investment — to solidify the MSN product, working out
all the complex issues around searching through books and other materials online.
"This is not a money-maker
for the company," Tiedt said. "This is very much a strategic bet for search
overall."
The effort marks Microsoft's
latest effort to play catch-up with Google on various search technologies ranging
from basic Internet search to localized queries.
But Google remains by the
search leader by far, accounting for 45.1 percent of all U.S. Internet searches
in September, according to Nielsen/Net Ratings. Microsoft's MSN Search
ranked third, accounting for 11.7 percent of U.S. searches during the same period.
Internet powerhouse Yahoo Inc. is setting out to
build a vast online library of copyrighted books that pleases publishers--something
rival Google Inc. hasn't been able to achieve.
The Open Content Alliance, a project that Yahoo
is backing with several other partners, plans to provide digital versions of
books, academic papers, video, and audio. Much of the material will consist
of copyrighted material voluntarily submitted by publishers and authors, said
David Mandelbrot, Yahoo's vice president of search content.
Other participants in the alliance, which was
announced Oct. 3, include Adobe Systems Inc., Hewlett-Packard Co., the Internet
Archive, O'Reilly Media Inc., the University of California, and the University
of Toronto.
Although Yahoo will power the search engine,
located at the Open Content Alliance
web site, all
of its content reportedly will be made available so it can be indexed by other
major search engines, too, including Google's.
By joining the project, Sunnyvale, Calif.-based
Yahoo is hoping to upstage Google, which has a one-year head start on scanning
and indexing books so more literature and academic research can be accessed
with an internet connection from anywhere in the world.
"My feeling is we are doing something new here,"
Mandelbrot said. "We are building a collaborative effort that will make a great
deal of copyrighted material available in a way that's acceptable to the creators.
That is novel."
The alliance won't include any copyrighted material
unless it receives the explicit permission of a publisher or author. That restriction
means the alliance is bound to be missing much of the material available in
brick-and-mortar libraries.
In an effort to be as comprehensive as possible,
Google plans to index millions of copyrighted books from three major university
libraries--Harvard, Stanford, and Michigan--unless the copyright holder notifies
the company by Nov. 1 about which volumes should be excluded from the search
engine index.
Google's opt out provision has outraged many
publishers, who contend the company is flouting long-established copyright laws.
The Author's Guild Inc., which represents about 8,000 writers, sued Google for
copyright infringement last month (see "Authors:
Google infringing on copyrights"). Google maintains its scanning represents
"fair use" allowed under the law because it allows web surfers to view only
excerpts from copyrighted books.
Some of the most strident critics of Google's
library project are endorsing the Open Content Alliance, or OCA.
Patricia Schroeder, president for the Association
of American Publishers, described the alliance's approach as "very encouraging."
Sally Morris, chief executive for the Association
of Learned and Professional Society Publishers, said she hopes Google follows
the alliance's example. "The OCA's model of allowing rights holders to control
which of their works are opened up ... and where they are hosted may encourage
others to do so."
Google also applauded the Yahoo-backed alliance.
"We welcome efforts to make information accessible to the world," the company
said.
Everyman
| |
Join Date: Jun 2004, Posts: 115
Location: Texas
Reputation:
|
|
Something fishy with Google library
project
Something fishy is going on.
In the NYT on December 14, 2004, "Google Is Adding Major Libraries to Its
Database," by John Markoff and Edward Wyatt:
"Each agreement with a library is slightly different. Google plans to digitize
nearly all the eight million books in Stanford's collection and the seven
million at Michigan."
...
"At Stanford, Google hopes to be able to scan 50,000 pages a day within
the month, eventually doubling that rate, according to a person involved
in the project."
____________
50,000 pages a day is 2,083 pages per hour.
Let's double this rate, as Google will do "eventually," and call it 4,167
pages per hour. How many years will it take to do 8 million x 200 pages
per volume?
8 million x 200 = 1,600,000,000 pages to be scanned.
1,600,000,000 / 4,167 = 383,969 hours to scan Stanford's library at the
speed they hope to attain "eventually."
Let's run 24-hours a day (three shifts of temp workers at minimum wage!)
and assume that the wizards at the Googleplex will never have any down time.
How many days is this? 383,969 / 24 = 15,999 days.
How many years is this? 15,999 / 365.25 = 43.8 years. Even their cookie
won't last that long!
________________
But there's another army of temp workers at Michigan. Let's look at the
Michigan figures. According to University of Michigan librarian John Wilkin,
as reported in the Detroit Free Press on December 14 by columnist
Mike Wendland:
7,000,000: Volumes in the U-M library to be digitized.
2,380,000,000: Estimated number of pages.
Hold it right there, Mr. Librarian! Are you saying that each volume has
an average of 340 pages? Well okay, you're the librarian!
I have to adjust my Stanford figures. I assumed 200 pages per volume for
8 million volumes. If it's really 340 pages per volume, then the Stanford
project will take 1.7 times longer. Instead of 43.8 years, Stanford will
take 74.46 years! (Two back-to-back cookies are needed!)
Then Mr. Wilkin goes on to say, "Going as fast as we can with the traditional
means of doing this, it would take us about 1,600 years to do all 7 million
volumes," he said. "Google will do it in six years."
Wow, I'm impressed. Google really is God. What's the scan rate for 7 million
volumes over 6 years, if you run around the clock?
7 million x 340 pages per volume = 2,380,000,000 pages
6 years = 365.25 x 24 x 6 = 52,596 hours
scan rate = 2,380,000,000 / 52,596 = 45,251 per hour
For 24 hours, that comes to 1,086,024 pages per day. Now remember at Stanford,
Google will "eventually" double the rate of 50,000 per day, which means
100,000 per day when they do this. Recall from above that this means 4,167
pages per hour.
In other words, even running full-speed 24 hours per day, the scan rate
Google will have to achieve at Michigan in order to pull it off in six years,
is 10.86 times greater than the rate they will "eventually" achieve at Stanford.
But of course, the Mike Wendland column also says this:
"The size of the U-M undertaking is staggering. It involves the use of new
technology developed by Google that greatly speeds the digitizing process.
Without that technology -- which Google won't discuss in detail -- the task
would be impossible, says John Wilkin, the U-M associate librarian who is
heading the project."
Wait a minute, the NYT piece said this:
"At least initially, Google's digitizing task will be labor intensive, with
people placing the books and documents on sophisticated scanners whose high-resolution
cameras capture an image of each page and convert it to a digital file."
...
"The company refused to comment on the technology that it was using to digitize
books, except to say that it was nondestructive. But according to a person
who has been briefed on the project, Google's technology is more labor-intensive
than systems that are already commercially available."
So their secret sauce isn't even ready for tasting! Better hurry, the clock
is ticking....
Is it possible that the NYT piece dropped a zero and the rate is really
ten times the figure they reported? I doubt it, from what I know about the
technology. If anyone thinks this is possible, the NYT will probably be
happy to check out their source again and run a correction if they goofed.
|
update
Google will temporarily stop scanning copyright-protected books from libraries
into its database, the company said late Thursday.
The company's
library project, launched in December, involves the scanning of out-of-print
and copyright works so that their text can be found through the search engine's
database. Google is working on the project with libraries at Stanford University,
Harvard University and other schools.
The plan has
come under fire from several groups, including publishers, who object to
what they claim are violations of their copyrights.
Google said
on its blog late Thursday that, following discussions with "publishers,
publishing industry organizations and authors," it will stop scanning in copyright-protected
until November, while it makes changes to its
Google Print Publisher Program.
The publisher program also involves scanning
copyright books. In that program, books are scanned--at the publisher's request--to
let Web searchers view excerpts from books, critics' reviews and other book
data, with links back to publishers' Web sites or other places where the books
are for sale.
Google said it is adding new features that will
let publishers submit a list of books that, when scanned through the library
project, will be added to the publisher program. It is also adding a feature
that lets publishers present a list of books that should not be scanned through
the library project.
"We think most publishers and authors will choose
to participate in the publisher program in order (to) introduce their work to
countless readers around the world. But we know that not everyone agrees, and
we want to do our best to respect their views too," Google said on its blog.
Google was not immediately available for comment.
(Google representatives have instituted a policy of not talking with CNET News.com
reporters until July 2006 in response to privacy issues raised by a
previous story.)
But Google's move apparently did not satisfy
all publishers' concerns regarding the project.
"Google's procedure shifts the responsibility
for preventing infringement to the copyright owner rather than the user, turning
every principle of copyright law on its ear," Patricia Schroeder, CEO of the
Association of American Publishers, said in a statement.
"Many AAP members have partnered with Google
in its Print for Publishers Program, allowing selected titles to be digitized
and searchable on a limited basis pursuant to licenses or permission from publishers,"
she said. "We were confident that by working together, Google and publishers
could have produced a system that would work for everyone, and regret that Google
has decided not to work with us on our alternative proposal
Google is digitizing entire university libraries. Book publishers haven't
decided if the Google Library Project means exposure to new readers or copyright
infringement on a massive scale. It's a question the Supreme Court may have
to decide.
In October, the search goliath announced Google Print, a program that lets
publishers work with Google to digitize books to which they hold the rights
in order to make them available for search. Google promises publishers they
can earn money when searchers click on contextual ads that appear alongside
the book pages.
But book publishers were taken aback when they heard about Google Library,
a project that had been under way since 2002 with the University of Michigan.
Harvard University, Stanford University, Oxford University and the New York
Public Library also are in the process of letting Google scan parts or all of
their collections.
Google broke the news in December, the same day print.google.com officially
went live.
The Library Project was positioned as an extension of Google Print, but some
publishers saw it as more of a collision with it.
Deals with Google were struck one publisher at a time, but they included
restrictions on the amount of material from a work under copyright that Google
could show in search results, maintaining a fair-use argument for the search
engine's use. When searchers click on a listing, they might be able to read
anywhere from several pages to only a few sentences containing the keywords.
Listings also include a shot of the book cover, links to online booksellers
and ads.
But if Google copies a library book instead of making a deal with the publisher
of that book, it's likely the publisher would be cut out of any ad revenue share.
Google could not make executives available for interviews, but John Wilkin,
associate librarian for the University of Michigan and head of its Google Library
Project, said his library had no agreement to share ad revenue with Google.
In other words, all the ad money would stay in Google's pocket.
"Having reached these agreements with publishers for the use of books under
their copyright, Google now announced they'd scan works from several libraries
-- including works that are currently under copyright -- without requesting
the permission of the copyright owners," said Allan Adler, vice president for
legal affairs for the Association of American Publishers (AAP). "Imagine the
consternation that caused among publishing houses who realized the possibility
that books they had agreed to provide to Google under contract might nevertheless
be scanned by Google without those agreements."
Adler said AAP members were wondering why Google had sat down with them,
then announced two months later that it didn't really need publishers' permission
to scan.
"Google has said publishers can opt out works from the Library Project,"
Adler said, "but we understand that to mean not that Google wouldn't scan them
in their entirety and include them in its database, but only that they wouldn't
use part of the works in response to a search query."
The librarians saw the project as a way to make their collections more accessible
to a digital-centric public. They also were lured by Google's offer to give
them their own digital copy of each book. Universities around the world have
begun their own digitization projects, but Google's muscle and money could put
those projects on Internet time.
University of Michigan's Wilkin said, "We had focused on the hard 10 percent
of the problem. Google swooped in and did the easy 90 percent."
While Google will only make snippets of the libraries' copyrighted works
available through search, the University of Michigan plans to make entire digital
copies of works not under copyright available to library users.
Google and the libraries insist they're respecting copyright and acting inside
the law. Said Wilkin, "For everything for which there are no rights issues,
such as pre-1923 works and U.S. government publications, we'll allow multiple
online users to access our copies at once. But for works under copyright, we're
not going to be able to provide full digital access for even our own users."
The AAP's Adler said the publishing community wasn't focusing on the murky
fair use question, but rather on Google's plan to make money from books it hadn't
bought.
"Google's use of these copyrighted works in order to expand the kinds of
responses it offers to users of its search engine is clearly going to be used
to enhance its ability to sell advertising in conjunction with the operation
of that search engine," Adler said.
The American Association of University Presses (AAUP) sent a critical letter
to Google, complaining that Google Library could cut into the presses' earnings.
According to the AAUP, on average, university presses recover 87 percent of
the cost of publishing scholarly books from sales, with payments for permission
to reproduce works in such things as anthologies, paperback editions, course
packs, electronic reserves and document delivery services adding to that take.
The AAUP came in for its own share of criticism for not consulting with all
its members before firing off the letter -- and for providing a copy of the
letter to BusinessWeek before Google had received it. Peter Givler, the
AAUP's executive director and the author of the letter, didn't respond to requests
for comment.
John Wiley & Sons was one publisher that went directly to Google. "We see
potential issues and potential opportunities that could have an impact on our
authors, customers and the business," said Susan Spilka, Wiley's director of
communications. "Were' talking to them directly and also through our trade association."
She said Wiley is in the process of learning more about the Google Print
for libraries program and exploring both the issues and the opportunities.
The crux of the copyright issue, according to Adler, is not whether supplying
anywhere from a few sentences to a few pages of a book to searchers is covered
by the admittedly murky fair use provisions of U.S. copyright law. Rather, the
Library Project seems like a way for Google to profit off books without buying
them.
A court date is likely, said Lee Bromberg, a partner in Bromberg & Sunstein,
a law firm specializing in intellectual property. The key question, he said,
is whether the issue is mo
free textbooks useless without problem sets (Score:2)