In David Wiley‘s IPT 692r – Intro to Open Ed course students have fragmented into two small groups, each of which has chosen to research and catalog appropriate open resources that may be used to fulfill learning objectives for one of the secondary education core curricula for the state of Utah. As I have begun searching for, tagging, and sharing resources, I’ve begun to consider the long-enduring web question: link or copy?
And though the question is not staggering, it may be taken for granted, even at the cost of the long-term success of the web project.
The link approach typically uses hyperlinks to the target source document, but may use iframes to embed the element within a locally-hosted web page.
- preserves integrity of the original source by maintaining all original qualities
- respects original source by trajecting traffic to the host site
- saves local hosting resources (storage & bandwith)
- ensures that source updates are reflected in the current version
- is, therefore, particularly well-suited for frequently updated or improved sources, like wikis
- is much easier, particularly when numerous multimedia files are embedded, or multiple files are referenced
- may provide learners with context and hyperlinks that lead to further, relevant exploration of the source site and the web
Many of these arguments for linking presume that there is more to the information than the information itself, and that the source has some inherent value that may be passed on to the learners or should be maintained for its own sake.
The copy approach is similarly self-evident: a digital copy of the source file(s) is downloaded, then hosted on the local server.
- provides for adaptation or modification (if the license allows) of:
- content (cut, insert, remix, extend)
- presentation (e.g. surface design)
- supports localization
- captures and preserves a version that may be discarded or replaced in the future
- allows designers to produce seamless learning experiences that support learner focus
- respects original source host’s resources (storage & bandwith)
- ensures technical availability of the resource is within local control (no dead links)
- allows contextual indexing for site (or public) search engines
- may improve reach and increase circulation of source information
- may thereby enlarge original author’s prominence and visibility
A couple notable obstacles to copying:
- Server-generated content, markup, interactions, or hyperlinks may be difficult to acquire or reuse (e.g.
- While CC By-ND allows reproduction of works, it may restrict modification of presentation or interactions in addition to the more clear prohibition on modification of content
Dynamic Scraping and Importing
There are other approaches that fall somewhere in between. For instance, web scraping of the source file(s) on the fly, followed by parsing and processing of the data on the local host. This sounds complex, but it’s not too bad; Google Docs & Spreadsheets has implemented this functionality into it’s data importing spreadsheet formulae:
- =importHTML grabs the content of a TABLE or list (OL / UL [/DL?])
- =importXML uses xPath expressions to target XML/XHTML elements
- =importData takes structured data files, such as comma separated values (CSV)
- =GoogleReader intakes the RSS or Atom of a target URL, such as a blog post