Preliminary issues that we have identified:
- PDF meta-data
- Snippets - how do you get the snippets together, organize in different ways?
- How can we create a database for the articles that are on our computer?
- How do we know if we have already downloaded an article? One possible solution is to match PDF fingerprints to database or check-sum - identical byte information
Three parts of research process:
- Process reading and markup, tagging, saved searches
- Output - writing, referencing, formatting
We plan to look at
- Brainstorming tools
- Reference Managing tools
A (growing) list of technologies to investigate....
- Evernote - It is an aid for keeping track of "tasty bits of information" and it allows you to sync from your mobile device and laptop to the web. It has the ability to convert text from an image to searchable text. For example, if you are at a conference, you can take a picture on your iPhone of someone you meet holding up their name tag. Evernote will convert the text to searchable text, and you will have a record of the person's name and what they look like. Looks like a fun tool - not sure how powerful it is for research.
- Research Desktop - http://research.microsoft.com/en-us/um/cambridge/projects/researchdesktop/swf/player_01.htm
- Bookends - http://www.sonnysoftware.com/
- SciPlore - www.sciplore.org
- What is it we want to do?
- What tools already exist?
- What are the tools that are closest to what we want to do?
Tasks for week one:
- Write a narrative of our ideal junior researcher workflow and post by Sunday night
- Download Scrivener, DevonThink and other workflow managing tools and briefly asses their functionality
(Rebecca) downloaded Scrivener and fully intended to use it to assemble an interim report of 17 pages. First of all, I didn't find it very intuitive - I am not a person who reads instructions. After about 10 minutes of playing with it, I gave up and went back to Word. I will give Scrivener another try when I don't have a deadline to meet by midnight.
- Redesign DevonThink or a suite of similar tools and incorporate the best features to create a functional and usable workflow for the Junior Researcher.
- Create a screen cast of our ideal workflow scenario
I tried using Zim first, and although it allows me to attach documents and was fun to use, it was difficult to store metadata for them. And I couldn't really search within Zim wiki itself, not that helpful! Then I found a few likely options for open source document management systems: Alfresco, Knowledge Tree, or Smart Content Aggregation and Navigation.
What I like about SCAN is that it has more features and is easy to operate. SCAN has a Java-based user interface with browsing, searching, and tagging functions and it supports various file formats. It also allows for management of metadata through document properties. It is pretty neat, and I like the layout of SCAN.
What's convenient is that I can preserve the document directory hierarchy however I want it within the file system. And with SCAN, I can choose to create multiple document repositories and organize my documents with tags.
It would be great if it had support for searching mindmaps maybe using a basic simple object access protocol-based layer to gain access to metadata and search from a web interface since other software that is similar has this feature making it noticeably absent.
The most exciting discovery I have made is a Microsoft Product called "Research Desktop". It is still in development and unfortunately is not available for Mac. The workflow features it has are truly inspiring - and my narrative has been heavily borrowed from it.
After walking, feeding and watering the dog, I grab my laptop bag and head out the door to my local Starbucks. My tall black Italian Roast coffee in hand, I open up my laptop and connect to Internet using Starbucks' free "wireless for everybody" service. With some bemusement, I recall how Starbucks used to be one of the only places I could go to work, but now with improved wireless transmission, it is possible to access the Internet nearly everywhere, even in Toronto's subway system.
I check my inbox. Twelve new emails. Not too bad. I quickly scan them - only seven are related to my academic work. Four are notices that a bill has been paid or a service has been renewed. I have all my bills set up to be paid automatically - bookkeeping has never been my strong suit. One is a personal email from my friend Rob, reminding me to finish the drawing I promised him for his wife's fiftieth birthday scrapbook. I'll do that tonight in front of a Ted Talk or something. ResearchWorks, the academic workflow program I use for nearly all my research tasks, will recommend a video that will be current and relevant to my work.
The academic emails take a little more time to process. Three are related to a workshop I attended yesterday. The workshop facilitator has posted some photos from the workshop on her blog. She wants us to stay in touch and continue to comment on the research ideas that we presented at the workshop, through her blog. I click on the link in the email, and a dialogue box from ResearchWorks asks me if I want to add it to the list of blogs I am following. I reply, "Yes." Whenever something new is posted, the facilitator's blog will go to the top of the list. I have all of the blogs I follow organized this way. Blogs most closely related to my research interests appear higher on the list, as do blogs that I visit the most often. The last email is a link to a paper that one of my collaboraters thinks is relevent to our research. I click on the link. I am prompted by ResearchWorks to indicate whether or not I want this link added to my project workflow space. I select the project we are working on, ironically "The Future of the Academic PDF", and the link is connected to the project.
Next, I check my Twitter feed using my web browser. There are links to a couple of items of interest. One of the links is to an article in Wired magazine that looks promising. As I skim the article I see that it is related to the research I am doing about using social media tools for learning. There is an icon for ResearchWorks in my web browser which allows me to select the project space I want to connect the web page to, in this case Social Media Enhanced Learning. The other link of interest is to a YouTube video on using Web 3.0 tools in the classroom. This is an item of general interest, and not related to a specific research area. I click the ResearchWorks icon again, and this time select a project workspace I call "Cool Stuff." ResearchWorks allows me to schedule in breaks, so if I set a minibreak for one hour from now, a prompt will suggest that I watch an item from the "Cool Stuff" workspace. ResearchWorks will only play approximately five minutes of the video at a time, fading off at a natural break in the video. During my next break, if I choose to continue watching the video, the video will resume from 30 seconds before it ended the previous time.
With my email and social media tasks complete, I launch ResearchWorks. It prompts me to log in, and I provide my acadmic account name and password, provided to me by the university. I will be able to access all the journals, proceedings and online books that I am interested in. Articles that I have indicated I want to add to ReferenceWorks will be automatically downloaded to the appropriate project workspace.
The ResearchWorks interface is uncluttered and clean. I select the "Tasks" icon and a list of about 15 projects drops down. Three are active, as indicated by their bold text.They are also organized by urgency, so the project with closest deadline appears at the top of the list. I choose the item at the top of the list, "The Future of the Academic PDF" and a window opens up that covers half of the desktop. I am able to "scrub" through all of the research papers, website pages, digital sticky notes, and mindmaps that are related to the project. This gives me a visually rich overview of the work that has been done on the project so far. I select the "Documents" view; only PDF files and Word documents are visible. I want to read the paper that my colleague sent me this morning. I select the "Most Recent" view and the paper has already been downloaded and is ready to read.
I take a sip of black coffee from my white porcelain cup. The PDF document has several layers attached to it. If I want to, I can make my colleague's highlighted text and annotations visible by clicking the eye icon next to her name. I decide to keep her comments turned off for now. Instead, I choose to click on the cloud icon, and a cloud tag for the entire document appears beside it. This gives me a quick overview of the article's key concepts. I wonder if cloud tags can be used to analyze interview data?
I decide to switch tasks for a moment. I click on the "Notes" icon. I can see an overview of my drawing board that contains clusters of images, 3D drawings, notes, emails, documents, and web pages, organized by topic. I zoom in a little and find "Research Methods." I select the "Tweet" icon, and write "Cloud Tags - A Viable Method for Analyzing Research Data?" I might as well get feedback on this instead of keeping it to myself. ResearchWorks will tweet my question. If anyone responds to the question, I will be notified during my break time. As a researcher, one of the greatest challenges I personally face is procrastination. My scheduled breaks allow me to have some control over distractions.
I click back on the "Tasks" icon, and the PDF I was reading is open to where I left off. I continue to read the article, highlighting and making comments as I read. When I am finished reading, I switch the "Read" toggle to complete. I also select the "Export Clips" icon. ResearchWorks asks me where I'd like to export the clips to. I check off current project workspace and email. My highlights and comments are exported in a text file to my current project workspace and are attached to an email, which I will send to my colleague, with a note of thanks. The text file also contains the metadata for the citation information. If I choose to copy and paste the text into the ResearchWorks writing environment, the citation data will automatically be added to the References list, in APA format.
A text box pops up saying it is time for a break. I am informed that there is a response to my Twitter question with a link to a relevant article. As I scan the article, I am excited by this adjunct method of analyzing data! I click the ReferenceWorks icon on my browser and add this page to the "Research Methods" cluster in the Notes section. Before starting to watch the YouTube video on Web 3.0 techologies in the classroom, I get up and go for a free refill on my coffee (having paid with my registered Starbucks card, of course!).
When downloading a PDF, it would use an automatic "fingerprint" to check with an online service if there has already been metadata entered. If not, the metadata I enter will be contributed to the database. It will both check a perfect (byte for byte match), and a content-based that would survive transformation from DOC to PDF etc.
Support for XMP Metadata in PDFs
Improve the workflow of sharing PDFs and bibliography information
Once I have a PDF with metadata, the metadata is stored physically in the PDF file, so that if I send the file to others, metadata follows automatically.
I have an application like iTunes, which allows me to manage all my PDFs, using a relational database (Ie. view all authors, view all titles, view all journal titles, etc).
(example: Papers http://mekentosj.com/papers/, http://www.sciplore.org/software/sciplore_mindmapping/ (view screencast))
This application can also easily generate other formats, and syncs with for example my iPhone, iPad, or ebook reader (Kindle etc). The PDF extractor is smart enough to understand headers and footers, columns etc. If I am not happy, it provides me with a very simple interface to mark up what elements of the pages go where. Once I have done this, it is synchronized with the server, and made available to anyone else who download the same PDF (the server that keeps the citation data).
I now have a multitude of ways of reading this article - I can read it either as the original PDF, or as nice converted clean text on my computer, I can read it on my iPad, Kindle etc. In all cases, I can easily mark up any amount of text, and add comments and tags at any point in the text. The computer interface shows a nice split screen view, with a window for running general comments, but you could also insert comments at any point in the text, or highlight text. These programs all synchronize with each other - the highlights I make on my Kindle show up when I view the article on my computer, etc.
The notes are stored in a relational database connected to the article. I can view the highlights as color applied to the text, but I can also view a text document containing only all the highlighted text (which I can then tag, etc). The software offers powerful organizing features, letting me tag snippets from different articles and do saved searches for certain tags, or words. It also let's me display snippets and categories graphically, to help with creative thinking
See Tinderbox (including some of the screencasts): http://www.eastgate.com/Tinderbox/
I can also simply copy words and paragraphs from the notes. The neat thing is that all the text has a property set referring to a certain article, and a certain location in that article - this property works similar to a color - if you copy a piece of text, it takes the color with it. So if you had something like this:
p 43 using metadata for analyzing PDFs
p 55 massive databases and data manipulation
p 60 egyptian revolution
p 65 twitter as a tool for revolution
p 10 Egyptian history
p 15 Libyan political situation
p 5 politics in the Middle-East
p 15 politics in Egypt
and I was preparing for a chapter in Egypt, I might copy and paste some text from the notes above
politics in Egypt
If I now moused-over any of these text snippets, they would display the source and location. If I dragged some of them into a document I was just writing, they would create a perfectly formatted reference to the source.
All the references in the articles I had would be parsed through the same citation server, which would have information about any public location where you could download them, or any user in the network who had a copy, and with the push of a button, you could request a copy of the article directly from that user. For articles available as OA, or through your university network, the program can easily download every single article cited, and do things like tag clouds, or even analyze their references, and tell you which are the most frequently cited articles by the articles cited in your article. If you allow the program to share usage data with the server, it will also share your reading habits, and let you know "people who read this article, also read these articles".
All of your data can easily be shared. For any topic, you can create a collection, containing all of the articles you accessed, tagged with keywords, and with your general notes. These will be available on the web, or directly to other users that access the server - when reading an article, you can easily choose to see the most highlighted passages, comments others have added (either only people in your social graph, or anyone), other articles they have linked to, etc.
All this citation information is also accessible to journals, and some innovative open access journals, such as PlOS One
These now display the metrics generated by the database - number of times downloaded, articles most commonly linked with it, etc.
- The four panel window works really well. I can scroll through the PDF while writing notes about the article.
- Email article feature. When you are in article view mode, you just click on Mail icon and the article is attached to your email, with citation information (APA) and a link to the website that it was downloaded from along with a PDF of the article!! I was so excited by this feature and the ease with which I could send my colleagues an article!!
- The customer service is very good. I emailed them with proof that I was a student so that I could get a discount when I purchased the product.
- So far, with the version I have, I can't highlight text in my PDFs. Apparently version 2.0 will allow me to do that. I have requested a student discount and am waiting for my discount code so I can upgrade to 2.0! Update: I have upgraded, and have to wait until March 8th to upgrade to 2.0. In the meantime, they are revealing one new feature per day....
- Adding multiple authors is difficult. I can only seem to add one author, or the text is garbled
In document type, I can select article, editorial, commentary or review, but not book chapter
- I would like to be able to collapse the "Repository" menu, so I don't have to scroll all the way down to see my collections
- I'd like to see a cloud tag for the PDFs I'm reading. It would be a quick way to generate some tags in the "Notes section"
- Shift-click select is useful if items are grouped before moving
- Very neat indicators for where an item will be dropped
- Easy work-arounds for problems regarding node genesis inserting child-parent
- Different zoom levels makes it easy to work locally or see the whole picture
- "Fold all" is very powerful after instating a file tree
- Nodes with folded children have a helpful circle on the end
- Using Foxit Reader, I could highlight text in my pdf files
- Scrolling can be glitchy, occasionally repelling bottom half of screen or scrolling very very slowly as though on different zoom level
- Rigid and directional tree layout,seemingly unresponsive to radial or free-form shapes
- No new parent node in right-click menu
- No way to 'bump' a node down the tree, to distance a group of children from root
- Hard to drag farther than a single screen while scrolling up or down
- Impossible to read titles on full size mind map, perhaps abridge titles and enlarge font at low zoom levels
- There is no icon for SciPlore on the desktop use start menu
- Significant lag on start up for some features, making it hard to find them
Notes on visualizations and mind mapping
Hart (1998) discussed several mapping approaches which might be a useful starting point for thinking about how the mindmapping portion of our workflow tool could work:
- Feature Map: Argumental structures which are developed from summary record sheets
- Subject Tree Map: Summative maps showing the development of topic into sub-themes to any number of levels
- Content Map: Linear structure of organization of content (hierarchical)
- Taxonomic Map: Classification through standardized taxonomies
- Concept Map: Linking concepts enables declarative to procedural knowledge (cause and effect and problem solving)
- Citation Mapping: It is an interactive citation tree that displays both forward and backward citations. It shows the citation relationships between a paper and other papers. It allows users to organize and color-code the results by author, year, journal title, etc.
Hart, C. (1998). Doing a Literature Review: Releasing the Social Science Research Imagination. London: Sage.