Page 1 of 2

How to generate TOC with links to pdf library? SOLVED

Posted: Wed Aug 11, 2021 2:36 am
by geo_c

So I have this huge pdf library, and I'm wondering if there's a way to mass generate links to each file and create an html page as table of contents made of hyperlinks. I'm sure there must be, but I don't know where to look, Open Office? Command line? Something else? Ideally it would preserve a relative directory structure.

Thanks to anyone who can point me in the right direction.

~geo


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 11, 2021 2:54 am
by Flash

Are the pdfs all in one directory? Here's a utility called tree that might help.


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 11, 2021 4:27 am
by Puppyt

Look at my "desert island" choice of software - Zotero https://www.zotero.org
Sure, it's primarily aimed at bibliography construction and research functions, but I also use it to take snapshots of webpages for various projects - all the info is searchable at whatever later stage (when it was a Firefox add-on, now standalone but integrated with Chrome- and Firefox-based browsers; TextMaker, LibreOffice, OpenOffice, Googledocs for writing articles etc).
Fancy having your PDF titles renamable on-the-fly within a repository system, all the metadata, tags, keywords pulled out of it automagically for easy searching and sorting? You can export your library in whatever form you like for use outside of Zotero (not sure about hyperlinks in an HTML doc - haven't needed that myself but I recommend you give it a look).
I've just downloaded the tarball and it started off-the-bat in Fossapup64 9.5. There's a deb available if you prefer.

Have Fun :)

P.S. Ignore the "You seem to be running Zotero under root" warning. You're in Puppy Territory now...
===========================================
These details copied from here https://www.zotero.org/support/attaching_files seem to suit your intended purpose:
Adding Files via the Zotero window
Drag and Drop
Files can be copied into your library by dragging a file from your operating system's file browser into the Zotero window, and either dropping it onto a collection in the left pane, or onto the center pane. Files dropped onto an existing regular Zotero item in the center pane are added as child items. Files dropped onto a collection, or in an empty space or between items in the center pane, are added as standalone items.

You can also drag and drop an existing standalone file item in Zotero onto a regular Zotero item to create a child item.

Adding linked files
By default, files dragged into Zotero are added as copies of the original files. To instead add links to the original files, hold down Ctrl+Shift (Windows/Linux) or Cmd+Option (Mac) while dropping. (On macOS, it may be necessary to allow the Zotero window to come to the front before letting go for the modifier key to take effect.)


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 11, 2021 8:08 am
by geo_c

Thanks Puppyt !

It fired right up, now let's see if I can figure out how to use it!

~geo


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 11, 2021 12:53 pm
by Puppyt

Good luck @geo_c !
Zotero loves to make a copy of every file into its own file structure system - one folder per pdf usually which can get unwieldy outside of the Zotero confines (eg searching in a file manager), so I recommend you use the emboldened tip for folders you might already have previously organised:

Adding linked files
By default, files dragged into Zotero are added as copies of the original files. To instead add links to the original files, hold down Ctrl+Shift (Windows/Linux) or Cmd+Option (Mac) while dropping.

There are loads of additional 'plug-ins' from volunteers for various specific features, this one might interest you:
(https://www.zotero.org/support/plugins)

Zotero Folder Import, by Emiliano Heyns.
Plugin to import a folder of attachment files from your computer into a Zotero collection hierarchy. Useful for transitioning to Zotero from a manual folder-based organization system.

I hope you find it useful. But not addictive - like a data hoarder like me :)


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 11, 2021 3:34 pm
by geo_c

Thanks, I played with it last night and yes it created a folder, which I kind of liked just so I didn't change any original files. I exported it as bookmarks, but instead of getting links to the local files, it changed them to the original links online, IMPRESSIVE, but not what I wanted. Let me try what you suggested. I think I would like to be able to do both, have an online bookmark file, and a local link file.

~geo

and yeah, info addiction is a growing problem in these confusing times!

So what does TOC stand for?


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 11, 2021 9:24 pm
by Wiz57

@geo_c
"So what does TOC stand for?"
Table of contents


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 12, 2021 2:42 am
by geo_c
Wiz57 wrote: Wed Aug 11, 2021 9:24 pm

@geo_c
"So what does TOC stand for?"
Table of contents

plant face in palms :oops:


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 12, 2021 4:25 am
by MochiMoppel
Flash wrote: Wed Aug 11, 2021 2:54 am

Are the pdfs all in one directory? Here's a utility called tree that might help.

Good idea. PDFs don't need to be in one directory.
It can "mass generate links to each file and create an html page as table of contents made of hyperlinks", and as far as I understand that's exactly what @geo_c asked for.


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 12, 2021 4:37 am
by geo_c
MochiMoppel wrote: Thu Aug 12, 2021 4:25 am
Flash wrote: Wed Aug 11, 2021 2:54 am

Are the pdfs all in one directory? Here's a utility called tree that might help.

Good idea. PDFs don't need to be in one directory.
It can "mass generate links to each file and create an html page as table of contents made of hyperlinks", and as far as I understand that's exactly what @geo_c asked for.

That's exactly it. They are sorted in directories by topic. I'll check out the tree utility.

BUT, is that a pet in a squashfile? I'm not sure whether to mount it or try and install it.


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 12, 2021 4:44 am
by MochiMoppel
geo_c wrote: Thu Aug 12, 2021 4:37 am

BUT, is that a pet in a squashfile?

Yeah, it's a bit oddly packed. It's just a squashfile which you can click in ROX and "View contents" and then pull out the tree binary. A pretty old version, but it does the job once you have figured out how the -H option works ;)


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 12, 2021 4:54 am
by geo_c

Well while I was toying around with that file, I noticed there's already a bin in usr/bin called tree. Is it the same? Is this a command line utility?


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 12, 2021 4:57 am
by geo_c

I just entered 'tree' in /root and it tree'd my whole root directory. Is that what we're talking about? Maybe I just need to run the help switch on it and figure out what all it can do.


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 12, 2021 5:10 am
by geo_c

Oh man! That's nice! I made a tree of links of the directory into an HTML file, but I must have put the Href in the wrong spot, because the browser can't find the files.


Re: How to generate TOC with links to pdf library? SOLVED

Posted: Thu Aug 12, 2021 5:22 am
by geo_c

DONE!

Why do we use these complicated GUI applications all the time? I mean typing long directory structures and getting the syntax in command line is a pain, but in cases like these, oh so worth it.

So I realized that the href is an actual directory. I'm wondering now about a new endeavor. Zipping that directory up with html link file table of contents located in the right spot, and could anyone unzip the directory and use it, or would that href directory have to be relative in some sense since the path isn't absolute.

I only know enough command line to be dangerous, don't you know.

Thanks guys! I love this forum.


Re: How to generate TOC with links to pdf library? SOLVED

Posted: Thu Aug 12, 2021 6:37 am
by geo_c

Well, I finally figured out the relative path thing also. So I'm totally in business. Although I'm not giving up on the Zotero application. It's like a meta-tagger librarian for media files, only for research docs. Very nice program.


Zipping linux directory for Windows

Posted: Tue Aug 17, 2021 1:17 pm
by geo_c

@MochiMoppel Now I am able pump out a table of contents quickly for all the files in a directory tree. I also saved a formatted header to paste in and make it look nice. The whole process takes about 30 seconds. I have an update script and file with the header for speed.

So what I decided to do was see if I zipped the directory, loaded it to my dropbox, could I share the files by sending an email link to the zip. I decided to see if the average person would do this, so I zipped a directory with table of contents included and emailed a dropbox link to various people of different user skills and platforms.

Then it occurred to me that since this is a linux file system, a windows user might not actually be able to unzip the directory and see everything.

Do you know the particulars of this, would I have to use FTP or something similar?

Thanks
@geo_c


Re: Zipping linux directory for Windows

Posted: Tue Aug 17, 2021 1:45 pm
by MochiMoppel
geo_c wrote: Tue Aug 17, 2021 1:17 pm

I also saved a formatted header to paste in and make it look nice. The whole process takes about 30 seconds. I have an update script and file with the header for speed.

How and why do you "paste in"? The header option of the tree command is not enough? Care to share your script and header file?

Do you know the particulars of this, would I have to use FTP or something similar?

What is "this"? I don't quite understand what you are asking for and why the transfer protocal would matter. If you have a ZIP file with a directory tree in your Dropbox and if the filenames comply with FAT32 naming conventions I see no reason why a Windows user would not be able to unzip the file.


Re: Zipping linux directory for Windows

Posted: Tue Aug 17, 2021 1:57 pm
by geo_c

If you have a ZIP file with a directory tree in your Dropbox and if the filenames comply with FAT32 naming conventions I see no reason why a Windows user would not be able to unzip the file.

I wasn't sure a zipfile of directory would cross platforms, and you answered my question! thank you@MochiMoppel.

MochiMoppel wrote: Tue Aug 17, 2021 1:45 pm

How and why do you "paste in"? The header option of the tree command is not enough? Care to share your script and header file?

Here's the header. I didn't see any options in the tree --help that would reformat the text and colors to this extent, but I probably missed it. I'm dense that way.

<!--
BODY { font-family : sans, book, sans; background-color: black;}
h1 { background: #000000; font-variant: normal; color : #7C4D2A; font-family: "DejaVu Sans"; font-size: 24pt; font-style: normal; }
P { font-weight: normal; font-family : sans book, sans; color: #1C9E26; background-color: black; }
B { font-weight: normal; color: #1C9E26; background-color: black; }
A:visited { color : #D1AF68; font-weight : normal; text-decoration : none; background-color : black; margin : 0px 0px 0px 0px; padding : 0px 0px 0px 0px; display: inline; }
A:link { color : #71CDF1; font-weight : normal; text-decoration : none; margin : 0px 0px 0px 0px; padding : 0px 0px 0px 0px; display: inline; }
A:hover { color : aqua; font-weight : normal; text-decoration : underline; background-color : #7C4D2A; margin : 0px 0px 0px 0px; padding : 0px 0px 0px 0px; display: inline; }
A:active { color : #D1AF68; font-weight: normal; background-color : aqua; margin : 0px 0px 0px 0px; padding : 0px 0px 0px 0px; display: inline; }
.VERSION { font-size: small; font-family : arial, sans-serif; }
.NORM { color: black; background-color: transparent;}
.FIFO { color: purple; background-color: transparent; }
.CHAR { color: yellow; background-color: transparent; }
.DIR { color: blue; background-color: transparent; }
.BLOCK { color: yellow; background-color: transparent; }
.LINK { color: aqua; background-color: transparent; }
.SOCK { color: fuchsia;background-color: transparent; }
.EXEC { color: green; background-color: transparent; }
-->

@geo_c


Re: How to generate TOC with links to pdf library?

Posted: Tue Aug 17, 2021 2:05 pm
by geo_c

@MochiMoppel

I see the -T string switch now. but I'm not sure how to insert the entire header into it. I'll have to read up on string theory! I'm kidding, I just mean I need to bone up on the syntax.

Though now I remember I looked at the -T string option before and thought that it only applied to HTML title and H1 header, not necessarily the body and background.


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 18, 2021 2:23 am
by Flash

Wow! You guys figured tree out far beyond anything I ever did. I only used it to make a list of books and authors in my audio book library. I thought it might do what geo_c wanted, but I couldn't have told you how. Thanks for stepping in, MochiMoppel.

Sorry I took so long to get back to this topic. I was binge watching YouTube car crash videos. Cars do the craziest things. :lol:

Geo_c, if your question has been answered, perhaps you could write up the answer as a recipe in the How-to section of this forum.


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 18, 2021 4:12 am
by geo_c
Flash wrote: Wed Aug 18, 2021 2:23 am

Geo_c, if your question has been answered, perhaps you could write up the answer as a recipe in the How-to section of this forum.

Well @Flash, before I write up a how to I want to hear back from @MochiMoppel about whether it's possible to get that whole html header in the tree command, and if so what that command might look like.

I also thought about using that command that appends files together, which I can't think of which command at the moment, but I seem to remember things could be inserted at certain text points. Whichever method would work, it would be nice to be able to include the lines in a script where the script runs the tree command with the more sophisticated html header. It's not that big of difference between the two, but the point is being able to tailor the look:
The default tree header:
Image
Versus the tweaked header:
Image


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 18, 2021 9:02 am
by MochiMoppel
geo_c wrote: Wed Aug 18, 2021 4:12 am

Well @Flash, before I write up a how to I want to hear back from @MochiMoppel about whether it's possible to get that whole html header in the tree command, and if so what that command might look like.

Not possible with the tree command itself, but for a How-To wouldn't it be best to start with the basics before venturing into manipulation of the tree output? I see from your header file that some of your changes "sabotage" the possibilities that the basic tree command offers, so it really might be best to start simple.

I also thought about using that command that appends files together, which I can't think of which command at the moment, but I seem to remember things could be inserted at certain text points.

There are various ways to exchange the <style> section of the HTML produced by tree with the content of your own header file. This will reduce your 30sec workload to less than a second :lol:

The HTML code produced by the tree command is not great, partly even wrong, but for many users the result may be OK. For changing some colors or fonts it might be sufficient to replace them with a sed command instead of a separate external header file. Let's see if there is demand for that.


Re: How to generate TOC with links to pdf library?

Posted: Wed Aug 18, 2021 2:35 pm
by geo_c
MochiMoppel wrote: Wed Aug 18, 2021 9:02 am

There are various ways to exchange the <style> section of the HTML produced by tree with the content of your own header file. This will reduce your 30sec workload to less than a second :lol:

The HTML code produced by the tree command is not great, partly even wrong, but for many users the result may be OK. For changing some colors or fonts it might be sufficient to replace them with a sed command instead of a separate external header file. Let's see if there is demand for that.

That's funny, because I've only recently in the last year become comfortable manipulating html. I looked at the header that tree generated and thought, "Yeah I completely get this. I could write this code from scratch eventually."

As a side note for html management, I just installed an addon to my browser called 'Link Gopher'. It extracts all links from a page and makes a nice new list. Userful tool!

https://sites.google.com/site/linkgopher/


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 19, 2021 2:55 am
by MochiMoppel
geo_c wrote: Wed Aug 18, 2021 2:35 pm

I looked at the header that tree generated and thought, "Yeah I completely get this. I could write this code from scratch eventually."

HTML and CSS are two different things, both would benefit from a rewrite. Even when not touching it I don't understand what tree is doing. I can't remember, but maybe that's the reason why I once gave up on tree and wrote a script emulating the tree output. It's not difficult.

Here an example of my tree worries. I tried to generate a tree for the directory /root/puppy-reference. AFAIK this directory exists in all Puppies and contains linked directories for mainly multimedia files.

My code:

Code: Select all

tree -lCH /root/puppy-reference /root/puppy-reference  > /tmp/trial.htm

This produced a HTML page with completely unusable links. Seems that tree is unable to handle linked directories (or I am unable to handle tree)
Next problem is the use of classes. Directories are formatted with class "DIR", which is fine, but the top node, which is always a directory, is formatted as class "NORM". Makes no sense. And then there is the arbitrary use of classes. Why are some WAV files formatted as "NORM" and others as "EXEC"? Same confusion with PNG and XPM files.


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 19, 2021 4:42 am
by geo_c
MochiMoppel wrote: Thu Aug 19, 2021 2:55 am
geo_c wrote: Wed Aug 18, 2021 2:35 pm

I looked at the header that tree generated and thought, "Yeah I completely get this. I could write this code from scratch eventually."

HTML and CSS are two different things, both would benefit from a rewrite. Even when not touching it I don't understand what tree is doing. I can't remember, but maybe that's the reason why I once gave up on tree and wrote a script emulating the tree output. It's not difficult.

Here an example of my tree worries. I tried to generate a tree for the directory /root/puppy-reference. AFAIK this directory exists in all Puppies and contains linked directories for mainly multimedia files.

My code:

Code: Select all

tree -lCH /root/puppy-reference /root/puppy-reference  > /tmp/trial.htm

This produced a HTML page with completely unusable links. Seems that tree is unable to handle linked directories (or I am unable to handle tree)
Next problem is the use of classes. Directories are formatted with class "DIR", which is fine, but the top node, which is always a directory, is formatted as class "NORM". Makes no sense. And then there is the arbitrary use of classes. Why are some WAV files formatted as "NORM" and others as "EXEC"? Same confusion with PNG and XPM files.

Yeah, that's interesting. My file didn't generate any classes for links, but I'm not sure what you're doing with the lCH options. That looks like it's designed to list any symbolic links as directories, using all colors and generating a tree with /root/puppy-reference as the top directory, But then I'm not exactly sure why that directory is entered twice in a row.

My script looks like this:

Code: Select all

#change to href directory
cd /mnt/home/dbox.sync.mir/lib.pdf/cov.lib

#use tree command with -H  (relative path argument), -o (output to named file)
tree -H ./ -o ALL-CONTENTS.html -T "Table of Contents"

And that script generates the previously posted header with the following body:

Code: Select all

<body>
	<h1>Table of Contents</h1><p>
	<a href="./">./</a><br>
	├── <a href=".//ALL-CONTENTS.html">ALL-CONTENTS.html</a><br>
	├── <a href=".//A-LOOK-AT-WHATS-NEW-IN-THIS-UPDATE/">A-LOOK-AT-WHATS-NEW-IN-THIS-UPDATE</a><br>
	│   ├── <a href=".//A-LOOK-AT-WHATS-NEW-IN-THIS-UPDATE/Antibody%20Evolution%20after%20CoV-2%20mRNA%20Vaccination-SUMMARY.pdf">Antibody Evolution after CoV-2 mRNA Vaccination-SUMMARY.pdf</a><br>

Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 19, 2021 5:12 am
by MochiMoppel
geo_c wrote: Thu Aug 19, 2021 4:42 am

My script looks like this:

Could you please use the same directory as I did? This would make it easier to compare and also easier for other users to try for themselves. Hopefully your attempts have better results than mine.

But then I'm not exactly sure why that directory is entered twice in a row

Because tree by default scans the working directory and not the directory supplied with the -H option. Would not be necessary if I would first cd to this directory (as you do). But this would be 2 commands. I prefer to use only the tree command.


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 19, 2021 6:37 pm
by Flash

Everyone probably already knows this, but hitting the ~ key while you're in a directory opens a console in that directory.


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 19, 2021 8:19 pm
by geo_c

Well, @MochiMoppel and @Flash this is fun even though so far it's a fail.

I tried this command from the /root/puppy-reference directory:

Code: Select all

tree -l -H /root/puppy-reference/ -o REFERENCES-2.html -T "Puppy Media"

and the file returned paths that looked like this, apparently following the link then combining the tree's directory and the linked directory into one directory name:

Code: Select all

<h1>Puppy Media</h1><p>
	<a href="/root/puppy-reference/">/root/puppy-reference/</a><br>
	├── <a href="/root/puppy-reference//audio/">audio</a><br>
	│   ├── <a href="/root/puppy-reference/usr/share/audio/2barks.au">2barks.au</a><br>
	│   ├── <a href="/root/puppy-reference/usr/share/audio/2barks.wav">2barks.wav</a><br>
	│   ├── <a href="/root/puppy-reference/usr/share/audio/bark.au">bark.au</a><br>

So, either it's lacking some function, or I'm just not getting the options right.

Maybe there's a way to eliminate the /root/puppy reference/ from the address.


Re: How to generate TOC with links to pdf library?

Posted: Thu Aug 19, 2021 8:35 pm
by geo_c

I don't think we want to use the -l option, because that says to follow the link as a directory, so it's probably doing what we're asking in a sense. Appending the directory to include the link target.

And I've tried quite a few combinations now. No luck with symlinks. But you know, that's not a problem for my initial project, which was to have a portable file directory with a linked table of contents to files at their actual address relative to the top directory.

I almost think tree could work for symlinks, if it was run at the very top (which I tried and got a tree of the whole system) but with a pattern matching option to only include the directories in puppy reference.

But I have to go for now. I'll try it later.