Go Back   CodingForums.com > :: Client side development > HTML & CSS

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Old 11-30-2009, 05:57 AM   PM User | #1
whopub
New Coder

 
Join Date: Apr 2006
Posts: 29
Thanks: 1
Thanked 0 Times in 0 Posts
whopub is an unknown quantity at this point
Question SRC URL extraction method - HTML or TXT to TXT...

Hi,

I'm looking for an app, or online form, to extract image URLs from HTML code saved on TXT files. To be taken from <IMG SRC> tags, to be more exact.

I have several code snippets like this:

Code:
<img src="http://dummy.site.com/here/images/09/10065/file01.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file02.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file03.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file04.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file05.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file06.jpg" width="64" height="100" alt="image title" />
image name
And I need an automated way to extract just the URLs, and save them on a TXT file like this:

Code:
http://dummy.site.com/here/images/09/10065/file01.jpg
http://dummy.site.com/here/images/09/10065/file02.jpg
http://dummy.site.com/here/images/09/10065/file03.jpg
http://dummy.site.com/here/images/09/10065/file04.jpg
http://dummy.site.com/here/images/09/10065/file05.jpg
http://dummy.site.com/here/images/09/10065/file06.jpg
One URL per line.

The code snippets are not too big, just a bit over 100 entries for the bigger ones. I don't care if I have to do it one TXT at a time. Beats doing the whole thing by hand.

This is the sort of thing that makes me mad for not being a programmer! Any one of you guys could probably come up with a number of ways to pull this off in just a couple of minutes.

And I'm quite sure the tools to pull it off are already out there, but trying a search for it... well, let's just say there's way too much out there, and installing small random apps is really not safe.

I may be completely wrong, but I think I was able to feed code like this to flashget, and he'd just go through the whole thing and listed the actual URLs it found on a confirmation box, allowing me then to select just a few and copy them to the clipboard, in the exact same one-URL-per-line format I need here. But somehow my flashget installation got screwed and now I can't figure out what version I was using. Already tested 4 different ones and none of them seems to be able to do that.

I need those URLs in that format so I can then batch replace URL segments and, finally, feed the updated URLs to flashget. But the first step is extracting the initial URL from that code.

So, any ideas?


Thanks.


PS: hope I'm not screwing up but posting this here, but I really couldn't find a better match... And it IS HTML related, I guess.
whopub is offline   Reply With Quote
Old 11-30-2009, 12:13 PM   PM User | #2
Jack Corzine
New to the CF scene

 
Join Date: Nov 2009
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Jack Corzine is an unknown quantity at this point
why not open it up in a text editor and use the search and replace utility? Just put in search for <img src=" and replace it with an empty space, then search for the ending string and do the same thing?

Jack
Jack Corzine is offline   Reply With Quote
Old 11-30-2009, 12:48 PM   PM User | #3
Rowsdower!
Senior Coder

 
Rowsdower!'s Avatar
 
Join Date: Oct 2008
Location: Some say it's everything.
Posts: 1,255
Thanks: 9
Thanked 223 Times in 219 Posts
Rowsdower! will become famous soon enoughRowsdower! will become famous soon enough
You could do this with javascript or PHP if you have a web host that supports PHP.

If you're looking for customized code to be built for you then this thread is probably most appropriately placed in the paid work forum.

If you want to learn to do it yourself and be guided then by all means make an effort of your own and we will help you sort out the issues you run into. The logic involved with this would be pretty simple.
__________________
»» Welcome To odings Forum! ««
See Mediocrity in its Infancy || Seek and you shall find... basically:
free web tutorials | free hosting (1) (2)| view your page cross-browser/cross-platform
It's usually a good idea to start out with this at the VERY TOP of your CSS:
* {border:0;margin:0;padding:0;}
Rowsdower! is offline   Reply With Quote
Old 11-30-2009, 02:29 PM   PM User | #4
whopub
New Coder

 
Join Date: Apr 2006
Posts: 29
Thanks: 1
Thanked 0 Times in 0 Posts
whopub is an unknown quantity at this point
Unhappy

Hey guys.

Quote:
Originally Posted by Jack Corzine View Post
why not open it up in a text editor and use the search and replace utility? Just put in search for <img src=" and replace it with an empty space, then search for the ending string and do the same thing
That would be my default approach, but the end string is always different because of the ALT tag and text, which are always different!

Quote:
Originally Posted by Rowsdower! View Post
You could do this with javascript or PHP if you have a web host that supports PHP.
My webhost supports PHP, and I'll take any solution, it's just that I'm sure there are freely available ready-made solutions out there. From website 'suckers' to download managers, or even html tag strippers that can be costumized. But, for sure, there's gotta be something just goes through random text and collects just the HTML links. I just need to be sure about one, so I don't end up installing 10 or more before I get the right one.

Quote:
Originally Posted by Rowsdower! View Post
If you're looking for customized code to be built for you then this thread is probably most appropriately placed in the paid work forum.
I have a friend who can probably do it for free in just a few hours, it's just that the tools I need already exist. And he's a busy guy.

Quote:
Originally Posted by Rowsdower! View Post
If you want to learn to do it yourself and be guided then by all means make an effort of your own and we will help you sort out the issues you run into. The logic involved with this would be pretty simple.
My coding skills are zero. I do HTML and CSS, that's about it. Calling that coding is almost like elevating paper airplane throwing to space exploration...

Last edited by whopub; 11-30-2009 at 03:17 PM.. Reason: spelling
whopub is offline   Reply With Quote
Old 12-01-2009, 08:08 PM   PM User | #5
whopub
New Coder

 
Join Date: Apr 2006
Posts: 29
Thanks: 1
Thanked 0 Times in 0 Posts
whopub is an unknown quantity at this point
Even an app that goes through text and just extracts all words starting with http will do (the " can easily be removed later.

But still, there must be apps out there to suck URLs out of text files. Anyone?
whopub is offline   Reply With Quote
Reply

Bookmarks

Tags
extract, html, parse, url

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 04:31 AM.

Home - Contact Us - Archives - Link to CF - Resources - Top 

Powered by vBulletin® Version 3.8.2
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.