<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: PDFBox : Extract Text from PDF</title>
	<atom:link href="http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/feed/" rel="self" type="application/rss+xml" />
	<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/</link>
	<description>/home/santhosh</description>
	<lastBuildDate>Tue, 03 Apr 2012 15:32:44 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Prakash</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-18949</link>
		<dc:creator>Prakash</dc:creator>
		<pubDate>Tue, 20 Sep 2011 06:50:28 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-18949</guid>
		<description>the output for indian languages is unrecognizable or not recognizable, I made a mistake in writing above sentence. Thanks.</description>
		<content:encoded><![CDATA[<p>the output for indian languages is unrecognizable or not recognizable, I made a mistake in writing above sentence. Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Prakash</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-18948</link>
		<dc:creator>Prakash</dc:creator>
		<pubDate>Tue, 20 Sep 2011 06:48:56 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-18948</guid>
		<description>Thanks for nice explanation of this utility. I tried this and got very good results with the English text, but when it came to extracting unicode/Indian language text from PDF  the out put was recognizable in many of the well known fonts. This must be a well known issue, can you please suggest me a solution?</description>
		<content:encoded><![CDATA[<p>Thanks for nice explanation of this utility. I tried this and got very good results with the English text, but when it came to extracting unicode/Indian language text from PDF  the out put was recognizable in many of the well known fonts. This must be a well known issue, can you please suggest me a solution?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: NAB</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-18883</link>
		<dc:creator>NAB</dc:creator>
		<pubDate>Mon, 19 Sep 2011 08:23:48 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-18883</guid>
		<description>Hi Everyone,

I have the same problem with Patrik&#039;s:

org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.util.PDFStreamEngine.(PDFStreamEngine.java:125)
at org.apache.pdfbox.util.PDFTextStripper.(PDFTextStripper.java:120)
at PDFTextParser1.pdftoText(PDFTextParser1.java:33)
at PDFTextParser1.main(PDFTextParser1.java:56)
Caused by: java.lang.ClassCastException: org.pdfbox.util.operator.ShowTextGlyph cannot be cast to org.apache.pdfbox.util.operator.OperatorProcessor
at org.apache.pdfbox.util.PDFStreamEngine.(PDFStreamEngine.java:119)
… 3 more

Does anyone know how to solve this? I am running on windows xp. Thank you very much in advance for helping me sort this out.</description>
		<content:encoded><![CDATA[<p>Hi Everyone,</p>
<p>I have the same problem with Patrik&#8217;s:</p>
<p>org.apache.pdfbox.exceptions.WrappedIOException<br />
at org.apache.pdfbox.util.PDFStreamEngine.(PDFStreamEngine.java:125)<br />
at org.apache.pdfbox.util.PDFTextStripper.(PDFTextStripper.java:120)<br />
at PDFTextParser1.pdftoText(PDFTextParser1.java:33)<br />
at PDFTextParser1.main(PDFTextParser1.java:56)<br />
Caused by: java.lang.ClassCastException: org.pdfbox.util.operator.ShowTextGlyph cannot be cast to org.apache.pdfbox.util.operator.OperatorProcessor<br />
at org.apache.pdfbox.util.PDFStreamEngine.(PDFStreamEngine.java:119)<br />
… 3 more</p>
<p>Does anyone know how to solve this? I am running on windows xp. Thank you very much in advance for helping me sort this out.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: gayathri</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-18068</link>
		<dc:creator>gayathri</dc:creator>
		<pubDate>Mon, 05 Sep 2011 13:13:37 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-18068</guid>
		<description>I have to extract only lines data(position, thickness ,width,height)  from pdf to text file if u people know do needfull to me</description>
		<content:encoded><![CDATA[<p>I have to extract only lines data(position, thickness ,width,height)  from pdf to text file if u people know do needfull to me</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: vinz</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-15425</link>
		<dc:creator>vinz</dc:creator>
		<pubDate>Sun, 17 Jul 2011 08:03:01 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-15425</guid>
		<description>its awesome..

Thanks for sharing..</description>
		<content:encoded><![CDATA[<p>its awesome..</p>
<p>Thanks for sharing..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dazzle</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-10841</link>
		<dc:creator>dazzle</dc:creator>
		<pubDate>Tue, 29 Mar 2011 07:52:28 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-10841</guid>
		<description>Thank you. All except this pdf did not work for me. can you please check what is the issue here
http://cid-a3aa7f7d9888874d.office.live.com/self.aspx/Public/getting%5E_started%5E_with%5E_Flex3.pdf</description>
		<content:encoded><![CDATA[<p>Thank you. All except this pdf did not work for me. can you please check what is the issue here<br />
<a href="http://cid-a3aa7f7d9888874d.office.live.com/self.aspx/Public/getting%5E_started%5E_with%5E_Flex3.pdf" rel="nofollow">http://cid-a3aa7f7d9888874d.office.live.com/self.aspx/Public/getting%5E_started%5E_with%5E_Flex3.pdf</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nik</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-10795</link>
		<dc:creator>nik</dc:creator>
		<pubDate>Sun, 27 Mar 2011 17:10:46 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-10795</guid>
		<description>Thanks, really works!

I&#039;ve tried to extract text from a particular region within a pdf with current pdfbox build presented on apache websit and had no luck....hope to do this with your jar file)</description>
		<content:encoded><![CDATA[<p>Thanks, really works!</p>
<p>I&#8217;ve tried to extract text from a particular region within a pdf with current pdfbox build presented on apache websit and had no luck&#8230;.hope to do this with your jar file)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Best hosting service</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-9795</link>
		<dc:creator>Best hosting service</dc:creator>
		<pubDate>Thu, 03 Mar 2011 09:37:15 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-9795</guid>
		<description>I have used your program and its work for me really. Thanks for sharing this code. And hope to see more these type of useful code.</description>
		<content:encoded><![CDATA[<p>I have used your program and its work for me really. Thanks for sharing this code. And hope to see more these type of useful code.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sakil Imran</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-9607</link>
		<dc:creator>Sakil Imran</dc:creator>
		<pubDate>Sat, 26 Feb 2011 22:42:43 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-9607</guid>
		<description>it really works. :) thanks for your work.</description>
		<content:encoded><![CDATA[<p>it really works. <img src='http://thottingal.in/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  thanks for your work.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: arlisa</title>
		<link>http://thottingal.in/blog/2009/06/24/pdfbox-extract-text-from-pdf/comment-page-1/#comment-9320</link>
		<dc:creator>arlisa</dc:creator>
		<pubDate>Sun, 20 Feb 2011 04:21:12 +0000</pubDate>
		<guid isPermaLink="false">http://thottingal.in/blog/?p=168#comment-9320</guid>
		<description>hi,
I&#039;ve tried the code, and its work, :) Then now I wanna ask about how to read a PDF file by it sections. I mean, if I have a PDF file and I don&#039;t wanna read all contents from it, Can you help how to read it by its sections? For example I just wanna get the content from Introduction section, or may be I wanna read only the table of content.
 Thank you.</description>
		<content:encoded><![CDATA[<p>hi,<br />
I&#8217;ve tried the code, and its work, <img src='http://thottingal.in/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Then now I wanna ask about how to read a PDF file by it sections. I mean, if I have a PDF file and I don&#8217;t wanna read all contents from it, Can you help how to read it by its sections? For example I just wanna get the content from Introduction section, or may be I wanna read only the table of content.<br />
 Thank you.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
