How to extract specific text from pdf in java

How to extract specific text from pdf in java. The system is open-source and provides a simple baseline function for extracting text from primary research articles using rules that developers can customize. Dec 26, 2017 · I am trying to extract data from a specific rectangular region specified by two coordinates given inside a PDF. jar filepathToPdf pageToExtract x y width height. Jan 20, 2020 · How can I extract specific information from a pdf file and save it as a table in a xlsx file? For example, I want to take specific information from a report of a company like the release date of th In addition to text and hyperlinks, PDFBox provides the provision to extract images from a document. Dec 8, 2015 · With Gnostice PDFOne (for Java) library, you can directly perform the search in the PDF using a regex (another regex, one created for email addresses, link given above). then ( function (pdf) {. *; import java. 3. (I know PDFBox nicely extracts the text in a specific rectangular area of a page. PdfDocument doc = new PdfDocument(); doc. Call writeText method. You could only use heuristics if you know what to expect. May 29, 2015 · This tutorial shows you how to extract text from a PDF file in simple steps using JPedal Java PDF library. Jul 21, 2011 · Essentially, a very unorganized group of PDFs. pdfbox. listdir(): if PDF_file. But there is no magic solution that will work for all documents. I have limited programming experience and virtually none with Java (used C++ when I was younger). I have used PDFTextStream, but it doesn't work with this file. matcher(text); while (m. Pdf can be used to extract images from PDFs, too. 7. int i; // page no. How to extract content from a PDF using java. Solution. Download the file with the extracted text. You merely have to check whether the image or text which you expect can be found by these classes positioned Extracting text from a PDF document is a common requirement. Please feel free to comment, as this is my first post. file. Please read javadoc on InvalidPdfException. sax. substring (<start index>, <end index>) additionally, you can use the method. nio. itextpdf. Look at the samples referenced from the keywords PARSING PDF > EXTRACTING IMAGES and PARSING PDF > EXTRACTING TEXT. Apr 20, 2015 · The coolest way to extract bookmarks, is by creating an XML file that shows the bookmarks in a nice hierarchical way: PdfReader reader = new PdfReader (src); List<HashMap<String, Object>> list = SimpleBookmark. the Sep 6, 2010 · 6. import java. fromFile () method. Create an instance of AsposeOcr class to apply OCR to an image. The following code shows how to parse PDF document and get values from the filled PDF form Jul 26, 2016 · Extract images from pdf file using PDFBox. You may try Algodocs: algodocs. This library provides PDFTextStripper class which is used to strip text from PDF files. parser. pdf'): for page Oct 1, 2014 · PDFTextStripperByArea stripper = new PDFTextStripperByArea(); stripper. i'm having some troubles using Apache TIKA (version 1. js. It takes two parameters: the starting index (inclusive) and the ending index (exclusive) of the desired substring. But as you do not share the pdf in question, it is hard to tell Jan 10, 2015 · The result of PDFTextStripper is plain text. layout import LTTextContainer, LTChar,LTLine,LAParams import os path=r'path\whereyour pdffile' os. group(2)); but without seeing the information you want to save I couldn't help you with getting it. one way is you can use window. text from both columns in the same line) this is the code: Jul 24, 2023 · Add a comment. If you don't want that then you should use a calculation script for Text2, instead. // Import Mar 28, 2013 · You can set parameters on the PDFTextStripper to read particular pages: PDDocument doc; // document. For PDF files, PDFBox should be able to give you some page stuff. groupdocsdev. load(file); PDFTextStripperByArea stripper = new PDFTextStripperByArea(); Nov 16, 2016 · Here's a shorter (not necessarily better) version: async function getPdfText(data) { let doc = await pdfjsLib. Set page boundaries (from first page to last page) to strip text and call the method writeText. Extract text from PDF using Java. Open the PDF using Document class and call the Accept method of the Pages collection. We need to extract the value of Invoice Number, Due Date and Total Due from the whole PDF file. Break & Mix − With the help of PDFBox, you can divide an individual PDF document into multiple documents, and mix them back into a single document. The TextPosition class has getXDirAdj () and getYDirAdj () methods which transform coordinates according to the direction of the text piece the respective TextPosition object represents (Corrected based on comment from @mkl) The final output is consistent Feb 12, 2016 · I am using iText to extract some text from a pdf file at a specific location. getResources () method of PDPage class gives you the list of all resource objects (like images Sep 2, 2015 · 9. Following is a step by step process to extract text line by line from PDF. *; import com. Initialize AsposeOcrPdf object to read text from the PDF. You'll need to work with the underlying libraries - Tika doesn't do anything at the page level. String originalString = "Hello, world!"; int startIndex = 7; int endIndex = 12; Feb 18, 2017 · for getting a part of a string, use the method. Say, if there is a single word, whose letters are each presented with a different font, then each letter would be a separate Oct 28, 2016 · I need to extract text with its coordinates using c# i am using pdfboxnet using c# and here it is class MyTextStripper : PDFTextStripper { protected override void processTextPosition(TextPos To resolve this I used PDFTextStripperByArea and using coordinates I extracted the data column by column for each row. Each e_text element represents a text run, which represents a sequence of text glyphs that use the same font and graphics attributes. com. indexOf (<character>) in order to get the index of a given character. Upload the File. load("Input_Docs\\input_doc. public class GetCharLocationAndSize extends PDFTextStripper {. Mar 10, 2021 · I have working code which extracts text from pdf with the size of the font. OCR in your project by specifying Maven Repository configurations. Paths; I need to extract text (word by word) from a pdf file. Yes, you can extract annotations from a specific page by accessing that page’s annotations using Aspose. Add a page to PDF using PdfDocument. extractRegions(docPage); String textForRegion = textStripper. this video might help: Pratik_Wavhal (Pratik Wavhal) July 9, 2020, 7:03am 8. load(file); PDFTextStripper pdfTextStripper = new PDFTextStripper(); String text = pdfTextStripper. The code is simple. I am using pdfbox to extract text from pdf files. There are differences, but you can define rows, cells in rows, column headers, row headers, etc. FileWriter; public class ExtractText { public static void main (String [] args) throws Exception { //Load the PDF file PdfDocument pdf = new PdfDocument (); pdf. endswith('. Installation pip install pymupdf pip install fitz. Solution 1: String[] lines = myString. That means each page is just an image. getPageN(i); PdfIndirectReference ir = d. ParseContext; import org. Document Headings and Sub headings. find()) System. util. Let me try itextpdf. Nov 19, 2018 · Hi, In this video, we will include PDFBox library in our java project, after including library we will write code to extract text from pdf file. What other OCR libraries are capable of doing it? Please Help. Fill Forms − With the help of PDFBox, you can fill the application data in a document. Operator; import Oct 26, 2014 · PDFBOX, Reading a pdf line by line and extracting text properties. From the Maven repository, configure Aspose. I have tried to convert the pdf to excel/text using tabula library. May 2, 2011 · Is there any way to extract the text of a specific region using ICEpdf? I was able to extract whole pages, but that's not what I want to do. It is problem also in second solution, that is why I am looking for a PDFBox based solution. Nov 30, 2012 · public LineSegment getBaseline(); // the baseline for the text (i. Oct 6, 2015 · In general. extract() method will extract all text from a specified page without needing to specify a certain extract option. with help of PDfminer, I have achieved this job. read the whole content and splitting the required content or finding the text by it's exact position. Desired result Sep 30, 2020 · How to extract some of the specific text only from PDF files using python and store the output data into particular columns of Excel. Specify the image size and position in the page. I want to get the same result like in the picture below. addTable () method. To extract the text from the pdf, we need to follow Jun 4, 2023 · This object represents the entire PDF document and serves as a starting point for text extraction. Provide your attempts and if you have a certain problem post a specific question. Thank you. tika. extractRegions(page); System. pdf" , null ) ; // get the first page PDFPage page = pdfDoc. My goal is to extract the text of the PDF files anyway. Sep 18, 2020 · I need my programm to move pdf without text in other folder, but now I'm getting just an empty txt file. Apr 22, 2021 · 1 Correct answer. ) using PDFBox, you instantiate a PDFTextStripper or a class derived from it and use it like this: PDFTextStripper stripper = new PDFTextStripper(); String text = stripper. promise; let pageTexts = Array. BufferedWriter; import java. But for now, in this article, we are going to concentrate on PDF (Portable Document Format) files. the line that the text 'sits' on) public LineSegment getAscentLine(); // the ascentline for the text (i. com get coordinates of text in pdf java: Parsing PDFs Part 1 ( iText 5) - In Depth Tutorials and Information get coordinates of text in pdf java: Example of using Java & PDFBox to extract text positions from a java read pdf to Apr 22, 2013 · LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance where needed). You’ll also learn how to extract specific sections or keywords, making your PDF data easily accessible and usable in your Java applications. PDF for Java has a tool to make it easy to retrieve tables. – An element of type e_text directly corresponds to a Tj element in the PDF document. Hi @muthu. loadFromFile ("sample. 2. FileInputStream; import org. Initialize the Parser object with the target PDF form. Jul 8, 2020 · You can extract any specific field , text or tble from PDF using UiPath document understanding and intelligent OCR activities. out. getText (int rowNum, int colNum) method. Traverse the collected data to get the desired field values. classes; import java. pdf Dec 30, 2021 · Steps to Extract Text from Scanned PDF in Java. Aug 21, 2022 · For the solution which converts PDF to Text by page, use the following code snippet instead. Print − With Our tutorials guide you through the process step by step, teaching you how to extract text from PDF pages, handle complex formatting, and even work with encrypted PDFs. substring (3,7); Here 3 specifies the beginning index, and 7 specifies the stopping point. For the above list I could imagine its only Jul 18, 2012 · Matcher m = p. Is it possible to do this in a PDF or would I have to convert it into a image and use OCR? If so, does PDFBox or iText include a way to analyze images via OCR? Thanks! Following is a step by step process to extract coordinates or position of characters in PDF. io. Image 2:- Extract text from PDF preview. split(System. getDocument ( data ). After extracting it, therefore, it is too late. -1. close (); Share. I tried changing line to line=scanner. getField ("Text2"). metadata. Extract Text from PDF Documents using Java. toString () provided by the parent class. This site is supposed to be used to manage specific programming problems. getPages (). this can help you find index values for your string if you don't know them or want to use a character as the reference instead. PDF for Java? Aspose. Aug 27, 2019 · I want to extract specific details from the pdf that has a table format and want to save it to excel using python. value. Extract text content: Create an instance of PDFTextStripper and set any desired configuration Jul 17, 2020 · Effective screenshot: Extract text from a particular page of PDF: import com. getDocument({data}). Add the reference to Aspose. value = event. Asking for help, clarification, or responding to other answers. PDF for Java provides the PdfTextExtractor class to extract text from a searchable PDF and the PdfTextExtractOptions class to manage the extract options. . If you are trying to automate a test case where you want to fetch a text fro May 31, 2018 · Extract Text − With the help of PDFBox, you can extract Unicode text from PDF documents. Dec 9, 2020 · The following simple steps for how to extract field values from PDF form. <your string>. The string returned contains all the characters from the beginning index, up to, but not including, the ending index. Loop Feb 16, 2015 · Your code prints as supposed, but it creates a new line with every word instead of the phrases. exportToXML (list, new FileOutputStream (dest), "ISO8859-1", true); reader. PDFStreamEngine; import org. Apache Tika is basically a toolkit that extracts data from many types of documents, including PDFs. Decide which pages to be analyzed and apply visit to the desired pages. Aug 26, 2017 · In this video, we will learn about how to read PDF document in selenium commands. Explore our fully-featured PDF editor that lets you modify your PDFs, create Jul 2, 2013 · The following example would extract all the text only from the first page of the PDF: /**. getBookmark (reader); SimpleBookmark. println(m. println(textForRegion); } } Run it from command line, ex: xxx. I read the pdf document as follows. Extracting text from a pdf file using Java is quite easy using the Apache PDFBox Java library. File file = new File("fileName. tabula. OCR in your project to read scanned PDF text. Or you can explore more like itextpdf, etc. 5. * @param {PDFDocument} PDFDocumentInstance The PDF document obtained. try (FileWriter writer = new FileWriter(outputFile)) {. PDFTextStripper reader = new PDFTextStripper(); reader. Loop through the tables in the array. Load an image using PdfImage. text. edited Mar 1, 2013 at 9:55. getSelection () to get the co-ordinates of the text selected and using toString () to convert it to the text format. So from your img you want to extract Full Name, Reg Number, Mobile Number, Jul 9, 2020 · How can I search for text and get position in pdf with java ? I tried with apache pdfbox and pdfclown but whenever the text goes down or start a new paragraph, it doesn't work. In this example, we are going to use the following input PdfWithTable. In this example, you’ll see how Aspose. * Retrieves the text of a specif page within a PDF Document obtained through pdf. extractTable (int pageIndex) method. In case of the PDFTextStripper you have to override. I've done it successfully with the following code. I got some PDF files which are just scanned pieces of paper. Surely this is possible. You need to code this way: . Provide details and share your research! But avoid . 0. I am new with pdf files. Choose or drop the PDF file from which you would like to extract text. Script i have used so far: Jan 23, 2018 · Since the positons of the desired text keep changing for different files, I can't use the ordinary approaches i. ng2-pdf-viewer: get coordinates of highlighted text. 10). chdir(path) Extract_Data=[] for PDF_file in os. Create a Java Class and extend it with PDFTextStripper. Jan 8, 2024 · In this article, we learned two different ways of reading PDF files in Java. setStartPage(i); reader. It offers functions to perform a text search and get the search results: To specify the search pattern and options, use functions TextSearch. getAsIndirectObject(PdfName. String s = "ADACADABRA"; String s2 = s. The data needs to be parsed and assembled from many sources to create the text in a PDF file. Right now, i can extract data by taping the area coordinations using this code: I need to extract text from pdf files using iText. Jun 24, 2009 · (Question answered in the comments. PDFParser Java provides an in-built package that provides a class PDFParser, which parses the contents of PDF documents. Instantiate the DocumentRecognitionSettings class object for setting the recognition parameters. The end result would be a large text file containing the contents of the field within each individual PDF. As you talk about text extraction with PDF Clown, I assume you are using the TextExtractor class of that library. * @param {Integer} pageNum Specifies the number of the page. tmwanik. But indeed, text from broken pdfs in particular is likely to fail with it. Image 1:- Input file for PDF to Text extraction. separator")); Solution 2: Apr 16, 2016 · I have a pdf file (some part of it given below), and want to extract text from it. This library works only with PDF files. SetStartPage (only useful for a text That gave me all the images, but the images were in the wrong order. apache. To extract text from all the PDF pages: Create an object of the TextAbsorber class. Both libraries offer simple and effective APIs for extracting text from PDF documents. Metadata; import org. If I wanted a new PDF I would put the information into Microsoft Word or Google Docs and save a PDF. The data Apr 28, 2019 · Text Search. Firstly, upload the PDF document to the cloud using the code example given below: package com. Create a PdfDocument object. There are multiple ways of doing this. pdf) Link to the full PDF file File. getTextContent (). Or it does not contain an FDF Form. PDF for Java. Save extracted text to a text file using FileWriter object. Add validation code for parameters and some usage info. Use form field extractor. I am trying to extract text coordinates and line (or rectangle) coordinates from a PDF. Extract the text from PDF and then validate if the text you are looking is present in the PDF document or not. PDFJS. . pdf. getTextForRegion("region"); System. The problem is: some pdf files contain 2 columns and when I extract text I get a text file where columns are merged as the result (i. addRegion(regionName, region); stripper. jar ExtractText [OPTIONS] <inputfile> [output-text-file] Options: -password <password> : Password to decrypt document -encoding <output encoding> : UTF-8 (default) or ISO-8859-1, UTF-16BE, UTF-16LE, etc. To extract table data, you should perform the following steps: Open document - instantiate a Document object; Create a TableAbsorber object. BodyContentHandler; public class Mar 14, 2019 · 3. Refer to the code example in the article for details. pdf"); In this tutorial we will now see to validate Print PDF functionality by using below two ways. – Jan 1, 1997 · This question deals about doing your work. pdf"); // Obtain all website addresses in page 2. PDFParser; import org. Take your PDF files to another level with PDF Architect. Sorted by: 0. My next attempt looked like this: PdfDictionary d = reader. PDDocument document = new PDDocument(). See Question with no answers, but issue solved in the comments (or extended in chat)) @WeloSefer wrote: maybe this can help you get started Jun 30, 2016 · If you are going to go for the second option then you would need to extract coordinates of the text also. To extract text (with or without extra information like positions, colors, etc. Set start page and number of pages in the PDF for Mar 1, 2013 · answered Mar 1, 2013 at 9:55. Jun 6, 2023 · Here’s how to do it: Upload the PDF file to the cloud. However, since the image rendering works a lot better in ICEpdf, I'd like to use that library. Change PdfReader pr=new FdfReader("C:\\Users\\saviour\\Desktop\\doc308-999. Aug 23, 2019 · How to extract text from PDF document using Apache PDFBox API in Java. What I'd like to do is parse each file, pulling the contents of one specific field, and dumping the output to a text file. from pdfminer. Extend PDFTextStripper. pop (); The code above will work, but will allow the user to overwrite that value (unless Text2 is set to read-only). getText(document); writer. 1. Alternatively, if all you want is to to "extract all the Jul 9, 2018 · I tried to get all the lines from file in an array, but it is unstable, because unable to filter to a specific text. m. pdf"); PDDocument document = PDDocument. spire. Document images. Pdf library may be used to extract text from PDF files as plain text or as a collection of text chunks with coordinates for each chunk. What other types of annotations can I extract using Aspose. File; import java. PDF for Java allows extracting text from all the pages of a PDF document. This is provided that you have a fixed format pdf. *. 4. NoSuchElementException" that seems to be on that same line. This class offers numerous attributes helping to restrict the parsing area: public void setAreas(List<Rectangle2D> value); public void setAreaTolerance(double value); public void setAreaMode(AreaModeEnum Oct 13, 2020 · Extract and Strip Text From PDF in Java Example. I also get an exception "java. My tesseract is set up correctly and extracting JPG and PNG files works like a charm. getTextForRegion("region")); For most documents this code will extract no text because it looks at a small (10x10 pt) region in the upper left region of the second May 21, 2013 · 1 Answer. Docotic. Then you need to think on how to use those information to get the details you need. Following is the program to extract content from a PDF using java. add () method. Call the parseForm method to get all the data from the PDF form. Feb 6, 2014 · iText offers the classes of the parser package which allow extraction of text and/or bitmap images from content streams. Back. The tabular data will be scanned, and the result will be saved in Feb 1, 2023 · Extract Text from PDF using Java# Let’s explore the details to extract text from PDF using Java cloud SDK. We used iText and Apache PDFBox libraries to extract text from a sample PDF file. with many pdf's. pdf Apr 12, 2022 · The following code sample shows how to extract text from a PDF file using Java. Jun 28, 2021 · 1. Share. Jul 16, 2021 · The specified text can be retrieved using the method ContentHandlerDecorator. You can override writestring method of pdfTextStripper class (you can google about this). ) Mar 15, 2016 · Here is a sample code to extract text contained within a rectangle, at a specific position on a PDF page. To extract the data from a PDF, all you need to do is create a Parser class that implements the Parser interface and define a parse () method: public void parse(. g. DOI, ISBN, ISSN, PUI etc. operator. To create a table in Word, you use the Section. this. iText in Action contains a good overview of the limitations of text extraction from PDF, regardless of the library used (Section 18. One of my seniors instructed me to try with PDFTableExtractor class. println("Region is "+ stripper. split (" "). In order to do that I am using the LocationTextExtractionStrategy: Dec 29, 2021 · Steps to Extract Text from Image using Java. As usual, the complete source code for the examples is available over on GitHub. Our Feb 20, 2014 · There is a possibility that the pdf file you are trying to load is either not a pdf file. group(1) + ": " + m. Download jar file ; java -jar pdfbox-app-2. – Apr 15, 2015 · You can try Apache Tika. For Word, HWPF and XWPF from Apache POI don't really do page level things - the page breaks aren't stored in the file, but instead need to be calculated on the fly based on the text Aspose. But you can override certain methods of it and only let through text which is formatted according to your wishes. Oct 22, 2021 · Loop through the pages in the PDF, extract tables from each page into a PdfTable array using PdfTableExtractor. 2: Extracting and editing text), and a convincing explanation why the library Extract Words from PDF Document. Foxit PDF SDK provides APIs to search text in a PDF document, a XFA document, a text page or in a PDF annotation’s appearance. What you get in each step is a promise. Apr 28, 2011 · 5. Use Substring: To extract a substring from a string, you can use the substring () method. Feb 7, 2017 · I followed some previous advice on StackOverflow here (Extract specific parts of PDF documents) to extract data from PDFs. Extract Text from Specific Page of a PDF Document using Java# You can parse a PDF document and extract text from a specific page by following the simple steps mentioned below: Firstly, load the PDF file using the Parser class. Wait a few seconds while the text is being extracted. InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException {. CONTENTS); PdfObject o = reader. Tagged is really powerful, but unfortunately most of the PDFs found in the wild aren't tagged. PDF for Java supports various annotation types, including text annotations, highlight Nov 15, 2023 · After that, you can get the text value of a specific cell using the PdfTable. PDFBox Tutori Apr 6, 2017 · I am attempting to use Apache Tika™ to extract the following information from PDF documents. Disclaimer: I work for Bit Miracle. getText(document); (There are a number of PDFTextStripper attributes allowing you PdfBox 2. This sample is using Qoppa’s jPDFProcess library : // Load the PDF document PDFDocument pdfDoc = new PDFDocument ( "C: \\ doc. – Apr 21, 2022 · The following are the main steps to insert an image to a new PDF document in Java. You can try the following code change and get expected result. This will capture your API key and file input parameters in the same way as the first: Java. import org. protected void processTextPosition( TextPosition text ) Aug 8, 2022 · The new pdf. js has a function: page. List of Authors (Not just the lead Author) The document Abstract. iText has more low-level support for text manipulation, but you'd have to write a considerable amount of code to get text extraction. PDFParser parser = null; String text = ""; PDFTextStripper stripper = null; PDDocument pdoc = null; COSDocument cdoc = null; File file = new File("path"); Problem Description. SetPattern, TextSearch. getPage ( 0 ) ; // Define your position rectangle Oct 18, 2016 · textStripper. nextLine() but it does not print anything. the line that represents the topmost extent that a string of the current font could have) public LineSegment getDescentLine(); // the descentline for the text (i. Here is the sample input PDF file (File. 3 has a command line tool as well. **/. You can get the text content from that. (However it worked with other file, that has simple text). Efficiency and Precision. Now let’s try to understand the above specified code snippet: Apr 10, 2017 · PDF doesn't have a feature to extract titles, unless it's a tagged PDF, and it almost never is, and PDFBox can't extract these anyway. getProperty("line. from If using a specific strategy * is important, use {@link PdfTextExtractor#getTextFromPage(PdfReader, int, TextExtractionStrategy)} * @param reader the reader to extract text from * @param pageNumber the page to extract text from * @return the extracted text * @throws IOException if any operation fails while reading from the provided PdfReader Aug 9, 2019 · Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question. Oct 17, 2016 · Back then itext text extraction essentially was merely a proof of concept, not ready for production use. 1,6431420. The code i'm using looks like that (don't How to extract text from PDF files. Why use a third party library to handle PDF files? PDF files are a very complex binary/text hybrid data structure. Jul 10, 2015 · You can think of a table inside a tagged PDF as if it were an HTML table. -console : Send text to console instead of file -html : Output in HTML format instead of raw text Jun 4, 2015 · I am working on a program that extract texts from a PDF file in a specific area, I am using java and iText library. It works perfectly for PDFs with tables that even span to hundreds of pages. contentstream. e. By default, the PdfTextExtractor. high_level import extract_pages from pdfminer. List of citations. The question is whether it can be done pdfbox example code how to extract text from pdf file with java: PDFBox – How to read PDF file in Java – Mkyong. write(text); Dec 2, 2022 · Spire. getText(doc); As far as I'm aware, PDPage is more used with representing a page onscreen, rather than Mar 8, 2024 · Extracting text from a PDF file using the PyMuPDF library. pdf file. Read image using RecognizePage method for text extraction. then ( function () {}) to proceed to the next step. A very simple way without using any third party libraries. All associated document Identifiers, e. getPdfObject(ir. getNumber()); PdfStream stream = (PdfStream) o; // rest from example above. setEndPage(i); String pageText = reader. This library can be included using Gradle, maven, and other builds systems from the Maven repository. PyMuPDF is a Python library that supports file formats like XPS, PDF, CBR, and CBZ. ax nt wn vb bm mk dt zm hp dc