4WebHelp
 FAQ  •  Search  •  User Groups  •  Forum Admins  •  Smilies List  •  Statistics  •  Rules   •  Login   •  Register
Toggle Navigation Menu

 XML and HTML characters etc
Post New TopicReply to Topic
View Previous Topic Print this topic View Next Topic
Author Message
Darren
Team Member



Joined: 05 Feb 2002
Posts: 549
Location: London

PostPosted: Sat May 31, 2003 2:18 pm (14 years, 4 months ago) Reply with QuoteBack to Top

I'm using XML files to hold data for some product reviews.

I'm then parsing it with PHP and displaying the data using HTML.

There's a few things I was wondering about:

1. the use of HTML characters e.g •
I'd like to be able to use them within the text I'm storing but they don't get included in the final output. a   will even cause the text after it not to appear. Am I missing something or is this normal behaviour?

2. new lines
I don't seem to be able to store new lines in the normal way. e.g adding a couple of <br />'s is a no no as is leaving new lines with in the text and then using something like php's nl2br() function. At the moment I'm just using a | character as a new line delimeter and the using a ereg_replace to convert it to a <br />. Not ideal. Any thoughts?

3. putting links with in text
Similar to the new line issue - I can't use <a></a> so I'm using [a][/a] and using ereg_replace to change it. Is there a better way?

p.s. This may be better suited in server-side coding - wasn't sure Wink
OfflineView User's ProfileFind all posts by DarrenSend Personal MessageVisit Poster's Website
verto
Senior WebHelper
Senior WebHelper


Joined: 14 Jan 2002
Posts: 220
Location: Cambridge MA USA

PostPosted: Sun Jun 01, 2003 2:58 am (14 years, 4 months ago) Reply with QuoteBack to Top

Did you write your own DTD or schema, Darren? I.e., what or whose tag set are you using?

Without seeing your code, I'm guessing you may be confusing some capabilities of HTML with XML's. HTML tags such as <a> and <br> inside your XML, don't automatically take on the same meaning they have in HTML, and also won't automatically generate a tag with the same name in your output. The situation with HTML special characters is a bit similar, but there's actually a lot of related issues there that can interact to trip you up.

Maybe you can post a simplified version of your code. There're lots of alternatives when it comes to generating HTML from XML; what may sometimes be a shortcut can other times be the long way round.

________________________________
>>>>>>>>>>>>>
GENERAL DISCLAIMER:This disclaimer may be void where null in all cases unless explicitly not unprohibited or (p)re-exclusively assigned by sufficient presedimentation on behalf of every non-interested party to wit (or so it was said).
:::
.: :. . : :....: :.: .: :. verto .: :. . : :....: :.: .: :.
OfflineView User's ProfileFind all posts by vertoSend Personal Message
Darren
Team Member



Joined: 05 Feb 2002
Posts: 549
Location: London

PostPosted: Mon Jun 02, 2003 7:54 am (14 years, 4 months ago) Reply with QuoteBack to Top

Firstly, here is an example of the XML file...
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE product_review [
  <!ELEMENT details (product_type, product_name, product_description, product_image, product_rating, product_where, product_pros, product_cons)>
  <!ELEMENT product_type (#PCDATA)>
  <!ELEMENT product_name (#PCDATA)>
  <!ELEMENT product_description (#PCDATA)>
  <!ELEMENT product_image (#PCDATA)>
  <!ELEMENT product_rating (#PCDATA)>
  <!ELEMENT product_where (#PCDATA)>
  <!ELEMENT product_pros (#PCDATA)>
  <!ELEMENT product_cons (#PCDATA)>
  <!ELEMENT review (title, rating, body, reviewer, date)>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT rating (#PCDATA)>
  <!ELEMENT body (#PCDATA)>
  <!ELEMENT reviewer (#PCDATA)>
  <!ELEMENT date (#PCDATA)>
]>
<product_review>
  <details>
     <product_type>litter</product_type>
   <product_name>Bio-Catolet</product_name>
   <product_description>100% paper based cat litter</product_description>
   <product_image>litter_biocatolet.jpg</product_image>
   <product_rating>4</product_rating>
   <product_price>0.00 (5 litres 2.2Kg)|0.00 (12 litres 4.2Kg)</product_price>
   <product_where>Supermarkets, pet stores</product_where>
   <product_pros>&#149;100% paper|Minimal dust|Reduces odour</product_pros>
   <product_cons>Expensive if used as floor covering|Small pieces can easily leave cage</product_cons>
  </details>
  <review>
    <title>Review Title</title>
    <rating>4</rating>
    <body>I have used Bio-Catolet for all my rats and although it is a cat litter it is suitable for rats as it is 100% paper.||You can find it in Tesco's for about 4.99 for a 12 litre bag in the pet section.</body>
    <reviewer>Persons Name</reviewer>
    <date>30 May 2003</date>
  </review>
  <review>
    <title>Review Title?</title>
    <rating>4</rating>
    <body>I have found this in Tesco and Pets at Home for the same price, 4.19 for a 12 litre bag.</body>
    <reviewer>Persons Name</reviewer>
    <date>30 May 2003</date>
  </review>
</product_review>

And here is the php...
Code:
function startElementHandler($parser, $element_name, $element_attribs)
{
   global $xml_current_tag_state;
   $xml_current_tag_state = $element_name;
}

function endElementHandler($parser, $element_name)
{
   global $review_counter, $review_data, $xml_current_tag_state;
   $xml_current_tag_state = '';
   if($element_name == "REVIEW") $review_counter++;
}

function characterDataHandler($parser, $data)
{
   global $product_data, $review_counter, $review_data, $xml_current_tag_state;
   if($xml_current_tag_state == '')
      return;
   if($xml_current_tag_state == "PRODUCT_TYPE") $product_data["type"] = $data;
   if($xml_current_tag_state == "PRODUCT_NAME") $product_data["name"] = $data;
   if($xml_current_tag_state == "PRODUCT_DESCRIPTION") $product_data["description"] = $data;
   if($xml_current_tag_state == "PRODUCT_IMAGE") $product_data["image"] = $data;
   if($xml_current_tag_state == "PRODUCT_RATING") $product_data["rating"] = $data;
   if($xml_current_tag_state == "PRODUCT_PRICE") $product_data["price"] = $data;
   if($xml_current_tag_state == "PRODUCT_WHERE") $product_data["where"] = $data;
   if($xml_current_tag_state == "PRODUCT_PROS") $product_data["pros"] = $data;
   if($xml_current_tag_state == "PRODUCT_CONS") $product_data["cons"] = $data;
   
   if($xml_current_tag_state == "TITLE") $review_data[$review_counter]["title"] = $data;
   if($xml_current_tag_state == "RATING") $review_data[$review_counter]["rating"] = $data;
   if($xml_current_tag_state == "BODY") $review_data[$review_counter]["body"] = $data;   
   if($xml_current_tag_state == "REVIEWER") $review_data[$review_counter]["reviewer"] = $data;
   if($xml_current_tag_state == "DATE") $review_data[$review_counter]["date"] = $data;   
}

$fp = @fopen("reviews/".$product.".xml","r");

$review_counter = 0;
$review_data = array();
$product_data = array();
$xml_current_tag_state = '';

if(!($xml_parser = xml_parser_create()))
      die("Couldn't create xml parser!");

xml_set_element_handler($xml_parser,"startElementHandler","endElementHandler");
xml_set_character_data_handler($xml_parser,"characterDataHandler");

while($data = fread($fp, 4096))
{
      if(!xml_parse($xml_parser, $data, feof($fp)))
{
         break;
}

xml_parser_free($xml_parser);


This is all based on an example in the book "Beginning PHP4"

I realise that HTML tags won't have the same meaning within the XML and I can see that including them will just confuse the php functions that are parsing them. Understanding why the HTML special characters should cause it to break though is not quite so easy Rolling Eyes

Maybe I should take a look at a few more examples Very Happy but any ideas welcome
OfflineView User's ProfileFind all posts by DarrenSend Personal MessageVisit Poster's Website
Robert Wellock
WebHelper
WebHelper


Joined: 18 Jan 2002
Posts: 61
Location: Yorkshire - UK

PostPosted: Mon Jun 02, 2003 8:54 am (14 years, 4 months ago) Reply with QuoteBack to Top

Have you tried the alternatives, for example: &#x95; &#xA0; &#x20; or &#xD;

A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications.

<!ATTLIST element xml:space (default|preserve) 'defaultchoice'>

<!ATTLIST pre xml:space (preserve) #FIXED 'preserve'>

________________________________
};-) http://www.xhtmlcoder.com/
OfflineView User's ProfileFind all posts by Robert WellockSend Personal MessageVisit Poster's Website
Darren
Team Member



Joined: 05 Feb 2002
Posts: 549
Location: London

PostPosted: Mon Jun 02, 2003 11:37 am (14 years, 4 months ago) Reply with QuoteBack to Top

Robert Wellock wrote:
Have you tried the alternatives, for example: &#x95; &#xA0; &#x20; or &#xD;
They just don't appear, and neither does any normal text before them Sad

Robert Wellock wrote:
A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications.

<!ATTLIST element xml:space (default|preserve) 'defaultchoice'>

<!ATTLIST pre xml:space (preserve) #FIXED 'preserve'>

Assuming I implemented this in the correct way it has no effect. And again anything before a new line gets ignored.

Code:
<!ELEMENT details (product_type, product_name, product_description, product_image, product_rating, product_where, product_pros, product_cons)>
<!ELEMENT product_type (#PCDATA)>
<!ELEMENT product_name (#PCDATA)>
<!ELEMENT product_description (#PCDATA)>
<!ATTLIST product_description xml:space (default|preserve) 'preserve'>
<!ELEMENT product_image (#PCDATA)>
<!ELEMENT product_rating (#PCDATA)>
<!ELEMENT product_where (#PCDATA)>
<!ELEMENT product_pros (#PCDATA)>
<!ELEMENT product_cons (#PCDATA)>
<!ELEMENT review (title, rating, body, reviewer, date)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT rating (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ELEMENT reviewer (#PCDATA)>
<!ELEMENT date (#PCDATA)>
OfflineView User's ProfileFind all posts by DarrenSend Personal MessageVisit Poster's Website
Darren
Team Member



Joined: 05 Feb 2002
Posts: 549
Location: London

PostPosted: Mon Jun 02, 2003 2:20 pm (14 years, 4 months ago) Reply with QuoteBack to Top

I've come to the conclusion that my problems are occuring because I'm putting the data into an array.

If I echo the data straight out of my characterDataHandler() function I can see the special characters are working and white space is being retained.
OfflineView User's ProfileFind all posts by DarrenSend Personal MessageVisit Poster's Website
verto
Senior WebHelper
Senior WebHelper


Joined: 14 Jan 2002
Posts: 220
Location: Cambridge MA USA

PostPosted: Tue Jun 03, 2003 5:03 pm (14 years, 4 months ago) Reply with QuoteBack to Top

Darren wrote:
I've come to the conclusion that my problems are occuring because I'm putting the data into an array.

If I echo the data straight out of my characterDataHandler() function I can see the special characters are working and white space is being retained.

So did you fix the special chars, or are they still a problem?

The easiest way to get newlines in code you've shown is probably to define your own <br> tag. In XML you can name it pretty much anything, but I recommend a more unique name, like <xBR>, to remind you that it's not really HTML. The idea is to get something in the XML to trigger an event handler in the PHP. To actually handle the event, add another conditional to your characterDataHandler() and modify any others that need to enclose the newline. For example, if you want newlines inside your <body> element:
Code:
    // add this line:
   if($xml_current_tag_state == "xBR") $review_data[$review_counter]["body"] += $data;
    // modify this old line to:
  if($xml_current_tag_state == "BODY") $review_data[$review_counter]["body"] += $data;

Note changed assignment operator: +=. If you're validating your XML, you'll need to add to and modify element defs in DTD. You can do same with hyperlinks, defining your own tags to hold info to to build links. I'd recommend a separate output function to keep code cleaner and modularized. You can also parse attributes for link tag, but that's probably too much trouble if you're just toying around.

That's the general idea, and be aware that I didn't test the code above for mistakes.

CODE RED DISCLAIMER: All code snimpets above are unapproved by the FDA, NRA, IEEE, W3C, or Microsoft, are TOTALLY untested, and are not deemed fit for CPU consumption.

________________________________
>>>>>>>>>>>>>
GENERAL DISCLAIMER:This disclaimer may be void where null in all cases unless explicitly not unprohibited or (p)re-exclusively assigned by sufficient presedimentation on behalf of every non-interested party to wit (or so it was said).
:::
.: :. . : :....: :.: .: :. verto .: :. . : :....: :.: .: :.


Last edited by verto on Sat Jun 07, 2003 9:45 pm, edited 1 time in total
OfflineView User's ProfileFind all posts by vertoSend Personal Message
Darren
Team Member



Joined: 05 Feb 2002
Posts: 549
Location: London

PostPosted: Tue Jun 03, 2003 6:12 pm (14 years, 4 months ago) Reply with QuoteBack to Top

Cheers Verto, I will give all this a try when I get a chance.

I hadn't managed to fix the problem with the special characters or the white space. As I said it seemed to be the process of assigning the data to the array (or any variable).When I tried echoing it instead and the characters work and white space is retained. In fact most of the subsequent examples I have found online and in other books tend to work this way anyway - outputting the display directly from the 3 functions. Unfortunately my layout is a little to complex for this and I'd rather keep it seperate from the functions anyway.

I'd also much rather keep the data in the XML as 'clean' as possible without filling it with tags telling it how to display - surely thats the whole point of XML. Very Happy

In terms of adding stuff to the DTD it would appear that I have to first instruct Expat to use it as I believe it ignores it by default - can't remember where I read that Embarassed At the moment it makes no difference whether its there or not Sad
OfflineView User's ProfileFind all posts by DarrenSend Personal MessageVisit Poster's Website
verto
Senior WebHelper
Senior WebHelper


Joined: 14 Jan 2002
Posts: 220
Location: Cambridge MA USA

PostPosted: Tue Jun 03, 2003 7:26 pm (14 years, 4 months ago) Reply with QuoteBack to Top

Darren wrote:
Cheers Verto, I will give all this a try when I get a chance.

I hadn't managed to fix the problem with the special characters or the white space. As I said it seemed to be the process of assigning the data to the array (or any variable).When I tried echoing it instead and the characters work and white space is retained. In fact most of the subsequent examples I have found online and in other books tend to work this way anyway - outputting the display directly from the 3 functions. Unfortunately my layout is a little to complex for this and I'd rather keep it seperate from the functions anyway.

So was the code you posted a simplified version of what you're actually using then?

Quote:
I'd also much rather keep the data in the XML as 'clean' as possible without filling it with tags telling it how to display - surely thats the whole point of XML. Very Happy

Are you referring to the white space and special chars here, or to the <BR> tags, or to all of them?

Quote:
In terms of adding stuff to the DTD it would appear that I have to first instruct Expat to use it as I believe it ignores it by default - can't remember where I read that Embarassed At the moment it makes no difference whether its there or not Sad

You don't need to do that for the code I gave if it works correctly. I mentioned that mainly because I assumed that since you WERE already using a DTD, you may've had it there for some useful purpose. Was that just the DTD from the book you mentioned?

Not sure about your purpose for this page, but even if you're just doing it for practice or to learn XML, it can generally be a good idea to validate your XML against the DTD, much as you'd validate your Web pages. However, you don't really need expat to do that, as there're several decent online validators readily available.

________________________________
>>>>>>>>>>>>>
GENERAL DISCLAIMER:This disclaimer may be void where null in all cases unless explicitly not unprohibited or (p)re-exclusively assigned by sufficient presedimentation on behalf of every non-interested party to wit (or so it was said).
:::
.: :. . : :....: :.: .: :. verto .: :. . : :....: :.: .: :.
OfflineView User's ProfileFind all posts by vertoSend Personal Message
Display posts from previous:      
Post New TopicReply to Topic
View Previous Topic Print this topic View Next Topic


 Jump to:   




You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot edit your posts in this forum.
You cannot delete your posts in this forum.
You cannot vote in polls in this forum.


Page generation time: 0.071497 seconds :: 17 queries executed :: All Times are GMT
Powered by phpBB 2.0 © 2001, 2002 phpBB Group :: Based on an FI Theme