Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Emacs
  3. How to make emacs export org files with org-publish with readable for web servers file name encoding?

How to make emacs export org files with org-publish with readable for web servers file name encoding?

Scheduled Pinned Locked Moved Emacs
emacs
8 Posts 3 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G This user is from outside of this forum
    G This user is from outside of this forum
    [email protected]
    wrote last edited by [email protected]
    #1

    Hello, guys!

    I'm in process of moving my notes from Joplin, which is also a great tool, to Emacs 30.1. I use denote for managing notes.

    I found a strange behavior when using org-publish: almost every note I created and exported using org-publish can't be read by webserver. It happens when file name consists cyrillic letters. I've tried nginx, apache, python http.server, web-static-server. When I run a server and try to open html file in latin - it's OK, but when there some cyrillic letters in file name - web serser tells me it can't find file with this name like "%u...". However when I open html files locally with Firefox everything works just fine.

    So after a couple of days of reasearch I found that one reason for such behavior could be the wrong file name encoding. As far as I'm not an expert may be somebody can explain how to make emacs convert with org-publish notes in encoding that is readable for any web server?

    My emacs config consists:

    org-publish-project-alist '(
                                (
                                 "notes"
                                 :base-directory "~/org/denotes/"
                                 :recursive nil
                                 :publishing-directory "~/public_notes"
                                 :section-numbers nil
                                 :with-toc nil
                                 :with-author nil
                                 :with-creator nil
                                 :with-date nil
                                 :html-preamble "<nav><a href='index.html'>Notes</a></nav>"
                                 :html-postamble nil
                                 :auto-sitemap t
                                 :sitemap-filename "index.org"
                                 :sitemap-title "Notes"
                                 :sitemap-sort-files anti-chronologically
                                 )
    

    Host is Debian 13. UTF-8 is the only encoding enabled in locales. Servers I've tried so far also run on Debian 13 with UTF-8.

    S M 2 Replies Last reply
    7
    • G [email protected]

      Hello, guys!

      I'm in process of moving my notes from Joplin, which is also a great tool, to Emacs 30.1. I use denote for managing notes.

      I found a strange behavior when using org-publish: almost every note I created and exported using org-publish can't be read by webserver. It happens when file name consists cyrillic letters. I've tried nginx, apache, python http.server, web-static-server. When I run a server and try to open html file in latin - it's OK, but when there some cyrillic letters in file name - web serser tells me it can't find file with this name like "%u...". However when I open html files locally with Firefox everything works just fine.

      So after a couple of days of reasearch I found that one reason for such behavior could be the wrong file name encoding. As far as I'm not an expert may be somebody can explain how to make emacs convert with org-publish notes in encoding that is readable for any web server?

      My emacs config consists:

      org-publish-project-alist '(
                                  (
                                   "notes"
                                   :base-directory "~/org/denotes/"
                                   :recursive nil
                                   :publishing-directory "~/public_notes"
                                   :section-numbers nil
                                   :with-toc nil
                                   :with-author nil
                                   :with-creator nil
                                   :with-date nil
                                   :html-preamble "<nav><a href='index.html'>Notes</a></nav>"
                                   :html-postamble nil
                                   :auto-sitemap t
                                   :sitemap-filename "index.org"
                                   :sitemap-title "Notes"
                                   :sitemap-sort-files anti-chronologically
                                   )
      

      Host is Debian 13. UTF-8 is the only encoding enabled in locales. Servers I've tried so far also run on Debian 13 with UTF-8.

      S This user is from outside of this forum
      S This user is from outside of this forum
      [email protected]
      wrote last edited by
      #2

      This sounds like you should check the httpd output for the right application type headers and adjust the server config if you have to.

      G 1 Reply Last reply
      0
      • S [email protected]

        This sounds like you should check the httpd output for the right application type headers and adjust the server config if you have to.

        G This user is from outside of this forum
        G This user is from outside of this forum
        [email protected]
        wrote last edited by
        #3

        Could you please provide a little bit more details about your suggestion? I don't understand what headers I need to fix to make everything work? For example, in nginx my config, which is pretty default, contains:

            charset utf-8;
        

        When I curl a page I see:

        Content-Type: text/html; charset=utf-8
        
        S 1 Reply Last reply
        0
        • G [email protected]

          Could you please provide a little bit more details about your suggestion? I don't understand what headers I need to fix to make everything work? For example, in nginx my config, which is pretty default, contains:

              charset utf-8;
          

          When I curl a page I see:

          Content-Type: text/html; charset=utf-8
          
          S This user is from outside of this forum
          S This user is from outside of this forum
          [email protected]
          wrote last edited by
          #4

          That looks like the right content type. Can you use browser tools or telnet to see that the header is really being sent?

          G 1 Reply Last reply
          0
          • G [email protected]

            Hello, guys!

            I'm in process of moving my notes from Joplin, which is also a great tool, to Emacs 30.1. I use denote for managing notes.

            I found a strange behavior when using org-publish: almost every note I created and exported using org-publish can't be read by webserver. It happens when file name consists cyrillic letters. I've tried nginx, apache, python http.server, web-static-server. When I run a server and try to open html file in latin - it's OK, but when there some cyrillic letters in file name - web serser tells me it can't find file with this name like "%u...". However when I open html files locally with Firefox everything works just fine.

            So after a couple of days of reasearch I found that one reason for such behavior could be the wrong file name encoding. As far as I'm not an expert may be somebody can explain how to make emacs convert with org-publish notes in encoding that is readable for any web server?

            My emacs config consists:

            org-publish-project-alist '(
                                        (
                                         "notes"
                                         :base-directory "~/org/denotes/"
                                         :recursive nil
                                         :publishing-directory "~/public_notes"
                                         :section-numbers nil
                                         :with-toc nil
                                         :with-author nil
                                         :with-creator nil
                                         :with-date nil
                                         :html-preamble "<nav><a href='index.html'>Notes</a></nav>"
                                         :html-postamble nil
                                         :auto-sitemap t
                                         :sitemap-filename "index.org"
                                         :sitemap-title "Notes"
                                         :sitemap-sort-files anti-chronologically
                                         )
            

            Host is Debian 13. UTF-8 is the only encoding enabled in locales. Servers I've tried so far also run on Debian 13 with UTF-8.

            M This user is from outside of this forum
            M This user is from outside of this forum
            [email protected]
            wrote last edited by
            #5

            URIs can only contain ASCII characters, so the web server is receiving requests in 'percent encoded' form for urls, not in utf8, and so there is no way for the server to know which file to respond with. You'll have to urlencode the filenames yourself unfortunately, so that they will match the incoming requests. The tool jq can urlencode cyrillic characters:

            echo "људиа" | jq -rR @uri
            

            You could probably do this as part of the build process if you are clever enough.

            This is only for the file name itself; the exported document should share the source document's encoding unless overridden by the org-export-coding-system option.

            M G 2 Replies Last reply
            1
            • M [email protected]

              URIs can only contain ASCII characters, so the web server is receiving requests in 'percent encoded' form for urls, not in utf8, and so there is no way for the server to know which file to respond with. You'll have to urlencode the filenames yourself unfortunately, so that they will match the incoming requests. The tool jq can urlencode cyrillic characters:

              echo "људиа" | jq -rR @uri
              

              You could probably do this as part of the build process if you are clever enough.

              This is only for the file name itself; the exported document should share the source document's encoding unless overridden by the org-export-coding-system option.

              M This user is from outside of this forum
              M This user is from outside of this forum
              [email protected]
              wrote last edited by
              #6

              One more note on this is that while some searching did lead to webservers that can decode uris into utf before handling them, I believe this is very unsafe for a public server, and, in the worst case, could allow public access to your entire drive. There are vulnerabilities because different systems, and even different services on a single system, can treat specific unicode characters differently. My advice above to url-encode the filenames before serving or while building them would avoid the need for any decoding of requests as they come in.

              1 Reply Last reply
              1
              • S [email protected]

                That looks like the right content type. Can you use browser tools or telnet to see that the header is really being sent?

                G This user is from outside of this forum
                G This user is from outside of this forum
                [email protected]
                wrote last edited by
                #7

                Yeah, I see exactly the same type in header.

                1 Reply Last reply
                0
                • M [email protected]

                  URIs can only contain ASCII characters, so the web server is receiving requests in 'percent encoded' form for urls, not in utf8, and so there is no way for the server to know which file to respond with. You'll have to urlencode the filenames yourself unfortunately, so that they will match the incoming requests. The tool jq can urlencode cyrillic characters:

                  echo "људиа" | jq -rR @uri
                  

                  You could probably do this as part of the build process if you are clever enough.

                  This is only for the file name itself; the exported document should share the source document's encoding unless overridden by the org-export-coding-system option.

                  G This user is from outside of this forum
                  G This user is from outside of this forum
                  [email protected]
                  wrote last edited by
                  #8

                  Thank you, I'll dig into that.

                  1 Reply Last reply
                  1
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups