How to make emacs export org files with org-publish with readable for web servers file name encoding?
-
wrote last edited by [email protected]
Hello, guys!
I'm in process of moving my notes from Joplin, which is also a great tool, to Emacs 30.1. I use denote for managing notes.
I found a strange behavior when using
org-publish: almost every note I created and exported usingorg-publishcan't be read by webserver. It happens when file name consists cyrillic letters. I've tried nginx, apache, python http.server, web-static-server. When I run a server and try to open html file in latin - it's OK, but when there some cyrillic letters in file name - web serser tells me it can't find file with this name like "%u...". However when I open html files locally with Firefox everything works just fine.So after a couple of days of reasearch I found that one reason for such behavior could be the wrong file name encoding. As far as I'm not an expert may be somebody can explain how to make emacs convert with
org-publishnotes in encoding that is readable for any web server?My emacs config consists:
org-publish-project-alist '( ( "notes" :base-directory "~/org/denotes/" :recursive nil :publishing-directory "~/public_notes" :section-numbers nil :with-toc nil :with-author nil :with-creator nil :with-date nil :html-preamble "<nav><a href='index.html'>Notes</a></nav>" :html-postamble nil :auto-sitemap t :sitemap-filename "index.org" :sitemap-title "Notes" :sitemap-sort-files anti-chronologically )Host is Debian 13. UTF-8 is the only encoding enabled in locales. Servers I've tried so far also run on Debian 13 with UTF-8.
-
Hello, guys!
I'm in process of moving my notes from Joplin, which is also a great tool, to Emacs 30.1. I use denote for managing notes.
I found a strange behavior when using
org-publish: almost every note I created and exported usingorg-publishcan't be read by webserver. It happens when file name consists cyrillic letters. I've tried nginx, apache, python http.server, web-static-server. When I run a server and try to open html file in latin - it's OK, but when there some cyrillic letters in file name - web serser tells me it can't find file with this name like "%u...". However when I open html files locally with Firefox everything works just fine.So after a couple of days of reasearch I found that one reason for such behavior could be the wrong file name encoding. As far as I'm not an expert may be somebody can explain how to make emacs convert with
org-publishnotes in encoding that is readable for any web server?My emacs config consists:
org-publish-project-alist '( ( "notes" :base-directory "~/org/denotes/" :recursive nil :publishing-directory "~/public_notes" :section-numbers nil :with-toc nil :with-author nil :with-creator nil :with-date nil :html-preamble "<nav><a href='index.html'>Notes</a></nav>" :html-postamble nil :auto-sitemap t :sitemap-filename "index.org" :sitemap-title "Notes" :sitemap-sort-files anti-chronologically )Host is Debian 13. UTF-8 is the only encoding enabled in locales. Servers I've tried so far also run on Debian 13 with UTF-8.
This sounds like you should check the httpd output for the right application type headers and adjust the server config if you have to.
-
This sounds like you should check the httpd output for the right application type headers and adjust the server config if you have to.
Could you please provide a little bit more details about your suggestion? I don't understand what headers I need to fix to make everything work? For example, in nginx my config, which is pretty default, contains:
charset utf-8;When I curl a page I see:
Content-Type: text/html; charset=utf-8 -
Could you please provide a little bit more details about your suggestion? I don't understand what headers I need to fix to make everything work? For example, in nginx my config, which is pretty default, contains:
charset utf-8;When I curl a page I see:
Content-Type: text/html; charset=utf-8That looks like the right content type. Can you use browser tools or telnet to see that the header is really being sent?
-
Hello, guys!
I'm in process of moving my notes from Joplin, which is also a great tool, to Emacs 30.1. I use denote for managing notes.
I found a strange behavior when using
org-publish: almost every note I created and exported usingorg-publishcan't be read by webserver. It happens when file name consists cyrillic letters. I've tried nginx, apache, python http.server, web-static-server. When I run a server and try to open html file in latin - it's OK, but when there some cyrillic letters in file name - web serser tells me it can't find file with this name like "%u...". However when I open html files locally with Firefox everything works just fine.So after a couple of days of reasearch I found that one reason for such behavior could be the wrong file name encoding. As far as I'm not an expert may be somebody can explain how to make emacs convert with
org-publishnotes in encoding that is readable for any web server?My emacs config consists:
org-publish-project-alist '( ( "notes" :base-directory "~/org/denotes/" :recursive nil :publishing-directory "~/public_notes" :section-numbers nil :with-toc nil :with-author nil :with-creator nil :with-date nil :html-preamble "<nav><a href='index.html'>Notes</a></nav>" :html-postamble nil :auto-sitemap t :sitemap-filename "index.org" :sitemap-title "Notes" :sitemap-sort-files anti-chronologically )Host is Debian 13. UTF-8 is the only encoding enabled in locales. Servers I've tried so far also run on Debian 13 with UTF-8.
URIs can only contain ASCII characters, so the web server is receiving requests in 'percent encoded' form for urls, not in utf8, and so there is no way for the server to know which file to respond with. You'll have to urlencode the filenames yourself unfortunately, so that they will match the incoming requests. The tool jq can urlencode cyrillic characters:
echo "људиа" | jq -rR @uriYou could probably do this as part of the build process if you are clever enough.
This is only for the file name itself; the exported document should share the source document's encoding unless overridden by the
org-export-coding-systemoption. -
URIs can only contain ASCII characters, so the web server is receiving requests in 'percent encoded' form for urls, not in utf8, and so there is no way for the server to know which file to respond with. You'll have to urlencode the filenames yourself unfortunately, so that they will match the incoming requests. The tool jq can urlencode cyrillic characters:
echo "људиа" | jq -rR @uriYou could probably do this as part of the build process if you are clever enough.
This is only for the file name itself; the exported document should share the source document's encoding unless overridden by the
org-export-coding-systemoption.One more note on this is that while some searching did lead to webservers that can decode uris into utf before handling them, I believe this is very unsafe for a public server, and, in the worst case, could allow public access to your entire drive. There are vulnerabilities because different systems, and even different services on a single system, can treat specific unicode characters differently. My advice above to url-encode the filenames before serving or while building them would avoid the need for any decoding of requests as they come in.
-
That looks like the right content type. Can you use browser tools or telnet to see that the header is really being sent?
Yeah, I see exactly the same type in header.
-
URIs can only contain ASCII characters, so the web server is receiving requests in 'percent encoded' form for urls, not in utf8, and so there is no way for the server to know which file to respond with. You'll have to urlencode the filenames yourself unfortunately, so that they will match the incoming requests. The tool jq can urlencode cyrillic characters:
echo "људиа" | jq -rR @uriYou could probably do this as part of the build process if you are clever enough.
This is only for the file name itself; the exported document should share the source document's encoding unless overridden by the
org-export-coding-systemoption.Thank you, I'll dig into that.