Page 2 of 2

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Mon Jan 14, 2013 12:33 pm
by Thomas Lohrum
CintaNotes Developer wrote:
Thomas wrote:The problem is with the Link-field, not the Tag-field!
It's essentially the same problem.
It's the difference between:
SELECT * FROM Notes WHERE text MATCH "engine google" OR link MATCH "engine google"
and
SELECT * FROM Notes WHERE (text MATCH "engine" OR link MATCH "engine") AND (text MATCH "google" OR link MATCH "google").

It is obvious that the second query will be much slower than the first.

If i want speed over hits, i can select title+text. If i go for "Anywhere" i want maximum hits. The help says, "In the Anywhere-mode, (...), CintaNotes includes all fields into the search." My expectation as a user is, that in this case search rules will always behave identical. That is a search for "word1 word2" should give my all notes containing both words "anywhere" in the text, regardless of title, text or link!

As for speed - how about the following suggestion: When storing the note in the database, append the link to the notes text using an internal syntax to separate the both. Now the search query can be executed the best way in terms of hits and speed. When reading the note for editing or displaying, the link values must be ripped from the notes text. The appended link value is redundant for search only. The use and storage of the link field itself is not affected.

Thomas

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Mon Jan 14, 2013 12:59 pm
by CintaNotes Developer
Thomas Lohrum wrote:If i want speed over hits, i can select title+text. If i go for "Anywhere" i want maximum hits. The help says, "In the Anywhere-mode, (...), CintaNotes includes all fields into the search." My expectation as a user is, that in this case search rules will always behave identical. That is a search for "word1 word2" should give my all notes containing both words "anywhere" in the text, regardless of title, text or link!

I understand. However, please consider that there are overall 6 fields: title, text, tags, link, time created and time modified. And they all need to be combined using OR operator which gives us, taking N as the number of words in the query, 6*N match clauses. This, unfortunately, will get very slow really soon!

Thomas Lohrum wrote:As for speed - how about the following suggestion: When storing the note in the database, append the link to the notes text using an internal syntax to separate the both. Now the search query can be executed the best way in terms of hits and speed. When reading the note for editing or displaying, the link values must be ripped from the notes text. The appended link value is redundant for search only. The use and storage of the link field itself is not affected.

Yes, this is a solution. Unfortunately it horribly violates an important law of good database design, namely the very first normal form: "column values should be atomic, no column should contain multiple values". In practical terms, it means that many database operations on these columns will be severely hampered, like e.g. sorting by link.

To summarize, I totally accept the usefulness of your request and suggest the following:
1) Add special search mode "Search across field boundaries" which would apply the F*N-matching to the SQL query.
2) Use only 4 fields for this matching: title, text, tags (stringized), and link. There's no real need to search "engine google 10.12.2012". Or is there?.. :)

Also, an altogether different idea: use some hints for the parser, like: engine title:google link:cintanotes.com (somewhat similar to Google search syntax).

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Mon Jan 14, 2013 1:15 pm
by ChrisCN
Thomas I agree: There are times where it is necessary to find that one note you are searching for - regardless of speed.
Alex I agree: I would not start messing up the database design - this will avenge oneself.

As far as I have heard (although not an expert) there are still possible ways to accomplish the task.
What about building up some additional look-up tables for the search which could be optimized for this task?

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Mon Jan 14, 2013 2:29 pm
by Thomas Lohrum
CintaNotes Developer wrote:
Thomas Lohrum wrote:If i want speed over hits, i can select title+text. If i go for "Anywhere" i want maximum hits. The help says, "In the Anywhere-mode, (...), CintaNotes includes all fields into the search." My expectation as a user is, that in this case search rules will always behave identical. That is a search for "word1 word2" should give my all notes containing both words "anywhere" in the text, regardless of title, text or link!

I understand. However, please consider that there are overall 6 fields: title, text, tags, link, time created and time modified. And they all need to be combined using OR operator which gives us, taking N as the number of words in the query, 6*N match clauses. This, unfortunately, will get very slow really soon!

Nevertheless, imo the current design is not correct. It does not give me the results i expect. It is also in conflict with help.

CintaNotes Developer wrote:
Thomas Lohrum wrote:As for speed - how about the following suggestion: When storing the note in the database, append the link to the notes text using an internal syntax to separate the both. Now the search query can be executed the best way in terms of hits and speed. When reading the note for editing or displaying, the link values must be ripped from the notes text. The appended link value is redundant for search only. The use and storage of the link field itself is not affected.

Yes, this is a solution. Unfortunately it horribly violates an important law of good database design, namely the very first normal form: "column values should be atomic, no column should contain multiple values". In practical terms, it means that many database operations on these columns will be severely hampered, like e.g. sorting by link.

I am ware of the "very first normal form" :) Still CN can not be compared to a CRM-system, where a proper database design is much more important. As for the database operations: My suggestion is to use the largest field (note text) and append the short fields internally. The design of the database must not be changed, that is the short fields (title, link, tags (?)) remain separate fields.

CintaNotes Developer wrote:To summarize, I totally accept the usefulness of your request and suggest the following:
1) Add special search mode "Search across field boundaries" which would apply the F*N-matching to the SQL query.
2) Use only 4 fields for this matching: title, text, tags (stringized), and link. There's no real need to search "engine google 10.12.2012". Or is there?.. :)

Thank you for considering my suggestions. I guess it is theory to select all notes containing some text and having a modification date before some given date. If i really need this, i can search for the word and scroll down the list, to match the given date. Imo, title, text + link is a must. I'd also be pleased to see the tags field included. However, stringinizing the field also violates 1NF of database design ;) I think for tags a join should be used, even if this is slower/the slowest.

Limiting the search to the four fields as you described makes sense to me. Having one more option "Search across field boundaries" complicates things a little, but i agree it is a good compromise in regards to speed over hits.

There is also another suggestion regarding the offered field selection. Right now we have title+text,text only, title only, link only, etc. What about having each field as a checkable selection, which can be combined, e.g. [X] title [X] link. What do you think about this? I consider making this a suggestion on the roadmap.

Thomas

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Tue Jan 15, 2013 11:33 pm
by jkoerner
CintaNotes Developer wrote:Alex
I wrote a lengthy reply on the subject of how to notify users about very bad bugs, but you did not respond to it.
In the 201 final topic.
Jon

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Thu Jan 17, 2013 4:38 pm
by CintaNotes Developer
jkoerner wrote:
CintaNotes Developer wrote:Alex
I wrote a lengthy reply on the subject of how to notify users about very bad bugs, but you did not respond to it.
In the 201 final topic.
Jon


Jon, I didn't forget about it, just taken some time to think the issue over. Will reply soon!

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Thu Jan 17, 2013 4:41 pm
by CintaNotes Developer
ChrisCN wrote:As far as I have heard (although not an expert) there are still possible ways to accomplish the task.
What about building up some additional look-up tables for the search which could be optimized for this task?


Good idea, but many people already complain that the .db file is bigger than the exported XML. Imagine what would be said if it became
even 10% bigger :)

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Thu Jan 17, 2013 4:57 pm
by CintaNotes Developer
Thomas Lohrum wrote:Nevertheless, imo the current design is not correct. It does not give me the results i expect. It is also in conflict with help.

I agree. However there must exist ability to specify that all search terms must be in the same field.

Thomas wrote:I am ware of the "very first normal form" :) Still CN can not be compared to a CRM-system, where a proper database design is much more important. As for the database operations: My suggestion is to use the largest field (note text) and append the short fields internally. The design of the database must not be changed, that is the short fields (title, link, tags (?)) remain separate fields.

Well, the instant search is one of the main wow-factors of CN, and I'd really hate to lose it. However duplicating content is also not a solution. Just think that now all these cached data need to be kept in sync..
I say we test the naive implementation first, just how bad is it, and then try to think of the ways to optimize.

Thomas wrote:Thank you for considering my suggestions. I guess it is theory to select all notes containing some text and having a modification date before some given date. If i really need this, i can search for the word and scroll down the list, to match the given date. Imo, title, text + link is a must. I'd also be pleased to see the tags field included. However, stringinizing the field also violates 1NF of database design ;) I think for tags a join should be used, even if this is slower/the slowest.

Don't worry, I meant stringizing just as a metaphor :) Real tags will remain in the Tags table :)

Thomas wrote:Limiting the search to the four fields as you described makes sense to me. Having one more option "Search across field boundaries" complicates things a little, but i agree it is a good compromise in regards to speed over hits.

Good! Hopefully it won't complicate the UI too much. But it is conceptually similar to "Search inside words", so I hope that the two options will be seen just as search fine-tuning.

Thomas wrote:There is also another suggestion regarding the offered field selection. Right now we have title+text,text only, title only, link only, etc. What about having each field as a checkable selection, which can be combined, e.g. [X] title [X] link. What do you think about this? I consider making this a suggestion on the roadmap.

For me it would be too cumbersome, because anytime you'd want to switch to another field, you'd have to also deactivate the old field.

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Thu Jan 17, 2013 9:20 pm
by Thomas Lohrum
CintaNotes Developer wrote:I say we test the naive implementation first, just how bad is it, and then try to think of the ways to optimize.

OK :)

CintaNotes Developer wrote:
Thomas wrote:There is also another suggestion regarding the offered field selection. Right now we have title+text,text only, title only, link only, etc. What about having each field as a checkable selection, which can be combined, e.g. [X] title [X] link. What do you think about this? I consider making this a suggestion on the roadmap.
For me it would be too cumbersome, because anytime you'd want to switch to another field, you'd have to also deactivate the old field.

You're probably right. So we should leave the existing options, but enhance "Anywhere" the way discussed, e.g. "Search across field boundaries".

Re: [Ann] CintaNotes 2.0.2 Beta 1

Posted: Fri Jan 18, 2013 10:20 am
by CintaNotes Developer
Thomas Lohrum wrote:You're probably right. So we should leave the existing options, but enhance "Anywhere" the way discussed, e.g. "Search across field boundaries".
Agreed 8-)